Elasticsearch needs to improve its distributed systems to handle scale, performance, and resilience, specifically focusing on how nodes communicate, data is indexed, allocated, and replicated across nodes, and how the cluster coordination system maintains high performance and system safety as nodes and data move.
Requirements
- A strong background in distributed systems and consensus algorithms.
- You have strong skills in core Java and are conversant in the standard library of data structures and concurrency constructs, as well as newer language features.
- You have a deep technical proficiency in algorithms.
- You have shown your ability to understand and work on complex, highly distributed systems.
- You demonstrate the ability to build and debug features with a broad impact, running on multiple machines.
- Data stores
- Asynchronous event-driven network frameworks such as Netty
Responsibilities
- Improving Elasticsearch’s components that support concurrent and consistent indexing across multiple machines.
- Maintaining our cluster coordination system to keep performance high even though nodes come and go from the cluster and data moves around, while maintaining the safety and liveness properties of the system as a whole.
- Pushing the limits on the number of shards, nodes, and petabytes that Elasticsearch can handle today
- Looking into all kinds of issues, including performance or concurrency issues, and proposing solutions.
- Supporting our support engineers with the harder problems.
Other
- You are able to own projects from beginning to end. This covers both technical design and working with others to develop needed components.
- You have experience managing projects involving multiple engineers.
- Competitive pay based on the work you do here and not your previous salary
- Health coverage for you and your family in many locations
- Ability to craft your calendar with flexible locations and schedules for many roles