Reddit is looking to hire a Senior Software Engineer to work on its Compute team, focusing on platform engineering and cluster engineering to improve the availability, scalability, latency, and efficiency of Reddit's Compute Platform.
Requirements
- 4+ years of experience developing internet-scale software, preferably in the context of infrastructure.
- Language proficiency in Go.
- Experience developing on top of Kubernetes or similar distributed systems.
- Kubernetes controller or operator development experience is a huge plus.
- Proficiency operating Linux with a solid understanding around cgroups, namespaces, other multi-tenancy primitives.
- Strong troubleshooting capabilities surrounding both systems and software.
- Experience engineering large systems, tracking work, and being a self-starter on projects.
Responsibilities
- Software automation that creates, manages, and destroys clusters in our fleet.
- APIs and controllers that support multi-cluster deployment and scheduling mechanics.
- Core SDKs that enable controller development in the larger organization.
- Software that codifies out-of-cluster ancillary concerns such as network configurations and managed services.
- Detection of node-level performance characteristics and making availability decisions based on the data.
- Schedulers that support more efficient packing of resources along with reactive rescheduling on the basis of changing compute availability.
- Kubernetes controllers that offer APIs in the cluster and perform reconciliation to reach a desired state.
Other
- Work collaboratively with a team of software engineers to create and maintain the foundational platform for running Reddit’s infrastructure.
- Contribute feedback to the technical and strategic direction of the compute platform.
- Share on-call responsibilities with the Compute team.
- Excellent communication skills to collaborate with a service-oriented team and company.