ByteDance is looking to solve the problem of building and managing large-scale, highly distributed systems, and is seeking talented individuals to join their Data Infrastructure Site Reliability Engineering (SRE) team to design, develop, and operate cloud-managed, scalable, and reliable elements.
Requirements
- Experience programming in one of the following Languages: C, C++, Java, Python, Go, and Rust
- Familiar with Unix/Linux system internals, networking, and distributed systems
- Experience in MySQL, Redis, Ngnix, Kubernetes, Docker, OpenStack, Hadoop, Spark, Flink, etc.
- Experience in designing and analyzing large-scale distributed systems
- Strong skills in problem solving
Responsibilities
- Participate in and enhance the complete service lifecycle, from inception and design, through development, capacity planning, launch reviews, deployment, operation, and refinement.
- Design and implement software platforms and monitoring frameworks to govern service-oriented architecture (SOA) efficiently, automatically, and intelligently.
- Develop and manage components of cloud-managed data infrastructure, encompassing technologies such as Kubernetes, Redis, MySQL, Flink, and more.
- Establish sustainable mechanisms for scaling systems, such as automation, to drive enhancements in reliability, efficiency, and velocity.
- Provide sustainable user support, manage incident responses, and conduct blameless postmortems as part of our ongoing efforts to improve our systems.
Other
- Bachelor's degree or above in Computer Science or a related technical field
- Commit to an onboarding date by end of year 2026
- Ability to communicate effectively
- Ability to work in a team environment
- Must be able to commit to these start dates and state availability and graduation date clearly in resume