CapCut and TikTok are looking to solve the problem of managing complex challenges of scale, while using expertise in coding, algorithms, complexity analysis, and large-scale system design to build and run large-scale, massively distributed, and fault-tolerant systems.
Requirements
- Proven work experience as a Site Reliability Engineer, Systems Engineer, or similar software engineering role.
- Proficient knowledge of high-level programming languages (e.g. Python, Go, Java, and Shell script).
- Experience in network architecture, database modeling, cloud systems and large-scale distributed systems.
- Strong understanding of Linux operating systems and open-source technologies.
- Experience in MySQL, Redis, Ngnix, Kubernetes, Docker, OpenStack, Hadoop, Spark, etc
- Knowledge of monitoring tools and methodologies (such as Prometheus, Grafana).
- Excellent problem-solving skills, strategic thinking, and a strong ability to debug complex systems.
Responsibilities
- Develop and maintain automation procedures to maximize system efficiency and minimize human intervention.
- Work closely with software engineering teams to design, deploy and operate elements to ensure that systems are functionally robust.
- Ensure system scalability to handle growth in web traffic and data.
- Implement monitoring tools and set up metrics to keep track of system health and performance.
- Participate in on-call rotations, assist with incident management, and diagnose, resolve, and prevent production issues.
- Conduct performance tests to find and address system bottlenecks.
- Collaborate with teams across the organization to define Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
Other
- Bachelor's degree in Computer Science, Information Technology, or a related field with 3+ years of experience
- Hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department.
- Ability to work with and support systems designed to protect sensitive data and information.
- Must be eligible for strict national security-related screening.
- Excellent communication skills and the ability to effectively collaborate with cross-functional teams.