Ensure infrastructure services are reliable, fault-tolerant, efficiently scalable and cost-effective for TikTok's Edge SRE team.
Requirements
- 3+ years experience working with Unix Linux systems from kernel to shell and beyond with experience working with system libraries, file systems, and client-server protocols.
- 2+ years experience in one or more programming languages such as Java, C++, Go, or scripting experience in Shell and Python.
- Experience in designing, analyzing and building automation and tools for large scale systems.
- Experience with the Hadoop ecosystem - HDFS, Yarn, Spark, etc.
- Experience in building solutions with AWS, Google, Azures and other cloud services.
- Experience in networking technologies such TCP/IP, BGP, DNS, etc. in a carrier-grade environment.
- Experience in developing and operating one or more of following systems: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, etc.
Responsibilities
- Build data pipelines, tools, automations, visualizations and monitors to facilitate the operation and optimization of edge services.
- Data monitoring and alerting, data quality assurance and anomaly detection.
- Document team processes and policies, including methods of engagement and SLOs.
- Analyze, design and implement solutions at the system level to remove bottlenecks and improve edge service performance.
- Implement monitoring and alerting to improve issue detection and response.
- Work in a fast-paced environment.
- Participate in technical operations and rotations in response to performance and reliability issues.
Other
- Master’s degree (or Bachelor's degree with 2+) years of experience in Computer Engineering, Electrical Engineering, Computer Science or related major.
- Strong analytical skills and the ability to solve real world problems in a fast moving environment.
- Self-driven and capable of working with ambiguity and moving projects from concept to delivery.
- Our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department.
- This role requires the ability to work with and support systems designed to protect sensitive data and information. As such, this role will be subject to strict national security-related screening.