ByteDance is looking to enhance the stability, efficiency, effectiveness, and scalability of its data center and server operations, platform, and service on a worldwide scale.
Requirements
- Demonstrated proficiency in Linux system administration tasks.
- Possessed an in-depth comprehension of Linux kernels, drivers, and modules.
- Capable of scripting in Bash and Python to automate routine system operations, encompassing skills such as system configuration, performance tuning, and security management within the Linux environment.
- Had an in-depth understanding of server hardware, and was able to conduct troubleshooting or diagnostics.
- Proficient in customizing operation and maintenance tools to satisfy specific demands for new server hardware.
- Competent in managing the entire software tool lifecycle, ranging from deployment to continuous maintenance.
- Experience in developing and maintaining hardware, network, or service monitoring software for more than 10,000 servers.
Responsibilities
- Contribute to enhancing the stability, efficiency, effectiveness, and scalability of our data center and server operations, platform, and service on a worldwide scale.
- Participate in and enhance the entire lifecycle of the server fleet - from system design/introduction consultation to launch reviews, deployment, operation, and retirement.
- Develop and deploy tools and solutions to enhance the automation, reliability, scalability, and operability of servers in the datacenter.
- Develop and deploy tools and solutions for improving the availability, latency, and overall service of the datacenter infrastructure, server, and network health.
- Troubleshoot and resolve complex technical issues in a high-pressure, fast-paced environment.
- Conduct high-level root-cause analysis for service interruption and establish preventive measures.
- Practice sustainable incident response and postmortem.
Other
- Bachelor's degree in Computer Science, Electronic Engineering, relevant technical field, or equivalent practical experience.
- Experience in managing and coordinating teams in the global context.
- 3 years of work experience in related filed.
- An intermediate level of expertise is preferred.
- Proficiency in the operation and maintenance of GPU server is strongly preferred.