Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Production System Engineer

TikTok

$87,480 - $228,000

Aug 22, 2025

San Jose, CA, USA

ByteDance is looking to enhance the stability, efficiency, effectiveness, and scalability of its data center and server operations, platform, and service on a worldwide scale.

Requirements

Demonstrated proficiency in Linux system administration tasks.
Possessed an in-depth comprehension of Linux kernels, drivers, and modules.
Capable of scripting in Bash and Python to automate routine system operations, encompassing skills such as system configuration, performance tuning, and security management within the Linux environment.
Had an in-depth understanding of server hardware, and was able to conduct troubleshooting or diagnostics.
Proficient in customizing operation and maintenance tools to satisfy specific demands for new server hardware.
Competent in managing the entire software tool lifecycle, ranging from deployment to continuous maintenance.
Experience in developing and maintaining hardware, network, or service monitoring software for more than 10,000 servers.

Responsibilities

Contribute to enhancing the stability, efficiency, effectiveness, and scalability of our data center and server operations, platform, and service on a worldwide scale.
Participate in and enhance the entire lifecycle of the server fleet - from system design/introduction consultation to launch reviews, deployment, operation, and retirement.
Develop and deploy tools and solutions to enhance the automation, reliability, scalability, and operability of servers in the datacenter.
Develop and deploy tools and solutions for improving the availability, latency, and overall service of the datacenter infrastructure, server, and network health.
Troubleshoot and resolve complex technical issues in a high-pressure, fast-paced environment.
Conduct high-level root-cause analysis for service interruption and establish preventive measures.
Practice sustainable incident response and postmortem.

Other

Bachelor's degree in Computer Science, Electronic Engineering, relevant technical field, or equivalent practical experience.
Experience in managing and coordinating teams in the global context.
3 years of work experience in related filed.
An intermediate level of expertise is preferred.
Proficiency in the operation and maintenance of GPU server is strongly preferred.