Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

TikTok Logo

Site Reliability Engineer - AML Global Recommendation - Usds

TikTok

$118,657 - $259,200
Sep 18, 2025
San Jose, CA, USA
Apply Now

The business problem that this Site Reliability Engineering (SRE) role at TikTok is looking to solve is to develop and run a massively distributed AI/ML recommendation system for the United States and all around the world, ensuring high availability, scalability, and fault tolerance.

Requirements

  • Expertise in analyzing and troubleshooting Linux-based distributed systems.
  • Experience programming with at least one commonly used language (C, C++, Python, Go).
  • Strong understanding of data structures and algorithms.
  • Competent knowledge of relational database systems.
  • Ability to design and maintain large-scale systems.
  • Strong understanding of code optimization and routine task automation.
  • Proficiency in at least one machine learning framework: TensorFlow, PyTorch, MXNet or PaddlePaddle

Responsibilities

  • Design, build, and maintain highly available, scalable, and fault-tolerant systems.
  • Monitor and analyze system performance, identifying and resolving issues before causing user impact.
  • Develop and maintain automated monitoring, alerting, and incident response systems.
  • Collaborate closely with software engineering teams to ensure that applications are designed with reliability, scalability, and performance in mind.
  • Implement and maintain security best practices and ensure compliance with regulatory requirements.
  • Participate in on-call rotations and respond to issues and incidents within and outside of normal business hours.
  • Conduct root cause analysis of incidents, hold post-mortem reviews with stakeholders, and implement preventative measures to minimize the risk of similar incidents occurring in the future.

Other

  • Bachelor's/Master's degree in Computer Science, Computer Engineering, or equivalent years of experience in a SRE or software engineering role.
  • Hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department.
  • Ability to interact and occasionally have unsupervised contact with internal/external clients and/or colleagues.
  • Ability to appropriately handle and manage confidential information including proprietary and trade secret information and access to information technology systems.
  • Exercising sound judgment.