Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Toyota Logo

Principal Engineer - Site Reliability Engineering - SRE

Toyota

Salary not specified
Sep 13, 2025
Plano, TX, USA
Apply Now

Toyota is looking to drive its Kubernetes microservices and containerization strategy to ensure platform resilience, optimize resource utilization, and enable seamless disaster recovery and business continuity.

Requirements

  • 7+ years of hands-on experience managing Kubernetes clusters, container orchestration, and microservices deployment in high-performance environments
  • Proven expertise with DevOps automation tools such as GitHub Actions, Terraform, Ansible, Helm, Rancher, and Harness
  • Strong scripting skills in Python or similar languages to build testable automation solutions
  • Deep understanding of monitoring and logging frameworks including Datadog, Splunk, and Prometheus
  • Advanced experience deploying and managing distributed messaging systems like Kafka, RabbitMQ, MQTT, or Amazon Kinesis
  • Experience with hybrid cloud/on-premises infrastructure, including VMware and AWS services
  • Familiarity with business process mining tools (Celonis, SAP Signavio, UIPath) and project management platforms (JIRA, MS Project)

Responsibilities

  • Own the end-to-end management of Kubernetes clusters across on-premises and cloud environments, ensuring high availability and performance
  • Design, deploy, and maintain scalable microservices using Helm charts, GitOps tools like Argo CD, and CI/CD pipelines built with GitHub Actions and Terraform
  • Troubleshoot and resolve complex issues spanning cluster components, networking, storage, and application layers to minimize downtime
  • Implement and enforce security best practices to protect our containerized environments and applications
  • Monitor system health and resource usage using tools like Datadog, Splunk, and Prometheus, driving continuous performance improvements
  • Collaborate closely with infrastructure, networking, security, and application teams to align solutions with business needs and accelerate delivery
  • Lead incident response efforts and conduct post-mortem analyses to prevent future disruptions

Other

  • Bachelor’s degree or equivalent experience providing a strong foundation in software engineering, systems administration, or related fields
  • Excellent analytical, problem-solving, and communication skills, with a collaborative mindset to work effectively across teams
  • Experience with incident management platforms and leading cross-functional incident response
  • Ability to work in a team environment built on teamwork, flexibility and respect
  • Toyota does not offer sponsorship of job applicants for employment-based visas or any other work authorization for this position at this time