Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Principal Engineer - Site Reliability Engineering - SRE

Toyota

Salary not specified

Sep 13, 2025

Plano, TX, USA

Toyota is looking to drive its Kubernetes microservices and containerization strategy to ensure platform resilience, optimize resource utilization, and enable seamless disaster recovery and business continuity.

Requirements

7+ years of hands-on experience managing Kubernetes clusters, container orchestration, and microservices deployment in high-performance environments
Proven expertise with DevOps automation tools such as GitHub Actions, Terraform, Ansible, Helm, Rancher, and Harness
Strong scripting skills in Python or similar languages to build testable automation solutions
Deep understanding of monitoring and logging frameworks including Datadog, Splunk, and Prometheus
Advanced experience deploying and managing distributed messaging systems like Kafka, RabbitMQ, MQTT, or Amazon Kinesis
Experience with hybrid cloud/on-premises infrastructure, including VMware and AWS services
Familiarity with business process mining tools (Celonis, SAP Signavio, UIPath) and project management platforms (JIRA, MS Project)

Responsibilities

Own the end-to-end management of Kubernetes clusters across on-premises and cloud environments, ensuring high availability and performance
Design, deploy, and maintain scalable microservices using Helm charts, GitOps tools like Argo CD, and CI/CD pipelines built with GitHub Actions and Terraform
Troubleshoot and resolve complex issues spanning cluster components, networking, storage, and application layers to minimize downtime
Implement and enforce security best practices to protect our containerized environments and applications
Monitor system health and resource usage using tools like Datadog, Splunk, and Prometheus, driving continuous performance improvements
Collaborate closely with infrastructure, networking, security, and application teams to align solutions with business needs and accelerate delivery
Lead incident response efforts and conduct post-mortem analyses to prevent future disruptions

Other

Bachelor’s degree or equivalent experience providing a strong foundation in software engineering, systems administration, or related fields
Excellent analytical, problem-solving, and communication skills, with a collaborative mindset to work effectively across teams
Experience with incident management platforms and leading cross-functional incident response
Ability to work in a team environment built on teamwork, flexibility and respect
Toyota does not offer sponsorship of job applicants for employment-based visas or any other work authorization for this position at this time