Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Principal Engineer MLOps - DLP Detection

Palo Alto Networks

Salary not specified

Sep 30, 2025

Santa Clara, CA, USA

The company is looking to lead the design, development, and operation of production-grade machine learning infrastructure at scale, architect robust pipelines, deploy and monitor ML models, and ensure reliability, reproducibility, and governance across their AI/ML ecosystem.

Requirements

Strong programming skills (Python, Go, or Java) with deep expertise in building production systems
Experience with cloud platforms (AWS, GCP, Azure) and container orchestration (Kubernetes, Docker)
Proven experience in ML infrastructure: model serving (TensorFlow Serving, TorchServe, Triton), workflow orchestration (Airflow, Kubeflow, MLflow, Ray, Vertex AI, SageMaker)
Hands-on experience with CI/CD pipelines, infrastructure-as-code (Terraform, Helm), and monitoring/observability tools (Prometheus, Grafana, ELK/EFK stack)
Strong knowledge of data pipelines, feature stores, and streaming systems (Kafka, Spark, Flink)
Understanding of model monitoring, drift detection, retraining pipelines, and governance frameworks

Responsibilities

Lead MLOps architecture: Design and implement scalable ML platforms, CI/CD pipelines, and deployment workflows across cloud and hybrid environments
Operationalize ML models: Build automated systems for training, testing, deployment, monitoring, and rollback of ML models in production
Ensure reliability and governance: Implement model versioning, reproducibility, auditing, and compliance best practices
Drive observability & monitoring: Develop real-time monitoring, alerting, and logging solutions for ML services, ensuring performance, drift detection, and system health
Champion automation & efficiency: Reduce friction between data science, engineering, and operations by introducing best practices for infrastructure-as-code, container orchestration, and continuous delivery
Collaborate cross-functionally: Partner with ML engineers, data scientists, security teams, and product engineering to deliver robust, production-ready AI systems
Lead innovation in MLOps: Evaluate and introduce new tools, frameworks, and practices that elevate the scalability, reliability, and security of ML operations

Other

In office 3 days a week. Not a remote role.
Ability to influence cross-functional stakeholders, define best practices, and mentor engineers at all levels
Passion for operational excellence, scalability, and securing ML systems in mission-critical environments