Cognitiv is seeking a Machine Learning Engineer to help build and scale the next generation of their ML infrastructure as they transition from a legacy platform to a modern, automated, and highly scalable system. The goal is to improve automation, reliability, and performance of their Deep Learning Advertising Platform.
Requirements
- Python (primary), PyTorch Lightning
- AWS, Docker, Apache Airflow
- ClickHouse, S3, distributed data processing tools
- Deep Learning, LLMs, Hugging Face ecosystem
- Strong Coder. You write clean, efficient, and scalable code in Python
- ML Systems Builder. You’ve designed or maintained ML pipelines, automation, or MLOps systems, and have a solid grasp of model training, deployment, and monitoring in production.
- Distributed Data Expert. You’re experienced with distributed data processing (e.g., PySpark) and understand how to scale workflows efficiently.
Responsibilities
- Contribute to the design, development, and automation of ML workflows across data ingestion, training, deployment, and monitoring.
- Build and maintain scalable data pipelines that support high-volume model training and evaluation.
- Partner with senior engineers to optimize system performance and reduce operational bottlenecks.
- Write clean, production-level Python code and participate in code reviews to maintain quality and consistency.
- Help improve monitoring and observability across ML pipelines to ensure reliability in production.
Other
- Collaborate closely with Product, Engineering, and ML Research teams to deliver reliable, high-impact systems.
- Collaborative Engineer. You communicate clearly, thrive in cross-functional environments, and take pride in building reliable, well-architected systems.
- In-Person Collaborator. You’re available to work onsite MTW in San Mateo, partnering closely with peers to accelerate progress.
- Hybrid work model & daily team lunch
- Comprehensive onboarding (Cognitiv University)…and more!