CloudBees is seeking an Engineering Leader to drive their Agentic & AI Operations (AIOps) strategy, focusing on developing an AI platform for fine-tuning, deployment, and management of AI, ML, and Agentic Services to simplify complexity and help organizations deliver better software, faster.
Requirements
- 3 years of experience with large-scale systems, with a focus on reliability, scalability, and maintainability; and 1 year of experience with AI/ML systems
- Strong hands-on experience with MLOps tools (e.g., MLflow, Kubeflow, SageMaker, Airflow, Metaflow).
- Proven track record building ML pipelines in production environments.
- Experience with cloud infrastructure (AWS, GCP, or Azure) and container orchestration (Kubernetes).
- Deep knowledge of CI/CD practices as they relate to ML lifecycle.
- Experience deploying and managing services such as Amazon bedrock or Vertex AI - LLm
- Familiarity with data observability and ML monitoring tools (e.g., EvidentlyAI, Prometheus/Grafana for models).
Responsibilities
- Lead and scale a team responsible for AIOps, including model deployment, monitoring, and lifecycle management.
- Architect and implement AI/ML pipelines that are scalable, observable, and reproducible.
- Collaborate with cross-functional teams (data science, DevOps, product) to integrate AI/ML systems into our SaaS platform.
- Establish best practices for AI/ML experimentation, CI/CD for models, data versioning, and model governance.
- Own the full stack of AIOps infrastructure, from data ingestion to real-time inference systems.
- Drive technical vision and roadmap for ML platform development.
- Ability to launch new platforms 0 - 1 and drive adoption internally and externally with partner teams.
Other
- 7+ years of engineering experience, including platform engineering, system development, or related roles with at least 3 years in leadership roles.
- Prior experience in a startup or fast-paced SaaS environment.
- Strong collaboration and communication skills.
- Act as a mentor and coach, helping engineers grow in a fast-paced, startup environment.
- Manage a team of 5+