Enable seamless development, deployment, and monitoring of machine learning models at scale on Google Cloud Platform (GCP)
Requirements
- Proficiency in programming languages such as Python
- Expertise in GCP services, including Vertex AI, Google Kubernetes Engine (GKE), Cloud Run, BigQuery, Cloud Storage, and Cloud Composer
- Experience with infrastructure-as-code - Terraform
- Familiarity with containerization (Docker, GKE) and CI/CD pipelines, GitLab and Bitbucket
- Knowledge of ML frameworks (TensorFlow, PyTorch, scikit-learn) and MLOps tools compatible with GCP (MLflow, Kubeflow)
Responsibilities
- Design and implement pipelines for deploying machine learning models into production using GCP services
- Build and maintain scalable GCP-based infrastructure using services like Google Compute Engine, Google Kubernetes Engine (GKE), and Cloud Storage
- Develop automated workflows for data ingestion, model training, validation, and deployment using GCP tools
- Implement monitoring solutions using Google Cloud Monitoring and Logging to track model performance, data drift, and system health
- Manage versioning of datasets, models, and code using GCP tools like Artifact Registry or Cloud Storage
- Optimize model performance and resource utilization on GCP, leveraging containerization with Docker and GKE
Other
- Strong problem-solving and analytical skills
- Excellent communication and collaboration abilities
- Ability to work in a fast-paced, cross-functional environment