Brain Corp is seeking to solve the problem of managing and optimizing a large-scale autonomous robotics fleet through advanced cloud-based AI and machine learning systems. The goal is to ensure efficient data ingestion, model training, and deployment for autonomous robots operating in commercial environments.
Requirements
- Expert-level knowledge of Google Cloud Platform (GCP) services such as GKE, Dataflow, BigQuery, Cloud Run, Pub/Sub, Vertex AI, and Cloud Storage.
- Strong proficiency in Go, Python, or TypeScript, with an emphasis on maintainable, production-quality code.
- Deep understanding of machine learning pipelines: data ingestion, preprocessing, training, deployment, and inference.
- Experience optimizing GPU workloads, autoscaling, and resource scheduling in cloud environments.
- Proven success in designing high-availability and fault-tolerant distributed systems.
- Hands-on experience with containerization and orchestration technologies (Docker, Kubernetes).
- Familiarity with infrastructure-as-code tools (Pulumi, Terraform) and CI/CD systems (e.g., Jenkins, GitHub Actions).
Responsibilities
- Lead a team of cloud software engineers, providing technical mentorship, career guidance, and performance management.
- Define and execute the cloud technical roadmap, ensuring alignment with Brain Corp’s business and product goals.
- Architect and implement high-availability, scalable, and secure systems on Google Cloud Platform (GCP) to support machine learning workloads and data ingestion at scale.
- Design, build, and operate ML pipelines that process hundreds of thousands of images daily, enabling rapid model iteration and deployment.
- Develop and optimize GPU resource management strategies, improving model serving throughput, latency, and cost efficiency.
- Build canary and staging environments to ensure safe, progressive deployments and system resilience.
- Collaborate cross-functionally with ML, DevOps, and robotics teams to define APIs, data models, and operational workflows for cloud–robot communication.
Other
- Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field.
- 7+ years of professional software engineering experience, including 3+ years in cloud architecture or large-scale distributed systems.
- Proven experience designing and operating GCP-based ML systems at scale.
- Excellent problem-solving, communication, and leadership skills.
- Passion for robotics, automation, and enabling intelligence at scale.