Agtonomy is looking to scale its ML platform to accelerate model iteration and improve training performance across an expanding GPU ecosystem, addressing challenges in agriculture, turf, and beyond by transforming heavy machinery into intelligent, autonomous systems.
Requirements
- Experience with Python, Docker, Kubernetes, and Infrastructure as code (e.g. terraform).
- Hands-on experience with data pipelines, ETL processes, and distributed computing in cloud environments (AWS, GCP, or similar).
- You’ve wrangled massive datasets and built systems to organize, label, and evaluate them at scale; come with examples!
- Experience working with data from multiple sensors like cameras, LiDAR, and radar.
- You’ve benchmarked complex systems or large-scale ML models, finding failure modes and turning them into wins.
- Familiarity with Nvidia TensorRT or similar tools for optimizing ML inference.
Responsibilities
- Architect and build distributed training pipelines that scale to handle petabytes of real-world data from farms, fields, and other rugged environments.
- Own the ML lifecycle: curate, label, and visualize massive datasets from cameras, LiDAR, and radar to train world-class models.
- Implement metrics and tags to provide a holistic understanding of model performance and enable the discovery of interesting scenarios for training and evaluation.
- Create tools to visualize predictions and identify failure cases.
- Partner with autonomy, platform and cloud engineers to shape models that run flawlessly on real machines in harsh environments.
Other
- At least 3 years of experience building systems that matter.
- A knack for thriving in a fast-paced, collaborative startup where you’ll own big problems and deliver bigger solutions.
- Collaborative work environment working alongside passionate mission-driven team!
- Phone Screen with Hiring Manager (30 minutes)
- Technical Evaluation in Domain (1 hour)