Avride is looking to improve the quality and representativeness of datasets powering its self-driving systems, to enhance data efficiency, reduce labeling costs, and improve model performance.
Requirements
- Strong Python skills for algorithm development and prototyping
- Solid understanding of ML concepts (metrics, evaluation, dataset sampling, etc.)
- Experience with data processing and analysis at scale
- Ability to move between research prototyping and production engineering
- Experience with auto-labeling, weak supervision, or human-in-the-loop ML
- Exposure to 3D data (point clouds, sensor fusion, 3D annotation pipelines)
- Workflow orchestration systems (Argo, Airflow, etc.)
Responsibilities
- Design and implement algorithms that optimize annotation, including auto-labeling systems that reduce manual effort and increase throughput
- Build data-mining and active-learning pipelines to surface the highest-value samples for training
- Create dataset-quality monitoring systems identifying noise, redundancy, and low-value data
- Develop analytics platforms (databases, dashboards, reporting) to track dataset quality and coverage over time
- Collaborate with ML and Perception teams to integrate research results into production workflows
- Explore emerging approaches (vision-language models, weak supervision, uncertainty estimation) to expand dataset quality and automation
Other
- Bachelor's or Master's degree in Computer Science or related field
- Ability to work in the U.S. (candidates are required to be authorized to work in the U.S.)
- No relocation sponsorship available
- No remote work options available
- Strong analytical mindset and curiosity to dig deep into data quality problems