Serve Robotics is looking to solve the problem of urban congestion and inefficient delivery systems by developing and deploying a fleet of autonomous sidewalk robots. The company needs to build and maintain a robust data and ML platform to support the growth and efficiency of these robotic deliveries, enabling data partnerships, ML research, and autonomy engineering.
Requirements
- 3+ years of industry experience building, running and improving large-volume data processing, feature extraction, data annotation workflows
- Experience building data mining and search capabilities
- Experience with both Python and SQL is required
- Solid understanding of data distributions and their impact on ML Models
- Hands-on experience and good understanding of LLMs, VLMs, embeddings, vector databases
- Experience with data annotation providers such as CVAT, LabelBox, LabelStudio, etc
- Experience with integrating cloud inference platforms for LLMs/VLMS (ChatGPT, Gemini, etc)
Responsibilities
- Develop and maintain highly scalable data processing pipelines for data curation, annotation, search and ml feature extraction.
- Build data discovery features for the platform.
- Create and maintain search features such as natural language querying
- Develop and maintain our orchestration and scheduling systems.
- Maintain and evolve our data schemas such as unified data attribute system, scenario tagging and management
- Build integrations with annotation providers to efficiently review large scale data preannotations
- Collaborate with autonomy engineers to collect feedback, improve documentation, and run tutorials on platform features
Other
- BS or MS in computer science with focus in data engineering and/or machine learning
- Experience with robotics systems