At Motional, we're transforming how autonomous vehicles discover critical intelligence hidden within petabytes of multimodal sensor data. Our next-generation autonomous driving stack depends on finding the rare edge cases, long-tail scenarios, and model errors that matter most. OmniTag, our ML-powered multimodal data mining framework, is the engine that powers this discovery.
Requirements
- Deep, hands-on expertise with Ray or Spark (or both) for distributed data processing and large-scale inference workloads
- Expert-level Python proficiency with strong software engineering fundamentals: testing (unit, integration, and end-to-end), CI/CD pipelines, containerization, and code review practices
- Proven experience optimizing and scaling production data pipelines that process terabytes or petabytes of data
- Strong SQL and data manipulation skills; comfort with both structured and semi-structured data
- Experience with cloud infrastructure (AWS preferred: S3, EC2, EKS, EMR, IAM) and infrastructure-as-code patterns
- Demonstrated track record of shipping robust, well-tested, production-grade systems and mentoring junior engineers
- Experience building or scaling vector databases, large-scale information retrieval systems, or similarity search engines.
Responsibilities
- Design and build the high-throughput, low-latency backend systems that execute billion-scale inference across Ray/Spark, transforming raw sensor data into unified multimodal representations.
- Own the complete data journey - from ingestion, normalization, and preprocessing of heterogeneous modalities (image, video, LiDAR, audio) through encoding, indexing, and cached embedding storage.
- Enhance our in-house billion-scale vector search engine to power RAG-driven few-shot dataset creation.
- Build comprehensive monitoring, logging, and alerting for multimodal data preprocessing pipelines.
- Work closely with ML engineers to support domain-specific fine-tuning workflows, model versioning, and A/B testing of new encoders and decoders.
- Establish patterns for graceful degradation, fault tolerance, and cost optimization.
- Operate OmniTag as a mission-critical data platform serving the entire ML organization, with a focus on reliability, debuggability, and operational excellence.
Other
- BS in Computer Science or a related field, or equivalent professional experience
- 6+ years designing, building, and operating large-scale distributed systems in production environments
- We encourage a hybrid schedule with in-office time at one of our locations in Boston, Pittsburgh, or Las Vegas to support collaboration, or this role can be fully remote.
- Motional is a driverless technology company making autonomous vehicles a safe, reliable, and accessible reality.
- We believe in building a great place to work through a progressive, global culture that is diverse, inclusive, and ensures people feel valued at every level of the organization.