May Mobility is transforming cities through autonomous technology to create a safer, greener, more accessible world. May Mobility is experiencing a period of significant growth as we expand our autonomous shuttle and mobility services nationwide. We are seeking a talented data engineer specializing in building highly scalable, reliable, and performant data pipelines to support automated tagging and analysis systems.
Requirements
- Expert proficiency in PySpark and distributed computing frameworks for processing petabyte-scale datasets.
- Deep working knowledge of cloud ecosystems (AWS, GCP, or Azure) and modern data lake/warehouse technologies.
- Demonstrated experience in migrating prototype data pipelines/scripts into fully managed, production-ready, fault-tolerant ETL systems.
- Ability to work with high-volume, multimodal sensor data, understanding the complexities of temporal alignment
- Implement strategies to reliably synchronize generated metadata and tags e.g., tags created during real-time processing and post-processing analysis.
- Experience working with data formats common in the AV/Robotics space (e.g., ROS bags, Protobuf, custom logging formats).
Responsibilities
- Design, build, and optimize high-throughput ETL pipelines using PySpark and cloud services to manage the flow of multimodal AV sensor logs
- Collaborate directly with ML Engineers to productionize, scale, and performance-tune the model inference pipelines, focusing on maximizing data throughput and minimizing operational costs
- Implement robust data quality checks, schema validation, and monitoring on all raw input data and on the structured, searchable metadata
- Identify bottlenecks in data movement and processing, improve the speed and efficiency of data preparation, and downstream data retrieval for dashboards and data search functionalities.
- Serve as the liaison between the Data Science teams and the Data Platform team, advocating for and implementing infrastructure improvements necessary for long-term scalability and reliability.
Other
- 5+ years of professional experience in a Data Engineering role, specifically focused on supporting complex analytics initiatives and machine learning.
- Experience collaborating cross-functionally to define service level objectives for pipeline uptime, latency, and data freshness.
- Clear written communication and the ability to align folks on a plan before executing
- Excellent attention to detail and rigorous testing methodology.
- Identify complex problems and devise optimal and innovative solutions that often cross organizational boundaries.