Applied Intuition needs to build a central data and machine learning platform to support all verticals of the company, including data ingestion from a production fleet, data processing and storage, labeling infrastructure, and machine learning infrastructure.
Requirements
- Strong backend engineering experience
- Hands-on experience with modern data stack technologies including Apache Spark, Hudi, Trino, Kafka, or similar distributed data processing frameworks
- Knowledge of data lake architectures
- Knowledge of streaming systems
- Knowledge of workflow orchestration platforms like Flyte
- Experience with Apache Spark
- Experience with Apache Hudi
Responsibilities
- Design and build large-scale data platforms to support our AI research and autonomy stack development, handling petabytes of multimodal sensor data from real-world driving scenarios
- Work on data curation and tagging platforms that enable efficient dataset discovery, labeling workflows, and quality assessment across diverse driving conditions
- Build high-performance data processing systems using modern distributed computing frameworks to transform raw sensor data into training-ready formats
- Use the following technologies: Apache Spark, Apache Hudi, Trino, Apache Kafka, Flyte, Kubernetes, Python, Golang, Java
Other
- Problem-solving skills and experience working with cross-functional teams in fast-paced environments
- We are an in-office company, and our expectation is that employees primarily work from their Applied Intuition office 5 days a week. However, we also recognize the importance of flexibility and trust our employees to manage their schedules responsibly. This may include occasional remote work, starting the day with morning meetings from home before heading to the office, or leaving earlier when needed to accommodate family commitments. (Note: For EpiSci job openings, fully remote work will be considered by exception.)