Backflip is building a foundation model for mechanical design to enable hundreds of millions of people to create physical objects as easily as imagining them, addressing the current limitation of mechanical design being a bottleneck in the physical world due to the scarcity of CAD experts.
Requirements
- You’ve built and maintained ML data pipelines at scale, ideally for foundation or generative models, that shipped into production in the real world.
- You have deep experience with data engineering for ML, including distributed systems, data extraction, transformation, and loading, and large-scale data processing (e.g. PySpark, Beam, Ray, or similar).
- You’re fluent in Python and experienced with ML frameworks and data formats (Parquet, TFRecord, HuggingFace datasets, etc.).
- You’ve developed data augmentation, sampling, or curation strategies that improved model performance.
- You are comfortable working with a variety of complex data formats, e.g. for 3D geometry kernels or rendering engines.
- You have an interest in math, geometry, topology, rendering, or computational geometry.
- You’ve worked in 3D printing, CAD, or computer graphics domains.
Responsibilities
- Architect and own Backflip’s ML data pipeline, from ingestion to processing to evaluation.
- Define data strategy: establish best practices for data augmentation, filtering, and sampling at scale.
- Design scalable data systems for multimodal training (text, geometry, CAD, and more).
- Develop and automate data collection, curation, and validation workflows.
- Collaborate with MLEs to design and execute experiments that measure and improve model performance.
- Build tools and metrics for dataset analysis, monitoring, and quality assurance.
- Contribute to model development through insights grounded in data, shaping what, how, and when we train.
Other
- This is a core leadership role within the AI team.
- You think like both an engineer and an experimentalist: curious, analytical, and grounded in evidence.
- You collaborate well across AI development, infra, and product, and enjoy building the data systems that make great models possible.
- You care deeply about data quality, reproducibility, and scalability.
- You’re excited to help shape the future of AI for physical design.