Field AI is transforming how robots interact with the real world by building risk-aware, reliable, and field-ready AI systems that address the most complex challenges in robotics, unlocking the full potential of embodied intelligence.
Requirements
- Strong expertise in computer vision, video understanding, temporal modeling, and VLMs.
- Proficiency in Python and PyTorch with production-level coding skills.
- Experience building pipelines for large-scale video/image datasets.
- Familiarity with AWS or other cloud platforms for ML training and deployment.
- Understanding of MLOps best practices (CI/CD, experiment tracking).
- Hands-on experience fine-tuning open-source multimodal models using HuggingFace, DeepSpeed, vLLM, FSDP, LoRA/QLoRA.
- Knowledge of precision tradeoffs (FP16, bfloat16, quantization) and multi-GPU optimization.
Responsibilities
- Train and fine-tune million- to billion-parameter multimodal models, with a focus on computer vision, video understanding, and vision-language integration.
- Track state-of-the-art research, adapt novel algorithms, and integrate them into FiFM.
- Curate datasets and develop tools to improve model interpretability.
- Build scalable evaluation pipelines for vision and multimodal models.
- Contribute to model observability, drift detection, and error classification.
- Fine-tune and optimize open-source VLMs and multimodal embedding models for efficiency and robustness.
- Build and optimize Multi-VectorRAG pipelines with vector DBs and knowledge graphs.
Other
- Master’s/Ph.D. in Computer Science, AI/ML, Robotics, or equivalent industry experience.
- 2+ years of industry experience or relevant publications in CV/ML/AI.
- Ability to design scalable evaluation pipelines for vision/VLMs and agent performance.
- Ability to work with a world-class team that thrives on creativity, resilience, and bold thinking.
- Willingness to work in a hybrid or remote environment.