Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Machine Learning Engineer, Distributed & Scalable Training

Pioneering Intelligence

Salary not specified

Aug 21, 2025

Cambridge, MA, US

Lila Sciences is seeking an ML Engineer specializing in distributed and scalable training to design and maintain large-scale training systems, optimize performance for massive models, and integrate cutting-edge techniques to improve efficiency and throughput for their scientific superintelligence platform.

Requirements

Proven experience with distributed ML training frameworks (Megatron-LM, TorchTitan, DeepSpeed, Ray)
Strong software engineering skills (Python, C++ kernel contributions are a plus)
Understanding of large-scale model training techniques
Experience with cloud or HPC environments
Prior work with scientific datasets or domain-specific modeling
Contributions to open-source ML frameworks

Responsibilities

Design and maintain large-scale training systems
Optimize performance for massive models
Integrate cutting-edge techniques to improve efficiency and throughput
Ray-based distributed training infrastructure for LLMs and multi-modal models
Performance optimizations for large-scale model training including training and optimization workflows (SFT, MoE, long-context scaling)
Orchestrate frontier and open source LLMs along with complex compute-intensive tool use
Scalable pipelines for data preprocessing and experiment orchestration, including tools for efficient data loading, pipeline parallelism, and optimizer tuning

Other

If this sounds like an environment you’d love to work in, even if you only have some of the experience listed below, we encourage you to apply.