Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Pioneering Intelligence Logo

Machine Learning Engineer, Distributed & Scalable Training

Pioneering Intelligence

Salary not specified
Aug 21, 2025
Cambridge, MA, US
Apply Now

Lila Sciences is seeking an ML Engineer specializing in distributed and scalable training to design and maintain large-scale training systems, optimize performance for massive models, and integrate cutting-edge techniques to improve efficiency and throughput for their scientific superintelligence platform.

Requirements

  • Proven experience with distributed ML training frameworks (Megatron-LM, TorchTitan, DeepSpeed, Ray)
  • Strong software engineering skills (Python, C++ kernel contributions are a plus)
  • Understanding of large-scale model training techniques
  • Experience with cloud or HPC environments
  • Prior work with scientific datasets or domain-specific modeling
  • Contributions to open-source ML frameworks

Responsibilities

  • Design and maintain large-scale training systems
  • Optimize performance for massive models
  • Integrate cutting-edge techniques to improve efficiency and throughput
  • Ray-based distributed training infrastructure for LLMs and multi-modal models
  • Performance optimizations for large-scale model training including training and optimization workflows (SFT, MoE, long-context scaling)
  • Orchestrate frontier and open source LLMs along with complex compute-intensive tool use
  • Scalable pipelines for data preprocessing and experiment orchestration, including tools for efficient data loading, pipeline parallelism, and optimizer tuning

Other

  • If this sounds like an environment you’d love to work in, even if you only have some of the experience listed below, we encourage you to apply.