Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Susquehanna International Group (SIG) Logo

Machine Learning Systems Engineer: Distributed Training

Susquehanna International Group (SIG)

Salary not specified
Aug 28, 2025
Ardmore, PA, USA
Apply Now

Strengthen the performance and scalability of our distributed training infrastructure and streamline the development and execution of large-scale training runs.

Requirements

  • Experience with large-scale ML training pipelines and distributed training frameworks
  • Strong software engineering skills in python
  • Passion for diving deep into systems implementations and understanding fundamentals to improve their performance and maintainability
  • Experience improving resource efficiency across distributed computing environments by leveraging profiling, benchmarking, and implementing system-level optimizations

Responsibilities

  • Collaborate with researchers to enable them to develop systems-efficient models and architectures
  • Apply the latest techniques to our internal training runs to achieve impressive hardware efficiency for our training runs
  • Create tooling to help researchers distribute their training jobs more effectively
  • Profile and optimize our training runs

Other

  • This position is a great fit for someone who enjoys working at the intersection of distributed systems and machine learning, values high-performance code, and has an interest in supporting innovative machine learning efforts.