Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Machine Learning Systems Engineer: Distributed Training

Susquehanna International Group (SIG)

Salary not specified

Aug 28, 2025

Ardmore, PA, USA

Strengthen the performance and scalability of our distributed training infrastructure and streamline the development and execution of large-scale training runs.

Requirements

Experience with large-scale ML training pipelines and distributed training frameworks
Strong software engineering skills in python
Passion for diving deep into systems implementations and understanding fundamentals to improve their performance and maintainability
Experience improving resource efficiency across distributed computing environments by leveraging profiling, benchmarking, and implementing system-level optimizations

Responsibilities

Collaborate with researchers to enable them to develop systems-efficient models and architectures
Apply the latest techniques to our internal training runs to achieve impressive hardware efficiency for our training runs
Create tooling to help researchers distribute their training jobs more effectively
Profile and optimize our training runs

Other

This position is a great fit for someone who enjoys working at the intersection of distributed systems and machine learning, values high-performance code, and has an interest in supporting innovative machine learning efforts.