Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

StackAV Logo

Staff Software Engineer, ML Training

StackAV

Salary not specified
Oct 17, 2025
Pittsburgh, PA, US • Remote, US
Apply Now

The company is looking to enhance the efficiency and scalability of its machine learning model training process.

Requirements

  • Experience with both ML Platforms and building ML-based applications
  • Experience building scalable, reliable infra at a fast-paced environment
  • Experience with model training, model optimization, or large data processing pipelines
  • Machine Learning Expertise is preferred but not necessary
  • Knows how to push the GPU to its limit from Python to CUDA kernel level
  • Built the inference or training loop for a large model (ideally with LLM flavor)
  • Knows how to write (readable) high performance C++

Responsibilities

  • Setup efficiency monitoring for all training jobs to identify models that need improvement
  • Work with customer teams to benchmark/profile their jobs and make improvements
  • Create standardized APIs for stack-wide abstractions like training datasets, bulk inference jobs, evaluation metrics
  • Optimize dataloaders / training data formats to ensure high gpu utilization
  • Optimize distributed training configurations (network topologies, sharding strategies, pipelines, etc)

Other

  • High customer empathy, able to communicate with customers well
  • Comfortable reading papers / keeping up with SOTA ML literature
  • 5+ years as a SWE, ideally building infrastructure/customer facing product, experience in AV or robotics is also great
  • U.S. person status, and/or citizenship status may be required for this position