Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Genesis AI Logo

Staff Software Engineer, Training (Bay Area / Paris / Remote)

Genesis AI

Salary not specified
Sep 9, 2025
San Carlos, CA, US
Apply Now

Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernels

Requirements

  • Deep experience in distributed systems, ML infrastructure, or high-performance computing (8+ years)
  • Production-grade expertise in Python
  • Low-level performance mastery: CUDA/cuDNN/Triton, CPU–GPU interactions, data movement, and kernel optimization
  • Scaling at the frontier: experience with PyTorch and training jobs using data, context, pipeline, and model parallelism
  • System-level mindset with a track record of tuning hardware–software interactions for maximum utilization

Responsibilities

  • Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernels
  • Design, build, and optimize distributed training systems (PyTorch) for multi-node GPU clusters, ensuring scalability, robustness, and high utilization
  • Implement efficient low-level code (CUDA, cuDNN, Triton, custom kernels) and integrate it seamlessly into high-level training frameworks
  • Optimize workloads for hardware efficiency: CPU/GPU compute balance, memory management, data throughput, and networking
  • Develop monitoring and debugging tools for large-scale runs, enabling rapid diagnosis of performance regressions and failures

Other

  • Deep experience in distributed systems, ML infrastructure, or high-performance computing (8+ years)