Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Crossing Hurdles Logo

Machine Learning Engineer (CUDA) | $250/hr Remote | Mercor

Crossing Hurdles

$120 - $250
Dec 4, 2025
Remote, US
Apply Now

Mercor collaborates with the world’s leading AI research labs to build and train cutting-edge AI models. This role is to develop, optimize, and benchmark CUDA kernels for tensor and operator workloads to improve AI model performance.

Requirements

  • Deep expertise in CUDA, GPU architecture, and memory optimization
  • Proven record of quantifiable performance improvements across hardware generations
  • Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability
  • Familiarity with PyTorch, TensorFlow, or Triton (preferred but not required)

Responsibilities

  • Develop, optimize, and benchmark CUDA kernels for tensor and operator workloads
  • Tune for occupancy, memory coalescing, instruction-level parallelism, and optimal warp scheduling
  • Profile and diagnose performance bottlenecks with tools such as Nsight Systems and Nsight Compute
  • Report performance results, analyze speedups, and propose architectural improvements
  • Integrate kernels with PyTorch and collaborate asynchronously with operator specialists
  • Produce reproducible benchmarks and write comprehensive performance documentation

Other

  • Strong communication and independent problem-solving skills
  • Demonstrated contributions in open-source, research, or performance benchmarking
  • Training support will be provided
  • Hourly Contract
  • Remote