Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Machine Learning Engineer (CUDA) | $250/hr Remote | Mercor

Crossing Hurdles

$120 - $250

Dec 4, 2025

Remote, US

Mercor collaborates with the world’s leading AI research labs to build and train cutting-edge AI models. This role is to develop, optimize, and benchmark CUDA kernels for tensor and operator workloads to improve AI model performance.

Requirements

Deep expertise in CUDA, GPU architecture, and memory optimization
Proven record of quantifiable performance improvements across hardware generations
Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability
Familiarity with PyTorch, TensorFlow, or Triton (preferred but not required)

Responsibilities

Develop, optimize, and benchmark CUDA kernels for tensor and operator workloads
Tune for occupancy, memory coalescing, instruction-level parallelism, and optimal warp scheduling
Profile and diagnose performance bottlenecks with tools such as Nsight Systems and Nsight Compute
Report performance results, analyze speedups, and propose architectural improvements
Integrate kernels with PyTorch and collaborate asynchronously with operator specialists
Produce reproducible benchmarks and write comprehensive performance documentation

Other

Strong communication and independent problem-solving skills
Demonstrated contributions in open-source, research, or performance benchmarking
Training support will be provided
Hourly Contract
Remote