An AI startup is seeking a Senior Systems Engineer to optimize deep learning performance at scale.
Requirements
- 3+ years of systems-level engineering experience in deep learning environments
- Strong Python development and debugging skills
- Kernel optimization (parallelization, performance tuning)
- GPU / AI accelerator compute model familiarity
- Large-scale distributed training (diagnosing bottlenecks in clusters)
- PyTorch framework optimization and runtime improvements
- Deep understanding of CUDA, Triton, and related internals
Responsibilities
- work at the intersection of systems, infrastructure, and machine learning, driving improvements across model training, inference, and distributed compute environments.
- focus on kernel-level optimization, GPU/accelerator efficiency, and deep framework tuning to push the boundaries of performance for next-generation AI workloads.
- contribute across large-scale data processing, model parallelism, and runtime efficiency.
- diagnose and optimize performance bottlenecks across kernels, frameworks, and clusters.
- accelerating training and inference performance at scale
Other
- wear multiple hats in a startup environment
- work in a collaborative high-impact envoronment