Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Member of Technical Staff, ML Systems

Netpreme

Salary not specified

Dec 23, 2025

Santa Clara, CA, US • Cambridge, MA, US

Unlocking greater AI capability while dramatically improving efficiency at the infrastructure layer for LLM inference systems.

Requirements

Prior experience contributing to the core LLM inference infrastructures (vLLM, SGLang, TensorRT, etc.).
Prior experience in accelerator programming (e.g. CUDA, JAX/Pallas, ROCm).
Advanced computer architectures and performance engineering skills is a big plus.

Responsibilities

Prototype and optimize emerging ML inference systems.
Develop novel memory models for expandable vRAM.
Write efficient GPU kernels for data movement.
Perform design-space exploration, implementation, and benchmarking of inference engines, both in simulations and on real hardware.

Other

This role is part engineering, part research
This role will be performed on-site from one of our offices in Santa Clara, CA or Boston, MA.
Relocation assistance and visa sponsorship.
A collaborative, continuous-learning work environment with smart, dedicated colleagues engaged in developing the next generation of architecture for high-performance computing.
We value thoughtful disagreement, fast learning, and intellectual fearlessness.