NVIDIA is seeking a Senior Software Engineer to design and implement core container strategy for NVIDIA Inference Microservices (NIMs) and hosted services, focusing on building enterprise-grade software and tooling for container build, packaging, and deployment to improve reliability, performance, and scale across thousands of GPUs.
Requirements
- 6+ years building production software with a strong focus on containers and Kubernetes.
- Strong Python skills building production-grade tooling/services
- Experience with Python SDKs and clients for Kubernetes and cloud services
- Expert knowledge of Docker/BuildKit, containerd/OCI, image layering, multi-stage builds, and registry workflows.
- Deep experience operating workloads on Kubernetes.
- Hands-on experience building and running GPU workloads in k8s, including NVIDIA device plugin, MIG, CUDA drivers/runtime, and resource isolation.
- Expertise with Helm chart design systems, Operators, and platform APIs serving many teams.
Responsibilities
- Design, build, and harden containers for NIM runtimes, inference backends; enable reproducible, multi-arch, CUDA-optimized builds.
- Develop Python tooling and services for build orchestration, CI/CD integrations, Helm/Operator automation, and test harnesses; enforce quality with typing, linting, and unit/integration tests.
- Help design and evolve Kubernetes deployment patterns for NIMs, including GPU scheduling, autoscaling, and multi-cluster rollouts.
- Optimize container performance: layer layout, startup time, build caching, runtime memory/IO, network, and GPU utilization; instrument with metrics and tracing.
- Evolve the base image strategy, dependency management, and artifact/registry topology.
- Collaborate across research, backend, SRE, and product teams to ensure day-0 availability of new models.
- Mentor teammates; set high engineering standards for container quality, security, and operability.
Other
- A degree in Computer Science, Computer Engineering, or a related field (BS or MS) or equivalent experience.
- Excellent collaboration and communication skills; ability to influence cross-functional design.
- Experience with OpenAI API, Hugging Face API as well as understanding difference inference backends (vLLM, SGLang, TRT-LLM)
- Background in benchmarking and optimizing inference container performance and startup latency at scale.
- Prior experience designing multi-tenant, multi-cluster, or edge/air-gapped container delivery.