NVIDIA is seeking a Senior Software Engineer to build and maintain the core infrastructure for their AI Factory initiative, specifically focusing on an Inference as a Service platform to manage GPU resources and deliver high-performance, low-latency AI model inference at scale.
Requirements
- Strong programming skills in Python, Go, or C++ with a track record of building production-grade, highly available systems.
- Proven experience with container orchestration technologies like Kubernetes.
- A strong understanding of system architecture for high-performance, low-latency API services.
- Experience in designing, implementing, and optimizing systems for GPU resource management.
- Familiarity with modern observability tools (e.g., DataDog, Prometheus, Grafana, OpenTelemetry).
- Demonstrated experience with deployment strategies and CI/CD pipelines.
- Experience with specialized inference serving frameworks
Responsibilities
- Contribute to the design and development of a scalable, robust, and reliable platform for serving AI models for inference as a service.
- Develop and implement systems for dynamic GPU resource management, autoscaling, and efficient scheduling of inference workloads.
- Build and maintain the core infrastructure, including load balancing and rate limiting, to ensure the stability and high availability of inference services.
- Implement APIs for model deployment, monitoring, and management for a seamless user experience.
- Collaborate with engineering teams to integrate deployment, monitoring, and performance telemetry into our CI/CD pipelines.
- Build tools and frameworks for real-time observability, performance profiling, and debugging of inference services.
- Work with architects to define and implement best practices for long-term platform evolution.
Other
- 12+ years of software engineering experience with expertise in distributed systems or large-scale backend infrastructure.
- Excellent problem-solving skills and the ability to work in a fast-paced, collaborative environment.
- BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering or related fields (or equivalent experience)
- Open-source contributions to projects in the AI/ML, distributed systems, or infrastructure space.
- Hands-on experience with performance optimization techniques for AI models, such as quantization or model compression.