Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior Software Engineer - Inference as a Service

NVIDIA

$200,000 - $391,000

Aug 22, 2025

Santa Clara, CA, US

NVIDIA is seeking a Senior Software Engineer to build and maintain the core infrastructure for their AI Factory initiative, specifically focusing on an Inference as a Service platform to manage GPU resources and deliver high-performance, low-latency AI model inference at scale.

Requirements

Strong programming skills in Python, Go, or C++ with a track record of building production-grade, highly available systems.
Proven experience with container orchestration technologies like Kubernetes.
A strong understanding of system architecture for high-performance, low-latency API services.
Experience in designing, implementing, and optimizing systems for GPU resource management.
Familiarity with modern observability tools (e.g., DataDog, Prometheus, Grafana, OpenTelemetry).
Demonstrated experience with deployment strategies and CI/CD pipelines.
Experience with specialized inference serving frameworks

Responsibilities

Contribute to the design and development of a scalable, robust, and reliable platform for serving AI models for inference as a service.
Develop and implement systems for dynamic GPU resource management, autoscaling, and efficient scheduling of inference workloads.
Build and maintain the core infrastructure, including load balancing and rate limiting, to ensure the stability and high availability of inference services.
Implement APIs for model deployment, monitoring, and management for a seamless user experience.
Collaborate with engineering teams to integrate deployment, monitoring, and performance telemetry into our CI/CD pipelines.
Build tools and frameworks for real-time observability, performance profiling, and debugging of inference services.
Work with architects to define and implement best practices for long-term platform evolution.

Other

12+ years of software engineering experience with expertise in distributed systems or large-scale backend infrastructure.
Excellent problem-solving skills and the ability to work in a fast-paced, collaborative environment.
BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering or related fields (or equivalent experience)
Open-source contributions to projects in the AI/ML, distributed systems, or infrastructure space.
Hands-on experience with performance optimization techniques for AI models, such as quantization or model compression.