Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

NVIDIA Logo

Principal Software Engineer - Inference as a Service

NVIDIA

$248,000 - $391,000
Aug 20, 2025
Santa Clara, CA, US
Apply Now

NVIDIA is seeking to build and maintain the core infrastructure that powers their closed and open source AI models as part of the NVIDIA AI Factory initiative, specifically designing and developing an Inference as a Service platform.

Requirements

  • Strong programming skills in Python, Go, or C++ with a track record of building production-grade, highly available systems.
  • Proven experience with container orchestration technologies like Kubernetes.
  • A deep understanding of system architecture for high-performance, low-latency API services.
  • Experience in designing, implementing, and optimizing systems for GPU resource management.
  • Familiarity with modern observability tools (e.g., DataDog, Prometheus, Grafana, OpenTelemetry).
  • Demonstrated experience with deployment strategies and CI/CD pipelines.
  • Excellent problem-solving skills and the ability to work in a fast-paced, collaborative environment.

Responsibilities

  • Lead the design and development of a scalable, robust, and reliable platform for serving AI models for inference as a service.
  • Architect and implement systems for dynamic GPU resource management, autoscaling, and efficient scheduling of inference workloads.
  • Build and maintain the core infrastructure, including load balancing and rate limiting, to ensure the stability and high availability of inference services.
  • Define and implement APIs for model deployment, monitoring, and management for a seamless user experience.
  • Optimize system performance and latency for various model types, from large language models (LLMs) to computer vision models, ensuring high-throughput and responsiveness.
  • Collaborate with engineering teams to integrate deployment, monitoring, and performance telemetry into our CI/CD pipelines.
  • Develop tools and frameworks for real-time observability, performance profiling, and debugging of inference services.

Other

  • BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering or related fields (or equivalent experience)
  • 15+ years of software engineering experience with deep expertise in distributed systems or large-scale backend infrastructure.
  • Ability to work in a fast-paced, collaborative environment.
  • Creative and autonomous work style.
  • Commitment to fostering a diverse work environment and equal opportunity employment.