NVIDIA is seeking to reshape the future of GPU Cloud computing and contribute to groundbreaking projects in the fields of Deep Learning and AI by hiring a Senior System Software Engineer - Infrastructure
Requirements
- Strong hands-on experience with AWS services (VPC, IAM, EC2, EKS, Lambda, CloudWatch).
- Deep knowledge of Kubernetes internals, Helm charts, and container orchestration principles.
- Proficiency with GitLab CI/CD or equivalent pipeline automation tools.
- Experience implementing GitOps workflows (ArgoCD, FluxCD).
- Strong foundation in scripting languages such as Python, Bash, or Go.
- Familiarity with networking, load balancing, and security in cloud-native environments.
- Experience enforcing cloud and container security standards and compliance practices.
Responsibilities
- Designing, deploying, and maintaining scalable AWS infrastructure using EKS, EC2, S3, and related services.
- Managing and optimizing Kubernetes clusters for high availability, resilience, and performance.
- Creating and maintaining GitLab CI/CD pipelines to automate build, test, and deployment workflows.
- Developing automation scripts and Infrastructure as Code (IaC) templates with Terraform.
- Monitoring system performance and implementing logging, metrics, and alerting through LGTM, Prometheus, Datadog, or Splunk.
- Implementing DevSecOps best practices, embedding security scans, compliance checks, and secret management in the CI/CD lifecycle.
- Supporting platform observability, diagnosing production incidents, and enhancing self-service for developer teams.
Other
- BS/MS in Computer Science and/or equivalent experience.
- 12+ years of hands-on experience building/supporting complex services.
- Excellent documentation, problem-solving, and communication skills for cross-team alignment.
- Ability to work in an encouraging, inclusive environment.
- Must be a great teammate who is inquisitive, innovative, driven to succeed, and autonomous.