NVIDIA is seeking a Senior Systems Software Engineer to accelerate the validation and quality of their Autonomous Vehicle (AV) software stack, focusing on coverage, analysis, and tooling development to ensure reliability, scalability, and adherence to safety-critical standards in a fast-moving environment.
Requirements
- Strong background in Linux systems, distributed systems, and infrastructure engineering.
- Hands-on experience with Bazel build system and its integration into CI/CD pipelines.
- Proficiency in C++, Python and Bash.
- Experience with PostgreSQL and data handling at scale.
- Knowledge of cloud and on-prem environments: Kubernetes, Docker, VM infrastructure.
- Familiarity with logging, monitoring, and alerting stacks (Grafana, Prometheus, ELK stack).
- Prior experience with coverage frameworks (lcov, gcov, VectorCAST) and delivering quality metrics in compliance-heavy environments.
Responsibilities
- Design, deploy, and maintain distributed infrastructure to support AV software builds, simulation, and validation.
- Operate and optimize Bazel-based build/test pipelines, integrating with CI/CD frameworks (e.g. GitLab, Jenkins).
- Support large-scale data and service workflows with a focus on performance, scalability, and reliability.
- Enable developers with tools, wrappers, and automation that improve correctness, prevent regressions, and enforce quality gates before code is merged.
- Provide mechanisms for automated analysis, triage, and reporting that help developers and stakeholders act on results quickly.
- Build dashboards and metrics for system health, workload quality, and resource utilization across compute and storage environments.
- Communicate proactively with stakeholders, ensuring no issues are left unattended and infra evolves alongside developer needs.
Other
- 5+ years of professional experience in infrastructure, distributed systems, or platform engineering.
- Ability to collaborate across teams and communicate effectively with developers and stakeholders.
- Problem-solving mindset: capable of debugging across the stack (infra, build system, workloads).
- Hands-on experience with static analysis tooling like Coverity, and embedding it into developer workflows.
- Background in safety-critical domains like automotive, with audit-driven workflows.