Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Oracle Logo

Senior Software Developer - AI Infra Compute

Oracle

Salary not specified
Oct 23, 2025
Remote, US
Apply Now

OCI (Oracle Cloud Infrastructure) AI Infrastructure is looking to solve the problem of building a cutting-edge, ultra-high-performance GPU platform to support AI/ML/HPC workloads, allowing customers to scale from tens to thousands of GPUs without compromising performance.

Requirements

  • Deep understanding of operating systems, computer networks, and high-performance applications
  • Proficient in one programming language (java/python/c/c++/goLang/shell scripting)
  • Strong background in Linux systems
  • Familiarity with system-level architecture, data synchronization, fault tolerance, and state management
  • General enterprise storage, networking, or computing experience
  • Experience with RoCE and Infiniband technologies
  • Understanding of distributed systems and algorithms

Responsibilities

  • Designing and developing fundamental architectural changes for GPU delivery, health monitoring, triage automation, and diagnostic services
  • Designing, implementing, and delivering software, firmware for managing GPU based AI servers
  • Working closely with product teams to debug, resolve customer's issues
  • Building groundbreaking solutions for customers from the ground up
  • Delivering and operating large-scale production systems (1000+ server instances)
  • Diving deep into any part of the stack, as well as software debugging and low-level systems troubleshooting
  • Collaborating effectively with various dependencies, including Network and Data Center operations

Other

  • BS or MS degree in Computer Science or relevant technical field involving coding or equivalent practical experience
  • Adaptable Engineers: Self-motivated individuals with a quick learning ability
  • Collaborative Spirit: Comfortable working in a collaborative, agile environment and eager to learn
  • Ability to collaborate effectively with various dependencies
  • 4+ years’ experience delivering and operating large-scale production systems (1000+ server instances)