Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior Software Developer - AI Infra Compute

Oracle

Salary not specified

Oct 23, 2025

Remote, US

OCI (Oracle Cloud Infrastructure) AI Infrastructure is looking to solve the problem of building a cutting-edge, ultra-high-performance GPU platform to support AI/ML/HPC workloads, allowing customers to scale from tens to thousands of GPUs without compromising performance.

Requirements

Deep understanding of operating systems, computer networks, and high-performance applications
Proficient in one programming language (java/python/c/c++/goLang/shell scripting)
Strong background in Linux systems
Familiarity with system-level architecture, data synchronization, fault tolerance, and state management
General enterprise storage, networking, or computing experience
Experience with RoCE and Infiniband technologies
Understanding of distributed systems and algorithms

Responsibilities

Designing and developing fundamental architectural changes for GPU delivery, health monitoring, triage automation, and diagnostic services
Designing, implementing, and delivering software, firmware for managing GPU based AI servers
Working closely with product teams to debug, resolve customer's issues
Building groundbreaking solutions for customers from the ground up
Delivering and operating large-scale production systems (1000+ server instances)
Diving deep into any part of the stack, as well as software debugging and low-level systems troubleshooting
Collaborating effectively with various dependencies, including Network and Data Center operations

Other

BS or MS degree in Computer Science or relevant technical field involving coding or equivalent practical experience
Adaptable Engineers: Self-motivated individuals with a quick learning ability
Collaborative Spirit: Comfortable working in a collaborative, agile environment and eager to learn
Ability to collaborate effectively with various dependencies
4+ years’ experience delivering and operating large-scale production systems (1000+ server instances)