Oracle Cloud Infrastructure is seeking a Principal Engineer to develop and optimize AI hardware and software solutions for next-gen Cloud AI Infrastructure platforms
Requirements
- Demonstrated ability to write great code using Java, GoLang, C-Sharp, or similar OO languages
- Solid knowledge of AI / GPU platform architecture and their capabilities
- Experience working on large-scale, highly distributed services infrastructure
- Solid working experience with GPU supplier test code as well as open-source AI test / characterization tools
- Experience with the architecture, design, and implementation of modern server platforms consisting of multiple architectures and vendors
- Demonstrated experience debugging and root-causing complex issues that may have a mix of hardware and software causes
- Experience with AI accelerator chips and knowledge of AI accelerator benchmarks and tools for performance evaluation
Responsibilities
- Evaluation of system architecture and proposed implementation path analysis
- Work directly with hardware design and development teams on architecture, implementation, development, deployment, and troubleshooting of AI hardware platforms
- Conduct comprehensive benchmarking and performance analysis of AI accelerators from emerging hardware vendors
- Compare and contrast new AI accelerators with industry-standard hardware for training and inference workloads
- Develop tools and processes for evaluating the performance of hardware in real-world AI applications
- Contribute to the design and improvement of performance optimization algorithms for AI models running on the hardware
- Collaborate with software engineers to ensure tight integration with AI workloads
Other
- BS or MS degree in Computer Science or relevant technical field involving coding or equivalent practical experience
- 10+ years of total experience in software development
- Systematic problem-solving approach, strong communication skills, a sense of ownership, and drive
- Ability to work independently and provide technical leadership to the broader organization
- Understanding of operational excellence and ability to infuse a culture of being proactive within the team