Red Hat OpenShift AI (RHOAI) is looking for a Principal Software Engineer with Kubernetes and MLOps (Machine Learning) experience to join their rapidly growing engineering team to create a platform, partner ecosystem, and community by which enterprise customers can solve problems to accelerate business success using AI.
Requirements
- Proven expertise with Kubernetes API development and testing (CRs, Operators, Controllers), including reconciliation logic.
- Strong background with model serving (like KServe, vLLM) and distributed inference strategies for LLMs (tensor, pipeline, data parallelism).
- Deep understanding of GPU optimization, autoscaling (KEDA/Knative), and low-latency networking (e.g., NVLink, P2P GPU).
- Experience architecting resilient, secure, and observable systems for model serving, including metrics and tracing.
- Advanced skills in Go and Python; ability to design APIs for high-performance inference and streaming.
- Excellent system troubleshooting skills in cloud environments and the ability to innovate in fast-paced environments.
- An existing contributor in one or more MLOps open source projects such as KubeFlow, KServe, RayServe, and vLLM is a huge plus
Responsibilities
- Lead the team strategy and implementation for Kubernetes-native components in Model Serving, including Custom Resources, Controllers, and Operators.
- Be an influencer and leader in MLOps-related open source communities to help build an active MLOps open source ecosystem for Open Data Hub and OpenShift AI
- Architect and design new features for open-source MLOps communities such as KubeFlow and KServe
- Provide technical vision and leadership on critical and high-impact projects
- Ensure non-functional requirements including security, resiliency, and maintainability are met
- Write unit and integration tests and work with quality engineers to ensure product quality
- Use CI/CD best practices to deliver solutions as productization efforts into RHOAI
Other
- Mentor, influence, and coach a team of distributed engineers
- Contribute to a culture of continuous improvement by sharing recommendations and technical knowledge with team members
- Collaborate with product management, other engineering, and cross-functional teams to analyze and clarify business requirements
- Communicate effectively to stakeholders and team members to ensure proper visibility of development efforts
- Give thoughtful and prompt code reviews