Red Hat is looking to scale its AI Inference team and bring the power of open-source LLMs and vLLM to every enterprise. The company aims to accelerate AI for the enterprise and simplify GenAI deployments by providing a stable platform for building, optimizing, and scaling LLM deployments.
Requirements
- 2+ years of experience in MLOps, DevOps, Automation and modern Software Deployment practices
- Strong experience with Git, Github Actions including self-hosted runners, Terraform, Jenkins, Ansible, and common technologies for automation and monitoring
- Highly experienced with administering Kubernetes/Openshift
- Experience with Cloud Computing using at least one of the following Cloud infrastructures: AWS, GCP, Azure, or IBM Cloud
- Solid programming skills especially in Python
- Solid troubleshooting skills
- Experience maintaining an infrastructure and ensuring stability
Responsibilities
- Collaborate with research and product development teams to scale machine learning products for internal and external applications
- Create and manage model training and deployment pipelines
- Actively contribute to managing and releasing upstream and midstream product builds
- Test to ensure correctness, responsiveness, and efficiency
- Troubleshoot, debug and upgrade Dev & Test pipelines
- Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management
- Keep abreast of the latest technologies and standards in the field
Other
- Familiar with Agile development methodology
- Ability to interact comfortably with the other members of a large, geographically dispersed team
- Familiarity with contributing to the vLLM CI community is a big plus
- While a Bachelor’s degree or higher in computer science, mathematics, or a related discipline is valued, we prioritize technical prowess, initiative, problem solving, and practical experience