Red Hat is seeking to solve challenging technical problems at the forefront of deep learning and bring the power of open-source LLMs and vLLM to every enterprise
Requirements
- 2+ years of experience in MLOps, DevOps, Automation and modern Software Deployment practices
- Experience evaluating LLMs for performance on accelerators and accuracy
- Strong experience with Python and PyTest
- Strong experience with Git, Github Actions including self-hosted runners, Terraform, Jenkins, Ansible, and common technologies for automation and monitoring
- Highly experienced with administering Kubernetes/Openshift
- Experience with Cloud Computing using at least one of the following Cloud infrastructures: AWS, GCP, Azure, or IBM Cloud
- Solid programming skills especially in Python
Responsibilities
- Collaborate with research and product development teams to scale machine learning products for internal and external applications
- Create and manage model training and deployment pipelines
- Actively contribute to managing and releasing upstream and midstream product builds
- Test to ensure correctness, responsiveness, and efficiency
- Troubleshoot, debug and upgrade Dev & Test pipelines
- Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management
- Collaborate with a cross-functional team about market requirements and best practices
Other
- Ability to interact comfortably with the other members of a large, geographically dispersed team
- While a Bachelor’s degree or higher in computer science, mathematics, or a related discipline is valued, we prioritize technical prowess, initiative, problem solving, and practical experience
- Comprehensive medical, dental, and vision coverage
- Paid time off and holidays
- Paid parental leave plans for all new parents
- Leave benefits including disability, paid family medical leave, and paid military leave