The company is looking to develop systems and APIs that enable customers to perform inference and fine tune LLMs.
Requirements
- Experience with DevOps practices like CI/CD, automation, containerization (Docker), and orchestration (Kubernetes).
- Proficiency in cloud platforms like AWS, Google Cloud, or Azure.
- Expertise in programming (Python, go, etc.) and frameworks for ML (TensorFlow, PyTorch, Scikit-learn).
- Strong understanding of the state of the art in machine learning especially LLMs.
- 5+ years experience working on a production level ML training or inference system.
Responsibilities
- Work closely with engineering, research, and sales on deploying, evaluating, and operating inference systems for both customers and internal use.
- Build and maintain tools, services, and documentation for automation and testing.
- Analyze and improve efficiency, scalability, and stability of various system resources.
- Conduct design and code reviews.
- Participate in an on-call rotation to respond to critical incidents as needed.
Other
- Bachelor’s degree in computer science or equivalent industry experience.
- We offer competitive compensation, startup equity, health insurance and other competitive benefits.