As the Engineering Manager for the Machine Learning Infrastructure team at Moveworks, the business problem is to spearhead the development of a cutting-edge platform that powers Moveworks' conversational AI, ensuring the long-term scalability of their core AI product and the company.
Requirements
- Deep technical expertise in designing, building, and scaling end-to-end machine learning systems in production environments.
- Strong command of Python and experience with performant languages such as C++ or GoLang.
- Extensive experience with deep learning frameworks like PyTorch or Hugging Face.
- Hands-on experience with modern LLM infrastructure, including distributed training frameworks (e.g., Deepspeed) and inference/serving frameworks (e.g., vLLM, TensorRT-LLM, Kubernetes).
- A strategic mindset with experience balancing the demands of operating robust, scalable infrastructure with the need for forward-looking research and development.
Responsibilities
- Lead, Mentor, and Grow a world-class team of ML and Systems Engineers, fostering a culture of innovation, ownership, and operational excellence that aligns with Moveworks' core principles.
- Own the Technical Vision and roadmap for the end-to-end ML platform that powers the entire lifecycle—from data synthesis and distributed training to ultra-low-latency inference and serving—for hundreds of production models, including our proprietary MoveLM series.
- Drive the Strategy for model performance and efficiency, making critical architectural decisions to optimize our GPU infrastructure for latency, throughput, and cost at massive scale.
- Partner with Leaders across agentic platform, search platform, product engineering, and core infrastructure teams to define and deliver the foundational infrastructure that will power the next generation of agentic AI experiences.
- Champion a Product Mindset for your platform, building powerful abstractions and tools that accelerate the velocity of machine learning engineers and researchers across the organization.
Other
- 5+ years of industry experience with a proven track record of leading or managing high-performing machine learning or infrastructure teams.
- Excellent communication and collaboration skills, with experience working cross-functionally to deliver complex projects.
- Experience working with Machine Learning products