Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Lead AI/ML Engineer (P4368)

84.51°

Salary not specified

Oct 31, 2025

Chicago, IL, US

84.51° is looking to solve the problem of creating, deploying, and maintaining computationally efficient proprietary SLM, LLM, and embedding model implementations, serving infrastructure, and end-to-end solutions. The role focuses on model serving and operations within their foundation models team, requiring expertise in distributed systems, model serving architectures, GPU cluster management, and MLOps best practices for enterprise workloads and large-scale model deployments.

Requirements

5+ years of experience developing cloud-based software solutions with understanding of design for scalability, performance, and reliability in distributed systems
2+ years hands-on experience with foundation models (LLMs, SLMs, embedding models) in production environments; 2+ years of experience in model serving and inference optimization preferred
Deep knowledge of foundation model serving frameworks, particularly Triton Inference Server and vLLM
Working experience with PyTorch models and optimization for inference (quantization, pruning, ONNX, TensorRT)
Knowledge of distributed GPU computing, CUDA programming, and GPU memory optimization techniques
Hands-on experience with GCP and Azure cloud platforms, including GPU instances, managed services, and networking
Kubernetes & Docker experience with focus on GPU workloads and model serving deployments

Responsibilities

Lead large-scale foundation model projects that can span months, focusing on model serving, inference optimization, and production deployment
Leverage known patterns, frameworks, and tools for automating & deploying foundation model serving solutions using Triton, vLLM, and other inference engines
Develop new tools, processes and operational capabilities to monitor and analyze foundation model performance, latency, throughput, and resource utilization
Work with researchers and ML engineers to optimize and scale foundation model serving using best practices in distributed systems, GPU orchestration, and MLOps
Abstract foundation model serving solutions as robust APIs, microservices, or components that can be reused across the business with high availability and low latency
Build, steward, and maintain production-grade foundation model serving infrastructure (robust, reliable, maintainable, observable, scalable, performant) to manage and serve LLMs, SLMs, and embedding models at scale
Research state-of-the-art foundation model serving technologies, inference optimization techniques, and distributed GPU architectures to identify new opportunities for implementation across the enterprise

Other

Bachelor's degree or higher in Machine Learning, Computer Science, Computer Engineering, Applied Statistics, or related field
Foster a collaborative and innovative team environment, encouraging professional growth and development among junior team members in foundation model technologies
Understand business requirements and trade-off latency, cost, throughput, and model accuracy to maximize value and translate research into production-ready serving solutions
Responsible for code reviews, infrastructure reviews, and production readiness assessments for foundation model deployments
Apply appropriate documentation, version control, infrastructure as code practices, and other internal communication practices across channels
Make time-sensitive decisions and solve urgent production issues in foundation model serving environments without escalation
Excellent communication skills, particularly on technical topics related to distributed systems and model serving architectures