Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Lead AI/ML Engineer (P4368)

84.51°

$121,000 - $201,250

Dec 10, 2025

Cincinnati, OH, US

84.51° needs to create, deploy, and maintain computationally efficient proprietary SLM, LLM, and embedding model implementations, serving infrastructure, and end-to-end solutions, with a specific focus on models serving and operations within their foundation models team.

Requirements

2+ years hands-on experience with foundation models (LLMs, SLMs, embedding models) in production environments; 2+ years of experience in model serving and inference optimization preferred
Deep knowledge of foundation model serving frameworks, particularly Triton Inference Server and vLLM
Working experience with PyTorch models and optimization for inference (quantization, pruning, ONNX, TensorRT)
Knowledge of distributed GPU computing, CUDA programming, and GPU memory optimization techniques
Hands-on experience with GCP and Azure cloud platforms, including GPU instances, managed services, and networking
Kubernetes & Docker experience with focus on GPU workloads and model serving deployments
CI/CD Pipeline experience with focus on ML model deployment; GitHub Actions experience preferred

Responsibilities

Lead large-scale foundation model projects that can span months, focusing on model serving, inference optimization, and production deployment
Leverage known patterns, frameworks, and tools for automating & deploying foundation model serving solutions using Triton, vLLM, and other inference engines
Develop new tools, processes and operational capabilities to monitor and analyze foundation model performance, latency, throughput, and resource utilization
Work with researchers and ML engineers to optimize and scale foundation model serving using best practices in distributed systems, GPU orchestration, and MLOps
Abstract foundation model serving solutions as robust APIs, microservices, or components that can be reused across the business with high availability and low latency
Build, steward, and maintain production-grade foundation model serving infrastructure (robust, reliable, maintainable, observable, scalable, performant) to manage and serve LLMs, SLMs, and embedding models at scale
Research state-of-the-art foundation model serving technologies, inference optimization techniques, and distributed GPU architectures to identify new opportunities for implementation across the enterprise

Other

Foster a collaborative and innovative team environment, encouraging professional growth and development among junior team members in foundation model technologies
Understand business requirements and trade-off latency, cost, throughput, and model accuracy to maximize value and translate research into production-ready serving solutions
Responsible for code reviews, infrastructure reviews, and production readiness assessments for foundation model deployments
Apply appropriate documentation, version control, infrastructure as code practices, and other internal communication practices across channels
Make time-sensitive decisions and solve urgent production issues in foundation model serving environments without escalation