Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

84.51° Logo

Lead AI/ML Engineer (P4368)

84.51°

$121,000 - $201,250
Oct 16, 2025
Cincinnati, OH, US
Apply Now

84.51° needs to create, deploy, and maintain computationally efficient proprietary SLM, LLM, and embedding model implementations, serving infrastructure, and end-to-end solutions, with a specific focus on models serving and operations within their foundation models team.

Requirements

  • 2+ years hands-on experience with foundation models (LLMs, SLMs, embedding models) in production environments; 2+ years of experience in model serving and inference optimization preferred
  • Deep knowledge of foundation model serving frameworks, particularly Triton Inference Server and vLLM
  • Working experience with PyTorch models and optimization for inference (quantization, pruning, ONNX, TensorRT)
  • Knowledge of distributed GPU computing, CUDA programming, and GPU memory optimization techniques
  • Hands-on experience with GCP and Azure cloud platforms, including GPU instances, managed services, and networking
  • Kubernetes & Docker experience with focus on GPU workloads and model serving deployments
  • CI/CD Pipeline experience with focus on ML model deployment; GitHub Actions experience preferred

Responsibilities

  • Lead large-scale foundation model projects that can span months, focusing on model serving, inference optimization, and production deployment
  • Leverage known patterns, frameworks, and tools for automating & deploying foundation model serving solutions using Triton, vLLM, and other inference engines
  • Develop new tools, processes and operational capabilities to monitor and analyze foundation model performance, latency, throughput, and resource utilization
  • Work with researchers and ML engineers to optimize and scale foundation model serving using best practices in distributed systems, GPU orchestration, and MLOps
  • Abstract foundation model serving solutions as robust APIs, microservices, or components that can be reused across the business with high availability and low latency
  • Build, steward, and maintain production-grade foundation model serving infrastructure (robust, reliable, maintainable, observable, scalable, performant) to manage and serve LLMs, SLMs, and embedding models at scale
  • Research state-of-the-art foundation model serving technologies, inference optimization techniques, and distributed GPU architectures to identify new opportunities for implementation across the enterprise

Other

  • Foster a collaborative and innovative team environment, encouraging professional growth and development among junior team members in foundation model technologies
  • Understand business requirements and trade-off latency, cost, throughput, and model accuracy to maximize value and translate research into production-ready serving solutions
  • Responsible for code reviews, infrastructure reviews, and production readiness assessments for foundation model deployments
  • Apply appropriate documentation, version control, infrastructure as code practices, and other internal communication practices across channels
  • Make time-sensitive decisions and solve urgent production issues in foundation model serving environments without escalation