Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Eli Lilly Logo

Advisor - AI HPC Platform Engineering

Eli Lilly

$135,000 - $213,400
Sep 2, 2025
Indianapolis, IN, US
Apply Now

Lilly is seeking an AI HPC Platform Engineer to accelerate the next era of AI and HPC innovation by enabling and supporting leading-edge AI/ML workloads using NVIDIA’s Run:ai platform and traditional HPC infrastructure.

Requirements

  • Hands-on experience in HPC and AI platforms, including in-depth knowledge of accelerators (e.g., GPU), HPC schedulers (e.g., Altair Grid Engine, Slurm), Kubernetes platforms, and containers technologies (Docker, Apptainer).
  • 6+ years of demonstrated experience in AI/ML and HPC workloads, infrastructure, and cluster architectures.
  • Expertise in Linux system and HPC administration, including experience with platform observability (e.g., alerting, logging, and metrics).
  • Knowledge of Run:ai core concepts, including roles, departments, projects, workloads, quotas, GPU fractions, and pre-emptible vs non-preemptible jobs.
  • Experience with writing, building and running containers. Understanding of container registry management and using NGC images.
  • Experience with machine learning frameworks such as PyTorch, Keras, and TensorFlow
  • Strong programming and scripting skills in languages such as Python or Bash.

Responsibilities

  • You will be driving the engineering and operations of design, build, and maintain scalable AI HPC platforms and collaborating on infrastructure for training and inference on large-scale, distributed GPU clusters.
  • You will play a crucial role in boosting productivity for our Advanced Intelligence teams through advancing our AI and HPC infrastructure and experiences
  • Collaborate with researchers and scientists to optimize performance and streamline workflows.
  • Leverage tooling and automation for ML workflow orchestration, resource scheduling, data access, and reproducibility.
  • Evolve and operate public cloud and on-premises environments with a focus on availability and performance for AI and HPC workloads.
  • Define and monitor infrastructure metrics as well as ML-specific metrics, such as model efficiency, resource utilization, job success rates, among others.

Other

  • You will bring a high learning agility and platform engineering skills to enable the Lilly Technology strategy, identifying opportunities to accelerate our AI journey.
  • You will advance initiatives to enable critical business projects.
  • You will have opportunities to leverage agile ways of working with a willingness to become an expert in deploying AI and HPC solutions.
  • You will learn about new technologies in AI and HPC.
  • Passion for continual learning and staying informed of new technologies, infrastructure trends, and approaches in the AI/ML field.