Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

BioSpace Logo

Advisor - AI HPC Platform Engineering

BioSpace

$135,000 - $213,400
Sep 7, 2025
Indianapolis, IN, US
Apply Now

At Lilly, the business problem is to accelerate the next era of AI and HPC innovation by enabling and supporting leading-edge AI/ML workloads using NVIDIA’s Run:ai platform as well as traditional HPC infrastructure.

Requirements

  • Hands-on experience in HPC and AI platforms, including in-depth knowledge of accelerators (e.g., GPU), HPC schedulers (e.g., Altair Grid Engine, Slurm), Kubernetes platforms, and containers technologies (Docker, Apptainer).
  • 6+ years of demonstrated experience in AI/ML and HPC workloads, infrastructure, and cluster architectures.
  • Expertise in Linux system and HPC administration, including experience with platform observability (e.g., alerting, logging, and metrics).
  • Knowledge of Run:ai core concepts, including roles, departments, projects, workloads, quotas, GPU fractions, and pre-emptible vs non-preemptible jobs.
  • Experience with writing, building and running containers. Understanding of container registry management and using NGC images.
  • Experience with machine learning frameworks such as PyTorch, Keras, and TensorFlow
  • Strong programming and scripting skills in languages such as Python or Bash.

Responsibilities

  • driving the engineering and operations of design, build, and maintain scalable AI HPC platforms and collaborating on infrastructure for training and inference on large-scale, distributed GPU clusters.
  • boosting productivity for our Advanced Intelligence teams through advancing our AI and HPC infrastructure and experiences
  • Collaborate with researchers and scientists to optimize performance and streamline workflows.
  • Leverage tooling and automation for ML workflow orchestration, resource scheduling, data access, and reproducibility.
  • Evolve and operate public cloud and on-premises environments with a focus on availability and performance for AI and HPC workloads.
  • Define and monitor infrastructure metrics as well as ML-specific metrics, such as model efficiency, resource utilization, job success rates, among others.

Other

  • Bachelor’s degree in computer science, Information Technology, or related technical field.
  • 10+ years’ experience HPC Platform Engineer.
  • Demonstrated experience leading a global large-scale infrastructure project.
  • Hybrid role located in Indianapolis, IN (relocation required)
  • <5% travel