Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Advisor - AI HPC Platform Engineering

BioSpace

$135,000 - $213,400

Sep 7, 2025

Indianapolis, IN, US

At Lilly, the business problem is to accelerate the next era of AI and HPC innovation by enabling and supporting leading-edge AI/ML workloads using NVIDIA’s Run:ai platform as well as traditional HPC infrastructure.

Requirements

Hands-on experience in HPC and AI platforms, including in-depth knowledge of accelerators (e.g., GPU), HPC schedulers (e.g., Altair Grid Engine, Slurm), Kubernetes platforms, and containers technologies (Docker, Apptainer).
6+ years of demonstrated experience in AI/ML and HPC workloads, infrastructure, and cluster architectures.
Expertise in Linux system and HPC administration, including experience with platform observability (e.g., alerting, logging, and metrics).
Knowledge of Run:ai core concepts, including roles, departments, projects, workloads, quotas, GPU fractions, and pre-emptible vs non-preemptible jobs.
Experience with writing, building and running containers. Understanding of container registry management and using NGC images.
Experience with machine learning frameworks such as PyTorch, Keras, and TensorFlow
Strong programming and scripting skills in languages such as Python or Bash.

Responsibilities

driving the engineering and operations of design, build, and maintain scalable AI HPC platforms and collaborating on infrastructure for training and inference on large-scale, distributed GPU clusters.
boosting productivity for our Advanced Intelligence teams through advancing our AI and HPC infrastructure and experiences
Collaborate with researchers and scientists to optimize performance and streamline workflows.
Leverage tooling and automation for ML workflow orchestration, resource scheduling, data access, and reproducibility.
Evolve and operate public cloud and on-premises environments with a focus on availability and performance for AI and HPC workloads.
Define and monitor infrastructure metrics as well as ML-specific metrics, such as model efficiency, resource utilization, job success rates, among others.

Other

Bachelor’s degree in computer science, Information Technology, or related technical field.
10+ years’ experience HPC Platform Engineer.
Demonstrated experience leading a global large-scale infrastructure project.
Hybrid role located in Indianapolis, IN (relocation required)
<5% travel