At Lilly, the business problem is to unlock the power of AI and HPC based POGPU and Accelerated Compute infrastructure to support cutting-edge AI/ML workloads and improve the understanding and management of disease.
Requirements
- Expertise in Linux system administration, HPC environments, and Nvidia DGX server management
- Experience with Spectrum X networking and parallel file systems
- Strong scripting skills and familiarity with containerization and automation tools
- Hands-on experience in using or operating High Performance Computing (HPC) grade infrastructure
- In-depth knowledge of accelerated computing (e.g., GPU), storage (e.g., Weka), scheduling & orchestration (e.g., Slurm, Kubernetes, LSF), high-speed networking (e.g., Ultra-Ethernet, RoCE ), and containers technologies (Docker)
- Expertise in running and optimizing large-scale distributed training workloads using PyTorch (DDP, FSDP), NeMo, or JAX
- Some proficiency in at least one scripting language such as Bash, Python, or equivalent
Responsibilities
- Driving the engineering and operations of advanced Linux platforms supporting AI and HPC workloads
- Managing Nvidia DGX systems using Mission Control, Base Command and Run:AI
- Optimizing Spectrum X networking and WEKA storage for AI/ML applications
- Implementing advancements across AI/HPC infrastructure tooling and operational excellence
- Leading the strategy, engineering and development of Advanced Linux computing capabilities for AI/ML
- Advising with senior Linux platform engineer directing the global Linux strategy for on-premises private cloud and public IaaS Linux services
- Accelerating initiatives in areas such as AI/ML acceleration, Infrastructure AI OPS automation, HPC management, and infrastructure as code
Other
- Bachelor’s degree in computer science, Information Technology, or related technical field
- 10+ years’ experience as a Linux OS/ Platform Engineer
- Demonstrated experience leading a global large-scale Infrastructure project
- Less than 5% travel
- Hybrid role located in Indianapolis, IN (relocation required)