Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Accenture Logo

AI & HPC Infrastructure Engineer

Accenture

$68,300 - $218,800
Oct 17, 2025
Overland Park, KS, US
Apply Now

The Global Infrastructure Engineering AI & HPC team at Accenture is looking to solve the problem of enabling infrastructure reinvention for the next era of digital solutions powered by AI and High-Performance Computing (HPC) for their strategic and mission-critical clients.

Requirements

  • Minimum 4+ year of hands-on experience designing, deploying, and managing HPC and AI infrastructure across on-premises, cloud, and hybrid environments
  • Minimum 4+ years’ experience of accelerated computing architectures (GPUs, XPUs, DPUs), high-performance fabrics (InfiniBand, Ethernet), SONiC, networking, and modern storage/data platforms
  • Minimum 4+ year experience with cluster management and orchestration (e.g. Slurm, Run:ai, Kubernetes, Docker), real-time performance monitoring, and observability frameworks
  • Minimum 4+ years’ experience with cloud and virtualization platforms (e.g. AWS, Azure, GCP, VMware, Nutanix) and expertise in automation and optimization using scripting (Python, AI tools) with foundational Infrastructure-as-Code tools such as Terraform and Ansible
  • Minimum 4+ year experience implementing MLOps and DevSecOps frameworks to enable secure, automated, and reproducible workflows
  • Experience managing the deployment of 1,000+ GPU clusters for HPC and AI workloads with various infrastructure services enabled
  • Experience with GPU computing libraries and accelerators (e.g., NVIDIA CUDA, Dynamo, AMD ROCm)

Responsibilities

  • Design and implement HPC and AI infrastructure solutions, aligning system architecture and deployment roadmaps to industry-specific performance and scalability needs
  • Deploy, configure, and manage XPU-based clusters (CPU/GPU/accelerators) using schedulers, VM/K8s orchestration platforms, Slurm, and containerized platforms in scalable designs to provide Metal as a Service (MaaS), GPUaaS, AIaaS, and other offerings
  • Optimize cluster performance, scalability, energy, and cost efficiency across on-premises, cloud, and hybrid environments
  • Integrate AI and HPC platforms with existing IT systems, data pipelines, and security frameworks
  • Monitor, troubleshoot, and tune infrastructure to ensure high availability, low-latency networking, and workload resiliency
  • Develop and maintain documentation including architecture diagrams, configuration baselines, and operational runbooks
  • Provide technical guidance and support to users, enabling efficient execution of HPC/AI workloads, large-scale models, and simulations

Other

  • Travel may be required for this role, with the amount of travel varying from 25% to 100% depending on business need and client requirements
  • Bachelor's degree or equivalent (minimum 12 years) work experience
  • Applicants for employment in the US must have work authorization that does not now or in the future require sponsorship of a visa for employment authorization in the United States
  • Candidates who are currently employed by a client of Accenture or an affiliated Accenture business may not be eligible for consideration
  • Job candidates will not be obligated to disclose sealed or expunged records of conviction or arrest as part of the hiring process