IT@JH Research Computing is seeking an HPC Scientific Software Engineer to support faculty, researchers, and students engaged in high-performance and AI-driven research by deploying, optimizing, and maintaining scientific software and computational workflows on advanced HPC Systems and related infrastructure.
Requirements
- Hands-on experience with SLURM, for job scheduling.
- Proficiency in Python, Perl, C/C++, and Shell scripting for automation and system management.
- Advanced knowledge of Linux systems and proficiency in scripting languages such as Python, Perl, and Shell.
- Familiarity with scientific application management tools such as Containerization, LUA modules, CMake, Spack, and EasyBuild.
- Utilize CUDA, DNN, TensorRT, and Intel Compilers to enhance system performance.
- Oversee installation, configuration, and maintenance of HPC packages with tools like CMake, Make, EasyBuild, Spack, and Lua module files.
- Develop and manage container orchestration strategies to ensure scalability, reliability, and security of applications.
Responsibilities
- Develop and refine deployment strategies for scientific software on HPC and AI systems.
- Design computational workflows, selecting optimal software configurations, and utilizing tools like Ansible for automation.
- Assist teams in implementing, tuning, and optimizing AI models and gateway applications (e.g., XDMoD, Coldfront, Open OnDemand, CryoSPARC Live, SBGrid, AI Agents).
- Analyze and optimize the performance of AI models and HPC applications, focusing on GPU-enabled computing.
- Implement parallel processing, distributed computing, and resource management techniques for efficient job execution.
- Develop, debug, and maintain software tools, libraries, and frameworks supporting HPC and AI workloads.
- Manage and support scientific software deployment across HPC, cloud-based, and colocation facilities.
Other
- Master's Degree in a quantitative discipline.
- Five (5) years of experience in HPC user support, software deployment,and performance optimization within an academic or research environment.
- Experience in scientific computing environments and applications.
- Training Workshops, Performance Optimization and Troubleshooting.
- Remote