Baylor University - ITS Research Technology needs assistance with the operation and support of the Kodiak High-Performance Computing (HPC) cluster, requiring hands-on experience in HPC administration, programming support, system monitoring, and documentation.
Requirements
- Basic proficiency in Linux command-line environments; familiarity with scripting languages (e.g., Bash, Python) preferred.
- Demonstrated interest in HPC systems, parallel computing, or system administration.
- Prior exposure to job schedulers or batch systems (e.g., SLURM, PBS).
- Familiarity with compiling and running parallel or GPU-accelerated code.
- Experience with environment module systems or managing software versions.
Responsibilities
- Provide first-level client support for Kodiak users, including troubleshooting login issues, software module loading, and job submission questions.
- Assist the Senior Research Systems Administrator and Senior Research Technology programmer with system maintenance tasks such as applying software updates, monitoring node health, and ensuring smooth operation of both CPU and GPU compute nodes.
- Aid Kodiak users by guiding them through Linux basics, Kodiak access procedures, and relevant training courses (e.g., Linux Unhatched, NCSA HPC self-paced courses).
- Help maintain documentation, FAQs, and user guides—including updates to the Research Technology Box-AI hub knowledge base.
- Assist with batch job monitoring and basic performance tracking across compute and GPU nodes; escalate more advanced issues to senior staff as needed.
- Support hardware replacement and equipment troubleshooting with the Senior Research Systems Administrator in the Baylor University - Dutton Data Center.
- Contribute to backup job management through validation of backup jobs, libraries, and off-site backups for HPC data storage.
Other
- Enrolled as a graduate student at Baylor University, preferably in: Computer Science, Computer Systems Engineering, Cyber Security, Electrical & Computer Engineering, Information Systems, Mathematics, Mechanical Engineering, or related fields.
- Excellent communication skills for user support and documentation.
- Ability to work both independently and collaboratively under the guidance of senior administrators.