The company is looking to execute and maintain automated integration and system testing processes across a diverse range of HPC operating environments.
Requirements
Experience with the Linux CLI, and Linux tools
Recent experience using Bash/Python to develop scripts to automate testing of HPC systems
Recent Linux administration experience in an HPC multi-host/multi-instance environment
Experience troubleshooting hardware and software issues operating in an HPC environment
Experience configuring and sustaining VMware ESXi/Virtualization environments
Experience with containerization technologies such as Docker
Experience with IaC principles and automation tools including Ansible
Responsibilities
executing and maintaining automated integration and system testing processes across a diverse range of HPC operating environments
development of scripts and playbooks to be leveraged for system-level integration
development of technical documentation
coordination of integration activities
clear communication of results to key stakeholders
conduct performance, functional, redundancy, and failover testing to ensure system stability and reliability under various conditions
Other
Experience with tracking and reporting issues to key stakeholders
Familiar with metrics and monitoring tools used for ingesting, indexing, searching, monitoring, and analyzing data
Demonstrated experience with network monitoring tools, including the ability to configure, analyze, and troubleshoot network performance in HPC systems
Experience with CI/CD principles, methodologies, and tools including GitLab CI