Designing, developing and maintaining a comprehensive distributed Metrics and Monitoring solution for HPC systems to ensure the reliability, performance, and scalability of HPC infrastructure.
Requirements
- Experience developing in Unix
- Ability to perform shell scripting
- Working knowledge of Configuration Management (CM) tools and Web Services implementation
- Software development using languages such as C, C++, Python, Ruby, Perl, JavaScript, etc.
- Has experience with agile development processes
- Has experience with source code control systems, such as Git
- Experience using the Linux CLI
- Experience using Linux tools and developing Bash scripts to automate manual processes
- Recent software development experience using Python
- Experience with maintaining security compliance and user management
- Familiar with Datacenter Infrastructure Management (DCIM) tools such as Netbox
- Familiar with Observability and Analytics platform solutions such as Splunk
- Experience developing documentation for systems such as SSPs, CONOPS, user Guides, and How-To manuals
- Experience with automation frameworks including Ansible for orchestrating deployment
- Experience with CI/CD principles, methodologies, and tools such as GitLab CI and Jenkins
- Experience with Git Source Control System
Responsibilities
- Design, develop, test, deploy, document, maintain, and enhance complex and diverse software systems
- Reviews and tests software components for adherence to the design requirements and documents test results
- Resolves software problem reports
- Utilizes software development and software design methodologies appropriate to the development environment
- Provides specific input to the software components of system design to include hardware/software trade-offs, software reuse, use of Open-Source Software (OSS) and/or Commercial Off-The-Shelf (COTS) Government Off-The-Shelf (GOTS) software in place of new development, and requirements analysis and synthesis from system level to individual software components
- Analyze user requirements to derive software design and performance requirements
- Debug existing software and correct defects
Other
- Master's degree in Computer Science or related discipline from an accredited college or university, plus five (5) years of experience as a SWE
- OR Bachelor's degree in Computer Science or related discipline from an accredited college or university, plus seven (7) years of experience as a SWE
- OR Nine (9) years of experience as a SWE
- TS/SCI - Polygraph required