The VA is looking to enhance the reliability, performance, and monitoring of its applications using enterprise monitoring tools and AWS- and Azure-native services.
Requirements
Experience deploying, maintaining, or troubleshooting complex applications at an enterprise scale
Experience working in Network, Windows, Unix/Linux, AWS or Azure Cloud, Java JS Development, Microsoft, or Oracle Database Technology areas
Experience with capacity planning, demand management, and performance optimization
Experience with test-driven development, distributed systems, microservices, and cloud-native application implementation
Experience working in an Agile framework, including KanBan and Scrum
Responsibilities
Lead and mentor the SRE team to ensure reliability, performance, and scalability across VA applications.
Collaborate with product and engineering teams to design and optimize monitoring, alerting, and logging using AWS- and Azure-native and enterprise tools.
Manage capacity planning, infrastructure unification for on-premises and cloud, and cost-effective resource allocation.
Develop advanced performance monitoring and alerting capabilities to support cloud transitions and product consolidation efforts.
Proactively resolve system issues and partner with cross-functional teams to improve platform health and user experience.
Other
Ability to collaborate and communicate skills to work effectively across cross-functional teams
Ability to obtain and maintain a Public Trust or Suitability/Fitness determination based on client requirements
Bachelor’s degree
Possession of excellent written and verbal communication skills
Possession of excellent critical thinking and error assessment skills