Block is looking for a DevOps Engineer to build and maintain infrastructure for their Hardware Continuous Integration Infrastructure (CII) team, focusing on boosting developer productivity, enhancing system performance and reliability, and optimizing mission-critical internal applications.
Requirements
- 8+ years of industry experience architecting, developing, and troubleshooting large scale infrastructure.
- 5+ years of experience in at least one of the following programming languages: Python, Java
- Strong experience with cloud infrastructure and Infrastructure as Code (Terraform, CloudFormation, Kubernetes)
- Strong experience with Linux systems including troubleshooting
- Strong experience with VMs, Docker, and container orchestration
- Experience with TCP/IP networking, network, and application-level security.
- Experience with management/automation tools such as Ansible/SALT.
Responsibilities
- Build scalable infrastructure to manage CI systems (both on-prem and AWS) and applications.
- Minimize the risk of reliability-related failure outcomes regarding durability, availability, and performance.
- Build automation tools to detect and remediate system health issues, preventing reoccurrence.
- Build automation for dynamic capacity planning and resource optimization to balance performance and cost efficiency across our on-prem and CI AWS infrastructure.
- Perform periodic on-call duty to ensure the availability and efficiency of the continuous integration infrastructure.
- Perform service migrations with comprehensive documentation, phased rollouts, and stakeholder communication
- Provide level-1 support in triaging and debugging CI pipelines, build failures, and other supporting services.
Other
- Collaborate across multiple teams, including IT Support, Production Platform Engineering, Hardware Engineering, and Devices Software Engineering, providing technical guidance and solutions to teams with varying levels of infrastructure knowledge.
- Champion improvement projects and coordinate incident response with structured, data-driven approaches
- Provide support to internal stakeholders with a customer-oriented mindset and develop automation solutions to improve customer experience and usage of CI systems.
- Lead incident response for critical infrastructure outages and implement preventative measures based on root cause analysis.
- Quickly diagnose and resolve complex infrastructure issues that impact developer productivity across hardware teams.