Schwab is looking to improve the availability, performance, reliability, and telemetry of its Trading Platform by implementing SRE best practices and modernizing tooling.
Requirements
- 10+ years of software development and site reliability engineering experience supporting production applications in any public cloud environment, PCF and IaaS.
- 7+ years in DevOps engineering leadership focusing on complementing production operations with automation and tooling initiatives.
- 5+ years of experience defining, driving and implementing operational best practices (SLOs, SLIs, Error Budgets, Monitoring errors, capacity planning, blameless postmortems and toil management).
- 5+ years of experience with CI/CD tools, logging, observability and telemetry solutions (Bitbucket, Bamboo, Github, Jenkins, Datadog, Splunk, Prometheus, Grafana etc.)
- Proficient in programming languages to automate repeatable processes and building IaaC solutions (Python, CloudFormation, Terraform)
- Knowledge of NoSQL databases (Aerospike, MongoDB preferred)
- Knowledge of IBM MQ, RabbitMQ and Kafka
Responsibilities
- Identifying tactical and strategic opportunities to improve service health, performance, reliability, and telemetry across Trading Platform
- Leading the design, architecture and implementation of availability and resiliency roadmap that delivers on modernized tooling and metrics.
- Working closely with development team to define a sustainable operating model for Trading applications focusing on platform scale, availability, fault tolerance and performance.
- Leading the automation and IaaC practices to ensure teams are following patterns to ensure repeatability, consistency and portability.
- Identifying toil and technical debt, develop a comprehensive plan and lead the team through the process of execution.
- Driving a shift-left mindset and influence architectural decisions to ensure resiliency and scale at the outset of software development process.
- Being a hands-on technical leader who will lead the team from the front and be able to inspire thought leadership in the team.
Other
- In-office collaboration and fully intend for the selected candidate for this role to work on site in the specified location(s).
- 7+ years of people leadership, supporting highly technical individuals including performance management, talent development, driving efficiencies and talent engagement.
- Leading the team with data driven mindset focusing on addressing key performance metrics such as MTTD, MTTR, Availability in close collaboration with Trading development and IT Operations teams.