To administer the end-to-end testing of Bloomberg's datacenters for disaster recovery scenarios of numerous services which support applications that constitute Bloomberg’s line of products
Requirements
- 4+ years of experience in Python and/or TypeScript
- Experience with Unix, Unix tools and shell scripting
- Experience designing stable, long-lasting APIs
- Deep understanding of TCP/IP networking and the OSI model
- Experience designing and automating repeatable processes in a client/server modeled environment
- Experience building monitors and alarms for system performance, status and stability
- Experience with CI/CD systems and writing robust unit and system tests
- Basic knowledge in Rapid framework
- Experience analyzing existing systems and identifying shortcomings with proven methods for improvement
- Experience with Chaos Engineering
- Experience with Splunk/Humio and Grafana or other metric based reporting tools
- Experience with GitHub and JIRA
Responsibilities
- Manage and develop solutions that support various disaster recovery tools
- Create applications to integrate the services they provide into the Bloomberg operational environment as well as Bloomberg products
- Develop tooling suite to test our clusters and managed services that reside in our datacenters and nodesites in an automated, scale-able and self driven fashion
- Write tooling with end-to-end unit testing and continuous integration to provide the highest level of stability
- Perform system tuning, performance analysis, defining and following availability targets such as SLA’s, SLO’s and SLI’s
- Design and automate repeatable processes in a client/server modeled environment
- Build and maintain highly sophisticated, available, performant, and scalable, critically important systems
Other
- A degree in Computer Science, Engineering or similar field of study or equivalent work experience
- Passion for product ownership