The job is looking to solve the problem of providing high scale, highly available, and resilient delivery services using automation and infrastructure code, while also building reliability through various engineering practices and implementing advanced observability techniques.
Requirements
- Automates with various scripting languages such as Python and Shell scripting to run, build and develop applications.
- Coordinates systems using Infrastructure as Code (IaC) tools (IAM, ARM, Terraform, and Chef).
- Deploys applications in a DevOps environment using Cloud Computes and DevOps concepts (CI/CD pipelines).
- Utilizes modern monitoring tools such as DataDog, Prometheus, and Splunk.
- Demonstrated Expertise (“DE”) designing, architecting, and building scalable and resilient N-tier software solutions, and creating E2E plans for critical services according to DevOps practices, using .Net, Java, Python, Docker, and Kubernetes.
- DE delivering high scale, highly available, and resilient services according to automation and Infra-structure-as-Code (IaC) methodologies, using Open Telemetry (OTEL), Datadog, Splunk, Prometheus, and ELK.
- DE building cloud-based platforms for consumption at an enterprise level, using AWS EKS, Lambda, EMR, and CloudFormation AWS and Azure services.
Responsibilities
- Automates with various scripting languages such as Python and Shell scripting to run, build and develop applications.
- Coordinates systems using Infrastructure as Code (IaC) tools (IAM, ARM, Terraform, and Chef).
- Deploys applications in a DevOps environment using Cloud Computes and DevOps concepts (CI/CD pipelines).
- Utilizes modern monitoring tools such as DataDog, Prometheus, and Splunk.
- Confers with systems analysts, engineers, programmers and others to design systems and to obtain information on project limitations and capabilities, performance requirements and interfaces.
- Builds reliability using resiliency engineers, automation, observability and chaos tests.
- Implements advanced observability practices and techniques at scale.
Other
- Bachelor’s degree (or foreign education equivalent) in Computer Science, Engineering, Information Technology, Information Systems, Mathematics, Physics, or a closely related field and five (5) years of experience as Principal Site Reliability Engineer (or closely related occupation) designing and developing reliability, performance, and scalability of enterprise-wide full stack applications (ensuring seamless integration and high availability) using Datadog, ELK, and Prometheus in a financial services environment.
- Or, alternatively, Master’s degree (or foreign education equivalent) in Computer Science, Engineering, Information Technology, Information Systems, Mathematics, Physics, or a closely related field and three (3) years of experience as a Principal Site Reliability Engineer (or closely related occupation) designing and developing reliability, performance, and scalability of enterprise-wide full stack applications (ensuring seamless integration and high availability) using Datadog, ELK, and Prometheus in a financial services environment.
- Fidelity’s hybrid working model blends the best of both onsite and offsite work experiences.
- Most hybrid roles require associates to work onsite every other week (all business days, M-F) in a Fidelity office.
- Those laws and regulations may restrict Fidelity from hiring and/or associating with individuals with certain Criminal Histories.