Schwab is looking to transform the finance industry by building, testing, and deploying new Schwab Investing Technology suite of product offerings. The Client Experience Technology team needs a Senior DevOps Engineer to support developers and multiple agile teams in this endeavor, ensuring the success of initiatives and the reliability of their platforms.
Requirements
- 5+ years in DevOps/Platform roles on Linux with AWS depth (networking/VPC, IAM, ECR, ECS, RDS)
- Expert with containers (Docker) and 12‑factor services; hands‑on with ECS (Fargate or EC2) and image promotion workflows
- CI/CD mastery with GitHub Actions and/or Bamboo (pipeline design, reusable templates, environment promotion, deployment strategies)
- Terraform at scale (modules, policies/guardrails, plan/apply automation, drift detection) for app and data stacks (incl. RDS)
- Strong networking fundamentals; scripting in Bash and Python
- Experience with Kubernetes/ECS or PCF/Tanzu to support multi‑runtime orgs
- Observability stacks (metrics, logs, tracing) and incident response/retrospectives with SRE
Responsibilities
- Own CI/CD pipelines from build through promotion and deployment for containerized services; define guardrails, quality gates, and rollout/rollback patterns aligned with SRE and Release Management practices.
- Design, build, and operate AWS infrastructure with Terraform (networking, compute, containers, data services—incl. RDS) using module standards, workspaces/environments, and automated promos.
- Diagnose container runtime issues (e.g., task health, service scaling, deployments) and partner with teams during image promotion windows.
- Embed reliability practices: runbooks, production checks, operational readiness, and joint incident/retro participation with SRE.
- Champion observability and change safety: metrics, logs, alerts, and progressive delivery strategies (feature flags, config changes, DB change playbooks).
- Respond to Alerts and Escalations: Actively monitor and respond to system alerts and escalations to ensure the stability and reliability of our services. This includes diagnosing and troubleshooting issues in real-time to minimize downtime and impact on users.
- System Recovery Events: Lead and coordinate system recovery efforts during incidents. This involves executing recovery procedures, collaborating with cross-functional teams to restore services, and conducting post-incident reviews to identify root causes and implement preventive measures.
Other
- In-office collaboration and fully intend for the selected candidate for this role to work on site in the specified location(s).
- Excellent cross‑functional communication with SRE and Release Management to drive readiness and approvals
- Experience in production change management, including change approval workflows, risk assessment, deployment coordination, and post-deployment monitoring
- Database change automation (scripts, rollbacks, promotions) and safe config strategies (feature flagging, toggles)
- In addition to the salary range, this role is also eligible for bonus or incentive opportunities.