Tinder is looking to ensure that its services are observable, predictable, and resilient against failures, so that users experience consistent, high-quality service, even when things go wrong internally.
Requirements
- 1-2+ years of experience with AWS, GCP, Azure or other cloud providers
- Good experience with languages like Python, Go
- Docker experience
- Terraform, CloudFormation, Ansible experience
- Kubernetes, Envoy, or service-mesh experience
- Experience managing a CI/CD platform (Buildkite, Jenkins, Spinnaker, etc.)
- Experience with cloud networking (VPC, ELB, Route53/DNS, etc.)
Responsibilities
- Maintain AWS EKS clusters with Terraform, Terragrunt.
- Maintain monitoring systems (Mimir, Alloy, Prometheus, Thanos, Grafana).
- Build Grafana dashboards and SSO integrations for internal tools.
- Coordinate outage resolutions.
- Actively looking for refining Site Reliability Engineering processes and tools.
- Collaborate with various technical and functional teams across our company
- Participate in architecture, design, and code reviews
Other
- 5-7+ years of software development experience
- Strong communication skills and a passion to drive projects from start to finish
- Excellent knowledge of Computer Science fundamentals
- BS/MS in Computer Science/Engineering or similar subject area
- Recruit, encourage and develop other team members