PlayStation is looking to develop, automate, and maintain scalable data platforms, emphasizing Infrastructure as Code (IaC) and cloud-native technologies to ensure the reliability and automation of NoSQL, Streaming, and Caching services within AWS and GCP environments. The goal is to provide resilient, high-performance infrastructure for real-time data services that directly impact the scalability and reliability of global data solutions.
Requirements
- 6+ years of software development and SRE experience, with at least 3+ years specializing in Go and Infrastructure As Code with a focus on automation.
- Deep proficiency in Go (Golang), with the ability to write performant, idiomatic, and maintainable code for production-scale systems.
- Established track record crafting modular, architecture-focused frameworks in Go, supporting large and complex backend services.
- Expertise with infrastructure-as-code tools such as Terraform, Ansible.
- Expertise in operations: scaling, consistency tuning, compaction, repair, and backup/recovery of databases.
- Familiarity with NoSQL, caching, and streaming platforms (e.g., Apache Kafka, Redis, AWS MSK).
- Cloud experience (AWS, GCP, or Azure), with knowledge of managed services (e.g., DynamoDB, ElastiCache, MSK or equivalent experience).
Responsibilities
- Develop and implement Infrastructure as Code (IaC) and automate the provisioning, monitoring, scaling, and lifecycle management of NoSQL, Streaming, and Caching platforms (e.g., Cassandra, Kafka, Redis).
- Drive end-to-end automation to enable repeatable, reliable, and self-service deployment of data services across cloud and hybrid environments.
- Guarantee the platform data solutions are always available, scalable, and resilient.
- Define and enforce SLIs, SLOs, and error margins for data platforms to drive reliability engineering practices.
- Develop highly efficient, self-repairing systems, automated redundancy, and scalability solutions for databases and streaming platforms.
- Develop observability solutions (metrics, logging, tracing) for Cassandra, Redis, and Kafka/MSK to ensure proactive issue detection.
- Lead incident response for critical database/caching/streaming issues and drive root cause analysis with permanent automated fixes.
Other
- Collaborate with product and platform teams to provide resilient, high-performance infrastructure for real-time data services.
- Embrace Dev & SRE principles, prioritize automation, and use AI/ML to boost system performance.
- Work closely with platform and product teams to ensure the seamless integration and delivery of high-performance solutions for global PlayStation experiences.
- Collaborate with engineering and platform teams to deliver dependable, scalable, and high-performing data services.
- Excellent communication and collaboration skills, with experience mentoring and influencing peers across diverse teams.