Carrier Services offers seamless integration of Apple Retail Stores and Apple Online store with major US Carriers for iPhone activations. We are looking for a talented Site Reliability Engineer to join our growing team.
Requirements
- 4 years of experience in incident management for large-scale, customer-facing retail applications, with a focus on impact-driven prioritization, root cause analysis, and timely resolution
- Proven 4-year track record of strong troubleshooting, problem-solving, and debugging skills in dynamic, production environments
- 4 years of hands-on experience in observability and monitoring using tools like Splunk and Prometheus, with expertise in creating complex queries and insightful dashboards
- 4 years of proficiency in at least one scripting language such as Python, enabling automation and efficient system management
- 2 years of experience working with relational and NoSQL databases such as Oracle and Cassandra, including writing and optimizing complex queries for scalable and efficient data access
- Strong problem solving skills, software development and debugging skills
Responsibilities
- ensuring the reliability, scalability, and performance of our systems and services
- design, build, and maintain robust infrastructure and automation solutions
- represent the SRE organization in design reviews and operational readiness exercises for new and existing services
- analyze statistics to come up with a clear picture on current state of our system
- automate manual operations and to improve them through repeated iteration
- proactive in dealing with critical production issues and take them to closure while working with required partners
- Participate in an on call rotation providing hands-on technical expertise during service impacting events
Other
- work closely with our engineering and operations teams
- collaborate with technical and non technical teams
- Willingness to participate in on-call rotations and provide weekend coverage as needed
- Experience in communicating complex technical concepts to both technical and non-technical stakeholders
- Proven track record of taking ownership and successfully delivering results