Box needs to enhance the availability, reliability, and resilience of its systems to improve customer experience and drive business growth.
Requirements
- 5+ years of working experience designing, developing, and operating large-scale, customer-facing products or services
- Experience coding in higher-level languages (e.g., Java, Scala, Go, Python)
- Experience designing complex systems and frameworks using proven system design principles, such as NALSD (Non-Abstract Large System Design) methodologies
- Experience troubleshooting issues across distributed Linux environments, with comfort tracing problems across applications, systems, and networks
- Proficient with modern cloud technologies such as GCP, AWS, and Kubernetes
- Experienced in service observability practices and tools (e.g., Prometheus, OpenTelemetry, SignalFx, or similar)
- Familiarity with PHP/JavaScript/NodeJS (bonus)
Responsibilities
- Build software, frameworks, and tools required for reliable operations of Box's services across multiple cloud environments
- Manage the stability and operation of several of Box's most critical production applications through application reviews, capacity planning, and performance tuning
- Develop automations / frameworks / tools for better platform reliability/resilience/availability
- Participate in various POCs on new projects and frameworks being evaluated for the product/platforms
- Improve observability as both a developer/maintainer of systems/frameworks, and a mentor to our product development teams
- Work with modern cloud-native technologies including container orchestration (Kubernetes, Docker), service mesh solutions (Istio, Linkerd), and cloud platforms (AWS, GCP)
- Participate in product design reviews and architectural discussions to ensure reliability is considered early in the development lifecycle of product/services
Other
- Natural collaborator who inspires others, mentors junior engineers, and drives technical excellence
- Work from assigned office a minimum of 2 days per week, with a focus on Tuesdays and Thursdays
- Participate in a team on-call rotation