Rubrik's Systems Engineering team needs to ensure the reliability and robustness of its products by rigorously focusing on stress, scale, longevity, and resilience through collaboration with product teams, product management, and release managers to integrate reliability early and ensure high-quality releases for end customers.
Requirements
- Strong knowledge of data structures, algorithms, and software design
- Solid programming skills in one or more programming languages (Python preferred)
- Building AI based applications/workflows using LLMs
- Working knowledge of virtualization, container technologies, storage, database, network
- Experience with Google Cloud Platform/AWS/Azure or other public cloud technologies
- Building high scale & performant products.
- Knowledge of CI/CD solutions like Jenkins, Ansible, ELK
Responsibilities
- Drive release reliability certification for Rubrik Cloud Data Management and Rubrik Security Cloud-Private products.
- Architect and build scalable infrastructure and efficient pipelines for stress, scale and resilience/chaos testing.
- Develop and deploy simulators to optimize cost efficiency and accelerate testing.
- Enable Product teams with self-service tools and infrastructure for their validation needs.
- Maintain and evolve long-running, customer-like environments to proactively identify potential issues.
- Design & build infrastructure automation to enable on-demand building of complex product deployments similar to customer deployments and system stress/performance pipelines
- Develop and enhance tools for monitoring, alerting and telemetry of customer-like deployments.
Other
- Ability to work collaboratively in a team environment, including quickly getting up to speed with new technologies.
- BS or MS in Computer science or related field with a minimum of 2 years of relevant work experience