Articul8 AI is building the next generation of resilient, scalable software systems that help organizations transform their operations. The company is seeking a Senior SDET specializing in chaos engineering and monitoring to design and implement sophisticated test automation frameworks, create and run chaos experiments to validate systems' resilience against real-world failures, while ensuring comprehensive monitoring capabilities that provide actionable insights during both testing and production scenarios.
Requirements
- 5+ years of experience in software testing and quality assurance, with at least 2 years focused on chaos engineering
- Strong programming skills in languages such as Python, Go, and/or Rust
- Experience with chaos engineering tools such as Chaos Monkey, Gremlin, or similar frameworks
- In-depth knowledge of monitoring systems like Prometheus, Grafana, ELK Stack, or similar tools
- Experience implementing observability practices (metrics, logging, tracing) in distributed systems
- Familiarity with container orchestration platforms like Kubernetes and related chaos tools
- Experience with cloud platforms (AWS, GCP, Azure) and their monitoring capabilities
Responsibilities
- Design, develop, and maintain advanced test automation frameworks that incorporate chaos engineering principles
- Create and execute chaos experiments that simulate various failure modes and edge cases in our distributed systems
- Implement monitoring solutions that effectively track system performance, resilience, and failure recovery
- Establish observability practices that provide deep insights into system behavior during chaos experiments
- Develop metrics and dashboards to visualize system reliability and the impact of chaos experiments
- Integrate chaos testing into CI/CD pipelines to validate system resilience continuously
- Mentor engineers through code reviews, technical sessions, and hands-on guidance in test automation, chaos engineering, and monitoring best practices.
Other
- Bachelor's degree in Computer Science, Engineering, or related field
- Excellent communication skills with the ability to present technical findings to various stakeholders
- Master’s degree in Computer Science, Engineering, or related field
- Contributions to open-source testing or chaos engineering projects
- Familiarity with AI/ML systems and their unique testing challenges