Consult with teams to ensure their systems are reliable, and scalable. Provide expert guidance and making data-driven recommendations. Improve system reliability and performance.
Requirements
- Programming background in Python, and experience with Dynatrace.
- History of implementing end-to-end observability architecture
- Proficiency in Infrastructure as Code (IaC) and configuration management
- Working knowledge of cloud-native infrastructure, distributed microservice architectures, and CI/CD pipelines.
- Hands-on experience with Jira or similar systems.
- Familiarity with observability tools (e.g., metrics, logging, and tracing platforms).
- Hands-on experience building dashboards
Responsibilities
- Design and implement comprehensive observability solutions that provide visibility into other teams' applications.
- Collaborate with development, operations, and SRE teams to identify performance metrics, such as latency, traffic, errors, and resource saturation.
- Build dashboards, alerts, and reports that provide visibility into system performance.
- Implement solutions, ensuring that the right people are notified in case of issues.
- Fine-tune backend configurations to match customer traffic patterns, maintaining the stability and scalability of applications.
- Configure applications to collect telemetry data, including metrics, logs, and traces.
- Design and implement end-to-end observability solutions that integrate with Dynatrace and ServiceNow CMDB.
Other
- 10–15 years of experience in software engineering, DevOps, SRE, or platform operations roles.
- Lead junior engineers and encourage a culture of continuous improvement.
- Define requirements and advocate for tooling improvements that reduce manual effort
- Guide system optimization to ensure that our practices evolve with the latest trends.
- Certifications in cloud platforms (e.g., AWS Certified DevOps Engineer, Azure DevOps Engineer Expert).