Blackhawk Network is seeking to build scalable observability platforms, integrate real-time metrics pipelines, and work with time-series databases to drive system reliability and performance insights.
Requirements
- Deep expertise in observability tools and practices (e.g., Prometheus, Grafana, OpenTelemetry, Splunk, Coralogix).
- Strong programming skills in Python or similar languages.
- Experience with real-time systems, metrics collection, and time-series databases (e.g., InfluxDB, AWS Timestream).
- Proficiency with AWS services, especially CloudWatch, Lambda, and related infrastructure.
- Familiarity with infrastructure as code (Terraform, CloudFormation) and container orchestration (Kubernetes).
- Solid understanding of distributed systems, microservices, and event-driven architectures.
- Familiarity with APM tools and distributed tracing.
Responsibilities
- Architect, build, and maintain observability platforms using tools such as Prometheus, Grafana, OpenTelemetry, InfluxDB, and AWS CloudWatch.
- Design and implement real-time metrics pipelines and time-series data processing systems.
- Develop scalable APIs and services to expose observability data to internal teams.
- Integrate observability tooling into CI/CD pipelines and service deployment workflows.
- Collaborate with SRE, DevOps, and application teams to embed observability best practices across the SDLC.
- Define and implement SLIs/SLOs, alerting strategies, and performance dashboards.
- Troubleshoot complex distributed systems and contribute to reducing MTTD and MTTR.
Other
- Bachelor's degree in Information Technology, Computer Science, or related field; or equivalent experience.
- 6+ years of experience in software engineering, platform engineering, or SRE roles.
- Excellent communication skills and a collaborative mindset.
- Ability to work both independently and collaboratively.
- Strong sense of ownership, a hunger to solve complex problems