To ensure exceptional platform reliability and global uptime for impact.com's SaaS platform
Requirements
- Solid understanding of systems and application design
- Experience building and using observability features with tools such as ElasticSearch, Grafana, LogicMonitor, Splunk
- Experience database monitoring and SQL tuning
- Proficient in at least one high-level programming language (Python, SQL) and shell scripting
Responsibilities
- Implement comprehensive observability features to proactively identify and resolve issues
- Build and enhance monitoring and alerting solutions with tools such as Elastic APM, Splunk, Logic Monitor, Grafana
- Analyze and understand client-generated workloads to ensure resource consumption is consistent with their contract
- Troubleshoot across the entire stack: hardware, software, database, network, applications, customer-generated workloads
- Drive and contribute to root-cause analysis when issues are identified
- Configure alerts and document associated run books when thresholds are exceeded
Other
- B.S. in Computer Science or similar field or equivalent experience
- Ability to prioritize tasks and work independently
- Ability to adapt and focus on the simplest, most efficient and reliable solutions