Uber needs a centralized, reliable, and interactive observability data platform that includes metrics, logging, and tracing to empower engineers with the tools needed for monitoring, troubleshooting, and performing root cause analysis at scale. The Metrics team is responsible for delivering a cutting-edge, end-to-end distributed metrics solution designed to operate at Uber's scale, ingesting over 5 billion metrics per second and handling over 25K queries per second, with cardinality up to 500K. The system needs to evolve to meet increasing demands and provide intelligent insights that identify issues before they impact customers.
Requirements
- Proficient in one or more backend languages, like Java, Go, C/C++, C-Sharp, with the ability to pick up new ones quickly.
- Strong problem solving skills, with relevant experience in designing and implementing large scale distributed backend services
- Proven record of building and operating highly reliable distributed systems at scale.
- Experience with OpenTelemetry, Prometheus, Influx and/or building and operating monitoring infrastructure at large scale.
- Under the hood experience with Apache Lucene, ElasticSearch, OpenSearch and other Search technologies is a big plus.
- Batch and stream data process pipeline experience is a plus.
Responsibilities
- Design system architecture, own key components to deliver a centralized metric system for Uber.
- Join on call rotation, driving continuous improvements on system availability, scalability, performance and efficiency.
- Collaborate with other infrastructure teams, production engineer team and product manager to drive adoption and best practices, and to design and implement high impact, cross-product features.
Other
- BS or higher degree in Computer Science, or a related technical discipline, or equivalent experience.