Uber's data infrastructure needs enhancement in terms of data freshness and cost efficiency, and development of a cutting-edge metadata management service for unstructured data.
Requirements
- Storage Experience is a must Proven experience in designing and building large-scale data storage and unified data catalog solutions.
- Strong understanding of data platform technologies, including data lakehouse architecture and cloud infrastructure (preferably GCP or OCI).
- Demonstrated expertise in data infrastructure components such as data catalog and storage systems.
- Extensive experience with technologies like Apache Spark, Flink, Kafka, and Table Formats (Hudi, Iceberg, Delta).
- Deep knowledge of data catalog/metadata management.
- Experience developing and implementing S3-compatible APIs and managing large-scale tabular and unstructured data.
- Familiarity with multi-cloud environments (OCI/GCP) and strategies for resource efficiency, including developing intelligent scheduling and portable data solutions.
Responsibilities
- Lead the design, development, and deployment of new features for the large scale cloud data storage platform.
- Drive the architecture and implementation of a unified catalog, focusing on creating a lightweight metadata layer, leveraging native cloud object store capabilities, and on behalf of Uber, to engage the community and contribute to open source projects.
- Collaborate on the development and rollout of uFlash, including implementing full S3 API support, integrating with existing clients, and onboarding new customers.
- Contribute to strategic Data for AI initiatives.
Other
- Basic Qualifications: Storage Experience is a must
- Preferred Qualifications
- Bachelor's, Master's, or Ph.D. degree is not explicitly mentioned but may be required
- Travel requirements not mentioned