Apple is seeking an Engineer to lead their team's data quality and curation process across a variety of core products and technologies, focusing on developing multimodal data modeling validation and curation processes and pipelines to ensure trustworthy and widely accepted data.
Requirements
- 6+ years of industry experience architecting and developing scalable and reliable software, pipeline and platforms for validation, analytics and curation on the multimodal data (including image, video, text, audio, sensor data, etc.)
- Proficiency with programming languages Python, Java, SQL or equivalent
- Proficiency with data pipeline, modeling, database and query tools, like Dagster, PostgreSQL, MangoDB, Trino or equivalent
- Experience with vision data processing tools like FFmpeg, GStreamer, OpenCV, or equivalent
- Experience in building Cloud Data Warehouses in Snowflake, Redshift, BigQuery or analogous architectures
- Experience with the practical application of data warehousing concepts, methodologies, and frameworks
- Experience with AI/ML frameworks like TensorFlow or PyTorch
Responsibilities
- Collaborate with cross-functional teams to establish comprehensive data quality assurance and curation processes, encompassing manual validation and automated workflows
- Define data quality metrics, implement data validation rules, and develop a scalable framework to execute diverse data validation and curation software components on the multimodal data
- Design and execute data assurance operations to run data validations, report its quality and facilitate the quality improvement throughout the data collection, processing and annotation
- Collaborate with cross-functional teams to implement a scalable framework and pipeline to extract, clean, transform, and standardize the multimodal data and metadata generated from a wide range of sources in order to make the data trustworthy and widespread discoverable and accessible
- Develop and drive the feedback loop between data consumers and data generation
Other
- Able to rotate on-call for mission-critical operations and applications
- Passion for data quality and curation, code elegance, clear documentation, operational excellence, attention to details and delivering outstanding user experiences
- Excellent communication skills with ability to confidently express the benefits and constraints of technology solutions to cross-functional technical and non-technical teams
- Experience in managing a team
- Experience with data collection and/or annotation operations