Apple is seeking to improve the quality and accessibility of multimodal data across various core products and technologies, ensuring trustworthy and widely accepted data.
Requirements
- Experience in building Cloud Data Warehouses in Snowflake, Redshift, BigQuery or analogous architectures
- Experience with the practical application of data warehousing concepts, methodologies, and frameworks
- Experience with AI/ML frameworks like TensorFlow or PyTorch
- Experience with machine learning algorithms for data curation and annotation
- Proficiency with programming languages Python, Java, SQL or equivalent
- Proficiency with data pipeline, modeling, database and query tools, like Dagster, PostgreSQL, MangoDB, Trino or equivalent
- Experience with vision data processing tools like FFmpeg, GStreamer, OpenCV, or equivalent
Responsibilities
- Collaborate with cross-functional teams to establish comprehensive data quality assurance and curation processes, encompassing manual validation and automated workflows
- Define data quality metrics, implement data validation rules, and develop a scalable framework to execute diverse data validation and curation software components on the multimodal data
- Design and execute data assurance operations to run data validations, report its quality and facilitate the quality improvement throughout the data collection, processing and annotation
- Collaborate with cross-functional teams to implement a scalable framework and pipeline to extract, clean, transform, and standardize the multimodal data and metadata generated from a wide range of sources in order to make the data trustworthy and widespread discoverable and accessible
- Develop and drive the feedback loop between data consumers and data generation
- Design and implement systematic processes, automated pipelines, and collaborate with data collection, data processing, and ML & product engineers to create high quality data, support data-driven product and AIML development, and ensure data compliance with security and privacy regulations
- Able to rotate on-call for mission-critical operations and applications
Other
- 6+ years of industry experience architecting and developing scalable and reliable software, pipeline and platforms for validation, analytics and curation on the multimodal data
- B.S. in Computer Science and/or an equivalent engineering field
- Excellent communication skills with ability to confidently express the benefits and constraints of technology solutions to cross-functional technical and non-technical teams
- Experience in managing a team
- Able to work with diverse teams and promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics