Improving Foundation Models by leveraging data from various sources for Apple Intelligence features
Requirements
- Familiarity with streaming-processing (Kafka)
- Familiarity with a variety of build tools (Jenkins, Maven, Docker, Gradle)
- 10+ years of programming experience in Python
- Extensive experiences in concurrency and parallelism, functional programming, decorators
- Familiarity in other programming languages (Java/GoLang/Rust/Swift)
- Proficient in REST API, Redis, VectorDB or other large scale data storage systems
Responsibilities
- Training data pipeline development. Convert raw data into format acceptable by training jobs on GCP and AWS. Leverage internal and open-sourced training modules.
- Large scale inferences: Leverage internal and open-sourced inference stack to generate inferences with fine-tuned LLMs on massive amounts of data, for pre-train and post-training
- Data processing and data filtering: Have the ability to efficiently process and filter very large amounts of data, often times messy.
- Scalable web services backend and APIs to support data access and data inspection tools
Other
- BS in Computer Science or Equivalent
- A good communicator with clear and concise, active listening and empathy skills
- Are self-motivated and curious. Strive to continually learn on the job
- Have demonstrated creative and critical thinking with an innate drive to improve how things work
- Apple is an equal opportunity employer that is committed to inclusion and diversity