Improve Foundation Models by leveraging data from a variety of sources: crawl, license, vendor and internal crowd-sourcing for Apple Intelligence features.
Requirements
- 10+ years of programming experience in Python.
- Extensive experiences in concurrency and parallelism, functional programming, decorators.
- Familiarity in other programming languages (Java/GoLang/Rust/Swift).
- Proficient in REST API, Redis, VectorDB or other large scale data storage systems.
- Solid foundational programming skills (algorithms, data structures, OOP, etc).
- Experience designing, writing, reviewing, testing and delivering software for applications and systems at scale.
- Familiarity with streaming-processing (Kafka). Familiarity with a variety of build tools (Jenkins, Maven, Docker, Gradle).
Responsibilities
- Training data pipeline development.
- Convert raw data into format acceptable by training jobs on GCP and AWS.
- Leverage internal and open-sourced training modules.
- Large scale inferences: Leverage internal and open-sourced inference stack to generate inferences with fine-tuned LLMs on massive amounts of data, for pre-train and post-training
- Data processing and data filtering: Have the ability to efficiently process and filter very large amounts of data, often times messy.
- Scalable web services backend and APIs to support data access and data inspection tools
Other
- BS in Computer Science or Equivalent.
- A good communicator with clear and concise, active listening and empathy skills.
- Are self-motivated and curious. Strive to continually learn on the job.
- Have demonstrated creative and critical thinking with an innate drive to improve how things work. Have a high tolerance for ambiguity.
- Experience providing architecture and design guidance.