Oracle Cloud Infrastructure (OCI) is looking to leverage AI and Generative AI to train large-scale models, requiring expertise in data science and machine learning to develop and manage the necessary datasets and pipelines.
Requirements
- Proficiency in Python and solid foundation in applied ML methods.
- Proficiency with Pytorch, Torchvision, OpenCV, and similar, as well as building and deploying DNN models in production.
- Experience building large-scale data pipelines for acquisition, cleaning, augmentation, and validation of data.
- Ability to evaluate datasets for distribution, diversity, anomalies and fairness to assess overall quality and suitability for generative AI.
- Experience with Computer Vision, NLP, Transformers, Large Language Models, Generative AI, optimizations around LLM training and serving.
- Experience with multimodal models a bonus.
- Proven track record of delivering scalable, data-centric ML solutions.
Responsibilities
- Design and develop large-scale datasets to power generative AI models in multimodal domains (e.g., text, vision, speech), with a focus on synthetic data creation.
- Build robust pipelines and tooling for data acquisition, cleaning, transformation, and quality assurance to support model training and evaluation.
- Research, implement, and adapt cutting-edge techniques (e.g., fine-tuning, RLHF, data augmentation) to align generative models with domain-specific needs.
- Curate and annotate datasets, ensuring diversity, representativeness, and compliance with responsible AI practices.
- Evaluate open-source and research models, integrating best practices into data generation workflows.
- Collaborate with engineering teams to ensure datasets and synthetic data pipelines are scalable, reliable, and production ready.
- Develop metrics and benchmarking frameworks to assess data quality, model alignment, and downstream impact across modalities.
Other
- 3+ years of industry experience.
- Bachelors or Master’s in Computer Science, Data Science, AI/ML, or related field.
- Partner cross-functionally with product, research, and infrastructure teams to drive innovation in data preparation and generative AI applications.
- Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.
- May be eligible for bonus and equity.