Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Oracle Logo

Data Scientist III

Oracle

$97,500 - $199,500
Sep 6, 2025
New York, NY, US
Apply Now

Oracle Cloud Infrastructure (OCI) is looking to leverage AI and Generative AI to train large-scale models, requiring expertise in data science and machine learning to develop and manage the necessary datasets and pipelines.

Requirements

  • Proficiency in Python and solid foundation in applied ML methods.
  • Proficiency with Pytorch, Torchvision, OpenCV, and similar, as well as building and deploying DNN models in production.
  • Experience building large-scale data pipelines for acquisition, cleaning, augmentation, and validation of data.
  • Ability to evaluate datasets for distribution, diversity, anomalies and fairness to assess overall quality and suitability for generative AI.
  • Experience with Computer Vision, NLP, Transformers, Large Language Models, Generative AI, optimizations around LLM training and serving.
  • Experience with multimodal models a bonus.
  • Proven track record of delivering scalable, data-centric ML solutions.

Responsibilities

  • Design and develop large-scale datasets to power generative AI models in multimodal domains (e.g., text, vision, speech), with a focus on synthetic data creation.
  • Build robust pipelines and tooling for data acquisition, cleaning, transformation, and quality assurance to support model training and evaluation.
  • Research, implement, and adapt cutting-edge techniques (e.g., fine-tuning, RLHF, data augmentation) to align generative models with domain-specific needs.
  • Curate and annotate datasets, ensuring diversity, representativeness, and compliance with responsible AI practices.
  • Evaluate open-source and research models, integrating best practices into data generation workflows.
  • Collaborate with engineering teams to ensure datasets and synthetic data pipelines are scalable, reliable, and production ready.
  • Develop metrics and benchmarking frameworks to assess data quality, model alignment, and downstream impact across modalities.

Other

  • 3+ years of industry experience.
  • Bachelors or Master’s in Computer Science, Data Science, AI/ML, or related field.
  • Partner cross-functionally with product, research, and infrastructure teams to drive innovation in data preparation and generative AI applications.
  • Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.
  • May be eligible for bonus and equity.