Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

EduWorks Logo

Data Engineer

EduWorks

$100,000 - $120,000
Sep 19, 2025
Corvallis, OR, US
Apply Now

Eduworks is seeking a Data Engineer to support their research and development in a government-funded autonomous vehicle (AV) driving project by designing and maintaining scalable video data pipelines, preparing annotated training corpora, and generating adversarial scenarios.

Requirements

  • Strong programming experience in Python, with proficiency in data libraries (Pandas, PySpark, Dask).
  • Experience in multimodal or video dataset preparation, including alignment of video-text pairs including large-scale video or image dataset processing pipelines.
  • Experience contributing to training datasets for LLMs or multimodal LLMs.
  • Experience implementing ETL pipelines with schema validation, logging, and quality checks.
  • Knowledge of Docker containerization.
  • Familiarity with AV datasets (e.g., BDD, nuScenes, Waymo) and annotation schemas.
  • Experience with using AV driving simulators (e.g. CARLA).

Responsibilities

  • Design, implement, and optimize data ingestion pipelines for large-scale AV datasets such as BDD100K, BDD-X, nuScenes, and Waymo Open.
  • Standardize, preprocess, and normalize raw video streams (e.g. frame decoding, resolution/frame-rate harmonization, perspective correction).
  • Develop ETL pipelines to validate schema conformity, synchronize annotations, and compute cryptographic hashes for source authenticity.
  • Synthetic adversarial data generation from CARLA and CHALLENGER simulators as well as diffusion-based video models.
  • Implement semi-supervised annotation workflows combining auto-labeling tools (e.g. YOLOv8, DETR) with human-in-the-loop quality control.
  • Develop tools to manage multimodal datasets (video, annotations, metadata, hashes) and package them into efficient formats such as Parquet for distributed training.
  • Work with ML teams to generate datasets for instruction tuning by pairing manipulated and clean sequences with interpretive rationales.

Other

  • 2 to 5 years of Data Engineering experience
  • Bachelor’s or Master’s degree in Computer Science or a related field