Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Data Engineer

EduWorks

$100,000 - $120,000

Sep 19, 2025

Corvallis, OR, US

Eduworks is seeking a Data Engineer to support their research and development in a government-funded autonomous vehicle (AV) driving project by designing and maintaining scalable video data pipelines, preparing annotated training corpora, and generating adversarial scenarios.

Requirements

Strong programming experience in Python, with proficiency in data libraries (Pandas, PySpark, Dask).
Experience in multimodal or video dataset preparation, including alignment of video-text pairs including large-scale video or image dataset processing pipelines.
Experience contributing to training datasets for LLMs or multimodal LLMs.
Experience implementing ETL pipelines with schema validation, logging, and quality checks.
Knowledge of Docker containerization.
Familiarity with AV datasets (e.g., BDD, nuScenes, Waymo) and annotation schemas.
Experience with using AV driving simulators (e.g. CARLA).

Responsibilities

Design, implement, and optimize data ingestion pipelines for large-scale AV datasets such as BDD100K, BDD-X, nuScenes, and Waymo Open.
Standardize, preprocess, and normalize raw video streams (e.g. frame decoding, resolution/frame-rate harmonization, perspective correction).
Develop ETL pipelines to validate schema conformity, synchronize annotations, and compute cryptographic hashes for source authenticity.
Synthetic adversarial data generation from CARLA and CHALLENGER simulators as well as diffusion-based video models.
Implement semi-supervised annotation workflows combining auto-labeling tools (e.g. YOLOv8, DETR) with human-in-the-loop quality control.
Develop tools to manage multimodal datasets (video, annotations, metadata, hashes) and package them into efficient formats such as Parquet for distributed training.
Work with ML teams to generate datasets for instruction tuning by pairing manipulated and clean sequences with interpretive rationales.

Other

2 to 5 years of Data Engineering experience
Bachelor’s or Master’s degree in Computer Science or a related field