Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

AI/ML Data Engineer

The College Board

$137,000 - $148,000

Dec 3, 2025

Remote, US

College Board's BigFuture Division needs to implement data and analytics services to personalize higher-education recruitment and student engagement, requiring the design, building, and operation of data and ML plumbing for personalized student experiences at scale.

Requirements

4+ years in data engineering (or 3+ with substantial ML productionization), with strong Python and distributed compute (Spark/Glue/Dask) skills.
Proven experience shipping ML data systems (training/eval datasets, feature or embedding pipelines, artifact/version management, experiment tracking).
MLOps/LLMOps: orchestration (Airflow/Step Functions), containerization (Docker), and deployment (SageMaker/EKS/ECS); CI/CD for data & models.
Expert SQL and data modeling for lakehouse/warehouse (Redshift/Athena/Iceberg), with performance tuning for large datasets.
Data quality & contracts (Great Expectations/Deequ), lineage/metadata (OpenLineage/DataHub/Amundsen), and drift/skew monitoring.
Cloud experience preferably with AWS services such as S3, Glue, Lambda, Athena, Bedrock, OpenSearch, API Gateway, DynamoDB, SageMaker, Step Functions, Redshift and Kinesis
RAG & vector search experience (OpenSearch KNN/pgvector/FAISS) and prompt/eval frameworks.

Responsibilities

Design, build, and own batch and streaming ETL (e.g., Kinesis/Kafka → Spark/Glue → Step Functions/Airflow) for training, evaluation, and inference use cases.
Stand up and maintain offline/online feature stores and embedding pipelines (e.g., S3/Parquet/Iceberg + vector index) with reproducible backfills.
Implement data contracts & validation (e.g., Great Expectations/Deequ), schema evolution, and metadata/lineage capture (e.g., OpenLineage/DataHub/Amundsen).
Optimize lakehouse/warehouse layouts and partitioning (e.g., Redshift/Athena/Iceberg) for scalable ML and analytics.
Productionize training and evaluation datasets with versioning (e.g., DVC/LakeFS) and experiment tracking (e.g., MLflow).
Build RAG foundations: document ingestion, chunking, embeddings, retrieval indexing, and quality evaluation (precision@k, faithfulness, latency, and cost).
Collaborate with DS to ship models to serving (e.g., SageMaker/EKS/ECS), automate feature backfills, and capture inference data for continuous improvement.

Other

This is a fully remote role that requires working EST hours.
Authorization to work in the United States for any employer
Curiosity and enthusiasm for emerging technologies, with a willingness to experiment with and adopt new AI-driven solutions and a comfort learning and applying new digital tools independently and proactively.
Clear and concise communication skills, written and verbal
A learner's mindset and a commitment to growth: welcoming diverse perspectives, giving and receiving timely, respectful feedback, and continuously improving through iterative learning and user input.