Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Mirage Logo

Software Engineer, ML Data Platform

Mirage

$185,000 - $285,000
Oct 17, 2025
New York, NY, US
Apply Now

Mirage is looking for a Software Engineer to build and scale the data systems that power their machine learning products, focusing on data engineering and ML infrastructure to handle large-scale streaming pipelines and ensure reliable, discoverable, and performant feature data.

Requirements

  • 4+ years building distributed data systems, feature platforms, or ML infrastructure at scale.
  • Strong experience with streaming and batch pipelines (e.g. Pub/Sub, Kafka, Dataflow, Beam, Flink, Spark).
  • Deep knowledge of cloud-native data stores (e.g. Bigtable, BigQuery, DynamoDB, Snowflake) and schema/versioning best practices.
  • Proficiency in Python and experience building developer-facing libraries or SDKs.
  • Experience with Kubernetes, containerized data infrastructure, and workflow orchestration tools (e.g. Airflow, Temporal).
  • Familiarity with ML workflows and feature store design — enough to partner closely with ML teams.
  • Bonus: Experience working with video, audio, or other unstructured media data in a production environment.

Responsibilities

  • Design and scale feature pipelines: Build distributed data processing systems for feature extraction, orchestration, and serving — including real-time streaming, batch ingestion, and CDC workflows.
  • Feature Extraction: Design and implement reliable, reusable feature pipelines for ML models, ensuring features are accurate, scalable, and production-ready through well-designed SDKs and orchestration tools.
  • Build and evolve storage infrastructure: Manage multi-tier data systems (e.g. Bigtable for online features/state, BigQuery for analytics and offline training), including schema evolution, versioning, and compatibility.
  • Own orchestration and reliability: Lead workflow orchestration design (e.g. Pub/Sub, Busboy, Airflow/Temporal), monitoring, and alerting to ensure reliability at 100M+ video scale.
  • Collaborate with ML teams: Partner with ML engineers on feature availability, dataset curation, and streaming pipelines for training and inference.
  • Optimize for performance and cost: Tune GPU utilization, resource allocation, and data processing efficiency to maximize system throughput and minimize cost.
  • Enable analytics and insights: Support downstream analytics and data science workflows by ensuring data accessibility, discoverability, and performance at scale.

Other

  • All of our roles will require you to be in-person at our NYC HQ (located in Union Square)
  • We do not work with third-party recruiting agencies, please do not contact us
  • Comprehensive medical, dental, and vision plans
  • 401K with employer match
  • Generous PTO policy