Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior Data Engineer

Monstro

Salary not specified

Sep 5, 2025

New York, NY, US

Monstro is looking to build and operate pipelines that turn real-world financial information into reliable, queryable data to support retrieval, knowledge graphs, agents, analytics, and machine learning.

Requirements

Strong Python and SQL.
Hands-on document parsing and ETL across PDFs, HTML, JSON, and XML.
Experience operating vector databases such as pgvector, Pinecone, or Weaviate, with multiple collections.
Building and scheduling ingestion via APIs, web downloads, and cron or an orchestrator, plus cloud storage and queues.
Understanding of embeddings, chunking strategies, metadata design, and retrieval evaluation.
Solid data modeling, schema design, indexing, and performance tuning across storage types.
History of implementing data quality checks, observability, and access controls for sensitive data.

Responsibilities

Build and own scalable pipelines that parse and normalize unstructured sources for retrieval, knowledge graphs, and agents.
Conceive and implement novel processes for processing thousands of types of unstructured documents with accuracy and consistency
Process semi-structured sources into consistent, validated schemas.
Transform structured datasets for analytics, features, and retrieval workloads.
Create, version, and maintain multiple collections in a vector database.
Design and implement robust multi-modal document processing systems that handle heterogeneous file formats (PDFs, images, HTML, XML) with automatic schema inference, content extraction validation, and graceful degradation for malformed inputs, maintaining 99.9% pipeline uptime SLA.
Own ingestion from APIs, file drops, partner feeds, and scheduled jobs with monitoring, retries, and alerting.

Other

Minimum 2 years in a dedicated Data Engineering role at an AI-native startup or 4+ years of experience in traditional Data Engineering, with ~8+ years of experience in Tech overall.
Ownership mindset, clear written communication, and effective collaboration with product and engineering.
Proven ownership of end-to-end pipelines (ingestion → transformation → serving), including scalable sourcing processes, ETL pipelines, and serving services.
Experience owning and operating infrastructure in production environments.
Track record of delivering high-consistency systems for mission-critical data pipelines.