Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Semantic Backend Engineer (Contract, Remote)

INFUSE

Salary not specified

Nov 11, 2025

Remote, US

INKHUB is ingesting 10 million raw PDFs to build the internet’s richest catalog of marketing-grade B2B content - tagged, summarized, and searchable by topic, company, or intent. The applied ML engineer will own the semantic ingestion pipeline, from raw PDFs to tagged, summarized, and embedded assets.

Requirements

Python, PyTorch, sentence-transformers, OpenAI APIs, or similar pretrained LLMs.
FastAPI, Milvus or pgvector, PyPDF/Tika, Airflow or Lambda for orchestration
Docker, GPU scheduling, Athena/Redshift SQL
You’ve built ML pipelines that touched real users, not just notebooks
You’ve worked on semantic search, embeddings, or large-scale tagging
You’ve wrestled with unstructured data and love turning chaos into clarity

Responsibilities

Own the ETL pipeline from raw PDFs (S3-ingested) to structured resources
Finalize our summarization + classification flow using open-source models with GPT-4o fallback
Apply filtering logic (≤3 years old, ≤100 pages, etc) to enforce resource quality
Map each asset to the specific topic taxonomy (10+ per topic across ~9,000 topics)
Generate dense embeddings using sentence-transformers
Load and query embeddings using Milvus or pgvector
Implement “freshness” logic to identify and index only new or updated content based on file diffing, crawl timestamp, or document hash

Other

You like working fast, iterating with feedback, and tracking metrics that matter