Pelago is seeking a Data Engineer to expand its AI capabilities, focusing on integrating traditional data infrastructure with next-gen AI workflows, including LLMs, agents, vector search, and human-in-the-loop systems, to enable intelligent, real-time workflows while maintaining core data infrastructure for analytics and operational needs.
Requirements
- 3+ years in a data engineering or related role
- Strong SQL and Python programming skills
- Experience with modern data stacks (dbt, Airflow, Redshift or Snowflake)
- Familiarity with LLM frameworks (e.g., LangChain, LlamaIndex, DSPy)
- Exposure to vector databases and retrieval-based pipelines
- Experience with FastAPI, MLflow, or similar tooling
- Knowledge of AI observability or prompt tuning frameworks (e.g., TruLens, Ragas)
Responsibilities
- Design, develop, and maintain production-grade ELT/ETL pipelines using modern tools like dbt, Airbyte, Census, and Airflow
- Architect scalable, modular data systems leveraging Redshift or Snowflake to support analytics and operational use cases
- Collaborate with analytics and product teams to deliver clean, governed, and high-impact data models
- Ensure performance, reliability, and observability across batch and streaming workflows
- Uphold best practices for security, compliance, and handling of regulated healthcare data, including PHI
- Build and orchestrate LLM-based agents using frameworks like LangChain, LlamaIndex, or DSPy
- Integrate pipelines with vector databases (e.g., Pinecone, Weaviate) to enable retrieval-augmented generation (RAG)
Other
- Effective communicator with experience working cross-functionally
- Work closely with product, clinical, and platform teams to define technical requirements and deliver impactful solutions
- Document and share best practices to onboard teammates into LLM workflows and tools
- Influence the roadmap for AI-augmented reporting, automation, and operational intelligence across the organization
- 4 days/week in our NYC office