MIT Lincoln Laboratory's Systems Engineering Group is seeking to implement a Retrieval-Augmented Generation (RAG) pipeline to enable natural language queries that return accurate and structured outputs.
Requirements
- Familiarity with Python programming and working with APIs
- Basic understanding of SQL databases (queries, joins, filtering).
- Coursework or project exposure to machine learning or NLP.
- Familiarity with vector databases
- Exposure to LLM frameworks (LangChain, LlamaIndex)
- Python programming
- SQL querying
Responsibilities
- Work with the engineering team to understand database schema
- Implement data ingestion and embeddings into a vector store.
- Set up a RAG pipeline (e.g., using LangChain, LlamaIndex, or Haystack) connected to the SQL database.
- Develop simple prompts and queries to return structured data from natural language questions.
- Test the system with real engineering queries and improve retrieval accuracy.
- Document implementation steps and create a “how-to” guide for future engineers.
Other
- Currently pursuing or recently completed a degree in Computer Science, Data Science, Electrical/Computer Engineering, or related field.
- Eagerness to learn and ability to work independently with guidance.
- U.S. citizenship is required.
- Must be able to obtain and maintain a Secret level DoD security clearance.
- Ability to work on-site