Mithrl is building the world's first commercially available AI Co-Scientist, a discovery engine that transforms messy biological data into insights in minutes. The Data Engineer, Knowledge Graphs role is crucial for building the infrastructure that powers Mithrl's biological knowledge layer, bridging biological knowledge ingestion with high-performance engineering systems.
Requirements
- Strong experience as a data engineer or backend engineer working with data intensive systems
- Experience building ETL or ELT pipelines for large structured or semi structured datasets
- Strong understanding of database design, schema modeling, and data architecture
- Experience with graph data models or willingness to learn graph storage concepts
- Proficiency in Python or similar languages for data engineering
- Experience designing and maintaining APIs for data access
- Understanding of versioning, provenance, validation, and reproducibility in data systems
Responsibilities
- Build and maintain ETL pipelines for large public biological datasets and curated knowledge sources
- Design, implement, and evolve schemas and storage models for graph structured biological data
- Create efficient APIs and query surfaces that allow internal teams and AI systems to retrieve nodes, relationships, pathways, annotations, and graph analytics
- Partner closely with the Data Scientists to operationalize curated relationships, harmonized variable IDs, metadata standards, and ontology mappings
- Build data models that support multi tenant access, versioning, and reproducibility across releases
- Implement scalable storage and indexing strategies for high volume graph data
- Maintain data quality, validate data integrity, and build monitoring around ingestion and usage
Other
- Strong communication skills and ability to work closely with scientific and engineering teams
- Experience with cloud infrastructure and modern data stack tools
- Experience with graph databases or graph query languages
- Experience with biological or chemical data sources
- Familiarity with ontologies, controlled vocabularies, and metadata standards