Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Altos Labs Logo

Staff Software Engineer, Data Curation

Altos Labs

$221,850 - $300,150
Sep 15, 2025
San Francisco, CA, US
Apply Now

Altos Labs aims to restore cell health and resilience through cell rejuvenation to reverse disease, injury, and disabilities. The job is to use AI agents to make complex research data FAIR (Findable, Accessible, Interoperable, Reusable) so scientists and product teams can ask richer questions, move faster, and advance discovery. This involves enabling the transition from manual to LLM-enabled, agentic data ingestion and curation.

Requirements

  • Demonstrably strong Python expertise, particularly in the context of data modeling and processing, with strong skills in both relational (SQL) and graph data stores, and the ability to choose pragmatically between them (e.g., Postgres/Redshift vs. Neo4j/Neptune).
  • Comfortable building pragmatic ETL/ELT workflows in a major cloud (preferably AWS), using orchestration frameworks or AWS-native tools.
  • Active user of AI coding editors such as Cursor, with an active interest in designing and building Model Context Protocol (MCP) applications; motivated to migrate processes from manual → automation → agentic.
  • Mature understanding of data quality, provenance, versioning, and “curation as code,” including hands-on use of testing/validation frameworks.
  • Experience with vector databases and search (e.g., Weaviate, FAISS, pgvector) and AI/LLM frameworks (e.g., LiteLLM, LangChain, LlamaIndex) for retrieval-augmented generation and agent workflows.
  • Experience with OBO Foundry ontologies and modern frameworks such as LinkML, BioLink, and BioCypher, familiarity with graph database technologies (e.g., Neo4j, AWS Neptune) and semantic standards (OWL, RDF, SPARQL).
  • Experience creating lightweight semantic layers and AI/LLM-assisted curation workflows (LiteLLM, FastMCP).

Responsibilities

  • Curate and harmonize data. Ingest, profile, clean, normalize, and annotate multi-modal research datasets (e.g., genomics/transcriptomics, proteomics, imaging/microscopy, CRISPR screens, assay/instrument metadata). Map to controlled vocabularies and standards; manage identifiers, synonyms, and crosswalks.
  • Deliver insights from curated data. Focus on the substance—entities, relationships, and annotations that answer real research and product questions using public domain assets from Ensembl, GEO, PubMed, OMIM, OLS, amongst others. Use pipelines and existing data sources storage pragmatically as tools to deliver content and outcomes.
  • Model knowledge to serve decisions. Capture the concepts and links researchers actually use; keep schemas lightweight and purpose-built. Leverage OBO Foundry ontologies; define with LinkML; align to the BioLink/Biolink Model; and integrate/serve with platforms such as BioCypher.
  • Quality, governance & AI enablement. Instrument automated checks (tests/expectations), process development to improvement data FAIRification, and LLM-assisted validations; capture provenance/lineage; codify SOPs; and work to facilitate the migration of processes from manual → automation → agentic (MCP‑integrated) workflows.
  • Serve as a key technical liaison between scientific, data science, and engineering teams, translating complex research needs into scalable and maintainable data solutions.
  • Define and evangelize best practices for data and knowledge engineering across the organization, mentoring junior team members and building reusable, AI-enhanced, enterprise-level components.

Other

  • PhD, Biological Sciences, Computer Science, Software Engineering, or related quantitative field, or equivalent technical experience
  • Candidates should have 8+ years of relevant experience in data curation, ontology/knowledge engineering, or data engineering (or equivalent experience) at a biotechnology company.
  • Mindset: You prioritize data and business objectives over tools; technology is a means to an end.
  • Experience in basic/exploratory life‑science research across multiple modalities (genomics/transcriptomics, proteomics, imaging/microscopy, screening, model organisms); a user of curated content to achieve research/business outcomes.
  • Experience with a data platform such as lamin.ai.