Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Prima Mente Logo

Data Engineer

Prima Mente

Salary not specified
Dec 2, 2025
San Francisco, California, US
Apply Now

Prima Mente's goal is to deeply understand the brain, to protect the brain from neurological disease and enhance the brain in health. We do this by generating our own data, building brain foundation models, and translating discovery to real clinical and research impact. The role focuses on Biological Data Infrastructure at Petabyte Scale.

Requirements

  • 4+ years of experience building data infrastructure or data platforms with demonstrated ability to solve complex distributed systems problems independently
  • Experience building infrastructure for large-scale data processing pipelines (both batch and streaming) using tools like Spark, Kafka, Apache Flink, Apache Beam, and with proprietary solutions like Nebius
  • Experience designing and implementing large-scale data storage systems (feature stores, timeseries DBs) for ML use cases, with strong familiarity with relational databases, data warehouses, object storage, and expertise in DB schema design
  • Experience with ML infrastructure and have worked at companies that use ML for core business functions
  • Experience building data pipelines for external data sources that are observable, debuggable, and verifiably correct, having dealt with challenges like data versioning, point-in-time correctness, and evolving schemas
  • Strong distributed systems and infrastructure skills - comfortable scaling and debugging Kubernetes services, writing Terraform, and working with orchestration tools like Flyte, Airflow, or Temporal
  • Experience with cloud platforms (AWS, GCP, Azure) and container technologies

Responsibilities

  • Owning and scaling our data infrastructure by several orders of magnitude to handle > 100 petabyte-scale multi-omic datasets, including data pipelines, distributed data processing, and storage systems
  • Building a unified feature store for all our ML models and biological data analysis workflows
  • Efficiently storing and loading petabytes of data for ML bio data
  • Processing and storing predictions and evaluation metrics for large-scale biological forecasting and analysis models
  • Implementing data versioning and point-in-time correctness systems for evolving biological datasets
  • Building observable, debuggable data pipelines that handle the complexity of multi-omic data sources
  • implementing initial optimizations to existing pipelines, and beginning work on scaling our feature store infrastructure for ML models

Other

  • Meaningful Impact: Contribute directly to research infrastructure that powers discoveries potentially impacting millions of lives.
  • Innovation & Autonomy: Work at the forefront of AI and multi-omics, with the freedom to propose and implement state-of-the-art infrastructure solutions.
  • Exceptional Team: Collaborate with talented colleagues from diverse backgrounds across ML, bioinformatics, and engineering.
  • Growth Opportunities: Continuous learning and growth opportunities in a rapidly advancing technical field.
  • Excellent communication skills and experience collaborating within multidisciplinary teams