Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

David AI Logo

Software Engineer, Machine Learning Infrastructure

David AI

$140,000 - $230,000
Aug 13, 2025
San Francisco, CA, US
Apply Now

David AI is looking to build and scale the core infrastructure that powers their cutting-edge audio ML products, enabling researchers and engineers to train, deploy, and evaluate machine learning models efficiently.

Requirements

  • 5+ years of backend engineering with 2+ years ML infrastructure experience.
  • Hands-on experience scaling cloud infrastructure and large-scale data processing pipelines for ML model training and evaluation.
  • Proficient with Docker, Kubernetes, and CI/CD pipelines.
  • Proven ML model deployment and lifecycle management in production.
  • Strong system design skills optimizing for scale and performance.
  • Proficient in Python with deep Kubernetes experience.
  • Experience with feature stores, experiment tracking (MLflow, Weights and Biases), or custom CI/CD pipelines.

Responsibilities

  • Design and maintain data pipelines for processing massive audio datasets, ensuring terabytes of data are managed, versioned, and fed into model training efficiently.
  • Develop frameworks for training audio models on compute clusters, managing cloud resources, optimizing GPU utilization, and improving experiment reproducibility.
  • Create robust infrastructure for deploying ML models to production, including APIs, microservices, model serving frameworks, and real-time performance monitoring.
  • Apply software engineering best practices with monitoring, logging, and alerting to guarantee high availability and fault-tolerant production workloads.
  • Translate research prototypes into production pipelines, working with ML engineers and data teams to support efficient data labeling and preparation.
  • Evaluate and integrate new MLOps technologies and optimization techniques to enhance infrastructure velocity and reliability.

Other

  • Proven ability to thrive in fast-moving startup environments.