Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

UniversalAGI Logo

Founding ML Infrastructure Engineer

UniversalAGI

Salary not specified
Nov 13, 2025
Remote, US
Apply Now

UniversalAGI is building OpenAI for Physics, aiming to create foundation AI models for physics that enable end-to-end industrial automation from initial design through optimization, validation, and production.

Requirements

  • 3+ years of hands-on experience building and scaling ML infrastructure for fine tuning, training, serving, or deployment
  • Deep experience with cloud platforms (AWS, GCP, Azure) and infrastructure-as-code (Terraform, Kubernetes, Docker)
  • Deep expertise in distributed training frameworks (PyTorch Distributed, DeepSpeed, Ray, etc.) and multi-GPU/multi-node orchestration
  • Strong foundation in ML serving: Experience building low-latency inference systems, model optimization, and production deployment
  • Expert-level coding skills in Python and infrastructure tools, comfortable diving deep into ML frameworks and optimizing performance
  • Understanding of ML workflows: Training pipelines, experiment tracking, model versioning, and the full lifecycle from research to production
  • Experience deploying ML in enterprise environments with strict security, compliance, and air-gapped requirements

Responsibilities

  • Build and scale fine tuning & training infrastructure for foundation models, distributed training across multi-GPU and multi-node clusters, optimizing for throughput, cost, and iteration speed
  • Design and implement model serving systems with low latency, high reliability, and the ability to handle complex physics workloads in production
  • Build fine-tuning pipelines that let customers adapt our foundation models to their specific use cases, data, and workflows without compromising model quality or security
  • Build deployment serving infrastructure for on-premise and cloud environments, working through customer security requirements and compliance constraints
  • Create robust data pipelines that can ingest, validate, and preprocess massive CFD datasets from diverse sources and formats
  • Instrument everything: Build observability, monitoring, and debugging tools that give our team and customers full visibility into model performance, data quality, and system health
  • Work directly with customers on deployment, integration, and scaling challenges, turning their infrastructure pain points into product improvements

Other

  • Work Directly with CEO & founding team
  • Report to CEO
  • Strong communicator capable of bridging customers, engineers, and researchers, translating infrastructure constraints into product decisions
  • Outstanding execution velocity: Ships fast, debugs quickly, and thrives in ambiguity
  • Exceptional problem-solving ability: Willing to dive deep into unfamiliar systems and figure out what's actually broken