UniversalAGI is building OpenAI for Physics, aiming to create foundation AI models for physics that enable end-to-end industrial automation from initial design through optimization, validation, and production.
Requirements
- 3+ years of hands-on experience building and scaling ML infrastructure for fine tuning, training, serving, or deployment
- Deep experience with cloud platforms (AWS, GCP, Azure) and infrastructure-as-code (Terraform, Kubernetes, Docker)
- Deep expertise in distributed training frameworks (PyTorch Distributed, DeepSpeed, Ray, etc.) and multi-GPU/multi-node orchestration
- Strong foundation in ML serving: Experience building low-latency inference systems, model optimization, and production deployment
- Expert-level coding skills in Python and infrastructure tools, comfortable diving deep into ML frameworks and optimizing performance
- Understanding of ML workflows: Training pipelines, experiment tracking, model versioning, and the full lifecycle from research to production
- Experience deploying ML in enterprise environments with strict security, compliance, and air-gapped requirements
Responsibilities
- Build and scale fine tuning & training infrastructure for foundation models, distributed training across multi-GPU and multi-node clusters, optimizing for throughput, cost, and iteration speed
- Design and implement model serving systems with low latency, high reliability, and the ability to handle complex physics workloads in production
- Build fine-tuning pipelines that let customers adapt our foundation models to their specific use cases, data, and workflows without compromising model quality or security
- Build deployment serving infrastructure for on-premise and cloud environments, working through customer security requirements and compliance constraints
- Create robust data pipelines that can ingest, validate, and preprocess massive CFD datasets from diverse sources and formats
- Instrument everything: Build observability, monitoring, and debugging tools that give our team and customers full visibility into model performance, data quality, and system health
- Work directly with customers on deployment, integration, and scaling challenges, turning their infrastructure pain points into product improvements
Other
- Work Directly with CEO & founding team
- Report to CEO
- Strong communicator capable of bridging customers, engineers, and researchers, translating infrastructure constraints into product decisions
- Outstanding execution velocity: Ships fast, debugs quickly, and thrives in ambiguity
- Exceptional problem-solving ability: Willing to dive deep into unfamiliar systems and figure out what's actually broken