Mirage is seeking to solve the problem of bringing large-scale multimodal video diffusion models into production to deliver low-latency, high-throughput inference at scale for millions of creators worldwide.
Requirements
- Proven experience deploying deep learning models on GPU-based infrastructure (NVIDIA GPUs, CUDA, TensorRT, etc.).
- Strong knowledge of containerization (Docker, Kubernetes) and microservice architectures for ML model serving.
- Proficiency with Python and at least one deep learning framework (PyTorch, TensorFlow).
- Familiarity with compression techniques (quantization, pruning, distillation) for large-scale models.
- Experience profiling and optimizing model inference (batching, concurrency, hardware utilization).
- Hands-on experience with ML pipeline orchestration (Airflow, Kubeflow, Argo) and automated CI/CD for ML.
- Strong grasp of logging, monitoring, and alerting tools (Prometheus, Grafana, etc.) in distributed systems.
Responsibilities
- Develop high-performance GPU-based inference pipelines for large multimodal diffusion models.
- Build, optimize, and maintain serving infrastructure to deliver low-latency predictions at large scale.
- Collaborate with DevOps teams to containerize models, manage autoscaling, and ensure uptime SLAs.
- Leverage techniques like quantization, pruning, and distillation to reduce latency and memory footprint without compromising quality.
- Implement continuous fine-tuning workflows to adapt models based on real-world data and feedback.
- Design and maintain automated CI/CD pipelines for model deployment, versioning, and rollback.
- Explore cutting-edge GPU acceleration frameworks (e.g., TensorRT, Triton, TorchServe) to continuously improve throughput and reduce costs.
Other
- Must be in-person at our NYC HQ (located in Union Square)
- Comprehensive medical, dental, and vision plans
- 401K with employer match
- Generous PTO policy
- Captions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type