MARA is seeking to solve the problem of deploying, scaling, and governing AI workloads across data centers, edge environments, and sovereign clouds by building a modular platform that unifies IaaS, PaaS, and SaaS.
Requirements
- Proven expertise in model serving and inference optimization (TensorRT, ONNX, vLLM, Triton, DeepSpeed, or similar).
- Strong proficiency in Python, with experience building APIs and pipelines using FastAPI, PyTorch, and Hugging Face tooling.
- Experience configuring and tuning RAG systems (vector databases such as Milvus, Weaviate, LanceDB, or pgvector).
- Solid foundation in MLOps practices: versioning (MLflow, DVC), orchestration (Airflow, Kubeflow), and monitoring (Prometheus, Grafana, Sentry).
- Familiarity with distributed compute systems (Kubernetes, Ray, Slurm) and cloud ML stacks (AWS Sagemaker, GCP Vertex AI, Azure ML).
- Understanding of prompt engineering, agentic frameworks, and LLM evaluation.
- Experience with Kubeflow, Airflow, Ray, MLflow
Responsibilities
- Own the end-to-end lifecycle of ML model deployment—from training artifacts to production inference services.
- Design, build, and maintain scalable inference pipelines using modern orchestration frameworks (e.g., Kubeflow, Airflow, Ray, MLflow).
- Implement and optimize model serving infrastructure for latency, throughput, and cost efficiency across GPU and CPU clusters.
- Develop and tune Retrieval-Augmented Generation (RAG) systems, including vector database configuration, embedding optimization, and retriever–generator orchestration.
- Evaluate, benchmark, and optimize large language and multimodal models using quantization, pruning, and distillation techniques.
- Design CI/CD workflows for ML systems, ensuring reproducibility, observability, and continuous delivery of model updates.
- Monitor production model performance, detect drift, and drive improvements to reliability and explainability.
Other
- 5+ years of experience in applied ML or ML infrastructure engineering.
- Strong collaboration and documentation skills, with ability to bridge ML research, DevOps, and product development.
- Background in HPC, ML infrastructure, or sovereign/regulated environments (preferred).
- Familiarity with energy-aware computing, modular data centers, or ESG-driven infrastructure design (preferred).
- Experience collaborating with European and global engineering partners (preferred)