Centific is looking to solve the problem of translating cutting-edge research in computer vision, multimodal large models, and embodied/physical AI into production systems that can perceive, reason, and act in the real world, thereby empowering enterprise clients with safe and scalable AI deployment.
Requirements
- Strong PyTorch (or JAX) and Python; comfort with CUDA profiling and mixed‑precision training.
- Demonstrated research in computer vision and at least one of: VLMs (e.g., LLaVA‑style, video‑language models), embodied/physical AI, 3D perception.
- Proven ability to move from paper → code → ablation → result with rigorous experiment tracking.
- Experience with video models (e.g., TimeSFormer/MViT/VideoMAE), diffusion or 3D GS/NeRF pipelines, or SLAM/scene reconstruction.
- Prior work on multimodal grounding (referring expressions, spatial language, affordances) or temporal reasoning.
- Familiarity with ROS2, DeepStream/TAO, or edge inference optimizations (TensorRT, ONNX).
- Scalable training: Ray, distributed data loaders, sharded checkpoints.
Responsibilities
- Build and fine‑tune models for detection, tracking, segmentation (2D/3D), pose & activity recognition, and scene understanding (incl. 360° and multi‑view).
- Train/evaluate vision–language models (VLMs) for grounding, dense captioning, temporal QA, and tool‑use; design retrieval‑augmented and agentic loops for perception‑action tasks.
- Prototype perception‑in‑the‑loop policies that close the gap from pixels to actions (simulation + real data). Integrate with planners and task graphs for manipulation, navigation, or safety workflows.
- Curate datasets, author high‑signal evaluation protocols/KPIs, and run ablations that make results irreproducible impossible.
- Package research into reliable services on a modern stack (Kubernetes, Docker, Ray, FastAPI), with profiling, telemetry, and CI for reproducible science.
- Orchestrate multi‑agent pipelines (e.g., LangGraph‑style graphs) that combine perception, reasoning, simulation, and code‑generation to self‑check and self‑correct.
Other
- Ph.D. student in CS/EE/Robotics (or related), actively publishing in CV/ML/Robotics (e.g., CVPR/ICCV/ECCV, NeurIPS/ICML/ICLR, CoRL/RSS).
- Public code artifacts (GitHub) and first‑author publications or strong open‑source impact.
- A publishable or open‑sourced outcome (with company approval) or a production‑ready module that measurably moves a product KPI (latency, accuracy, robustness).
- Clean, reproducible code with documented ablations and an evaluation report that a teammate can rerun end‑to‑end.
- A demo that clearly communicates capabilities, limits, and next steps.