Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Vision Research Intern-1

Centific

$30 - $50

Sep 23, 2025

Seattle, WA, USA

Centific is looking to solve the problem of translating cutting-edge research in computer vision, multimodal large models, and embodied/physical AI into production systems that can perceive, reason, and act in the real world, thereby empowering enterprise clients with safe and scalable AI deployment.

Requirements

Strong PyTorch (or JAX) and Python; comfort with CUDA profiling and mixed‑precision training.
Demonstrated research in computer vision and at least one of: VLMs (e.g., LLaVA‑style, video‑language models), embodied/physical AI, 3D perception.
Proven ability to move from paper → code → ablation → result with rigorous experiment tracking.
Experience with video models (e.g., TimeSFormer/MViT/VideoMAE), diffusion or 3D GS/NeRF pipelines, or SLAM/scene reconstruction.
Prior work on multimodal grounding (referring expressions, spatial language, affordances) or temporal reasoning.
Familiarity with ROS2, DeepStream/TAO, or edge inference optimizations (TensorRT, ONNX).
Scalable training: Ray, distributed data loaders, sharded checkpoints.

Responsibilities

Build and fine‑tune models for detection, tracking, segmentation (2D/3D), pose & activity recognition, and scene understanding (incl. 360° and multi‑view).
Train/evaluate vision–language models (VLMs) for grounding, dense captioning, temporal QA, and tool‑use; design retrieval‑augmented and agentic loops for perception‑action tasks.
Prototype perception‑in‑the‑loop policies that close the gap from pixels to actions (simulation + real data). Integrate with planners and task graphs for manipulation, navigation, or safety workflows.
Curate datasets, author high‑signal evaluation protocols/KPIs, and run ablations that make results irreproducible impossible.
Package research into reliable services on a modern stack (Kubernetes, Docker, Ray, FastAPI), with profiling, telemetry, and CI for reproducible science.
Orchestrate multi‑agent pipelines (e.g., LangGraph‑style graphs) that combine perception, reasoning, simulation, and code‑generation to self‑check and self‑correct.

Other

Ph.D. student in CS/EE/Robotics (or related), actively publishing in CV/ML/Robotics (e.g., CVPR/ICCV/ECCV, NeurIPS/ICML/ICLR, CoRL/RSS).
Public code artifacts (GitHub) and first‑author publications or strong open‑source impact.
A publishable or open‑sourced outcome (with company approval) or a production‑ready module that measurably moves a product KPI (latency, accuracy, robustness).
Clean, reproducible code with documented ablations and an evaluation report that a teammate can rerun end‑to‑end.
A demo that clearly communicates capabilities, limits, and next steps.