Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Research Intern – Multimodal Foundation Model for Vision

Sony Electronics

From $50

Sep 24, 2025

Remote, US

Sony AI America is seeking research interns to develop efficient and effective methodologies and prototype solutions for building next-generation foundation models for vision in a responsible manner, to improve the experience of billions of customers.

Requirements

Solid coding skills in Python, Pytorch, etc.
Publications or expertise in compact foundation model development and deployment.
Influential open-source projects or paper publication at top conferences, e.g., CVPR, ICCV, ECCV, NeurIPS, ICML, ACL, etc.
Better to have front-end development experience.
Currently has, or is in the process of obtaining, a master/PhD degree in computer science or related field.

Responsibilities

Conduct fundamental and innovative development in low-cost yet powerful vision-language models (VLM), unified models, automatic model compression, optimization and deployement on cloud and edge.
Design or implement state-of-the-art techs on model compression, inference speedup, deployement on harwares, tool automation.
PoC for various vision+text, generation relevant tasks (VQA, captioning, understanding, etc) and hardwares.
Contribute to library and tool development to support business; or Publish influential research in top-tier conferences and journals.

Other

Currently has, or is in the process of obtaining, a master/PhD degree in computer science or related field.
Be very self-motivated and capable of proposing and implementing innovative ideas.
Solid presentation and communication skills to internal and external audiences.
Location flexible (Tokyo, Europe , US)
All qualified applicants will receive consideration for employment without regard to any basis protected by applicable federal, state, or local law, ordinance, or regulation.