1 code implementation • 8 Jun 2023 • Sanjay Subramanian, Medhini Narasimhan, Kushal Khangaonkar, Kevin Yang, Arsha Nagrani, Cordelia Schmid, Andy Zeng, Trevor Darrell, Dan Klein
We present a framework that formulates visual question answering as modular code generation.
no code implementations • 23 Mar 2023 • Medhini Narasimhan, Licheng Yu, Sean Bell, Ning Zhang, Trevor Darrell
We introduce a new pre-trained video model, VideoTaskformer, focused on representing the semantics and structure of instructional videos.
no code implementations • 14 Aug 2022 • Medhini Narasimhan, Arsha Nagrani, Chen Sun, Michael Rubinstein, Trevor Darrell, Anna Rohrbach, Cordelia Schmid
In this work, we focus on summarizing instructional videos, an under-explored area of video summarization.
1 code implementation • NeurIPS 2021 • Jiashun Wang, Huazhe Xu, Medhini Narasimhan, Xiaolong Wang
Thus, instead of predicting each human pose trajectory in isolation, we introduce a Multi-Range Transformers model which contains of a local-range encoder for individual motion and a global-range encoder for social interactions.
1 code implementation • NeurIPS 2021 • Medhini Narasimhan, Anna Rohrbach, Trevor Darrell
A generic video summary is an abridged version of a video that conveys the whole story and features the most important scenes.
no code implementations • 6 Apr 2021 • Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A. Efros, Trevor Darrell
We learn representations for video frames and frame-to-frame transition probabilities by fitting a video-specific model trained using contrastive learning.
no code implementations • 1 Jan 2021 • Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A Efros, Trevor Darrell
By randomly traversing edges with high transition probabilities, we generate diverse temporally smooth videos with novel sequences and transitions.
no code implementations • ECCV 2020 • Medhini Narasimhan, Erik Wijmans, Xinlei Chen, Trevor Darrell, Dhruv Batra, Devi Parikh, Amanpreet Singh
We also demonstrate that reducing the task of room navigation to point navigation improves the performance further.
no code implementations • NeurIPS 2018 • Medhini Narasimhan, Svetlana Lazebnik, Alexander G. Schwing
Given a question-image pair, deep network techniques have been employed to successively reduce the large set of facts until one of the two entities of the final remaining fact is predicted as the answer.
no code implementations • ECCV 2018 • Medhini Narasimhan, Alexander G. Schwing
Question answering is an important task for autonomous agents and virtual assistants alike and was shown to support the disabled in efficiently navigating an overwhelming environment.