Search Results for author: Medhini Narasimhan

Found 10 papers, 3 papers with code

Learning and Verification of Task Structure in Instructional Videos

no code implementations23 Mar 2023 Medhini Narasimhan, Licheng Yu, Sean Bell, Ning Zhang, Trevor Darrell

We introduce a new pre-trained video model, VideoTaskformer, focused on representing the semantics and structure of instructional videos.

Activity Recognition

Multi-Person 3D Motion Prediction with Multi-Range Transformers

1 code implementation NeurIPS 2021 Jiashun Wang, Huazhe Xu, Medhini Narasimhan, Xiaolong Wang

Thus, instead of predicting each human pose trajectory in isolation, we introduce a Multi-Range Transformers model which contains of a local-range encoder for individual motion and a global-range encoder for social interactions.

motion prediction Multi-Person Pose forecasting +1

CLIP-It! Language-Guided Video Summarization

1 code implementation NeurIPS 2021 Medhini Narasimhan, Anna Rohrbach, Trevor Darrell

A generic video summary is an abridged version of a video that conveys the whole story and features the most important scenes.

Query-focused Summarization Video Summarization

Strumming to the Beat: Audio-Conditioned Contrastive Video Textures

no code implementations6 Apr 2021 Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A. Efros, Trevor Darrell

We learn representations for video frames and frame-to-frame transition probabilities by fitting a video-specific model trained using contrastive learning.

Contrastive Learning Self-Supervised Learning +1

Contrastive Video Textures

no code implementations1 Jan 2021 Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A Efros, Trevor Darrell

By randomly traversing edges with high transition probabilities, we generate diverse temporally smooth videos with novel sequences and transitions.

Contrastive Learning Video Generation

Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering

no code implementations NeurIPS 2018 Medhini Narasimhan, Svetlana Lazebnik, Alexander G. Schwing

Given a question-image pair, deep network techniques have been employed to successively reduce the large set of facts until one of the two entities of the final remaining fact is predicted as the answer.

Factual Visual Question Answering General Knowledge +2

Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering

no code implementations ECCV 2018 Medhini Narasimhan, Alexander G. Schwing

Question answering is an important task for autonomous agents and virtual assistants alike and was shown to support the disabled in efficiently navigating an overwhelming environment.

Factual Visual Question Answering General Knowledge +4

Cannot find the paper you are looking for? You can Submit a new open access paper.