Search Results for author: Sameer Dharur

Found 4 papers, 2 papers with code

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

no code implementations23 Oct 2023 Gautam Krishna, Sameer Dharur, Oggi Rudovic, Pranay Dighe, Saurabh Adya, Ahmed Hussen Abdelaziz, Ahmed H Tewfik

Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech.

Automatic Speech Recognition Binary Classification +2

Episodic Memory Question Answering

no code implementations CVPR 2022 Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, Devi Parikh

Towards that end, we introduce (1) a new task - Episodic Memory Question Answering (EMQA) wherein an egocentric AI assistant is provided with a video sequence (the tour) and a question as an input and is asked to localize its answer to the question within the tour, (2) a dataset of grounded questions designed to probe the agent's spatio-temporal understanding of the tour, and (3) a model for the task that encodes the scene as an allocentric, top-down semantic feature map and grounds the question into the map to localize the answer.

Question Answering

SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency

1 code implementation NAACL 2021 Sameer Dharur, Purva Tendulkar, Dhruv Batra, Devi Parikh, Ramprasaath R. Selvaraju

Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong.

Question Answering Visual Grounding +1

Cannot find the paper you are looking for? You can Submit a new open access paper.