no code implementations • NAACL 2021 • Hrituraj Singh, Anshul Nasery, Denil Mehta, Aishwarya Agarwal, Jatin Lamba, Balaji Vasan Srinivasan
In this paper, we propose a novel task - MIMOQA - Multimodal Input Multimodal Output Question Answering in which the output is also multimodal.
1 code implementation • 3 Apr 2021 • Jatin Lamba, abhishek, Jayaprakash Akula, Rishabh Dabral, Preethi Jyothi, Ganesh Ramakrishnan
In this paper, we present a novel approach to the audio-visual video parsing (AVVP) task that demarcates events from a video separately for audio and visual modalities.
Ranked #1 on
Event Detection
on Audio Set