1 code implementation • ICCV 2021 • Srikar Appalaraju, Bhavan Jasani, Bhargava Urala Kota, Yusheng Xie, R. Manmatha
DocFormer uses text, vision and spatial features and combines them using a novel multi-modal self-attention layer.
Ranked #1 on
Document Image Classification
on RVL-CDIP
no code implementations • 26 Nov 2019 • Bhavan Jasani, Afshaan Mazagonwalla
In this work, we present a body pose based zero shot action recognition network and demonstrate its performance on the NTU RGB-D dataset.
no code implementations • 8 Nov 2019 • Bhavan Jasani, Rohit Girdhar, Deva Ramanan
Joint vision and language tasks like visual question answering are fascinating because they explore high-level understanding, but at the same time, can be more prone to language biases.
no code implementations • 19 May 2018 • Yash Patel, Kashyap Chitta, Bhavan Jasani
We address the problem of semi-supervised domain adaptation of classification algorithms through deep Q-learning.