no code implementations • EMNLP 2021 • Arjun Akula, Spandana Gella, Keze Wang, Song-Chun Zhu, Siva Reddy
Our model outperforms the state-of-the-art NMN model on CLEVR-Ref+ dataset with +8. 1% improvement in accuracy on the single-referent test set and +4. 3% on the full test set.
no code implementations • EMNLP 2021 • Arjun Akula, Soravit Changpinyo, Boqing Gong, Piyush Sharma, Song-Chun Zhu, Radu Soricut
One challenge in evaluating visual question answering (VQA) models in the cross-dataset adaptation setting is that the distribution shifts are multi-modal, making it difficult to identify if it is the shifts in visual or language features that play a key role.
no code implementations • NeurIPS 2021 • Arjun Akula, Varun Jampani, Soravit Changpinyo, Song-Chun Zhu
Neural module networks (NMN) are a popular approach for solving multi-modal tasks such as visual question answering (VQA) and visual referring expression recognition (REF).