Search Results for author: Mohit Bansal

Found 269 papers, 177 papers with code

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

9 code implementations • IJCNLP 2019 • Hao Tan, Mohit Bansal

In LXMERT, we build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language encoder, and a cross-modality encoder.

Ranked #1 on Visual Question Answering (VQA) on VizWiz 2018

Language Modelling Masked Language Modeling +4

124,984

Paper
Code

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

3 code implementations • 9 Jun 2022 • Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning Math +1

2,650

Paper
Code

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

2 code implementations • 11 May 2022 • Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, Colin Raffel

ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.

Ranked #1 on Few-Shot Text Classification on RAFT

Few-Shot Text Classification In-Context Learning

1,966

Paper
Code

Unifying Vision, Text, and Layout for Universal Document Processing

2 code implementations • CVPR 2023 • Zineng Tang, ZiYi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal

UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation.

Ranked #5 on Visual Question Answering (VQA) on InfographicVQA (using extra training data)

document understanding Image Reconstruction +1

1,631

Paper
Code

Any-to-Any Generation via Composable Diffusion

1 code implementation • NeurIPS 2023 • Zineng Tang, ZiYi Yang, Chenguang Zhu, Michael Zeng, Mohit Bansal

We present Composable Diffusion (CoDi), a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities.

Ranked #7 on Audio Generation on AudioCaps

Audio Generation

1,631

Paper
Code

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

2 code implementations • 20 Dec 2021 • Revanth Gangi Reddy, Xilin Rui, Manling Li, Xudong Lin, Haoyang Wen, Jaemin Cho, Lifu Huang, Mohit Bansal, Avirup Sil, Shih-Fu Chang, Alexander Schwing, Heng Ji

Specifically, the task involves multi-hop questions that require reasoning over image-caption pairs to identify the grounded visual object being referred to and then predicting a span from the news body text to answer the question.

Answer Generation Data Augmentation +2

699

Paper
Code

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models

2 code implementations • ICCV 2023 • Jaemin Cho, Abhay Zala, Mohit Bansal

In this work, we investigate the visual reasoning capabilities and social biases of different text-to-image models, covering both multimodal transformer language models and diffusion models.

Image Captioning Image Classification +9

687

Paper
Code

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

1 code implementation • CVPR 2021 • Jie Lei, Linjie Li, Luowei Zhou, Zhe Gan, Tamara L. Berg, Mohit Bansal, Jingjing Liu

Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that ClipBERT outperforms (or is on par with) existing methods that exploit full-length videos, suggesting that end-to-end learning with just a few sparsely sampled clips is often more accurate than using densely extracted offline features from full-length videos, proving the proverbial less-is-more principle.

Ranked #24 on Visual Question Answering (VQA) on MSRVTT-QA (using extra training data)

Question Answering Retrieval +4

686

Paper
Code

Revealing Single Frame Bias for Video-and-Language Learning

2 code implementations • 7 Jun 2022 • Jie Lei, Tamara L. Berg, Mohit Bansal

Training an effective video-and-language model intuitively requires multiple frames as model inputs.

Ranked #5 on Video Retrieval on SSv2-template retrieval (using extra training data)

Fine-grained Action Recognition Language Modelling +6

686

Paper
Code

Adversarial Augmentation Policy Search for Domain and Cross-Lingual Generalization in Reading Comprehension

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Adyasha Maharana, Mohit Bansal

In this work, we present several effective adversaries and automated data augmentation policy search methods with the goal of making reading comprehension models more robust to adversarial evaluation, but also improving generalization to the source domain as well as new domains and languages.

Data Augmentation Reading Comprehension

627

Paper
Code

Robustness Gym: Unifying the NLP Evaluation Landscape

2 code implementations • NAACL 2021 • Karan Goel, Nazneen Rajani, Jesse Vig, Samson Tan, Jason Wu, Stephan Zheng, Caiming Xiong, Mohit Bansal, Christopher Ré

Despite impressive performance on standard benchmarks, deep neural networks are often brittle when deployed in real-world systems.

Entity Linking

627

Paper
Code

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

3 code implementations • ACL 2018 • Yen-Chun Chen, Mohit Bansal

Inspired by how humans summarize long documents, we propose an accurate and fast summarization model that first selects salient sentences and then rewrites them abstractively (i. e., compresses and paraphrases) to generate a concise overall summary.

Ranked #7 on Text Summarization on CNN / Daily Mail (Anonymized)

Abstractive Text Summarization Sentence +1

623

Paper
Code

PaperRobot: Incremental Draft Generation of Scientific Ideas

2 code implementations • ACL 2019 • Qingyun Wang, Lifu Huang, Zhiying Jiang, Kevin Knight, Heng Ji, Mohit Bansal, Yi Luan

We present a PaperRobot who performs as an automatic research assistant by (1) conducting deep understanding of a large collection of human-written papers in a target domain and constructing comprehensive background knowledge graphs (KGs); (2) creating new ideas by predicting links from the background KGs, by combining graph attention and contextual text attention; (3) incrementally writing some key elements of a new paper based on memory-attention networks: from the input title along with predicted related entities to generate a paper abstract, from the abstract to generate conclusion and future work, and finally from future work to generate a title for a follow-on paper.

Ranked #1 on Paper generation (Title-to-abstract) on PubMed Term, Abstract, Conclusion, Title Dataset

Graph Attention Knowledge Graphs +4

470

Paper
Code

Adversarial NLI: A New Benchmark for Natural Language Understanding

2 code implementations • ACL 2020 • Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, Douwe Kiela

We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure.

Natural Language Understanding

381

Paper
Code

How Much Can CLIP Benefit Vision-and-Language Tasks?

4 code implementations • 13 Jul 2021 • Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, Kurt Keutzer

Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using a relatively small set of manually-annotated data (as compared to web-crawled data), to perceive the visual world.

Ranked #4 on Vision and Language Navigation on RxR (using extra training data)

Question Answering Vision and Language Navigation +2

381

Paper
Code

Unifying Vision-and-Language Tasks via Text Generation

2 code implementations • 4 Feb 2021 • Jaemin Cho, Jie Lei, Hao Tan, Mohit Bansal

On 7 popular vision-and-language benchmarks, including visual question answering, referring expression comprehension, visual commonsense reasoning, most of which have been previously modeled as discriminative tasks, our generative approach (with a single unified architecture) reaches comparable performance to recent task-specific state-of-the-art vision-and-language models.

Ranked #3 on Image Captioning on nocaps val

Conditional Text Generation Image Captioning +7

351

Paper
Code

StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation

1 code implementation • 13 Sep 2022 • Adyasha Maharana, Darryl Hannan, Mohit Bansal

Hence, we first propose the task of story continuation, where the generated visual story is conditioned on a source image, allowing for better generalization to narratives with new characters.

Ranked #2 on Story Continuation on FlintstonesSV

Image Generation Story Continuation +2

326

Paper
Code

MAttNet: Modular Attention Network for Referring Expression Comprehension

1 code implementation • CVPR 2018 • Licheng Yu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Mohit Bansal, Tamara L. Berg

In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression.

Ranked #7 on Generalized Referring Expression Segmentation on gRefCOCO

Generalized Referring Expression Segmentation Referring Expression +1

291

Paper
Code

Multi-Target Embodied Question Answering

1 code implementation • CVPR 2019 • Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra

To address this, we propose a modular architecture composed of a program generator, a controller, a navigator, and a VQA module.

Embodied Question Answering Navigate +1

287

Paper
Code

TrustLLM: Trustworthiness in Large Language Models

1 code implementation • 10 Jan 2024 • Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao liu, Heng Ji, Hongyi Wang, huan zhang, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang, Mohit Bansal, James Zou, Jian Pei, Jian Liu, Jianfeng Gao, Jiawei Han, Jieyu Zhao, Jiliang Tang, Jindong Wang, Joaquin Vanschoren, John Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He, Lifu Huang, Michael Backes, Neil Zhenqiang Gong, Philip S. Yu, Pin-Yu Chen, Quanquan Gu, ran Xu, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu, Tianyi Zhou, William Wang, Xiang Li, Xiangliang Zhang, Xiao Wang, Xing Xie, Xun Chen, Xuyu Wang, Yan Liu, Yanfang Ye, Yinzhi Cao, Yong Chen, Yue Zhao

This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions.

Ethics Fairness

271

Paper
Code

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

4 code implementations • 20 Jul 2021 • Jie Lei, Tamara L. Berg, Mohit Bansal

Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w. r. t.

Ranked #12 on Highlight Detection on QVHighlights

Highlight Detection Moment Retrieval +2

235

Paper
Code

Detecting Moments and Highlights in Videos via Natural Language Queries

1 code implementation • NeurIPS 2021 • Jie Lei, Tamara Berg, Mohit Bansal

Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w. r. t.

Ranked #6 on Video Grounding on QVHighlights

Moment Retrieval Natural Language Queries +2

235

Paper
Code

End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures

2 code implementations • ACL 2016 • Makoto Miwa, Mohit Bansal

We present a novel end-to-end neural model to extract entities and relations between them.

Ranked #1 on Relation Extraction on ACE 2005 (Sentence Encoder metric)

Relation Relation Classification

224

Paper
Code

Fine-grained Image Captioning with CLIP Reward

1 code implementation • Findings (NAACL) 2022 • Jaemin Cho, Seunghyun Yoon, Ajinkya Kale, Franck Dernoncourt, Trung Bui, Mohit Bansal

Toward more descriptive and distinctive caption generation, we propose using CLIP, a multimodal encoder trained on huge image-text pairs from web, to calculate multimodal similarity and use it as a reward function.

Ranked #26 on Image Captioning on COCO Captions

Caption Generation Descriptive +5

224

Paper
Code

LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

2 code implementations • 13 Jun 2022 • Yi-Lin Sung, Jaemin Cho, Mohit Bansal

LST saves 69% of the memory costs to fine-tune the whole network, while other methods only save 26% of that in similar parameter usages (hence, 2. 7x more memory savings).

Transfer Learning Visual Question Answering (VQA)

212

Paper
Code

VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks

1 code implementation • CVPR 2022 • Yi-Lin Sung, Jaemin Cho, Mohit Bansal

Our results demonstrate that training the adapter with the weight-sharing technique (4. 18% of total parameters for image-text tasks and 3. 39% for video-text tasks) can match the performance of fine-tuning the entire model.

Image Captioning Transfer Learning

196

Paper
Code

Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision

1 code implementation • EMNLP 2020 • Hao Tan, Mohit Bansal

We find that the main reason hindering this exploration is the large divergence in magnitude and distributions between the visually-grounded language datasets and pure-language corpora.

Image Captioning Language Modelling

186

Paper
Code

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

1 code implementation • ACL 2020 • Jie Lei, Li-Wei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, Mohit Bansal

Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discourse-based coherence across the sentences in the paragraph.

Ranked #5 on Video Captioning on ActivityNet Captions

Sentence

168

Paper
Code

Self-Chained Image-Language Model for Video Localization and Question Answering

1 code implementation • NeurIPS 2023 • Shoubin Yu, Jaemin Cho, Prateek Yadav, Mohit Bansal

SeViLA framework consists of two modules: Localizer and Answerer, where both are parameter-efficiently fine-tuned from BLIP-2.

Ranked #3 on Zero-Shot Video Question Answer on IntentQA (using extra training data)

Language Modelling Representation Learning +2

161

Paper
Code

ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs

1 code implementation • 22 Sep 2023 • Justin Chih-Yao Chen, Swarnadeep Saha, Mohit Bansal

Large Language Models (LLMs) still struggle with natural language reasoning tasks.

Math

159

Paper
Code

TVQA: Localized, Compositional Video Question Answering

4 code implementations • EMNLP 2018 • Jie Lei, Licheng Yu, Mohit Bansal, Tamara L. Berg

Recent years have witnessed an increasing interest in image-based question-answering (QA) tasks.

Ranked #4 on Video Question Answering on SUTD-TrafficQA

Video Question Answering

158

Paper
Code

Improving and Simplifying Pattern Exploiting Training

2 code implementations • EMNLP 2021 • Derek Tam, Rakesh R Menon, Mohit Bansal, Shashank Srivastava, Colin Raffel

However, PET uses task-specific unlabeled data.

Few-Shot Learning

153

Paper
Code

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

2 code implementations • ECCV 2020 • Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal

The queries are also labeled with query types that indicate whether each of them is more related to video or subtitle or both, allowing for in-depth analysis of the dataset and the methods that built on top of it.

Ranked #2 on Video Retrieval on TVR

Moment Retrieval Retrieval +2

148

Paper
Code

Scaling Data Generation in Vision-and-Language Navigation

1 code implementation • ICCV 2023 • Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao

Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.

Imitation Learning Vision and Language Navigation +1

136

Paper
Code

GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models

2 code implementations • 14 Mar 2022 • Archiki Prasad, Peter Hase, Xiang Zhou, Mohit Bansal

Providing natural language instructions in prompts is a useful new paradigm for improving task performance of large language models in a zero-shot setting.

131

Paper
Code

Commonsense for Generative Multi-Hop Question Answering Tasks

2 code implementations • EMNLP 2018 • Lisa Bauer, Yicheng Wang, Mohit Bansal

We instead focus on a more challenging multi-hop generative task (NarrativeQA), which requires the model to reason, gather, and synthesize disjoint pieces of information within the context to generate an answer.

Ranked #6 on Question Answering on NarrativeQA

Implicit Relations Multi-hop Question Answering +1

122

Paper
Code

Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout

1 code implementation • NAACL 2019 • Hao Tan, Licheng Yu, Mohit Bansal

Next, we apply semi-supervised learning (via back-translation) on these dropped-out environments to generate new paths and instructions.

Ranked #1 on Vision-Language Navigation on Room2Room

Navigate Translation +1

120

Paper
Code

TVQA+: Spatio-Temporal Grounding for Video Question Answering

3 code implementations • ACL 2020 • Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal

We present the task of Spatio-Temporal Video Question Answering, which requires intelligent systems to simultaneously retrieve relevant moments and detect referenced visual concepts (people and objects) to answer natural language questions about videos.

Ranked #6 on Video Question Answering on TVQA

Question Answering Video Question Answering

120

Paper
Code

TVLT: Textless Vision-Language Transformer

1 code implementation • 28 Sep 2022 • Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal

In this work, we present the Textless Vision-Language Transformer (TVLT), where homogeneous transformer blocks take raw visual and audio inputs for vision-and-language representation learning with minimal modality-specific design, and do not use text-specific modules such as tokenization or automatic speech recognition (ASR).

Automatic Speech Recognition (ASR) Image Retrieval +6

118

Paper
Code

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

1 code implementation • 22 May 2022 • Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, ZiYi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji

The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples, such as domain-specific captioning, question answering, and future event prediction.

Attribute Automatic Speech Recognition +6

110

Paper
Code

TIES-Merging: Resolving Interference When Merging Models

2 code implementations • NeurIPS 2023 • Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, Mohit Bansal

To address this, we propose our method, TRIM, ELECT SIGN & MERGE (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign.

Transfer Learning

104

Paper
Code

Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

1 code implementation • 1 Oct 2023 • Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng, Chelsea Finn, Mohit Bansal, Huaxiu Yao

Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages.

Hallucination Hallucination Evaluation +1

100

Paper
Code

Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering

2 code implementations • IJCNLP 2019 • Shiyue Zhang, Mohit Bansal

Second, since the traditional evaluation metrics (e. g., BLEU) often fall short in evaluating the quality of generated questions, we propose a QA-based evaluation method which measures the QG model's ability to mimic human annotators in generating QA training data.

Question Answering Question Generation +2

Paper
Code

VindLU: A Recipe for Effective Video-and-Language Pretraining

1 code implementation • CVPR 2023 • Feng Cheng, Xizi Wang, Jie Lei, David Crandall, Mohit Bansal, Gedas Bertasius

Furthermore, our model also obtains state-of-the-art video question-answering results on ActivityNet-QA, MSRVTT-QA, MSRVTT-MC and TVQA.

Ranked #2 on Video Retrieval on Condensed Movies (using extra training data)

Question Answering Retrieval +3

Paper
Code

FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging

1 code implementation • EMNLP 2021 • Han Guo, Nazneen Fatema Rajani, Peter Hase, Mohit Bansal, Caiming Xiong

With the availability of the fast influence functions, we demonstrate their usefulness in four applications.

Data Augmentation

Paper
Code

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

1 code implementation • 8 Jun 2021 • Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara Lee Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu

Most existing video-and-language (VidL) research focuses on a single dataset, or multiple datasets of a single task.

Multi-Task Learning Question Answering +5

Paper
Code

Hierarchical Video-Moment Retrieval and Step-Captioning

1 code implementation • CVPR 2023 • Abhay Zala, Jaemin Cho, Satwik Kottur, Xilun Chen, Barlas Oğuz, Yasher Mehdad, Mohit Bansal

Our hierarchical benchmark consists of video retrieval, moment retrieval, and two novel moment segmentation and step captioning tasks.

Information Retrieval Moment Retrieval +4

Paper
Code

Vision Transformers are Parameter-Efficient Audio-Visual Learners

1 code implementation • CVPR 2023 • Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius

To do so, we propose a latent audio-visual hybrid (LAVISH) adapter that adapts pretrained ViTs to audio-visual tasks by injecting a small number of trainable parameters into every layer of a frozen ViT.

Ranked #4 on Audio-visual Question Answering on MUSIC-AVQA

Audio-visual Question Answering

Paper
Code

Combining Fact Extraction and Verification with Neural Semantic Matching Networks

2 code implementations • 16 Nov 2018 • Yixin Nie, Haonan Chen, Mohit Bansal

The increasing concern with misinformation has stimulated research efforts on automatic fact checking.

Claim Verification Fact Checking +5

Paper
Code

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

1 code implementation • 21 Jun 2021 • Hao Tan, Jie Lei, Thomas Wolf, Mohit Bansal

Unlike language, where the text tokens are more independent, neighboring video tokens typically have strong correlations (e. g., consecutive video frames usually look very similar), and hence uniformly masking individual tokens will make the task too trivial to learn useful representations.

Ranked #10 on Action Recognition on Diving-48

Action Classification Action Recognition +2

Paper
Code

Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Personalization

1 code implementation • 15 Jun 2023 • Swarnadeep Saha, Peter Hase, Mohit Bansal

We first show that teacher LLMs can indeed intervene on student reasoning to improve their performance.

Paper
Code

Expressing Visual Relationships via Language

1 code implementation • ACL 2019 • Hao Tan, Franck Dernoncourt, Zhe Lin, Trung Bui, Mohit Bansal

To push forward the research in this direction, we first introduce a new language-guided image editing dataset that contains a large number of real image pairs with corresponding editing instructions.

Image Captioning Retrieval

Paper
Code

Shortcut-Stacked Sentence Encoders for Multi-Domain Inference

3 code implementations • WS 2017 • Yixin Nie, Mohit Bansal

We present a simple sequential sentence encoder for multi-domain natural language inference.

Ranked #62 on Natural Language Inference on SNLI

Natural Language Inference Sentence +1

Paper
Code

Revealing the Importance of Semantic Retrieval for Machine Reading at Scale

2 code implementations • IJCNLP 2019 • Yixin Nie, Songhe Wang, Mohit Bansal

In this work, we give general guidelines on system design for MRS by proposing a simple yet effective pipeline system with special consideration on hierarchical semantic retrieval at both paragraph and sentence level, and their potential effects on the downstream task.

Ranked #44 on Question Answering on HotpotQA

Fact Verification Information Retrieval +5

Paper
Code

A Simple LLM Framework for Long-Range Video Question-Answering

1 code implementation • 28 Dec 2023 • Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius

Furthermore, we show that a specialized prompt that asks the LLM first to summarize the noisy short-term visual captions and then answer a given input question leads to a significant LVQA performance boost.

Ranked #1 on Zero-Shot Video Question Answer on NExT-GQA

Large Language Model Long-range modeling +2

Paper
Code

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer

1 code implementation • NeurIPS 2021 • Zineng Tang, Jaemin Cho, Hao Tan, Mohit Bansal

We train a multi-modal teacher model on a video-text dataset, and then transfer its knowledge to a student language model with a text dataset.

Image Retrieval Knowledge Distillation +6

Paper
Code

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

1 code implementation • NeurIPS 2023 • Peter Hase, Mohit Bansal, Been Kim, Asma Ghandeharioun

This finding raises questions about how past work relies on Causal Tracing to select which model layers to edit.

Denoising knowledge editing

Paper
Code

What is More Likely to Happen Next? Video-and-Language Future Event Prediction

1 code implementation • EMNLP 2020 • Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal

Given a video with aligned dialogue, people can often infer what is more likely to happen next.

Paper
Code

EmailSum: Abstractive Email Thread Summarization

1 code implementation • ACL 2021 • Shiyue Zhang, Asli Celikyilmaz, Jianfeng Gao, Mohit Bansal

Furthermore, we find that widely used automatic evaluation metrics (ROUGE, BERTScore) are weakly correlated with human judgments on this email thread summarization task.

Ranked #1 on Email Thread Summarization on EmailSum (short)

Abstractive Text Summarization Email Thread Summarization

Paper
Code

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

1 code implementation • 2 Oct 2023 • Pingzhi Li, Zhenyu Zhang, Prateek Yadav, Yi-Lin Sung, Yu Cheng, Mohit Bansal, Tianlong Chen

Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks, however, they have issues like (a) High Memory Usage, due to duplication of the network layers into multiple copies as experts; and (b) Redundancy in Experts, as common learning-based routing policies suffer from representational collapse.

Paper
Code

FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations

2 code implementations • NAACL 2022 • Leonardo F. R. Ribeiro, Mengwen Liu, Iryna Gurevych, Markus Dreyer, Mohit Bansal

Despite recent improvements in abstractive summarization, most current approaches generate summaries that are not factually consistent with the source document, severely restricting their trust and usage in real-world applications.

Abstractive Text Summarization

Paper
Code

Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

1 code implementation • ACL 2020 • Peter Hase, Mohit Bansal

Through two kinds of simulation tests involving text and tabular data, we evaluate five explanations methods: (1) LIME, (2) Anchor, (3) Decision Boundary, (4) a Prototype model, and (5) a Composite approach that combines explanations from each method.

counterfactual tabular-classification

Paper
Code

Continual and Multi-Task Architecture Search

1 code implementation • ACL 2019 • Ramakanth Pasunuru, Mohit Bansal

Architecture search is the process of automatically learning the neural model or cell structure that best suits the given task.

Continual Learning General Classification +8

Paper
Code

The Unreasonable Effectiveness of Easy Training Data for Hard Tasks

1 code implementation • 12 Jan 2024 • Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe

In this paper, we present the surprising conclusion that current language models often generalize relatively well from easy to hard data, even performing as well as "oracle" models trained on hard data.

General Knowledge In-Context Learning +1

Paper
Code

GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations

1 code implementation • 19 Feb 2024 • Jinhao Duan, Renming Zhang, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Elias Stengel-Eskin, Mohit Bansal, Tianlong Chen, Kaidi Xu

As Large Language Models (LLMs) are integrated into critical real-world applications, their strategic and logical reasoning abilities are increasingly crucial.

Card Games Logical Reasoning

Paper
Code

Unified Coarse-to-Fine Alignment for Video-Text Retrieval

1 code implementation • ICCV 2023 • Ziyang Wang, Yi-Lin Sung, Feng Cheng, Gedas Bertasius, Mohit Bansal

Specifically, our model captures the cross-modal similarity information at different granularity levels.

Ranked #11 on Video Retrieval on MSR-VTT

Retrieval Text Retrieval +2

Paper
Code

Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts

1 code implementation • 31 Mar 2024 • Qin Liu, Jaemin Cho, Mohit Bansal, Marc Niethammer

In light of this, we reintroduce this dense design into the generalist models, to facilitate the development of generalist models with high segmentation quality.

Image Segmentation Interactive Segmentation +2

Paper
Code

FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization

1 code implementation • NAACL 2022 • David Wan, Mohit Bansal

We present FactPEGASUS, an abstractive summarization model that addresses the problem of factuality during pre-training and fine-tuning: (1) We augment the sentence selection strategy of PEGASUS's (Zhang et al., 2020) pre-training objective to create pseudo-summaries that are both important and factual; (2) We introduce three complementary components for fine-tuning.

Abstractive Text Summarization Contrastive Learning +1

Paper
Code

Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA

1 code implementation • ACL 2020 • Hyounghun Kim, Zineng Tang, Mohit Bansal

Moreover, our model is also comprised of dual-level attention (word/object and frame level), multi-head self/cross-integration for different sources (video and dense captions), and gates which pass more relevant information to the classifier.

Image Captioning Multi-Label Classification +3

Paper
Code

Punny Captions: Witty Wordplay in Image Descriptions

1 code implementation • NAACL 2018 • Arjun Chandrasekaran, Devi Parikh, Mohit Bansal

Wit is a form of rich interaction that is often grounded in a specific situation (e. g., a comment in response to an event).

Paper
Code

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

2 code implementations • CVPR 2017 • Licheng Yu, Hao Tan, Mohit Bansal, Tamara L. Berg

The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions.

Referring Expression Referring Expression Comprehension

Paper
Code

Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention

1 code implementation • 21 Nov 2022 • Zineng Tang, Jaemin Cho, Jie Lei, Mohit Bansal

We present Perceiver-VL, a vision-and-language framework that efficiently handles high-dimensional multimodal inputs such as long videos and text.

Cross-Modal Retrieval Language Modelling +1

Paper
Code

An Empirical Study of Multimodal Model Merging

1 code implementation • 28 Apr 2023 • Yi-Lin Sung, Linjie Li, Kevin Lin, Zhe Gan, Mohit Bansal, Lijuan Wang

In this paper, we expand on this concept to a multimodal setup by merging transformers trained on different modalities.

Retrieval Visual Question Answering (VQA)

Paper
Code

EnvEdit: Environment Editing for Vision-and-Language Navigation

1 code implementation • CVPR 2022 • Jialu Li, Hao Tan, Mohit Bansal

Training on these edit-augmented environments prevents the agent from overfitting to existing environments and helps generalize better to new, unseen environments.

Ranked #2 on Vision and Language Navigation on RxR (using extra training data)

Data Augmentation Navigate +1

Paper
Code

ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound

1 code implementation • 6 Apr 2022 • Yan-Bo Lin, Jie Lei, Mohit Bansal, Gedas Bertasius

We introduce an audiovisual method for long-range text-to-video retrieval.

Retrieval Text to Video Retrieval +1

Paper
Code

What Can We Learn from Collective Human Opinions on Natural Language Inference Data?

1 code implementation • EMNLP 2020 • Yixin Nie, Xiang Zhou, Mohit Bansal

Analysis reveals that: (1) high human disagreement exists in a noticeable amount of examples in these datasets; (2) the state-of-the-art models lack the ability to recover the distribution over human labels; (3) models achieve near-perfect accuracy on the subset of data with a high level of human agreement, whereas they can barely beat a random guess on the data with low levels of human agreement, which compose most of the common errors made by state-of-the-art models on the evaluation sets.

Natural Language Inference

Paper
Code

Distributed NLI: Learning to Predict Human Opinion Distributions for Language Reasoning

1 code implementation • Findings (ACL) 2022 • Xiang Zhou, Yixin Nie, Mohit Bansal

We introduce distributed NLI, a new NLU task with a goal to predict the distribution of human judgements for natural language inference.

Natural Language Inference

Paper
Code

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

1 code implementation • NAACL 2021 • Adyasha Maharana, Darryl Hannan, Mohit Bansal

Therefore, we also provide an exploration of evaluation metrics for the model, focused on aspects of the generated frames such as the presence/quality of generated characters, the relevance to captions, and the diversity of the generated images.

Image Generation Story Visualization +1

Paper
Code

Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks

1 code implementation • 29 Sep 2023 • Vaidehi Patil, Peter Hase, Mohit Bansal

Experimentally, we show that even state-of-the-art model editing methods such as ROME struggle to truly delete factual information from models like GPT-J, as our whitebox and blackbox attacks can recover "deleted" information from an edited model 38% of the time.

Model Editing

Paper
Code

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

1 code implementation • 26 Nov 2021 • Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer

In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks.

Paper
Code

MLP Architectures for Vision-and-Language Modeling: An Empirical Study

1 code implementation • 8 Dec 2021 • Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, Lijuan Wang

Based on this, we ask an even bolder question: can we have an all-MLP architecture for VL modeling, where both VL fusion and the vision encoder are replaced with MLPs?

Language Modelling Visual Question Answering (VQA)

Paper
Code

Paxion: Patching Action Knowledge in Video-Language Foundation Models

1 code implementation • NeurIPS 2023 • Zhenhailong Wang, Ansel Blume, Sha Li, Genglin Liu, Jaemin Cho, Zineng Tang, Mohit Bansal, Heng Ji

Action knowledge involves the understanding of textual, visual, and temporal aspects of actions.

Ranked #19 on Video Question Answering on NExT-QA (using extra training data)

Action Understanding Object Recognition +1

Paper
Code

MTVR: Multilingual Moment Retrieval in Videos

1 code implementation • ACL 2021 • Jie Lei, Tamara L. Berg, Mohit Bansal

We introduce mTVR, a large-scale multilingual video moment retrieval dataset, containing 218K English and Chinese queries from 21. 8K TV show video clips.

Moment Retrieval Retrieval

Paper
Code

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

1 code implementation • 21 Oct 2021 • Adyasha Maharana, Mohit Bansal

Prior work in this domain has shown that there is ample room for improvement in the generated image sequence in terms of visual quality, consistency and relevance.

Dense Captioning Image Generation +1

Paper
Code

Integrating Visuospatial, Linguistic, and Commonsense Structure into Story Visualization

1 code implementation • EMNLP 2021 • Adyasha Maharana, Mohit Bansal

Such information is even more important for story visualization since its inputs have an explicit narrative structure that needs to be translated into an image sequence (or visual story).

Dense Captioning Image Generation +1

Paper
Code

Efficiently Summarizing Text and Graph Encodings of Multi-Document Clusters

1 code implementation • NAACL 2021 • Ramakanth Pasunuru, Mengwen Liu, Mohit Bansal, Sujith Ravi, Markus Dreyer

We also show improvements in a transfer-only setup on the DUC-2004 dataset.

Document Summarization Multi-Document Summarization

Paper
Code

Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension

1 code implementation • ACL 2019 • Yichen Jiang, Nitish Joshi, Yen-Chun Chen, Mohit Bansal

Multi-hop reading comprehension requires the model to explore and connect relevant information from multiple sentences/documents in order to answer the question about the context.

Multi-Hop Reading Comprehension Sentence

Paper
Code

Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline

1 code implementation • CoNLL (EMNLP) 2021 • Ori Ernst, Ori Shapira, Ramakanth Pasunuru, Michael Lepioshkin, Jacob Goldberger, Mohit Bansal, Ido Dagan

Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task, notably for generating training data for salience detection.

Clustering Document Summarization +1

Paper
Code

Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees

1 code implementation • 21 Sep 2022 • Swarnadeep Saha, Shiyue Zhang, Peter Hase, Mohit Bansal

We demonstrate that SP-Search effectively represents the generative process behind human summaries using modules that are typically faithful to their intended behavior.

Abstractive Text Summarization Sentence +1

Paper
Code

Polite Dialogue Generation Without Parallel Data

1 code implementation • TACL 2018 • Tong Niu, Mohit Bansal

We present three weakly-supervised models that can generate diverse polite (or rude) dialogue responses without parallel data.

Dialogue Generation Language Modelling +2

Paper
Code

Evaluating the Factual Consistency of Large Language Models Through News Summarization

1 code implementation • 15 Nov 2022 • Derek Tam, Anisha Mascarenhas, Shiyue Zhang, Sarah Kwan, Mohit Bansal, Colin Raffel

To generate summaries that are factually inconsistent, we generate summaries from a suite of summarization models that we have manually annotated as factually inconsistent.

News Summarization

Paper
Code

MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models

1 code implementation • 2 Feb 2024 • Justin Chih-Yao Chen, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal

Experiments on seven widely-used commonsense and math reasoning benchmarks show that MAGDi improves the reasoning capabilities of smaller models, outperforming several methods that distill from a single teacher and multiple teachers.

Language Modelling Large Language Model +1

Paper
Code

ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness

1 code implementation • 21 Apr 2023 • Archiki Prasad, Swarnadeep Saha, Xiang Zhou, Mohit Bansal

Multi-step reasoning ability is fundamental to many natural language tasks, yet it is unclear what constitutes a good reasoning chain and how to evaluate them.

Informativeness Natural Language Inference +1

Paper
Code

RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios

1 code implementation • NAACL (ACL) 2022 • Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang, Tuan Lai, Xudong Lin, Ziqi Wang, Iris Liu, Ben Zhou, Haoyang Wen, Manling Li, Darryl Hannan, Jie Lei, Hyounghun Kim, Rotem Dror, Haoyu Wang, Michael Regan, Qi Zeng, Qing Lyu, Charles Yu, Carl Edwards, Xiaomeng Jin, Yizhu Jiao, Ghazaleh Kazeminejad, Zhenhailong Wang, Chris Callison-Burch, Mohit Bansal, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Martha Palmer, Heng Ji

We introduce RESIN-11, a new schema-guided event extraction&prediction framework that can be applied to a large variety of newsworthy scenarios.

Event Extraction

Paper
Code

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation

1 code implementation • 13 Apr 2023 • Jaemin Cho, Linjie Li, Zhengyuan Yang, Zhe Gan, Lijuan Wang, Mohit Bansal

In this paper, we propose LayoutBench, a diagnostic benchmark for layout-guided image generation that examines four categories of spatial control skills: number, position, size, and shape.

Ranked #1 on Layout-to-Image Generation on LayoutBench

Layout-to-Image Generation

Paper
Code

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

1 code implementation • 22 Nov 2023 • Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal

Despite the efficiency of PEFT methods, the size of expert models can make it onerous to retrieve expert models per query over high-latency networks like the Internet or serve multiple experts on a single GPU.

Language Modelling Quantization

Paper
Code

Self-Assembling Modular Networks for Interpretable Multi-Hop Reasoning

1 code implementation • IJCNLP 2019 • Yichen Jiang, Mohit Bansal

Multi-hop QA requires a model to connect multiple pieces of evidence scattered in a long context to answer the question.

Paper
Code

D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning

1 code implementation • 11 Oct 2023 • Adyasha Maharana, Prateek Yadav, Mohit Bansal

There are two dominant approaches: (1) geometry-based data selection for maximizing data diversity in the coreset, and (2) functions that assign difficulty scores to samples based on training dynamics.

Paper
Code

Game-Based Video-Context Dialogue

1 code implementation • EMNLP 2018 • Ramakanth Pasunuru, Mohit Bansal

Current dialogue systems focus more on textual and speech context knowledge and are usually based on two speakers.

Retrieval

Paper
Code

PRover: Proof Generation for Interpretable Reasoning over Rules

2 code implementations • EMNLP 2020 • Swarnadeep Saha, Sayan Ghosh, Shashank Srivastava, Mohit Bansal

First, PROVER generates proofs with an accuracy of 87%, while retaining or improving performance on the QA task, compared to RuleTakers (up to 6% improvement on zero-shot evaluation).

valid

Paper
Code

Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Peter Hase, Shiyue Zhang, Harry Xie, Mohit Bansal

We provide code for the experiments in this paper at https://github. com/peterbhase/LAS-NL-Explanations

Explanation Generation

Paper
Code

ChrEn: Cherokee-English Machine Translation for Endangered Language Revitalization

1 code implementation • EMNLP 2020 • Shiyue Zhang, Benjamin Frey, Mohit Bansal

To help save this endangered language, we introduce ChrEn, a Cherokee-English parallel dataset, to facilitate machine translation research between Cherokee and English.

Cultural Vocal Bursts Intensity Prediction Language Modelling +5

Paper
Code

ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback

2 code implementations • ACL 2021 • Shiyue Zhang, Benjamin Frey, Mohit Bansal

The quantitative evaluation demonstrates that our backbone translation models achieve state-of-the-art translation performance and our quality estimation well correlates with both BLEU and human judgment.

Machine Translation NMT +3

Paper
Code

CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion

1 code implementation • 8 Feb 2024 • Shoubin Yu, Jaehong Yoon, Mohit Bansal

Furthermore, we propose a fusion module designed to compress multimodal queries, maintaining computational efficiency in the LLM while combining additional modalities.

Ranked #1 on Question Answering on SQA3D

Computational Efficiency Optical Flow Estimation +2

Paper
Code

Diagnosing the Environment Bias in Vision-and-Language Navigation

1 code implementation • 6 May 2020 • Yubo Zhang, Hao Tan, Mohit Bansal

Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations.

Vision and Language Navigation

Paper
Code

The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations

1 code implementation • NeurIPS 2021 • Peter Hase, Harry Xie, Mohit Bansal

In this paper, we study several under-explored dimensions of FI explanations, providing conceptual and empirical improvements for this form of explanation.

counterfactual Feature Importance +2

Paper
Code

DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization

1 code implementation • NAACL 2021 • Zineng Tang, Jie Lei, Mohit Bansal

Second, to alleviate the temporal misalignment issue, our method incorporates an entropy minimization-based constrained attention loss, to encourage the model to automatically focus on the correct caption from a pool of candidate ASR captions.

Question Answering Retrieval +4

Paper
Code

Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models

1 code implementation • 9 Oct 2023 • Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal

An increasing number of vision-language tasks can be handled with little to no training, i. e., in a zero and few-shot manner, by marrying large language models (LLMs) to vision encoders, resulting in large vision-language models (LVLMs).

Language Modelling Question Answering +2

Paper
Code

ManyModalQA: Modality Disambiguation and QA over Diverse Inputs

1 code implementation • 22 Jan 2020 • Darryl Hannan, Akshay Jain, Mohit Bansal

By analyzing this model, we investigate which words in the question are indicative of the modality.

Question Answering Transfer Learning

Paper
Code

MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies

1 code implementation • 26 May 2023 • Shiyue Zhang, Shijie Wu, Ozan Irsoy, Steven Lu, Mohit Bansal, Mark Dredze, David Rosenberg

Autoregressive language models are trained by minimizing the cross-entropy of the model distribution Q relative to the data distribution P -- that is, minimizing the forward cross-entropy, which is equivalent to maximum likelihood estimation (MLE).

Paper
Code

Merging by Matching Models in Task Parameter Subspaces

1 code implementation • 7 Dec 2023 • Derek Tam, Mohit Bansal, Colin Raffel

Model merging aims to cheaply combine individual task-specific models into a single multitask model.

Paper
Code

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

1 code implementation • 19 Jan 2024 • Xiyao Wang, YuHang Zhou, Xiaoyu Liu, Hongjin Lu, Yuancheng Xu, Feihong He, Jaehong Yoon, Taixi Lu, Gedas Bertasius, Mohit Bansal, Huaxiu Yao, Furong Huang

However, current MLLM benchmarks are predominantly designed to evaluate reasoning based on static information about a single image, and the ability of modern MLLMs to extrapolate from image sequences, which is essential for understanding our ever-changing world, has been less investigated.

Language Modelling Large Language Model

Paper
Code

What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment

1 code implementation • NAACL 2016 • Hongyuan Mei, Mohit Bansal, Matthew R. Walter

We propose an end-to-end, domain-independent neural encoder-aligner-decoder model for selective generation, i. e., the joint task of content selection and surface realization.

Data-to-Text Generation

Paper
Code

Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models

1 code implementation • CONLL 2018 • Tong Niu, Mohit Bansal

We present two categories of model-agnostic adversarial strategies that reveal the weaknesses of several generative, task-oriented dialogue models: Should-Not-Change strategies that evaluate over-sensitivity to small and semantics-preserving edits, as well as Should-Change strategies that test if a model is over-stable against subtle yet semantics-changing modifications.

Paper
Code

Extractive is not Faithful: An Investigation of Broad Unfaithfulness Problems in Extractive Summarization

1 code implementation • 8 Sep 2022 • Shiyue Zhang, David Wan, Mohit Bansal

Though extractive summarization is less prone to the common unfaithfulness issues of abstractive summaries, does that mean extractive is equal to faithful?

Abstractive Text Summarization Extractive Summarization

Paper
Code

Multimodal Representation Learning by Alternating Unimodal Adaptation

1 code implementation • 17 Nov 2023 • Xiaohui Zhang, Jaehong Yoon, Mohit Bansal, Huaxiu Yao

This optimization process is controlled by a gradient modification mechanism to prevent the shared head from losing previously acquired information.

Representation Learning

Paper
Code

When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data

1 code implementation • LNLS (ACL) 2022 • Peter Hase, Mohit Bansal

In order to carefully control important properties of the data and explanations, we introduce a synthetic dataset for experiments, and we also make use of three existing datasets with explanations: e-SNLI, TACRED, and SemEval.

Retrieval

Paper
Code

CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination

1 code implementation • NAACL 2022 • Hyounghun Kim, Abhay Zala, Mohit Bansal

Next, a counterfactual imagined scene change (in textual form) is applied, and the model has to predict the new response to the initial question based on this scene change.

counterfactual

Paper
Code

Faithfulness-Aware Decoding Strategies for Abstractive Summarization

1 code implementation • 6 Mar 2023 • David Wan, Mengwen Liu, Kathleen McKeown, Markus Dreyer, Mohit Bansal

We present a systematic study of the effect of generation techniques such as beam search and nucleus sampling on faithfulness in abstractive summarization.

Abstractive Text Summarization

Paper
Code

ReGAL: Refactoring Programs to Discover Generalizable Abstractions

1 code implementation • 29 Jan 2024 • Elias Stengel-Eskin, Archiki Prasad, Mohit Bansal

While large language models (LLMs) are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality.

Date Understanding Program Synthesis

Paper
Code

Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

1 code implementation • 2 Mar 2021 • Ramakanth Pasunuru, Asli Celikyilmaz, Michel Galley, Chenyan Xiong, Yizhe Zhang, Mohit Bansal, Jianfeng Gao

The progress in Query-focused Multi-Document Summarization (QMDS) has been limited by the lack of sufficient largescale high-quality training datasets.

Data Augmentation Document Summarization +1

Paper
Code

Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

1 code implementation • NAACL 2018 • Trang Tran, Shubham Toshniwal, Mohit Bansal, Kevin Gimpel, Karen Livescu, Mari Ostendorf

In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses.

Sentence

Paper
Code

Finding a Balanced Degree of Automation for Summary Evaluation

1 code implementation • EMNLP 2021 • Shiyue Zhang, Mohit Bansal

In this work, we propose flexible semiautomatic to automatic summary evaluation metrics, following the Pyramid human evaluation method.

Natural Language Inference Semantic Role Labeling

Paper
Code

Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization

1 code implementation • NAACL 2021 • Yichen Jiang, Asli Celikyilmaz, Paul Smolensky, Paul Soulos, Sudha Rao, Hamid Palangi, Roland Fernandez, Caitlin Smith, Mohit Bansal, Jianfeng Gao

On several syntactic and semantic probing tasks, we demonstrate the emergent structural information in the role vectors and improved syntactic interpretability in the TPR layer outputs.

Abstractive Text Summarization

Paper
Code

Continuous Language Generative Flow

1 code implementation • ACL 2021 • Zineng Tang, Shiyue Zhang, Hyounghun Kim, Mohit Bansal

Recent years have witnessed various types of generative models for natural language generation (NLG), especially RNNs or transformer based sequence-to-sequence models, as well as variational autoencoder (VAE) and generative adversarial network (GAN) based models.

Data Augmentation Density Estimation +9

Paper
Code

iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration

1 code implementation • EMNLP (ACL) 2021 • Eran Hirsch, Alon Eirew, Ori Shapira, Avi Caciularu, Arie Cattan, Ori Ernst, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Ido Dagan

We introduce iFacetSum, a web application for exploring topical document sets.

Paper
Code

Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation

1 code implementation • NAACL 2019 • Ori Shapira, David Gabay, Yang Gao, Hadar Ronen, Ramakanth Pasunuru, Mohit Bansal, Yael Amsterdamer, Ido Dagan

Conducting a manual evaluation is considered an essential part of summary evaluation methodology.

Paper
Code

Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training, and Model Development for Multi-Hop QA

1 code implementation • ACL 2019 • Yichen Jiang, Mohit Bansal

After adversarial training, the baseline's performance improves but is still limited on the adversarial evaluation.

Multi-hop Question Answering Question Answering +1

Paper
Code

Towards Robustifying NLI Models Against Lexical Dataset Biases

1 code implementation • ACL 2020 • Xiang Zhou, Mohit Bansal

While deep learning models are making fast progress on the task of Natural Language Inference, recent studies have also shown that these models achieve high accuracy by exploiting several dataset biases, and without deep understanding of the language semantics.

Data Augmentation Natural Language Inference

Paper
Code

ConjNLI: Natural Language Inference Over Conjunctive Sentences

1 code implementation • EMNLP 2020 • Swarnadeep Saha, Yixin Nie, Mohit Bansal

Reasoning about conjuncts in conjunctive sentences is important for a deeper understanding of conjunctions in English and also how their usages and semantics differ from conjunctive and disjunctive boolean logic.

Natural Language Inference

Paper
Code

ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning

1 code implementation • EMNLP 2021 • Swarnadeep Saha, Prateek Yadav, Lisa Bauer, Mohit Bansal

Recent commonsense-reasoning tasks are typically discriminative in nature, where a model answers a multiple-choice question for a certain context.

Graph Generation Multiple-choice +1

Paper
Code

Analysis of Tree-Structured Architectures for Code Generation

1 code implementation • Findings (ACL) 2021 • Samip Dahal, Adyasha Maharana, Mohit Bansal

Code Generation

Paper
Code

Proposition-Level Clustering for Multi-Document Summarization

2 code implementations • NAACL 2022 • Ori Ernst, Avi Caciularu, Ori Shapira, Ramakanth Pasunuru, Mohit Bansal, Jacob Goldberger, Ido Dagan

Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.

Clustering Document Summarization +3

Paper
Code

Explanation Graph Generation via Pre-trained Language Models: An Empirical Study with Contrastive Learning

1 code implementation • ACL 2022 • Swarnadeep Saha, Prateek Yadav, Mohit Bansal

In this work, we study pre-trained language models that generate explanation graphs in an end-to-end manner and analyze their ability to learn the structural constraints and semantics of such graphs.

Contrastive Learning Graph Generation +1

Paper
Code

Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings

1 code implementation • 9 Feb 2024 • Yichen Jiang, Xiang Zhou, Mohit Bansal

Transformers generalize to novel compositions of structures and entities after being trained on a complex dataset, but easily overfit on datasets of insufficient complexity.

Machine Translation Quantization +2

Paper
Code

Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences

1 code implementation • 12 Jun 2015 • Hongyuan Mei, Mohit Bansal, Matthew R. Walter

We propose a neural sequence-to-sequence model for direction following, a task that is essential to realizing effective autonomous agents.

Natural Language Understanding Sentence

Paper
Code

SafeCity: Understanding Diverse Forms of Sexual Harassment Personal Stories

4 code implementations • EMNLP 2018 • Sweta Karlekar, Mohit Bansal

With the recent rise of #MeToo, an increasing number of personal stories about sexual harassment and sexual abuse have been shared online.

Clustering

Paper
Code

Analyzing Compositionality-Sensitivity of NLI Models

1 code implementation • 16 Nov 2018 • Yixin Nie, Yicheng Wang, Mohit Bansal

Therefore, we propose a compositionality-sensitivity testing setup that analyzes models on natural examples from existing datasets that cannot be solved via lexical features alone (i. e., on which a bag-of-words model gives a high probability to one wrong label), hence revealing the models' actual compositionality awareness.

Natural Language Inference

Paper
Code

On Curriculum Learning for Commonsense Reasoning

1 code implementation • NAACL 2022 • Adyasha Maharana, Mohit Bansal

Hence, we examine the effect of a human-like easy-to-difficult curriculum during finetuning of language models for commonsense reasoning tasks.

Learning-To-Rank Natural Language Understanding +1

Paper
Code

GRAVL-BERT: Graphical Visual-Linguistic Representations for Multimodal Coreference Resolution

1 code implementation • COLING 2022 • Danfeng Guo, Arpit Gupta, Sanchit Agarwal, Jiun-Yu Kao, Shuyang Gao, Arijit Biswas, Chien-Wei Lin, Tagyoung Chung, Mohit Bansal

Learning from multimodal data has become a popular research topic in recent years.

coreference-resolution Visual Grounding

Paper
Code

Evaluating and Improving Factuality in Multimodal Abstractive Summarization

1 code implementation • 4 Nov 2022 • David Wan, Mohit Bansal

Current metrics for evaluating factuality for abstractive document summarization have achieved high correlations with human judgment, but they do not account for the vision modality and thus are not adequate for vision-and-language summarization.

Abstractive Text Summarization Document Summarization

Paper
Code

Non-Sequential Graph Script Induction via Multimedia Grounding

1 code implementation • 27 May 2023 • Yu Zhou, Sha Li, Manling Li, Xudong Lin, Shih-Fu Chang, Mohit Bansal, Heng Ji

To automate the induction of such graph scripts for given tasks, we propose to take advantage of loosely aligned videos of people performing the tasks.

Paper
Code

Soft Self-Consistency Improves Language Model Agents

1 code implementation • 20 Feb 2024 • Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal

Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers.

Language Modelling valid

Paper
Code

FixMyPose: Pose Correctional Captioning and Retrieval

1 code implementation • 4 Apr 2021 • Hyounghun Kim, Abhay Zala, Graham Burri, Mohit Bansal

During the correctional-captioning task, models must generate descriptions of how to move from the current to target pose image, whereas in the retrieval task, models should select the correct target pose given the initial pose and correctional description.

Pose Retrieval Retrieval

Paper
Code

Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information

1 code implementation • NAACL 2021 • Jialu Li, Hao Tan, Mohit Bansal

One key challenge in this task is to ground instructions with the current visual information that the agent perceives.

Navigate Sentence +1

Paper
Code

CAISE: Conversational Agent for Image Search and Editing

1 code implementation • 24 Feb 2022 • Hyounghun Kim, Doo Soon Kim, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Mohit Bansal

To our knowledge, this is the first dataset that provides conversational image search and editing annotations, where the agent holds a grounded conversation with users and helps them to search and edit images according to their requests.

Image Retrieval

Paper
Code

Generating Summaries with Controllable Readability Levels

1 code implementation • 16 Oct 2023 • Leonardo F. R. Ribeiro, Mohit Bansal, Markus Dreyer

Readability refers to how easily a reader can understand a written text.

Lay Summarization News Summarization +1

Paper
Code

The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions

1 code implementation • EMNLP 2020 • Xiang Zhou, Yixin Nie, Hao Tan, Mohit Bansal

For the first question, we conduct a thorough empirical study over analysis sets and find that in addition to the unstable final performance, the instability exists all along the training curve.

Model Selection Natural Language Inference +1

Paper
Code

Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks

1 code implementation • 30 Sep 2021 • Yichen Jiang, Mohit Bansal

Motivated by the failure of a Transformer model on the SCAN compositionality challenge (Lake and Baroni, 2018), which requires parsing a command into actions, we propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics, as additional training supervision.

Paper
Code

Inducing Transformer’s Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks

1 code implementation • EMNLP 2021 • Yichen Jiang, Mohit Bansal

Paper
Code

GENE: Global Event Network Embedding

1 code implementation • NAACL (TextGraphs) 2021 • Qi Zeng, Manling Li, Tuan Lai, Heng Ji, Mohit Bansal, Hanghang Tong

Current methods for event representation ignore related events in a corpus-level global context.

coreference-resolution Event Coreference Resolution +1

Paper
Code

Debiasing Multimodal Models via Causal Information Minimization

1 code implementation • 28 Nov 2023 • Vaidehi Patil, Adyasha Maharana, Mohit Bansal

In this paper, we study bias arising from confounders in a causal graph for multimodal data and examine a novel approach that leverages causally-motivated information minimization to learn the confounder representations.

Visual Question Answering (VQA)

Paper
Code

Automatically Learning Data Augmentation Policies for Dialogue Tasks

1 code implementation • IJCNLP 2019 • Tong Niu, Mohit Bansal

Automatic data augmentation (AutoAugment) (Cubuk et al., 2019) searches for optimal perturbation policies via a controller trained using performance rewards of a sampled policy on the target task, hence reducing data-level model bias.

Data Augmentation Dialogue Generation +2

Paper
Code

multiPRover: Generating Multiple Proofs for Improved Interpretability in Rule Reasoning

1 code implementation • NAACL 2021 • Swarnadeep Saha, Prateek Yadav, Mohit Bansal

In order to jointly learn from all proof graphs and exploit the correlations between multiple proofs for a question, we pose this task as a set generation problem over structured output spaces where each proof is represented as a directed graph.

Multi-Label Classification

Paper
Code

Extending Multi-Document Summarization Evaluation to the Interactive Setting

1 code implementation • NAACL 2021 • Ori Shapira, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Yael Amsterdamer, Ido Dagan

In this paper, we develop an end-to-end evaluation framework for interactive summarization, focusing on expansion-based interaction, which considers the accumulating information along a user session.

Document Summarization Multi-Document Summarization

Paper
Code

CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations

1 code implementation • Findings (NAACL) 2022 • Jialu Li, Hao Tan, Mohit Bansal

Empirically, on the Room-Across-Room dataset, we show that our multilingual agent gets large improvements in all metrics over the strong baseline model when generalizing to unseen environments with the cross-lingual language representation and the environment-agnostic visual representation.

Navigate Representation Learning +2

Paper
Code

SETSum: Summarization and Visualization of Student Evaluations of Teaching

1 code implementation • NAACL (ACL) 2022 • Yinuo Hu, Shiyue Zhang, Viji Sathy, A. T. Panter, Mohit Bansal

Ten university professors from diverse departments serve as evaluators of the system and all agree that SETSum helps them interpret SET results more efficiently; and 6 out of 10 instructors prefer our system over the standard static PDF report (while the remaining 4 would like to have both).

Aspect Extraction Sentiment Analysis

Paper
Code

Exclusive Supermask Subnetwork Training for Continual Learning

1 code implementation • 18 Oct 2022 • Prateek Yadav, Mohit Bansal

Although there is no forgetting, the performance of SupSup is sub-optimal because fixed weights restrict its representational power.

Continual Learning Text Classification +1

Paper
Code

DAM: Dynamic Adapter Merging for Continual Video QA Learning

1 code implementation • 13 Mar 2024 • Feng Cheng, Ziyang Wang, Yi-Lin Sung, Yan-Bo Lin, Mohit Bansal, Gedas Bertasius

Our DAM model outperforms prior state-of-the-art continual learning approaches by 9. 1% while exhibiting 1. 9% less forgetting on 6 VidQA datasets spanning various domains.

Continual Learning Image Classification +2

Paper
Code

How can NLP Help Revitalize Endangered Languages? A Case Study and Roadmap for the Cherokee Language

1 code implementation • ACL 2022 • Shiyue Zhang, Ben Frey, Mohit Bansal

We hope that our work serves not only to inform the NLP community about Cherokee, but also to provide inspiration for future work on endangered languages in general.

Paper
Code

Efficient Few-Shot Fine-Tuning for Opinion Summarization

1 code implementation • Findings (NAACL) 2022 • Arthur Bražinskas, Ramesh Nallapati, Mohit Bansal, Markus Dreyer

In the same vein, we pre-train the adapters in a query-based manner on customer reviews and then fine-tune them on annotated datasets.

Abstractive Text Summarization Opinion Summarization

Paper
Code

Interactive Query-Assisted Summarization via Deep Reinforcement Learning

1 code implementation • NAACL 2022 • Ori Shapira, Ramakanth Pasunuru, Mohit Bansal, Ido Dagan, Yael Amsterdamer

Interactive summarization is a task that facilitates user-guided exploration of information within a document set.

Informativeness reinforcement-learning +1

Paper
Code

Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations

1 code implementation • 14 Nov 2022 • Swarnadeep Saha, Peter Hase, Nazneen Rajani, Mohit Bansal

We observe that (1) GPT-3 explanations are as grammatical as human explanations regardless of the hardness of the test samples, (2) for easy examples, GPT-3 generates highly supportive explanations but human explanations are more generalizable, and (3) for hard examples, human explanations are significantly better than GPT-3 explanations both in terms of label-supportiveness and generalizability judgements.

Paper
Code

Identify, Align, and Integrate: Matching Knowledge Graphs to Commonsense Reasoning Tasks

1 code implementation • EACL 2021 • Lisa Bauer, Mohit Bansal

For knowledge integration to yield peak performance, it is critical to select a knowledge graph (KG) that is well-aligned with the given task's objective.

Knowledge Graphs

Paper
Code

VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

1 code implementation • 22 Jun 2022 • Zhuofan Ying, Peter Hase, Mohit Bansal

In this paper, we show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason (RRR) metrics by optimizing for four key model objectives: (1) accurate predictions given limited but sufficient information (Sufficiency); (2) max-entropy predictions given no important information (Uncertainty); (3) invariance of predictions to changes in unimportant features (Invariance); and (4) alignment between model FI explanations and human FI explanations (Plausibility).

Feature Importance Question Answering +2

Paper
Code

Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging?

1 code implementation • NAACL 2022 • Xiang Zhou, Shiyue Zhang, Mohit Bansal

MPoSM can model arbitrary tag dependency and perform POS induction through the objective of masked POS reconstruction.

POS POS Tagging +1

Paper
Code

WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models

1 code implementation • 25 Jul 2022 • Yonatan Bitton, Nitzan Bitton Guetta, Ron Yosef, Yuval Elovici, Mohit Bansal, Gabriel Stanovsky, Roy Schwartz

While vision-and-language models perform well on tasks such as visual question answering, they struggle when it comes to basic human commonsense reasoning skills.

Ranked #1 on Common Sense Reasoning on WinoGAViL

Common Sense Reasoning General Knowledge +4

Paper
Code

Continual Few-Shot Learning for Text Classification

1 code implementation • EMNLP 2021 • Ramakanth Pasunuru, Veselin Stoyanov, Mohit Bansal

In this work, we propose a continual few-shot learning (CFL) task, in which a system is challenged with a difficult phenomenon and asked to learn to correct mistakes with only a few (10 to 15) training examples.

continual few-shot learning Few-Shot Learning +4

Paper
Code

NDH-Full: Learning and Evaluating Navigational Agents on Full-Length Dialogue

1 code implementation • EMNLP 2021 • Hyounghun Kim, Jialu Li, Mohit Bansal

In this paper, we explore the Navigation from Dialogue History (NDH) task, which is based on the Cooperative Vision-and-Dialogue Navigation (CVDN) dataset, and present a state-of-the-art model which is built upon Vision-Language transformers.

Data Augmentation Dynamic Time Warping +1

Paper
Code

HistAlign: Improving Context Dependency in Language Generation by Aligning with History

1 code implementation • 8 May 2023 • David Wan, Shiyue Zhang, Mohit Bansal

Cache-LMs, which augment LMs with a memory of recent history, can increase context dependency and have shown remarkable performance in diverse language generation tasks.

Abstractive Text Summarization Text Generation

Paper
Code

Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions

1 code implementation • 1 Nov 2021 • Prateek Yadav, Peter Hase, Mohit Bansal

Current approaches try to optimize for the cost incurred by users when adopting a recourse, but they assume that all users share the same cost function.

Fairness

Paper
Code

Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality

1 code implementation • 28 Nov 2022 • Yichen Jiang, Xiang Zhou, Mohit Bansal

Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.

Data Augmentation Inductive Bias +1

Paper
Code

Data Factors for Better Compositional Generalization

1 code implementation • 8 Nov 2023 • Xiang Zhou, Yichen Jiang, Mohit Bansal

However, in contrast to this poor performance, state-of-the-art models trained on larger and more general datasets show better generalization ability.

Memorization

Paper
Code

Multi-Reward Reinforced Summarization with Saliency and Entailment

no code implementations • NAACL 2018 • Ramakanth Pasunuru, Mohit Bansal

Abstractive text summarization is the task of compressing and rewriting a long document into a short summary while maintaining saliency, directed logical entailment, and non-redundancy.

Ranked #41 on Abstractive Text Summarization on CNN / Daily Mail

Abstractive Text Summarization

Paper
Add Code

Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation

no code implementations • ACL 2018 • Han Guo, Ramakanth Pasunuru, Mohit Bansal

An accurate abstractive summary of a document should contain all its salient information and should be logically entailed by the input document.

Ranked #33 on Text Summarization on GigaWord

Abstractive Text Summarization Multi-Task Learning +2

Paper
Add Code

Object Ordering with Bidirectional Matchings for Visual Reasoning

no code implementations • NAACL 2018 • Hao Tan, Mohit Bansal

Visual reasoning with compositional natural language instructions, e. g., based on the newly-released Cornell Natural Language Visual Reasoning (NLVR) dataset, is a challenging task, where the model needs to have the ability to create an accurate mapping between the diverse phrases and the several objects placed in complex arrangements in the image.

Object Visual Reasoning

Paper
Add Code

Robust Machine Comprehension Models via Adversarial Training

no code implementations • NAACL 2018 • Yicheng Wang, Mohit Bansal

It is shown that many published models for the Stanford Question Answering Dataset (Rajpurkar et al., 2016) lack robustness, suffering an over 50% decrease in F1 score during adversarial evaluation based on the AddSent (Jia and Liang, 2017) algorithm.

Data Augmentation Question Answering +1

Paper
Add Code

Detecting Linguistic Characteristics of Alzheimer's Dementia by Interpreting Neural Models

no code implementations • NAACL 2018 • Sweta Karlekar, Tong Niu, Mohit Bansal

More importantly, we next interpret what these neural models have learned about the linguistic characteristics of AD patients, via analysis based on activation clustering and first-derivative saliency techniques.

Clustering

Paper
Add Code

Source-Target Inference Models for Spatial Instruction Understanding

no code implementations • 12 Jul 2017 • Hao Tan, Mohit Bansal

Models that can execute natural language instructions for situated robotic tasks such as assembly and navigation have several useful applications in homes, offices, and remote scenarios.

Position Position regression +2

Paper
Add Code

Hierarchically-Attentive RNN for Album Summarization and Storytelling

no code implementations • EMNLP 2017 • Licheng Yu, Mohit Bansal, Tamara L. Berg

For this task, we make use of the Visual Storytelling dataset and a model composed of three hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album photos, select representative (summary) photos, and compose the story.

Ranked #15 on Visual Storytelling on VIST (BLEU-3 metric)

Retrieval Visual Storytelling

Paper
Add Code

Multi-Task Video Captioning with Video and Entailment Generation

no code implementations • ACL 2017 • Ramakanth Pasunuru, Mohit Bansal

Video captioning, the task of describing the content of a video, has seen some promising improvements in recent years with sequence-to-sequence models, but accurately learning the temporal and logical dynamics involved in the task still remains a challenge, especially given the lack of sufficient annotated data.

Multi-Task Learning Video Captioning +1

Paper
Add Code

Reinforced Video Captioning with Entailment Rewards

no code implementations • EMNLP 2017 • Ramakanth Pasunuru, Mohit Bansal

Sequence-to-sequence models have shown promising improvements on the temporal task of video captioning, but they optimize word-level cross-entropy loss during training.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Add Code

Video Highlight Prediction Using Audience Chat Reactions

no code implementations • EMNLP 2017 • Cheng-Yang Fu, Joon Lee, Mohit Bansal, Alexander C. Berg

Sports channel video portals offer an exciting domain for research on multimodal, multilingual analysis.

Paper
Add Code

The Role of Context Types and Dimensionality in Learning Word Embeddings

no code implementations • NAACL 2016 • Oren Melamud, David McClosky, Siddharth Patwardhan, Mohit Bansal

We provide the first extensive evaluation of how using different types of context to learn skip-gram word embeddings affects performance on a wide range of intrinsic and extrinsic NLP tasks.

Learning Word Embeddings

Paper
Add Code

Contextual RNN-GANs for Abstract Reasoning Diagram Generation

no code implementations • 29 Sep 2016 • Arnab Ghosh, Viveka Kulharia, Amitabha Mukerjee, Vinay Namboodiri, Mohit Bansal

Understanding, predicting, and generating object motions and transformations is a core problem in artificial intelligence.

Generative Adversarial Network Video Generation

Paper
Add Code

Coherent Dialogue with Attention-based Language Models

no code implementations • 21 Nov 2016 • Hongyuan Mei, Mohit Bansal, Matthew R. Walter

We model coherent conversation continuation via RNN-based dialogue models equipped with a dynamic attention mechanism.

Language Modelling

Paper
Add Code

Sort Story: Sorting Jumbled Images and Captions into Stories

no code implementations • EMNLP 2016 • Harsh Agrawal, Arjun Chandrasekaran, Dhruv Batra, Devi Parikh, Mohit Bansal

Temporal common sense has applications in AI tasks such as QA, multi-document summarization, and human-AI communication.

Common Sense Reasoning Document Summarization +2

Paper
Add Code

Navigational Instruction Generation as Inverse Reinforcement Learning with Neural Machine Translation

no code implementations • 11 Oct 2016 • Andrea F. Daniele, Mohit Bansal, Matthew R. Walter

We first decide which information to share with the user according to their preferences, using a policy trained from human demonstrations via inverse reinforcement learning.

Machine Translation Navigate +3

Paper
Add Code

Interpreting Neural Networks to Improve Politeness Comprehension

no code implementations • EMNLP 2016 • Malika Aubakirova, Mohit Bansal

We present an interpretable neural network approach to predicting and understanding politeness in natural language requests.

Paper
Add Code

Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions

no code implementations • EMNLP 2016 • Arijit Ray, Gordon Christie, Mohit Bansal, Dhruv Batra, Devi Parikh

We introduce the novel problem of determining the relevance of questions to images in VQA.

Question Answering Question Similarity +1

Paper
Add Code

Who did What: A Large-Scale Person-Centered Cloze Dataset

no code implementations • EMNLP 2016 • Takeshi Onishi, Hai Wang, Mohit Bansal, Kevin Gimpel, David Mcallester

We have constructed a new "Who-did-What" dataset of over 200, 000 fill-in-the-gap (cloze) multiple choice reading comprehension problems constructed from the LDC English Gigaword newswire corpus.

Multiple-choice Reading Comprehension

Paper
Add Code

Charagram: Embedding Words and Sentences via Character n-grams

no code implementations • EMNLP 2016 • John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu

We present Charagram embeddings, a simple approach for learning character-based compositional models to embed textual sequences.

Part-Of-Speech Tagging Sentence +2

Paper
Add Code

Learning Articulated Motion Models from Visual and Lingual Signals

no code implementations • 17 Nov 2015 • Zhengyang Wu, Mohit Bansal, Matthew R. Walter

In this paper, we present a multimodal learning framework that incorporates both visual and lingual information to estimate the structure and parameters that define kinematic models of articulated objects.

Language Modelling Word Embeddings

Paper
Add Code

Mapping Unseen Words to Task-Trained Embedding Spaces

no code implementations • WS 2016 • Pranava Swaroop Madhyastha, Mohit Bansal, Kevin Gimpel, Karen Livescu

We consider the supervised training setting in which we learn task-specific word embeddings.

Dependency Parsing Sentiment Analysis +1

Paper
Add Code

We Are Humor Beings: Understanding and Predicting Visual Humor

no code implementations • CVPR 2016 • Arjun Chandrasekaran, Ashwin K. Vijayakumar, Stanislaw Antol, Mohit Bansal, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh

We collect two datasets of abstract scenes that facilitate the study of humor at both the scene-level and the object-level.

Paper
Add Code

Towards Universal Paraphrastic Sentence Embeddings

no code implementations • 25 Nov 2015 • John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu

We again find that the word averaging models perform well for sentence similarity and entailment, outperforming LSTMs.

General Classification Sentence +4

Paper
Add Code

Accurate Vision-based Vehicle Localization using Satellite Imagery

no code implementations • 30 Oct 2015 • Hang Chu, Hongyuan Mei, Mohit Bansal, Matthew R. Walter

We propose a method for accurately localizing ground vehicles with the aid of satellite imagery.

Visual Localization

Paper
Add Code

From Paraphrase Database to Compositional Paraphrase Model and Back

1 code implementation • TACL 2015 • John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu, Dan Roth

The Paraphrase Database (PPDB; Ganitkevitch et al., 2013) is an extensive semantic resource, consisting of a list of phrase pairs with (heuristic) confidence estimates.

Word Embeddings

Paper
Code

Web-scale Surface and Syntactic n-gram Features for Dependency Parsing

no code implementations • 25 Feb 2015 • Dominick Ng, Mohit Bansal, James R. Curran

We develop novel first- and second-order features for dependency parsing based on the Google Syntactic Ngrams corpus, a collection of subtree counts of parsed sentences from scanned books.

Dependency Parsing

Paper
Add Code

Dynamic Multi-Level Multi-Task Learning for Sentence Simplification

no code implementations • COLING 2018 • Han Guo, Ramakanth Pasunuru, Mohit Bansal

In this work, we first present a strong pointer-copy mechanism based sequence-to-sequence sentence simplification model, and then improve its entailment and paraphrasing capabilities via multi-task learning with related auxiliary tasks of entailment and paraphrase generation.

Ranked #2 on Text Simplification on Newsela

Multi-Task Learning Paraphrase Generation +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.