Search Results for author: Mohit Bansal

Found 269 papers, 177 papers with code

Multi-Reward Reinforced Summarization with Saliency and Entailment

no code implementations NAACL 2018 Ramakanth Pasunuru, Mohit Bansal

Abstractive text summarization is the task of compressing and rewriting a long document into a short summary while maintaining saliency, directed logical entailment, and non-redundancy.

Abstractive Text Summarization

Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation

no code implementations ACL 2018 Han Guo, Ramakanth Pasunuru, Mohit Bansal

An accurate abstractive summary of a document should contain all its salient information and should be logically entailed by the input document.

Abstractive Text Summarization Decoder +3

Object Ordering with Bidirectional Matchings for Visual Reasoning

no code implementations NAACL 2018 Hao Tan, Mohit Bansal

Visual reasoning with compositional natural language instructions, e. g., based on the newly-released Cornell Natural Language Visual Reasoning (NLVR) dataset, is a challenging task, where the model needs to have the ability to create an accurate mapping between the diverse phrases and the several objects placed in complex arrangements in the image.

Object Visual Reasoning

Robust Machine Comprehension Models via Adversarial Training

no code implementations NAACL 2018 Yicheng Wang, Mohit Bansal

It is shown that many published models for the Stanford Question Answering Dataset (Rajpurkar et al., 2016) lack robustness, suffering an over 50% decrease in F1 score during adversarial evaluation based on the AddSent (Jia and Liang, 2017) algorithm.

Data Augmentation Question Answering +1

Detecting Linguistic Characteristics of Alzheimer's Dementia by Interpreting Neural Models

no code implementations NAACL 2018 Sweta Karlekar, Tong Niu, Mohit Bansal

More importantly, we next interpret what these neural models have learned about the linguistic characteristics of AD patients, via analysis based on activation clustering and first-derivative saliency techniques.

Clustering

Source-Target Inference Models for Spatial Instruction Understanding

no code implementations12 Jul 2017 Hao Tan, Mohit Bansal

Models that can execute natural language instructions for situated robotic tasks such as assembly and navigation have several useful applications in homes, offices, and remote scenarios.

Position Position regression +2

Hierarchically-Attentive RNN for Album Summarization and Storytelling

no code implementations EMNLP 2017 Licheng Yu, Mohit Bansal, Tamara L. Berg

For this task, we make use of the Visual Storytelling dataset and a model composed of three hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album photos, select representative (summary) photos, and compose the story.

Ranked #15 on Visual Storytelling on VIST (BLEU-3 metric)

Retrieval Visual Storytelling

Multi-Task Video Captioning with Video and Entailment Generation

no code implementations ACL 2017 Ramakanth Pasunuru, Mohit Bansal

Video captioning, the task of describing the content of a video, has seen some promising improvements in recent years with sequence-to-sequence models, but accurately learning the temporal and logical dynamics involved in the task still remains a challenge, especially given the lack of sufficient annotated data.

Decoder Multi-Task Learning +2

Reinforced Video Captioning with Entailment Rewards

no code implementations EMNLP 2017 Ramakanth Pasunuru, Mohit Bansal

Sequence-to-sequence models have shown promising improvements on the temporal task of video captioning, but they optimize word-level cross-entropy loss during training.

reinforcement-learning Reinforcement Learning (RL) +2

Video Highlight Prediction Using Audience Chat Reactions

no code implementations EMNLP 2017 Cheng-Yang Fu, Joon Lee, Mohit Bansal, Alexander C. Berg

Sports channel video portals offer an exciting domain for research on multimodal, multilingual analysis.

The Role of Context Types and Dimensionality in Learning Word Embeddings

no code implementations NAACL 2016 Oren Melamud, David McClosky, Siddharth Patwardhan, Mohit Bansal

We provide the first extensive evaluation of how using different types of context to learn skip-gram word embeddings affects performance on a wide range of intrinsic and extrinsic NLP tasks.

Learning Word Embeddings

Contextual RNN-GANs for Abstract Reasoning Diagram Generation

no code implementations29 Sep 2016 Arnab Ghosh, Viveka Kulharia, Amitabha Mukerjee, Vinay Namboodiri, Mohit Bansal

Understanding, predicting, and generating object motions and transformations is a core problem in artificial intelligence.

Generative Adversarial Network Video Generation

Coherent Dialogue with Attention-based Language Models

no code implementations21 Nov 2016 Hongyuan Mei, Mohit Bansal, Matthew R. Walter

We model coherent conversation continuation via RNN-based dialogue models equipped with a dynamic attention mechanism.

Language Modelling

Navigational Instruction Generation as Inverse Reinforcement Learning with Neural Machine Translation

no code implementations11 Oct 2016 Andrea F. Daniele, Mohit Bansal, Matthew R. Walter

We first decide which information to share with the user according to their preferences, using a policy trained from human demonstrations via inverse reinforcement learning.

Machine Translation Navigate +3

Interpreting Neural Networks to Improve Politeness Comprehension

no code implementations EMNLP 2016 Malika Aubakirova, Mohit Bansal

We present an interpretable neural network approach to predicting and understanding politeness in natural language requests.

Who did What: A Large-Scale Person-Centered Cloze Dataset

no code implementations EMNLP 2016 Takeshi Onishi, Hai Wang, Mohit Bansal, Kevin Gimpel, David Mcallester

We have constructed a new "Who-did-What" dataset of over 200, 000 fill-in-the-gap (cloze) multiple choice reading comprehension problems constructed from the LDC English Gigaword newswire corpus.

Multiple-choice Reading Comprehension

Charagram: Embedding Words and Sentences via Character n-grams

no code implementations EMNLP 2016 John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu

We present Charagram embeddings, a simple approach for learning character-based compositional models to embed textual sequences.

Part-Of-Speech Tagging Sentence +2

Learning Articulated Motion Models from Visual and Lingual Signals

no code implementations17 Nov 2015 Zhengyang Wu, Mohit Bansal, Matthew R. Walter

In this paper, we present a multimodal learning framework that incorporates both visual and lingual information to estimate the structure and parameters that define kinematic models of articulated objects.

Language Modelling Word Embeddings

Towards Universal Paraphrastic Sentence Embeddings

no code implementations25 Nov 2015 John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu

We again find that the word averaging models perform well for sentence similarity and entailment, outperforming LSTMs.

General Classification Sentence +4

Accurate Vision-based Vehicle Localization using Satellite Imagery

no code implementations30 Oct 2015 Hang Chu, Hongyuan Mei, Mohit Bansal, Matthew R. Walter

We propose a method for accurately localizing ground vehicles with the aid of satellite imagery.

Visual Localization

From Paraphrase Database to Compositional Paraphrase Model and Back

1 code implementation TACL 2015 John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu, Dan Roth

The Paraphrase Database (PPDB; Ganitkevitch et al., 2013) is an extensive semantic resource, consisting of a list of phrase pairs with (heuristic) confidence estimates.

Word Embeddings

Web-scale Surface and Syntactic n-gram Features for Dependency Parsing

no code implementations25 Feb 2015 Dominick Ng, Mohit Bansal, James R. Curran

We develop novel first- and second-order features for dependency parsing based on the Google Syntactic Ngrams corpus, a collection of subtree counts of parsed sentences from scanned books.

Dependency Parsing

Dynamic Multi-Level Multi-Task Learning for Sentence Simplification

no code implementations COLING 2018 Han Guo, Ramakanth Pasunuru, Mohit Bansal

In this work, we first present a strong pointer-copy mechanism based sequence-to-sequence sentence simplification model, and then improve its entailment and paraphrasing capabilities via multi-task learning with related auxiliary tasks of entailment and paraphrase generation.

Multi-Task Learning Paraphrase Generation +3

Closed-Book Training to Improve Summarization Encoder Memory

no code implementations EMNLP 2018 Yichen Jiang, Mohit Bansal

A good neural sequence-to-sequence summarization model should have a strong encoder that can distill and memorize the important information from long input texts so that the decoder can generate salient summaries based on the encoder's memory.

Abstractive Text Summarization Decoder +1

Incorporating Background Knowledge into Video Description Generation

no code implementations EMNLP 2018 Spencer Whitehead, Heng Ji, Mohit Bansal, Shih-Fu Chang, Clare Voss

We develop an approach that uses video meta-data to retrieve topically related news documents for a video and extracts the events and named entities from these documents.

Decoder Text Generation +2

Towards Improving Abstractive Summarization via Entailment Generation

no code implementations WS 2017 Ramakanth Pasunuru, Han Guo, Mohit Bansal

Abstractive summarization, the task of rewriting and compressing a document into a short summary, has achieved considerable success with neural sequence-to-sequence models.

Abstractive Text Summarization Decoder +3

Efficient Generation of Motion Plans from Attribute-Based Natural Language Instructions Using Dynamic Constraint Mapping

no code implementations8 Jul 2017 Jae Sung Park, Biao Jia, Mohit Bansal, Dinesh Manocha

We generate a factor graph from natural language instructions called the Dynamic Grounding Graph (DGG), which takes latent parameters into account.

Robotics

AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning

no code implementations NAACL 2019 Han Guo, Ramakanth Pasunuru, Mohit Bansal

To address these issues, we present AutoSeM, a two-stage MTL pipeline, where the first stage automatically selects the most useful auxiliary tasks via a Beta-Bernoulli multi-armed bandit with Thompson Sampling, and the second stage learns the training mixing ratio of these selected auxiliary tasks via a Gaussian Process based Bayesian optimization framework.

Bayesian Optimization Inductive Bias +2

Enabling Robots to Understand Incomplete Natural Language Instructions Using Commonsense Reasoning

no code implementations29 Apr 2019 Haonan Chen, Hao Tan, Alan Kuntz, Mohit Bansal, Ron Alterovitz

Our results show the feasibility of a robot learning commonsense knowledge automatically from web-based textual corpora, and the power of learned commonsense reasoning models in enabling a robot to autonomously perform tasks based on incomplete natural language instructions.

Common Sense Reasoning Language Modelling

Improving Visual Question Answering by Referring to Generated Paragraph Captions

no code implementations ACL 2019 Hyounghun Kim, Mohit Bansal

These paragraph captions can hence contain substantial information of the image for tasks such as visual question answering.

Decoder Image Captioning +3

Multi-Source Domain Adaptation for Text Classification via DistanceNet-Bandits

no code implementations13 Jan 2020 Han Guo, Ramakanth Pasunuru, Mohit Bansal

Next, we develop a DistanceNet model which uses these distance measures, or a mixture of these distance measures, as an additional loss function to be minimized jointly with the task's loss function, so as to achieve better unsupervised domain adaptation.

General Classification Sentiment Analysis +3

AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses

no code implementations15 Jan 2020 Tong Niu, Mohit Bansal

In our work, we build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.

Feature Engineering

Modality-Balanced Models for Visual Dialogue

no code implementations17 Jan 2020 Hyounghun Kim, Hao Tan, Mohit Bansal

The Visual Dialog task requires a model to exploit both image and conversational context information to generate the next response to the dialogue.

Visual Dialog

Simple Compounded-Label Training for Fact Extraction and Verification

no code implementations WS 2020 Yixin Nie, Lisa Bauer, Mohit Bansal

Automatic fact checking is an important task motivated by the need for detecting and preventing the spread of misinformation across the web.

Claim Verification Fact Checking +4

Evaluating Interactive Summarization: an Expansion-Based Framework

no code implementations17 Sep 2020 Ori Shapira, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Yael Amsterdamer, Ido Dagan

Allowing users to interact with multi-document summarizers is a promising direction towards improving and customizing summary results.

ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments

no code implementations Findings of the Association for Computational Linguistics 2020 Hyounghun Kim, Abhay Zala, Graham Burri, Hao Tan, Mohit Bansal

During this task, the agent (similar to a PokeMON GO player) is asked to find and collect different target objects one-by-one by navigating based on natural language instructions in a complex, realistic outdoor environment, but then also ARRAnge the collected objects part-by-part in an egocentric grid-layout environment.

Referring Expression Referring Expression Comprehension +1

DORB: Dynamically Optimizing Multiple Rewards with Bandits

no code implementations EMNLP 2020 Ramakanth Pasunuru, Han Guo, Mohit Bansal

Further, it is important to consider using a dynamic combination and curriculum of metric rewards that flexibly changes over time.

Data-to-Text Generation Question Generation +1

I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling

no code implementations ACL 2021 Yixin Nie, Mary Williamson, Mohit Bansal, Douwe Kiela, Jason Weston

To quantify how well natural language understanding models can capture consistency in a general conversation, we introduce the DialoguE COntradiction DEtection task (DECODE) and a new conversational dataset containing both human-human and human-bot contradictory dialogues.

Natural Language Understanding

To what extent do human explanations of model behavior align with actual model behavior?

no code implementations EMNLP (BlackboxNLP) 2021 Grusha Prasad, Yixin Nie, Mohit Bansal, Robin Jia, Douwe Kiela, Adina Williams

Given the increasingly prominent role NLP models (will) play in our lives, it is important for human expectations of model behavior to align with actual model behavior.

Natural Language Inference

Dual Reinforcement-Based Specification Generation for Image De-Rendering

no code implementations2 Mar 2021 Ramakanth Pasunuru, David Rosenberg, Gideon Mann, Mohit Bansal

Since these are sequence models, we must choose an ordering of the objects in the graphics programs for likelihood training.

Decoder Inductive Bias

Hidden Biases in Unreliable News Detection Datasets

no code implementations EACL 2021 Xiang Zhou, Heba Elfardy, Christos Christodoulopoulos, Thomas Butler, Mohit Bansal

Using the observations and experimental results, we provide practical suggestions on how to create more reliable datasets for the unreliable news detection task.

Fact Checking Selection bias

An Empirical Survey of Data Augmentation for Limited Data Learning in NLP

no code implementations14 Jun 2021 Jiaao Chen, Derek Tam, Colin Raffel, Mohit Bansal, Diyi Yang

NLP has achieved great progress in the past decade through the use of neural models and large labeled datasets.

Data Augmentation News Classification +1

An Overview of Uncertainty Calibration for Text Classification and the Role of Distillation

no code implementations ACL (RepL4NLP) 2021 Han Guo, Ramakanth Pasunuru, Mohit Bansal

Many recalibration methods have been proposed in the literature for quantifying predictive uncertainty and calibrating model outputs, with varying degrees of complexity.

text-classification Text Classification

Learning and Analyzing Generation Order for Undirected Sequence Models

1 code implementation Findings (EMNLP) 2021 Yichen Jiang, Mohit Bansal

On examples with a maximum source and target length of 30 from De-En, WMT'16 English-Romanian, and WMT'21 English-Chinese translation tasks, our learned order outperforms all heuristic generation orders on four out of six tasks.

Machine Translation Translation

Analyzing the Limits of Self-Supervision in Handling Bias in Language

no code implementations16 Dec 2021 Lisa Bauer, Karthik Gopalakrishnan, Spandana Gella, Yang Liu, Mohit Bansal, Dilek Hakkani-Tur

We define three broad classes of task descriptions for these tasks: statement, question, and completion, with numerous lexical variants within each class.

On the Limits of Evaluating Embodied Agent Model Generalization Using Validation Sets

no code implementations insights (ACL) 2022 Hyounghun Kim, Aishwarya Padmakumar, Di Jin, Mohit Bansal, Dilek Hakkani-Tur

Natural language guided embodied task completion is a challenging problem since it requires understanding natural language instructions, aligning them with egocentric visual observations, and choosing appropriate actions to execute in the environment to produce desired changes.

Enhanced Knowledge Selection for Grounded Dialogues via Document Semantic Graphs

no code implementations15 Jun 2022 Sha Li, Mahdi Namazifar, Di Jin, Mohit Bansal, Heng Ji, Yang Liu, Dilek Hakkani-Tur

Providing conversation models with background knowledge has been shown to make open-domain dialogues more informative and engaging.

Multi-Task Learning Response Generation +1

Multimodal Intent Discovery from Livestream Videos

no code implementations Findings (NAACL) 2022 Adyasha Maharana, Quan Tran, Franck Dernoncourt, Seunghyun Yoon, Trung Bui, Walter Chang, Mohit Bansal

We construct and present a new multimodal dataset consisting of software instructional livestreams and containing manual annotations for both detailed and abstract procedural intent that enable training and evaluation of joint video and text understanding models.

Intent Discovery Video Summarization +1

Enhancing Knowledge Selection for Grounded Dialogues via Document Semantic Graphs

no code implementations NAACL 2022 Sha Li, Mahdi Namazifar, Di Jin, Mohit Bansal, Heng Ji, Yang Liu, Dilek Hakkani-Tur

In this work, we propose to automatically convert the background knowledge documents into document semantic graphs and then perform knowledge selection over such graphs.

Multi-Task Learning Response Generation +1

GraDA: Graph Generative Data Augmentation for Commonsense Reasoning

1 code implementation COLING 2022 Adyasha Maharana, Mohit Bansal

Recent advances in commonsense reasoning have been fueled by the availability of large-scale human annotated datasets.

Data Augmentation Knowledge Graphs

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

1 code implementation28 Mar 2023 Adyasha Maharana, Amita Kamath, Christopher Clark, Mohit Bansal, Aniruddha Kembhavi

As general purpose vision models get increasingly effective at a wide set of tasks, it is imperative that they be consistent across the tasks they support.

Improving Vision-and-Language Navigation by Generating Future-View Image Semantics

no code implementations CVPR 2023 Jialu Li, Mohit Bansal

We then fine-tune the agent on the VLN task with an auxiliary loss that minimizes the difference between the view semantics generated by the agent and the ground truth view semantics of the next step.

Image Generation Navigate +3

Visual Programming for Text-to-Image Generation and Evaluation

no code implementations24 May 2023 Jaemin Cho, Abhay Zala, Mohit Bansal

First, we introduce VPGen, an interpretable step-by-step T2I generation framework that decomposes T2I generation into three steps: object/count generation, layout generation, and image generation.

Text-to-Image Generation World Knowledge

On Conditional and Compositional Language Model Differentiable Prompting

no code implementations4 Jul 2023 Jonathan Pilault, Can Liu, Mohit Bansal, Markus Dreyer

Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks.

Few-Shot Learning Language Modelling +1

VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning

no code implementations26 Sep 2023 Han Lin, Abhay Zala, Jaemin Cho, Mohit Bansal

Our experiments demonstrate that VideoDirectorGPT framework substantially improves layout and movement control in both single- and multi-scene video generation and can generate multi-scene videos with visual consistency across scenes, while achieving competitive performance with SOTAs in open-domain single-scene T2V generation.

Image Generation Video Generation

ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models

no code implementations4 Oct 2023 Yi-Lin Sung, Jaehong Yoon, Mohit Bansal

We first determine the sparsity ratios of different layers or blocks by leveraging the global importance score, which is efficiently computed based on the zeroth-order approximation of the global model gradients.

Model Compression

DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

no code implementations18 Oct 2023 Abhay Zala, Han Lin, Jaemin Cho, Mohit Bansal

In the first stage, we use LLMs to generate and iteratively refine 'diagram plans' (in a planner-auditor feedback loop) which describe all the entities (objects and text labels), their relationships (arrows or lines), and their bounding box layouts.

Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

no code implementations23 Oct 2023 Swarnadeep Saha, Omer Levy, Asli Celikyilmaz, Mohit Bansal, Jason Weston, Xian Li

Large Language Models (LLMs) are frequently used for multi-faceted language generation and evaluation tasks that involve satisfying intricate user constraints or taking into account multiple aspects and criteria.

Language Modelling Large Language Model +1

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

no code implementations27 Oct 2023 Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang

With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above.

Question Answering Question Generation +3

ADaPT: As-Needed Decomposition and Planning with Language Models

no code implementations8 Nov 2023 Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabharwal, Mohit Bansal, Tushar Khot

Large Language Models (LLMs) are increasingly being used for interactive decision-making tasks requiring planning and adapting to the environment.

Decision Making

CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

no code implementations30 Nov 2023 Zineng Tang, ZiYi Yang, Mahmoud Khademi, Yang Liu, Chenguang Zhu, Mohit Bansal

We present CoDi-2, a versatile and interactive Multimodal Large Language Model (MLLM) that can follow complex multimodal interleaved instructions, conduct in-context learning (ICL), reason, chat, edit, etc., in an any-to-any input-output modality paradigm.

Image Generation In-Context Learning +3

VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation

no code implementations5 Feb 2024 Jialu Li, Aishwarya Padmakumar, Gaurav Sukhatme, Mohit Bansal

Outdoor Vision-and-Language Navigation (VLN) requires an agent to navigate through realistic 3D outdoor environments based on natural language instructions.

Language Modelling Masked Language Modeling +2

Evaluating Very Long-Term Conversational Memory of LLM Agents

no code implementations27 Feb 2024 Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang

Using this pipeline, we collect LoCoMo, a dataset of very long-term conversations, each encompassing 300 turns and 9K tokens on avg., over up to 35 sessions.

Avg Multi-modal Dialogue Generation +1

Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training

no code implementations4 Mar 2024 David Wan, Jaemin Cho, Elias Stengel-Eskin, Mohit Bansal

Highlighting particularly relevant regions of an image can improve the performance of vision-language models (VLMs) on various vision-language (VL) tasks by guiding the model to attend more closely to these regions of interest.

Math Phrase Grounding +2

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

no code implementations11 Mar 2024 Jialu Li, Jaemin Cho, Yi-Lin Sung, Jaehong Yoon, Mohit Bansal

In this paper, we introduce SELMA: Skill-Specific Expert Learning and Merging with Auto-Generated Data, a novel paradigm to improve the faithfulness of T2I models by fine-tuning models on automatically generated, multi-skill image-text datasets, with skill-specific expert learning and merging.

In-Context Learning

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents

no code implementations18 Mar 2024 Abhay Zala, Jaemin Cho, Han Lin, Jaehong Yoon, Mohit Bansal

Instead of directly employing LLMs as agents, can we use LLMs' reasoning capabilities to adaptively create training environments to help smaller embodied RL agents learn useful skills that they are weak at?

Reinforcement Learning (RL) World Knowledge

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

no code implementations15 Apr 2024 Han Lin, Jaemin Cho, Abhay Zala, Mohit Bansal

Ctrl-Adapter provides diverse capabilities including image control, video control, video control with sparse frames, multi-condition control, compatibility with different backbones, adaptation to unseen control conditions, and video editing.

Image Generation Video Editing +1

Data Factors for Better Compositional Generalization

1 code implementation8 Nov 2023 Xiang Zhou, Yichen Jiang, Mohit Bansal

However, in contrast to this poor performance, state-of-the-art models trained on larger and more general datasets show better generalization ability.

Memorization

Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions

1 code implementation1 Nov 2021 Prateek Yadav, Peter Hase, Mohit Bansal

Current approaches try to optimize for the cost incurred by users when adopting a recourse, but they assume that all users share the same cost function.

Fairness

Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality

1 code implementation28 Nov 2022 Yichen Jiang, Xiang Zhou, Mohit Bansal

Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.

Data Augmentation Inductive Bias +1

Continual Few-Shot Learning for Text Classification

1 code implementation EMNLP 2021 Ramakanth Pasunuru, Veselin Stoyanov, Mohit Bansal

In this work, we propose a continual few-shot learning (CFL) task, in which a system is challenged with a difficult phenomenon and asked to learn to correct mistakes with only a few (10 to 15) training examples.

continual few-shot learning Few-Shot Learning +4

NDH-Full: Learning and Evaluating Navigational Agents on Full-Length Dialogue

1 code implementation EMNLP 2021 Hyounghun Kim, Jialu Li, Mohit Bansal

In this paper, we explore the Navigation from Dialogue History (NDH) task, which is based on the Cooperative Vision-and-Dialogue Navigation (CVDN) dataset, and present a state-of-the-art model which is built upon Vision-Language transformers.

Data Augmentation Dynamic Time Warping +1

HistAlign: Improving Context Dependency in Language Generation by Aligning with History

1 code implementation8 May 2023 David Wan, Shiyue Zhang, Mohit Bansal

Cache-LMs, which augment LMs with a memory of recent history, can increase context dependency and have shown remarkable performance in diverse language generation tasks.

Abstractive Text Summarization Text Generation

Identify, Align, and Integrate: Matching Knowledge Graphs to Commonsense Reasoning Tasks

1 code implementation EACL 2021 Lisa Bauer, Mohit Bansal

For knowledge integration to yield peak performance, it is critical to select a knowledge graph (KG) that is well-aligned with the given task's objective.

Knowledge Graphs

VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

1 code implementation22 Jun 2022 Zhuofan Ying, Peter Hase, Mohit Bansal

In this paper, we show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason (RRR) metrics by optimizing for four key model objectives: (1) accurate predictions given limited but sufficient information (Sufficiency); (2) max-entropy predictions given no important information (Uncertainty); (3) invariance of predictions to changes in unimportant features (Invariance); and (4) alignment between model FI explanations and human FI explanations (Plausibility).

Feature Importance Question Answering +2

Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging?

1 code implementation NAACL 2022 Xiang Zhou, Shiyue Zhang, Mohit Bansal

MPoSM can model arbitrary tag dependency and perform POS induction through the objective of masked POS reconstruction.

POS POS Tagging +1

WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models

1 code implementation25 Jul 2022 Yonatan Bitton, Nitzan Bitton Guetta, Ron Yosef, Yuval Elovici, Mohit Bansal, Gabriel Stanovsky, Roy Schwartz

While vision-and-language models perform well on tasks such as visual question answering, they struggle when it comes to basic human commonsense reasoning skills.

Common Sense Reasoning General Knowledge +4

How can NLP Help Revitalize Endangered Languages? A Case Study and Roadmap for the Cherokee Language

1 code implementation ACL 2022 Shiyue Zhang, Ben Frey, Mohit Bansal

We hope that our work serves not only to inform the NLP community about Cherokee, but also to provide inspiration for future work on endangered languages in general.

Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations

1 code implementation14 Nov 2022 Swarnadeep Saha, Peter Hase, Nazneen Rajani, Mohit Bansal

We observe that (1) GPT-3 explanations are as grammatical as human explanations regardless of the hardness of the test samples, (2) for easy examples, GPT-3 generates highly supportive explanations but human explanations are more generalizable, and (3) for hard examples, human explanations are significantly better than GPT-3 explanations both in terms of label-supportiveness and generalizability judgements.

Automatically Learning Data Augmentation Policies for Dialogue Tasks

1 code implementation IJCNLP 2019 Tong Niu, Mohit Bansal

Automatic data augmentation (AutoAugment) (Cubuk et al., 2019) searches for optimal perturbation policies via a controller trained using performance rewards of a sampled policy on the target task, hence reducing data-level model bias.

Data Augmentation Dialogue Generation +2

multiPRover: Generating Multiple Proofs for Improved Interpretability in Rule Reasoning

1 code implementation NAACL 2021 Swarnadeep Saha, Prateek Yadav, Mohit Bansal

In order to jointly learn from all proof graphs and exploit the correlations between multiple proofs for a question, we pose this task as a set generation problem over structured output spaces where each proof is represented as a directed graph.

Multi-Label Classification

Extending Multi-Document Summarization Evaluation to the Interactive Setting

1 code implementation NAACL 2021 Ori Shapira, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Yael Amsterdamer, Ido Dagan

In this paper, we develop an end-to-end evaluation framework for interactive summarization, focusing on expansion-based interaction, which considers the accumulating information along a user session.

Document Summarization Multi-Document Summarization

CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations

1 code implementation Findings (NAACL) 2022 Jialu Li, Hao Tan, Mohit Bansal

Empirically, on the Room-Across-Room dataset, we show that our multilingual agent gets large improvements in all metrics over the strong baseline model when generalizing to unseen environments with the cross-lingual language representation and the environment-agnostic visual representation.

Navigate Representation Learning +2

SETSum: Summarization and Visualization of Student Evaluations of Teaching

1 code implementation NAACL (ACL) 2022 Yinuo Hu, Shiyue Zhang, Viji Sathy, A. T. Panter, Mohit Bansal

Ten university professors from diverse departments serve as evaluators of the system and all agree that SETSum helps them interpret SET results more efficiently; and 6 out of 10 instructors prefer our system over the standard static PDF report (while the remaining 4 would like to have both).

Aspect Extraction Sentiment Analysis

Exclusive Supermask Subnetwork Training for Continual Learning

1 code implementation18 Oct 2022 Prateek Yadav, Mohit Bansal

Although there is no forgetting, the performance of SupSup is sub-optimal because fixed weights restrict its representational power.

Continual Learning Text Classification +1

The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions

1 code implementation EMNLP 2020 Xiang Zhou, Yixin Nie, Hao Tan, Mohit Bansal

For the first question, we conduct a thorough empirical study over analysis sets and find that in addition to the unstable final performance, the instability exists all along the training curve.

Model Selection Natural Language Inference +1

Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks

1 code implementation30 Sep 2021 Yichen Jiang, Mohit Bansal

Motivated by the failure of a Transformer model on the SCAN compositionality challenge (Lake and Baroni, 2018), which requires parsing a command into actions, we propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics, as additional training supervision.

Inducing Transformer’s Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks

1 code implementation EMNLP 2021 Yichen Jiang, Mohit Bansal

Motivated by the failure of a Transformer model on the SCAN compositionality challenge (Lake and Baroni, 2018), which requires parsing a command into actions, we propose two auxiliary sequence prediction tasks as additional training supervision.

Debiasing Multimodal Models via Causal Information Minimization

1 code implementation28 Nov 2023 Vaidehi Patil, Adyasha Maharana, Mohit Bansal

In this paper, we study bias arising from confounders in a causal graph for multimodal data and examine a novel approach that leverages causally-motivated information minimization to learn the confounder representations.

Visual Question Answering (VQA)

FixMyPose: Pose Correctional Captioning and Retrieval

1 code implementation4 Apr 2021 Hyounghun Kim, Abhay Zala, Graham Burri, Mohit Bansal

During the correctional-captioning task, models must generate descriptions of how to move from the current to target pose image, whereas in the retrieval task, models should select the correct target pose given the initial pose and correctional description.

Pose Retrieval Retrieval

Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information

1 code implementation NAACL 2021 Jialu Li, Hao Tan, Mohit Bansal

One key challenge in this task is to ground instructions with the current visual information that the agent perceives.

Navigate Sentence +1

CAISE: Conversational Agent for Image Search and Editing

1 code implementation24 Feb 2022 Hyounghun Kim, Doo Soon Kim, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Mohit Bansal

To our knowledge, this is the first dataset that provides conversational image search and editing annotations, where the agent holds a grounded conversation with users and helps them to search and edit images according to their requests.

Image Retrieval

DAM: Dynamic Adapter Merging for Continual Video QA Learning

1 code implementation13 Mar 2024 Feng Cheng, Ziyang Wang, Yi-Lin Sung, Yan-Bo Lin, Mohit Bansal, Gedas Bertasius

Our DAM model outperforms prior state-of-the-art continual learning approaches by 9. 1% while exhibiting 1. 9% less forgetting on 6 VidQA datasets spanning various domains.

Continual Learning Image Classification +2

Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences

1 code implementation12 Jun 2015 Hongyuan Mei, Mohit Bansal, Matthew R. Walter

We propose a neural sequence-to-sequence model for direction following, a task that is essential to realizing effective autonomous agents.

Decoder Natural Language Understanding +1

SafeCity: Understanding Diverse Forms of Sexual Harassment Personal Stories

4 code implementations EMNLP 2018 Sweta Karlekar, Mohit Bansal

With the recent rise of #MeToo, an increasing number of personal stories about sexual harassment and sexual abuse have been shared online.

Clustering

Analyzing Compositionality-Sensitivity of NLI Models

1 code implementation16 Nov 2018 Yixin Nie, Yicheng Wang, Mohit Bansal

Therefore, we propose a compositionality-sensitivity testing setup that analyzes models on natural examples from existing datasets that cannot be solved via lexical features alone (i. e., on which a bag-of-words model gives a high probability to one wrong label), hence revealing the models' actual compositionality awareness.

Natural Language Inference

On Curriculum Learning for Commonsense Reasoning

1 code implementation NAACL 2022 Adyasha Maharana, Mohit Bansal

Hence, we examine the effect of a human-like easy-to-difficult curriculum during finetuning of language models for commonsense reasoning tasks.

Learning-To-Rank Natural Language Understanding +1

Evaluating and Improving Factuality in Multimodal Abstractive Summarization

1 code implementation4 Nov 2022 David Wan, Mohit Bansal

Current metrics for evaluating factuality for abstractive document summarization have achieved high correlations with human judgment, but they do not account for the vision modality and thus are not adequate for vision-and-language summarization.

Abstractive Text Summarization Document Summarization

Non-Sequential Graph Script Induction via Multimedia Grounding

1 code implementation27 May 2023 Yu Zhou, Sha Li, Manling Li, Xudong Lin, Shih-Fu Chang, Mohit Bansal, Heng Ji

To automate the induction of such graph scripts for given tasks, we propose to take advantage of loosely aligned videos of people performing the tasks.

Soft Self-Consistency Improves Language Model Agents

1 code implementation20 Feb 2024 Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal

Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers.

Language Modelling valid

Towards Robustifying NLI Models Against Lexical Dataset Biases

1 code implementation ACL 2020 Xiang Zhou, Mohit Bansal

While deep learning models are making fast progress on the task of Natural Language Inference, recent studies have also shown that these models achieve high accuracy by exploiting several dataset biases, and without deep understanding of the language semantics.

Data Augmentation Natural Language Inference

ConjNLI: Natural Language Inference Over Conjunctive Sentences

1 code implementation EMNLP 2020 Swarnadeep Saha, Yixin Nie, Mohit Bansal

Reasoning about conjuncts in conjunctive sentences is important for a deeper understanding of conjunctions in English and also how their usages and semantics differ from conjunctive and disjunctive boolean logic.

Natural Language Inference

ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning

1 code implementation EMNLP 2021 Swarnadeep Saha, Prateek Yadav, Lisa Bauer, Mohit Bansal

Recent commonsense-reasoning tasks are typically discriminative in nature, where a model answers a multiple-choice question for a certain context.

Graph Generation Multiple-choice +1

Proposition-Level Clustering for Multi-Document Summarization

2 code implementations NAACL 2022 Ori Ernst, Avi Caciularu, Ori Shapira, Ramakanth Pasunuru, Mohit Bansal, Jacob Goldberger, Ido Dagan

Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.

Clustering Document Summarization +3

Explanation Graph Generation via Pre-trained Language Models: An Empirical Study with Contrastive Learning

1 code implementation ACL 2022 Swarnadeep Saha, Prateek Yadav, Mohit Bansal

In this work, we study pre-trained language models that generate explanation graphs in an end-to-end manner and analyze their ability to learn the structural constraints and semantics of such graphs.

Contrastive Learning Graph Generation +1

Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings

1 code implementation9 Feb 2024 Yichen Jiang, Xiang Zhou, Mohit Bansal

Transformers generalize to novel compositions of structures and entities after being trained on a complex dataset, but easily overfit on datasets of insufficient complexity.

Machine Translation Quantization +2

Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization

1 code implementation NAACL 2021 Yichen Jiang, Asli Celikyilmaz, Paul Smolensky, Paul Soulos, Sudha Rao, Hamid Palangi, Roland Fernandez, Caitlin Smith, Mohit Bansal, Jianfeng Gao

On several syntactic and semantic probing tasks, we demonstrate the emergent structural information in the role vectors and improved syntactic interpretability in the TPR layer outputs.

Abstractive Text Summarization

Continuous Language Generative Flow

1 code implementation ACL 2021 Zineng Tang, Shiyue Zhang, Hyounghun Kim, Mohit Bansal

Recent years have witnessed various types of generative models for natural language generation (NLG), especially RNNs or transformer based sequence-to-sequence models, as well as variational autoencoder (VAE) and generative adversarial network (GAN) based models.

Data Augmentation Density Estimation +9

Finding a Balanced Degree of Automation for Summary Evaluation

1 code implementation EMNLP 2021 Shiyue Zhang, Mohit Bansal

In this work, we propose flexible semiautomatic to automatic summary evaluation metrics, following the Pyramid human evaluation method.

Natural Language Inference Semantic Role Labeling

Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

1 code implementation2 Mar 2021 Ramakanth Pasunuru, Asli Celikyilmaz, Michel Galley, Chenyan Xiong, Yizhe Zhang, Mohit Bansal, Jianfeng Gao

The progress in Query-focused Multi-Document Summarization (QMDS) has been limited by the lack of sufficient largescale high-quality training datasets.

Data Augmentation Document Summarization +1

When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data

1 code implementation LNLS (ACL) 2022 Peter Hase, Mohit Bansal

In order to carefully control important properties of the data and explanations, we introduce a synthetic dataset for experiments, and we also make use of three existing datasets with explanations: e-SNLI, TACRED, and SemEval.

Retrieval

CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination

1 code implementation NAACL 2022 Hyounghun Kim, Abhay Zala, Mohit Bansal

Next, a counterfactual imagined scene change (in textual form) is applied, and the model has to predict the new response to the initial question based on this scene change.

counterfactual

Faithfulness-Aware Decoding Strategies for Abstractive Summarization

1 code implementation6 Mar 2023 David Wan, Mengwen Liu, Kathleen McKeown, Markus Dreyer, Mohit Bansal

We present a systematic study of the effect of generation techniques such as beam search and nucleus sampling on faithfulness in abstractive summarization.

Abstractive Text Summarization

ReGAL: Refactoring Programs to Discover Generalizable Abstractions

1 code implementation29 Jan 2024 Elias Stengel-Eskin, Archiki Prasad, Mohit Bansal

While large language models (LLMs) are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality.

Date Understanding Program Synthesis

What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment

1 code implementation NAACL 2016 Hongyuan Mei, Mohit Bansal, Matthew R. Walter

We propose an end-to-end, domain-independent neural encoder-aligner-decoder model for selective generation, i. e., the joint task of content selection and surface realization.

Data-to-Text Generation Decoder

Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models

1 code implementation CONLL 2018 Tong Niu, Mohit Bansal

We present two categories of model-agnostic adversarial strategies that reveal the weaknesses of several generative, task-oriented dialogue models: Should-Not-Change strategies that evaluate over-sensitivity to small and semantics-preserving edits, as well as Should-Change strategies that test if a model is over-stable against subtle yet semantics-changing modifications.

Extractive is not Faithful: An Investigation of Broad Unfaithfulness Problems in Extractive Summarization

1 code implementation8 Sep 2022 Shiyue Zhang, David Wan, Mohit Bansal

Though extractive summarization is less prone to the common unfaithfulness issues of abstractive summaries, does that mean extractive is equal to faithful?

Abstractive Text Summarization Extractive Summarization

ManyModalQA: Modality Disambiguation and QA over Diverse Inputs

1 code implementation22 Jan 2020 Darryl Hannan, Akshay Jain, Mohit Bansal

By analyzing this model, we investigate which words in the question are indicative of the modality.

Question Answering Transfer Learning

MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies

1 code implementation26 May 2023 Shiyue Zhang, Shijie Wu, Ozan Irsoy, Steven Lu, Mohit Bansal, Mark Dredze, David Rosenberg

Autoregressive language models are trained by minimizing the cross-entropy of the model distribution Q relative to the data distribution P -- that is, minimizing the forward cross-entropy, which is equivalent to maximum likelihood estimation (MLE).

Merging by Matching Models in Task Parameter Subspaces

1 code implementation7 Dec 2023 Derek Tam, Mohit Bansal, Colin Raffel

Model merging aims to cheaply combine individual task-specific models into a single multitask model.

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

1 code implementation19 Jan 2024 Xiyao Wang, YuHang Zhou, Xiaoyu Liu, Hongjin Lu, Yuancheng Xu, Feihong He, Jaehong Yoon, Taixi Lu, Gedas Bertasius, Mohit Bansal, Huaxiu Yao, Furong Huang

However, current MLLM benchmarks are predominantly designed to evaluate reasoning based on static information about a single image, and the ability of modern MLLMs to extrapolate from image sequences, which is essential for understanding our ever-changing world, has been less investigated.

Language Modelling Large Language Model

Diagnosing the Environment Bias in Vision-and-Language Navigation

1 code implementation6 May 2020 Yubo Zhang, Hao Tan, Mohit Bansal

Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations.

Vision and Language Navigation

The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations

1 code implementation NeurIPS 2021 Peter Hase, Harry Xie, Mohit Bansal

In this paper, we study several under-explored dimensions of FI explanations, providing conceptual and empirical improvements for this form of explanation.

counterfactual Feature Importance +2

DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization

1 code implementation NAACL 2021 Zineng Tang, Jie Lei, Mohit Bansal

Second, to alleviate the temporal misalignment issue, our method incorporates an entropy minimization-based constrained attention loss, to encourage the model to automatically focus on the correct caption from a pool of candidate ASR captions.

Question Answering Retrieval +4

Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models

1 code implementation9 Oct 2023 Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal

An increasing number of vision-language tasks can be handled with little to no training, i. e., in a zero and few-shot manner, by marrying large language models (LLMs) to vision encoders, resulting in large vision-language models (LVLMs).

Language Modelling Question Answering +2

Game-Based Video-Context Dialogue

1 code implementation EMNLP 2018 Ramakanth Pasunuru, Mohit Bansal

Current dialogue systems focus more on textual and speech context knowledge and are usually based on two speakers.

Retrieval

PRover: Proof Generation for Interpretable Reasoning over Rules

2 code implementations EMNLP 2020 Swarnadeep Saha, Sayan Ghosh, Shashank Srivastava, Mohit Bansal

First, PROVER generates proofs with an accuracy of 87%, while retaining or improving performance on the QA task, compared to RuleTakers (up to 6% improvement on zero-shot evaluation).

valid

ChrEn: Cherokee-English Machine Translation for Endangered Language Revitalization

1 code implementation EMNLP 2020 Shiyue Zhang, Benjamin Frey, Mohit Bansal

To help save this endangered language, we introduce ChrEn, a Cherokee-English parallel dataset, to facilitate machine translation research between Cherokee and English.

Cultural Vocal Bursts Intensity Prediction Language Modelling +5

ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback

2 code implementations ACL 2021 Shiyue Zhang, Benjamin Frey, Mohit Bansal

The quantitative evaluation demonstrates that our backbone translation models achieve state-of-the-art translation performance and our quality estimation well correlates with both BLEU and human judgment.

Machine Translation NMT +3

CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion

1 code implementation8 Feb 2024 Shoubin Yu, Jaehong Yoon, Mohit Bansal

Furthermore, we propose a fusion module designed to compress multimodal queries, maintaining computational efficiency in the LLM while combining additional modalities.

Computational Efficiency Optical Flow Estimation +2

Self-Assembling Modular Networks for Interpretable Multi-Hop Reasoning

1 code implementation IJCNLP 2019 Yichen Jiang, Mohit Bansal

Multi-hop QA requires a model to connect multiple pieces of evidence scattered in a long context to answer the question.

D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning

1 code implementation11 Oct 2023 Adyasha Maharana, Prateek Yadav, Mohit Bansal

There are two dominant approaches: (1) geometry-based data selection for maximizing data diversity in the coreset, and (2) functions that assign difficulty scores to samples based on training dynamics.

Multimodal Representation Learning by Alternating Unimodal Adaptation

1 code implementation17 Nov 2023 Xiaohui Zhang, Jaehong Yoon, Mohit Bansal, Huaxiu Yao

This optimization process is controlled by a gradient modification mechanism to prevent the shared head from losing previously acquired information.

Representation Learning

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation

1 code implementation13 Apr 2023 Jaemin Cho, Linjie Li, Zhengyuan Yang, Zhe Gan, Lijuan Wang, Mohit Bansal

In this paper, we propose LayoutBench, a diagnostic benchmark for layout-guided image generation that examines four categories of spatial control skills: number, position, size, and shape.

Layout-to-Image Generation

ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness

1 code implementation21 Apr 2023 Archiki Prasad, Swarnadeep Saha, Xiang Zhou, Mohit Bansal

Multi-step reasoning ability is fundamental to many natural language tasks, yet it is unclear what constitutes a good reasoning chain and how to evaluate them.

Informativeness Natural Language Inference +1

Polite Dialogue Generation Without Parallel Data

1 code implementation TACL 2018 Tong Niu, Mohit Bansal

We present three weakly-supervised models that can generate diverse polite (or rude) dialogue responses without parallel data.

Decoder Dialogue Generation +3

Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees

1 code implementation21 Sep 2022 Swarnadeep Saha, Shiyue Zhang, Peter Hase, Mohit Bansal

We demonstrate that SP-Search effectively represents the generative process behind human summaries using modules that are typically faithful to their intended behavior.

Abstractive Text Summarization Sentence +1

Evaluating the Factual Consistency of Large Language Models Through News Summarization

1 code implementation15 Nov 2022 Derek Tam, Anisha Mascarenhas, Shiyue Zhang, Sarah Kwan, Mohit Bansal, Colin Raffel

To generate summaries that are factually inconsistent, we generate summaries from a suite of summarization models that we have manually annotated as factually inconsistent.

News Summarization

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

1 code implementation22 Nov 2023 Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal

Despite the efficiency of PEFT methods, the size of expert models can make it onerous to retrieve expert models per query over high-latency networks like the Internet or serve multiple experts on a single GPU.

Language Modelling Quantization

Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension

1 code implementation ACL 2019 Yichen Jiang, Nitish Joshi, Yen-Chun Chen, Mohit Bansal

Multi-hop reading comprehension requires the model to explore and connect relevant information from multiple sentences/documents in order to answer the question about the context.

Multi-Hop Reading Comprehension Sentence

Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline

1 code implementation CoNLL (EMNLP) 2021 Ori Ernst, Ori Shapira, Ramakanth Pasunuru, Michael Lepioshkin, Jacob Goldberger, Mohit Bansal, Ido Dagan

Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task, notably for generating training data for salience detection.

Clustering Document Summarization +1

MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models

1 code implementation2 Feb 2024 Justin Chih-Yao Chen, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal

Experiments on seven widely-used commonsense and math reasoning benchmarks show that MAGDi improves the reasoning capabilities of smaller models, outperforming several methods that distill from a single teacher and multiple teachers.

Language Modelling Large Language Model +1

MTVR: Multilingual Moment Retrieval in Videos

1 code implementation ACL 2021 Jie Lei, Tamara L. Berg, Mohit Bansal

We introduce mTVR, a large-scale multilingual video moment retrieval dataset, containing 218K English and Chinese queries from 21. 8K TV show video clips.

Moment Retrieval Retrieval

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

1 code implementation21 Oct 2021 Adyasha Maharana, Mohit Bansal

Prior work in this domain has shown that there is ample room for improvement in the generated image sequence in terms of visual quality, consistency and relevance.

Dense Captioning Image Generation +1

Integrating Visuospatial, Linguistic, and Commonsense Structure into Story Visualization

1 code implementation EMNLP 2021 Adyasha Maharana, Mohit Bansal

Such information is even more important for story visualization since its inputs have an explicit narrative structure that needs to be translated into an image sequence (or visual story).

Dense Captioning Image Generation +1

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

1 code implementation26 Nov 2021 Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer

In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks.

MLP Architectures for Vision-and-Language Modeling: An Empirical Study

1 code implementation8 Dec 2021 Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, Lijuan Wang

Based on this, we ask an even bolder question: can we have an all-MLP architecture for VL modeling, where both VL fusion and the vision encoder are replaced with MLPs?

Language Modelling Visual Question Answering (VQA)

What Can We Learn from Collective Human Opinions on Natural Language Inference Data?

1 code implementation EMNLP 2020 Yixin Nie, Xiang Zhou, Mohit Bansal

Analysis reveals that: (1) high human disagreement exists in a noticeable amount of examples in these datasets; (2) the state-of-the-art models lack the ability to recover the distribution over human labels; (3) models achieve near-perfect accuracy on the subset of data with a high level of human agreement, whereas they can barely beat a random guess on the data with low levels of human agreement, which compose most of the common errors made by state-of-the-art models on the evaluation sets.

Natural Language Inference

Distributed NLI: Learning to Predict Human Opinion Distributions for Language Reasoning

1 code implementation Findings (ACL) 2022 Xiang Zhou, Yixin Nie, Mohit Bansal

We introduce distributed NLI, a new NLU task with a goal to predict the distribution of human judgements for natural language inference.

Natural Language Inference

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

1 code implementation NAACL 2021 Adyasha Maharana, Darryl Hannan, Mohit Bansal

Therefore, we also provide an exploration of evaluation metrics for the model, focused on aspects of the generated frames such as the presence/quality of generated characters, the relevance to captions, and the diversity of the generated images.

Image Generation Story Visualization +1

Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks

1 code implementation29 Sep 2023 Vaidehi Patil, Peter Hase, Mohit Bansal

Experimentally, we show that even state-of-the-art model editing methods such as ROME struggle to truly delete factual information from models like GPT-J, as our whitebox and blackbox attacks can recover "deleted" information from an edited model 38% of the time.

Model Editing

EnvEdit: Environment Editing for Vision-and-Language Navigation

1 code implementation CVPR 2022 Jialu Li, Hao Tan, Mohit Bansal

Training on these edit-augmented environments prevents the agent from overfitting to existing environments and helps generalize better to new, unseen environments.

Ranked #2 on Vision and Language Navigation on RxR (using extra training data)

Data Augmentation Navigate +1

Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention

1 code implementation21 Nov 2022 Zineng Tang, Jaemin Cho, Jie Lei, Mohit Bansal

We present Perceiver-VL, a vision-and-language framework that efficiently handles high-dimensional multimodal inputs such as long videos and text.

Cross-Modal Retrieval Language Modelling +1

Punny Captions: Witty Wordplay in Image Descriptions

1 code implementation NAACL 2018 Arjun Chandrasekaran, Devi Parikh, Mohit Bansal

Wit is a form of rich interaction that is often grounded in a specific situation (e. g., a comment in response to an event).

Decoder

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

2 code implementations CVPR 2017 Licheng Yu, Hao Tan, Mohit Bansal, Tamara L. Berg

The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions.

Referring Expression Referring Expression Comprehension

An Empirical Study of Multimodal Model Merging

1 code implementation28 Apr 2023 Yi-Lin Sung, Linjie Li, Kevin Lin, Zhe Gan, Mohit Bansal, Lijuan Wang

In this paper, we expand on this concept to a multimodal setup by merging transformers trained on different modalities.

Retrieval Visual Question Answering (VQA)

Cannot find the paper you are looking for? You can Submit a new open access paper.