Search Results for author: Mohit Bansal

Found 156 papers, 90 papers with code

An Overview of Uncertainty Calibration for Text Classification and the Role of Distillation

no code implementations ACL (RepL4NLP) 2021 Han Guo, Ramakanth Pasunuru, Mohit Bansal

Many recalibration methods have been proposed in the literature for quantifying predictive uncertainty and calibrating model outputs, with varying degrees of complexity.

Text Classification

Integrating Visuospatial, Linguistic, and Commonsense Structure into Story Visualization

1 code implementation EMNLP 2021 Adyasha Maharana, Mohit Bansal

Such information is even more important for story visualization since its inputs have an explicit narrative structure that needs to be translated into an image sequence (or visual story).

Fine-tuning Image Generation +1

NDH-Full: Learning and Evaluating Navigational Agents on Full-Length Dialogue

1 code implementation EMNLP 2021 Hyounghun Kim, Jialu Li, Mohit Bansal

In this paper, we explore the Navigation from Dialogue History (NDH) task, which is based on the Cooperative Vision-and-Dialogue Navigation (CVDN) dataset, and present a state-of-the-art model which is built upon Vision-Language transformers.

Curriculum Learning Data Augmentation +1

Inducing Transformer’s Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks

1 code implementation EMNLP 2021 Yichen Jiang, Mohit Bansal

Motivated by the failure of a Transformer model on the SCAN compositionality challenge (Lake and Baroni, 2018), which requires parsing a command into actions, we propose two auxiliary sequence prediction tasks as additional training supervision.

Continual Few-Shot Learning for Text Classification

1 code implementation EMNLP 2021 Ramakanth Pasunuru, Veselin Stoyanov, Mohit Bansal

In this work, we propose a continual few-shot learning (CFL) task, in which a system is challenged with a difficult phenomenon and asked to learn to correct mistakes with only a few (10 to 15) training examples.

Classification Few-Shot Learning +3

Learning and Analyzing Generation Order for Undirected Sequence Models

no code implementations Findings (EMNLP) 2021 Yichen Jiang, Mohit Bansal

In this work, we train a policy that learns the generation order for a pre-trained, undirected translation model via reinforcement learning.

Machine Translation Translation

Detecting Moments and Highlights in Videos via Natural Language Queries

1 code implementation NeurIPS 2021 Jie Lei, Tamara Berg, Mohit Bansal

Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w. r. t.

Moment Retrieval

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

1 code implementation26 Nov 2021 Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer

In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks.

Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions

1 code implementation1 Nov 2021 Prateek Yadav, Peter Hase, Mohit Bansal

We propose an objective function, Expected Minimum Cost (EMC), based on two key ideas: (1) when presenting a set of options to a user, it is vital that there is at least one low-cost solution the user could adopt; (2) when we do not know the user's true cost function, we can approximately optimize for user satisfaction by first sampling plausible cost functions, then finding a set that achieves a good cost for the user in expectation.

Fairness

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

1 code implementation21 Oct 2021 Adyasha Maharana, Mohit Bansal

Prior work in this domain has shown that there is ample room for improvement in the generated image sequence in terms of visual quality, consistency and relevance.

Fine-tuning Image Generation +1

Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks

1 code implementation30 Sep 2021 Yichen Jiang, Mohit Bansal

Motivated by the failure of a Transformer model on the SCAN compositionality challenge (Lake and Baroni, 2018), which requires parsing a command into actions, we propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics, as additional training supervision.

Finding a Balanced Degree of Automation for Summary Evaluation

1 code implementation EMNLP 2021 Shiyue Zhang, Mohit Bansal

In this work, we propose flexible semiautomatic to automatic summary evaluation metrics, following the Pyramid human evaluation method.

Natural Language Inference Semantic Role Labeling

Continuous Language Generative Flow

1 code implementation ACL 2021 Zineng Tang, Shiyue Zhang, Hyounghun Kim, Mohit Bansal

Recent years have witnessed various types of generative models for natural language generation (NLG), especially RNNs or transformer based sequence-to-sequence models, as well as variational autoencoder (VAE) and generative adversarial network (GAN) based models.

Data Augmentation Density Estimation +6

MTVR: Multilingual Moment Retrieval in Videos

1 code implementation ACL 2021 Jie Lei, Tamara L. Berg, Mohit Bansal

We introduce mTVR, a large-scale multilingual video moment retrieval dataset, containing 218K English and Chinese queries from 21. 8K TV show video clips.

Moment Retrieval

EmailSum: Abstractive Email Thread Summarization

1 code implementation ACL 2021 Shiyue Zhang, Asli Celikyilmaz, Jianfeng Gao, Mohit Bansal

Furthermore, we find that widely used automatic evaluation metrics (ROUGE, BERTScore) are weakly correlated with human judgments on this email thread summarization task.

Abstractive Text Summarization Email Thread Summarization

ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback

2 code implementations ACL 2021 Shiyue Zhang, Benjamin Frey, Mohit Bansal

The quantitative evaluation demonstrates that our backbone translation models achieve state-of-the-art translation performance and our quality estimation well correlates with both BLEU and human judgment.

Machine Translation Translation +1

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

1 code implementation20 Jul 2021 Jie Lei, Tamara L. Berg, Mohit Bansal

Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w. r. t.

Moment Retrieval

How Much Can CLIP Benefit Vision-and-Language Tasks?

2 code implementations13 Jul 2021 Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, Kurt Keutzer

Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using a relatively small set of manually-annotated data (as compared to web-crawled data), to perceive the visual world.

Ranked #2 on Visual Entailment on SNLI-VE val (using extra training data)

Fine-tuning Question Answering +2

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer

1 code implementation NeurIPS 2021 Zineng Tang, Jaemin Cho, Hao Tan, Mohit Bansal

We train a multi-modal teacher model on a video-text dataset, and then transfer its knowledge to a student language model with a text dataset.

Image Retrieval Knowledge Distillation +5

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

1 code implementation21 Jun 2021 Hao Tan, Jie Lei, Thomas Wolf, Mohit Bansal

Unlike language, where the text tokens are more independent, neighboring video tokens typically have strong correlations (e. g., consecutive video frames usually look very similar), and hence uniformly masking individual tokens will make the task too trivial to learn useful representations.

Action Classification Action Recognition +2

An Empirical Survey of Data Augmentation for Limited Data Learning in NLP

no code implementations14 Jun 2021 Jiaao Chen, Derek Tam, Colin Raffel, Mohit Bansal, Diyi Yang

NLP has achieved great progress in the past decade through the use of neural models and large labeled datasets.

Data Augmentation News Classification

multiPRover: Generating Multiple Proofs for Improved Interpretability in Rule Reasoning

1 code implementation NAACL 2021 Swarnadeep Saha, Prateek Yadav, Mohit Bansal

In order to jointly learn from all proof graphs and exploit the correlations between multiple proofs for a question, we pose this task as a set generation problem over structured output spaces where each proof is represented as a directed graph.

Multi-Label Classification

Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization

1 code implementation NAACL 2021 Yichen Jiang, Asli Celikyilmaz, Paul Smolensky, Paul Soulos, Sudha Rao, Hamid Palangi, Roland Fernandez, Caitlin Smith, Mohit Bansal, Jianfeng Gao

On several syntactic and semantic probing tasks, we demonstrate the emergent structural information in the role vectors and improved syntactic interpretability in the TPR layer outputs.

Abstractive Text Summarization

The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations

1 code implementation NeurIPS 2021 Peter Hase, Harry Xie, Mohit Bansal

In this paper, we study several under-explored dimensions of FI explanations, providing conceptual and empirical improvements for this form of explanation.

Feature Importance Text Classification

DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization

1 code implementation NAACL 2021 Zineng Tang, Jie Lei, Mohit Bansal

Second, to alleviate the temporal misalignment issue, our method incorporates an entropy minimization-based constrained attention loss, to encourage the model to automatically focus on the correct caption from a pool of candidate ASR captions.

Question Answering Video Captioning +2

Extending Multi-Document Summarization Evaluation to the Interactive Setting

no code implementations NAACL 2021 Ori Shapira, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Yael Amsterdamer, Ido Dagan

In this paper, we develop an end-to-end evaluation framework for interactive summarization, focusing on expansion-based interaction, which considers the accumulating information along a user session.

Document Summarization Multi-Document Summarization

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

1 code implementation NAACL 2021 Adyasha Maharana, Darryl Hannan, Mohit Bansal

Therefore, we also provide an exploration of evaluation metrics for the model, focused on aspects of the generated frames such as the presence/quality of generated characters, the relevance to captions, and the diversity of the generated images.

Image Generation Story Visualization +1

Identify, Align, and Integrate: Matching Knowledge Graphs to Commonsense Reasoning Tasks

1 code implementation EACL 2021 Lisa Bauer, Mohit Bansal

For knowledge integration to yield peak performance, it is critical to select a knowledge graph (KG) that is well-aligned with the given task's objective.

Knowledge Graphs

Hidden Biases in Unreliable News Detection Datasets

no code implementations EACL 2021 Xiang Zhou, Heba Elfardy, Christos Christodoulopoulos, Thomas Butler, Mohit Bansal

Using the observations and experimental results, we provide practical suggestions on how to create more reliable datasets for the unreliable news detection task.

Fact Checking Selection bias

Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information

1 code implementation NAACL 2021 Jialu Li, Hao Tan, Mohit Bansal

One key challenge in this task is to ground instructions with the current visual information that the agent perceives.

Vision-Language Navigation

Distributed NLI: Learning to Predict Human Opinion Distributions for Language Reasoning

no code implementations18 Apr 2021 Xiang Zhou, Yixin Nie, Mohit Bansal

We show that MC Dropout is able to achieve decent performance without any distribution annotations while Re-Calibration can further give substantial improvements when extra distribution annotations are provided, suggesting the value of multiple annotations for the example in modeling the distribution of human judgements.

Natural Language Inference

ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning

1 code implementation EMNLP 2021 Swarnadeep Saha, Prateek Yadav, Lisa Bauer, Mohit Bansal

Recent commonsense-reasoning tasks are typically discriminative in nature, where a model answers a multiple-choice question for a certain context.

Graph Generation Text Generation

FixMyPose: Pose Correctional Captioning and Retrieval

1 code implementation4 Apr 2021 Hyounghun Kim, Abhay Zala, Graham Burri, Mohit Bansal

During the correctional-captioning task, models must generate descriptions of how to move from the current to target pose image, whereas in the retrieval task, models should select the correct target pose given the initial pose and correctional description.

Pose Retrieval

Dual Reinforcement-Based Specification Generation for Image De-Rendering

no code implementations2 Mar 2021 Ramakanth Pasunuru, David Rosenberg, Gideon Mann, Mohit Bansal

Since these are sequence models, we must choose an ordering of the objects in the graphics programs for likelihood training.

Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

1 code implementation2 Mar 2021 Ramakanth Pasunuru, Asli Celikyilmaz, Michel Galley, Chenyan Xiong, Yizhe Zhang, Mohit Bansal, Jianfeng Gao

The progress in Query-focused Multi-Document Summarization (QMDS) has been limited by the lack of sufficient largescale high-quality training datasets.

Data Augmentation Document Summarization +1

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

1 code implementation CVPR 2021 Jie Lei, Linjie Li, Luowei Zhou, Zhe Gan, Tamara L. Berg, Mohit Bansal, Jingjing Liu

Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that ClipBERT outperforms (or is on par with) existing methods that exploit full-length videos, suggesting that end-to-end learning with just a few sparsely sampled clips is often more accurate than using densely extracted offline features from full-length videos, proving the proverbial less-is-more principle.

Ranked #2 on Visual Question Answering on MSRVTT-QA (using extra training data)

Question Answering Video Question Answering +2

Unifying Vision-and-Language Tasks via Text Generation

1 code implementation4 Feb 2021 Jaemin Cho, Jie Lei, Hao Tan, Mohit Bansal

On 7 popular vision-and-language benchmarks, including visual question answering, referring expression comprehension, visual commonsense reasoning, most of which have been previously modeled as discriminative tasks, our generative approach (with a single unified architecture) reaches comparable performance to recent task-specific state-of-the-art vision-and-language models.

Conditional Text Generation Image Captioning +6

When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data

1 code implementation3 Feb 2021 Peter Hase, Mohit Bansal

In order to carefully control important properties of the data and explanations, we introduce a synthetic dataset for experiments, and we also make use of three existing datasets with explanations: e-SNLI, TACRED, and SemEval.

Robustness Gym: Unifying the NLP Evaluation Landscape

2 code implementations NAACL 2021 Karan Goel, Nazneen Rajani, Jesse Vig, Samson Tan, Jason Wu, Stephan Zheng, Caiming Xiong, Mohit Bansal, Christopher Ré

Despite impressive performance on standard benchmarks, deep neural networks are often brittle when deployed in real-world systems.

Entity Linking

I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling

no code implementations ACL 2021 Yixin Nie, Mary Williamson, Mohit Bansal, Douwe Kiela, Jason Weston

To quantify how well natural language understanding models can capture consistency in a general conversation, we introduce the DialoguE COntradiction DEtection task (DECODE) and a new conversational dataset containing both human-human and human-bot contradictory dialogues.

Language understanding Natural Language Understanding

To what extent do human explanations of model behavior align with actual model behavior?

no code implementations EMNLP (BlackboxNLP) 2021 Grusha Prasad, Yixin Nie, Mohit Bansal, Robin Jia, Douwe Kiela, Adina Williams

Given the increasingly prominent role NLP models (will) play in our lives, it is important for human expectations of model behavior to align with actual model behavior.

Natural Language Inference

ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments

no code implementations Findings of the Association for Computational Linguistics 2020 Hyounghun Kim, Abhay Zala, Graham Burri, Hao Tan, Mohit Bansal

During this task, the agent (similar to a PokeMON GO player) is asked to find and collect different target objects one-by-one by navigating based on natural language instructions in a complex, realistic outdoor environment, but then also ARRAnge the collected objects part-by-part in an egocentric grid-layout environment.

Referring Expression Comprehension Vision and Language Navigation

DORB: Dynamically Optimizing Multiple Rewards with Bandits

no code implementations EMNLP 2020 Ramakanth Pasunuru, Han Guo, Mohit Bansal

Further, it is important to consider using a dynamic combination and curriculum of metric rewards that flexibly changes over time.

Data-to-Text Generation Question Generation

ConjNLI: Natural Language Inference Over Conjunctive Sentences

1 code implementation EMNLP 2020 Swarnadeep Saha, Yixin Nie, Mohit Bansal

Reasoning about conjuncts in conjunctive sentences is important for a deeper understanding of conjunctions in English and also how their usages and semantics differ from conjunctive and disjunctive boolean logic.

Fine-tuning Natural Language Inference

What is More Likely to Happen Next? Video-and-Language Future Event Prediction

1 code implementation EMNLP 2020 Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal

Given a video with aligned dialogue, people can often infer what is more likely to happen next.

Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision

1 code implementation EMNLP 2020 Hao Tan, Mohit Bansal

We find that the main reason hindering this exploration is the large divergence in magnitude and distributions between the visually-grounded language datasets and pure-language corpora.

Image Captioning Language Modelling +1

ChrEn: Cherokee-English Machine Translation for Endangered Language Revitalization

1 code implementation EMNLP 2020 Shiyue Zhang, Benjamin Frey, Mohit Bansal

To help save this endangered language, we introduce ChrEn, a Cherokee-English parallel dataset, to facilitate machine translation research between Cherokee and English.

Language Modelling Machine Translation +2

What Can We Learn from Collective Human Opinions on Natural Language Inference Data?

1 code implementation EMNLP 2020 Yixin Nie, Xiang Zhou, Mohit Bansal

Analysis reveals that: (1) high human disagreement exists in a noticeable amount of examples in these datasets; (2) the state-of-the-art models lack the ability to recover the distribution over human labels; (3) models achieve near-perfect accuracy on the subset of data with a high level of human agreement, whereas they can barely beat a random guess on the data with low levels of human agreement, which compose most of the common errors made by state-of-the-art models on the evaluation sets.

Natural Language Inference

PRover: Proof Generation for Interpretable Reasoning over Rules

2 code implementations EMNLP 2020 Swarnadeep Saha, Sayan Ghosh, Shashank Srivastava, Mohit Bansal

First, PROVER generates proofs with an accuracy of 87%, while retaining or improving performance on the QA task, compared to RuleTakers (up to 6% improvement on zero-shot evaluation).

Evaluating Interactive Summarization: an Expansion-Based Framework

1 code implementation17 Sep 2020 Ori Shapira, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Yael Amsterdamer, Ido Dagan

Allowing users to interact with multi-document summarizers is a promising direction towards improving and customizing summary results.

Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline

1 code implementation CoNLL (EMNLP) 2021 Ori Ernst, Ori Shapira, Ramakanth Pasunuru, Michael Lepioshkin, Jacob Goldberger, Mohit Bansal, Ido Dagan

Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task, notably for generating training data for salience detection.

Document Summarization Multi-Document Summarization

Simple Compounded-Label Training for Fact Extraction and Verification

no code implementations WS 2020 Yixin Nie, Lisa Bauer, Mohit Bansal

Automatic fact checking is an important task motivated by the need for detecting and preventing the spread of misinformation across the web.

Fact Checking Misinformation +1

Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA

1 code implementation ACL 2020 Hyounghun Kim, Zineng Tang, Mohit Bansal

Moreover, our model is also comprised of dual-level attention (word/object and frame level), multi-head self/cross-integration for different sources (video and dense captions), and gates which pass more relevant information to the classifier.

Image Captioning Multi-Label Classification +3

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

1 code implementation ACL 2020 Jie Lei, Li-Wei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, Mohit Bansal

Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discourse-based coherence across the sentences in the paragraph.

Towards Robustifying NLI Models Against Lexical Dataset Biases

1 code implementation ACL 2020 Xiang Zhou, Mohit Bansal

While deep learning models are making fast progress on the task of Natural Language Inference, recent studies have also shown that these models achieve high accuracy by exploiting several dataset biases, and without deep understanding of the language semantics.

Data Augmentation Natural Language Inference

Diagnosing the Environment Bias in Vision-and-Language Navigation

1 code implementation6 May 2020 Yubo Zhang, Hao Tan, Mohit Bansal

Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations.

Vision and Language Navigation

Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

1 code implementation ACL 2020 Peter Hase, Mohit Bansal

Through two kinds of simulation tests involving text and tabular data, we evaluate five explanations methods: (1) LIME, (2) Anchor, (3) Decision Boundary, (4) a Prototype model, and (5) a Composite approach that combines explanations from each method.

The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions

1 code implementation EMNLP 2020 Xiang Zhou, Yixin Nie, Hao Tan, Mohit Bansal

For the first question, we conduct a thorough empirical study over analysis sets and find that in addition to the unstable final performance, the instability exists all along the training curve.

Model Selection Natural Language Inference +1

Adversarial Augmentation Policy Search for Domain and Cross-Lingual Generalization in Reading Comprehension

1 code implementation Findings of the Association for Computational Linguistics 2020 Adyasha Maharana, Mohit Bansal

In this work, we present several effective adversaries and automated data augmentation policy search methods with the goal of making reading comprehension models more robust to adversarial evaluation, but also improving generalization to the source domain as well as new domains and languages.

Data Augmentation Reading Comprehension

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

2 code implementations ECCV 2020 Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal

The queries are also labeled with query types that indicate whether each of them is more related to video or subtitle or both, allowing for in-depth analysis of the dataset and the methods that built on top of it.

Moment Retrieval Video Corpus Moment Retrieval +1

ManyModalQA: Modality Disambiguation and QA over Diverse Inputs

1 code implementation22 Jan 2020 Darryl Hannan, Akshay Jain, Mohit Bansal

By analyzing this model, we investigate which words in the question are indicative of the modality.

Fine-tuning Question Answering +1

Modality-Balanced Models for Visual Dialogue

no code implementations17 Jan 2020 Hyounghun Kim, Hao Tan, Mohit Bansal

The Visual Dialog task requires a model to exploit both image and conversational context information to generate the next response to the dialogue.

Visual Dialog

AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses

no code implementations15 Jan 2020 Tong Niu, Mohit Bansal

In our work, we build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.

Feature Engineering Fine-tuning

Multi-Source Domain Adaptation for Text Classification via DistanceNet-Bandits

no code implementations13 Jan 2020 Han Guo, Ramakanth Pasunuru, Mohit Bansal

Next, we develop a DistanceNet model which uses these distance measures, or a mixture of these distance measures, as an additional loss function to be minimized jointly with the task's loss function, so as to achieve better unsupervised domain adaptation.

Classification General Classification +3

Automatically Learning Data Augmentation Policies for Dialogue Tasks

1 code implementation IJCNLP 2019 Tong Niu, Mohit Bansal

Automatic data augmentation (AutoAugment) (Cubuk et al., 2019) searches for optimal perturbation policies via a controller trained using performance rewards of a sampled policy on the target task, hence reducing data-level model bias.

Data Augmentation Dialogue Generation +1

Revealing the Importance of Semantic Retrieval for Machine Reading at Scale

2 code implementations IJCNLP 2019 Yixin Nie, Songhe Wang, Mohit Bansal

In this work, we give general guidelines on system design for MRS by proposing a simple yet effective pipeline system with special consideration on hierarchical semantic retrieval at both paragraph and sentence level, and their potential effects on the downstream task.

Fact Verification Information Retrieval +3

Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering

1 code implementation IJCNLP 2019 Shiyue Zhang, Mohit Bansal

Second, since the traditional evaluation metrics (e. g., BLEU) often fall short in evaluating the quality of generated questions, we propose a QA-based evaluation method which measures the QG model's ability to mimic human annotators in generating QA training data.

Question Answering Question Generation

Self-Assembling Modular Networks for Interpretable Multi-Hop Reasoning

1 code implementation IJCNLP 2019 Yichen Jiang, Mohit Bansal

Multi-hop QA requires a model to connect multiple pieces of evidence scattered in a long context to answer the question.

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

6 code implementations IJCNLP 2019 Hao Tan, Mohit Bansal

In LXMERT, we build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language encoder, and a cross-modality encoder.

Fine-tuning Language Modelling +3

Expressing Visual Relationships via Language

1 code implementation ACL 2019 Hao Tan, Franck Dernoncourt, Zhe Lin, Trung Bui, Mohit Bansal

To push forward the research in this direction, we first introduce a new language-guided image editing dataset that contains a large number of real image pairs with corresponding editing instructions.

Image Captioning

Improving Visual Question Answering by Referring to Generated Paragraph Captions

no code implementations ACL 2019 Hyounghun Kim, Mohit Bansal

These paragraph captions can hence contain substantial information of the image for tasks such as visual question answering.

Image Captioning Question Answering +1

Continual and Multi-Task Architecture Search

1 code implementation ACL 2019 Ramakanth Pasunuru, Mohit Bansal

Architecture search is the process of automatically learning the neural model or cell structure that best suits the given task.

Continual Learning General Classification +5

Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension

1 code implementation ACL 2019 Yichen Jiang, Nitish Joshi, Yen-Chun Chen, Mohit Bansal

Multi-hop reading comprehension requires the model to explore and connect relevant information from multiple sentences/documents in order to answer the question about the context.

Multi-Hop Reading Comprehension

PaperRobot: Incremental Draft Generation of Scientific Ideas

2 code implementations ACL 2019 Qingyun Wang, Lifu Huang, Zhiying Jiang, Kevin Knight, Heng Ji, Mohit Bansal, Yi Luan

We present a PaperRobot who performs as an automatic research assistant by (1) conducting deep understanding of a large collection of human-written papers in a target domain and constructing comprehensive background knowledge graphs (KGs); (2) creating new ideas by predicting links from the background KGs, by combining graph attention and contextual text attention; (3) incrementally writing some key elements of a new paper based on memory-attention networks: from the input title along with predicted related entities to generate a paper abstract, from the abstract to generate conclusion and future work, and finally from future work to generate a title for a follow-on paper.

Graph Attention Knowledge Graphs +4

Enabling Robots to Understand Incomplete Natural Language Instructions Using Commonsense Reasoning

no code implementations29 Apr 2019 Haonan Chen, Hao Tan, Alan Kuntz, Mohit Bansal, Ron Alterovitz

Our results show the feasibility of a robot learning commonsense knowledge automatically from web-based textual corpora, and the power of learned commonsense reasoning models in enabling a robot to autonomously perform tasks based on incomplete natural language instructions.

Common Sense Reasoning Language Modelling

TVQA+: Spatio-Temporal Grounding for Video Question Answering

3 code implementations ACL 2020 Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal

We present the task of Spatio-Temporal Video Question Answering, which requires intelligent systems to simultaneously retrieve relevant moments and detect referenced visual concepts (people and objects) to answer natural language questions about videos.

Question Answering Video Question Answering

Multi-Target Embodied Question Answering

1 code implementation CVPR 2019 Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra

To address this, we propose a modular architecture composed of a program generator, a controller, a navigator, and a VQA module.

Embodied Question Answering Question Answering

Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout

1 code implementation NAACL 2019 Hao Tan, Licheng Yu, Mohit Bansal

Next, we apply semi-supervised learning (via back-translation) on these dropped-out environments to generate new paths and instructions.

Fine-tuning Translation +1

AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning

no code implementations NAACL 2019 Han Guo, Ramakanth Pasunuru, Mohit Bansal

To address these issues, we present AutoSeM, a two-stage MTL pipeline, where the first stage automatically selects the most useful auxiliary tasks via a Beta-Bernoulli multi-armed bandit with Thompson Sampling, and the second stage learns the training mixing ratio of these selected auxiliary tasks via a Gaussian Process based Bayesian optimization framework.

Language understanding Multi-Task Learning

Analyzing Compositionality-Sensitivity of NLI Models

1 code implementation16 Nov 2018 Yixin Nie, Yicheng Wang, Mohit Bansal

Therefore, we propose a compositionality-sensitivity testing setup that analyzes models on natural examples from existing datasets that cannot be solved via lexical features alone (i. e., on which a bag-of-words model gives a high probability to one wrong label), hence revealing the models' actual compositionality awareness.

Natural Language Inference

Combining Fact Extraction and Verification with Neural Semantic Matching Networks

2 code implementations16 Nov 2018 Yixin Nie, Haonan Chen, Mohit Bansal

The increasing concern with misinformation has stimulated research efforts on automatic fact checking.

Fact Checking Fact Verification +2

Incorporating Background Knowledge into Video Description Generation

no code implementations EMNLP 2018 Spencer Whitehead, Heng Ji, Mohit Bansal, Shih-Fu Chang, Clare Voss

We develop an approach that uses video meta-data to retrieve topically related news documents for a video and extracts the events and named entities from these documents.

Text Generation Video Captioning +1

Commonsense for Generative Multi-Hop Question Answering Tasks

2 code implementations EMNLP 2018 Lisa Bauer, Yicheng Wang, Mohit Bansal

We instead focus on a more challenging multi-hop generative task (NarrativeQA), which requires the model to reason, gather, and synthesize disjoint pieces of information within the context to generate an answer.

Multi-hop Question Answering Question Answering +1

SafeCity: Understanding Diverse Forms of Sexual Harassment Personal Stories

4 code implementations EMNLP 2018 Sweta Karlekar, Mohit Bansal

With the recent rise of #MeToo, an increasing number of personal stories about sexual harassment and sexual abuse have been shared online.

Closed-Book Training to Improve Summarization Encoder Memory

no code implementations EMNLP 2018 Yichen Jiang, Mohit Bansal

A good neural sequence-to-sequence summarization model should have a strong encoder that can distill and memorize the important information from long input texts so that the decoder can generate salient summaries based on the encoder's memory.

Abstractive Text Summarization

Game-Based Video-Context Dialogue

1 code implementation EMNLP 2018 Ramakanth Pasunuru, Mohit Bansal

Current dialogue systems focus more on textual and speech context knowledge and are usually based on two speakers.

Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models

1 code implementation CONLL 2018 Tong Niu, Mohit Bansal

We present two categories of model-agnostic adversarial strategies that reveal the weaknesses of several generative, task-oriented dialogue models: Should-Not-Change strategies that evaluate over-sensitivity to small and semantics-preserving edits, as well as Should-Change strategies that test if a model is over-stable against subtle yet semantics-changing modifications.

Dynamic Multi-Level Multi-Task Learning for Sentence Simplification

no code implementations COLING 2018 Han Guo, Ramakanth Pasunuru, Mohit Bansal

In this work, we first present a strong pointer-copy mechanism based sequence-to-sequence sentence simplification model, and then improve its entailment and paraphrasing capabilities via multi-task learning with related auxiliary tasks of entailment and paraphrase generation.

Multi-Task Learning Paraphrase Generation +1

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

2 code implementations ACL 2018 Yen-Chun Chen, Mohit Bansal

Inspired by how humans summarize long documents, we propose an accurate and fast summarization model that first selects salient sentences and then rewrites them abstractively (i. e., compresses and paraphrases) to generate a concise overall summary.

Abstractive Text Summarization Sentence ReWriting

Polite Dialogue Generation Without Parallel Data

1 code implementation TACL 2018 Tong Niu, Mohit Bansal

We present three weakly-supervised models that can generate diverse polite (or rude) dialogue responses without parallel data.

Dialogue Generation Fine-tuning +1

Object Ordering with Bidirectional Matchings for Visual Reasoning

no code implementations NAACL 2018 Hao Tan, Mohit Bansal

Visual reasoning with compositional natural language instructions, e. g., based on the newly-released Cornell Natural Language Visual Reasoning (NLVR) dataset, is a challenging task, where the model needs to have the ability to create an accurate mapping between the diverse phrases and the several objects placed in complex arrangements in the image.

Visual Reasoning

Robust Machine Comprehension Models via Adversarial Training

no code implementations NAACL 2018 Yicheng Wang, Mohit Bansal

It is shown that many published models for the Stanford Question Answering Dataset (Rajpurkar et al., 2016) lack robustness, suffering an over 50% decrease in F1 score during adversarial evaluation based on the AddSent (Jia and Liang, 2017) algorithm.

Data Augmentation Question Answering +1

Detecting Linguistic Characteristics of Alzheimer's Dementia by Interpreting Neural Models

no code implementations NAACL 2018 Sweta Karlekar, Tong Niu, Mohit Bansal

More importantly, we next interpret what these neural models have learned about the linguistic characteristics of AD patients, via analysis based on activation clustering and first-derivative saliency techniques.

Multi-Reward Reinforced Summarization with Saliency and Entailment

no code implementations NAACL 2018 Ramakanth Pasunuru, Mohit Bansal

Abstractive text summarization is the task of compressing and rewriting a long document into a short summary while maintaining saliency, directed logical entailment, and non-redundancy.

Abstractive Text Summarization

Towards Improving Abstractive Summarization via Entailment Generation

no code implementations WS 2017 Ramakanth Pasunuru, Han Guo, Mohit Bansal

Abstractive summarization, the task of rewriting and compressing a document into a short summary, has achieved considerable success with neural sequence-to-sequence models.

Abstractive Text Summarization Machine Translation +2

Hierarchically-Attentive RNN for Album Summarization and Storytelling

no code implementations EMNLP 2017 Licheng Yu, Mohit Bansal, Tamara L. Berg

For this task, we make use of the Visual Storytelling dataset and a model composed of three hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album photos, select representative (summary) photos, and compose the story.

Visual Storytelling

Reinforced Video Captioning with Entailment Rewards

no code implementations EMNLP 2017 Ramakanth Pasunuru, Mohit Bansal

Sequence-to-sequence models have shown promising improvements on the temporal task of video captioning, but they optimize word-level cross-entropy loss during training.

Video Captioning

Video Highlight Prediction Using Audience Chat Reactions

no code implementations EMNLP 2017 Cheng-Yang Fu, Joon Lee, Mohit Bansal, Alexander C. Berg

Sports channel video portals offer an exciting domain for research on multimodal, multilingual analysis.

League of Legends

Source-Target Inference Models for Spatial Instruction Understanding

no code implementations12 Jul 2017 Hao Tan, Mohit Bansal

Models that can execute natural language instructions for situated robotic tasks such as assembly and navigation have several useful applications in homes, offices, and remote scenarios.

Representation Learning

Efficient Generation of Motion Plans from Attribute-Based Natural Language Instructions Using Dynamic Constraint Mapping

no code implementations8 Jul 2017 Jae Sung Park, Biao Jia, Mohit Bansal, Dinesh Manocha

We generate a factor graph from natural language instructions called the Dynamic Grounding Graph (DGG), which takes latent parameters into account.

Robotics

Punny Captions: Witty Wordplay in Image Descriptions

1 code implementation NAACL 2018 Arjun Chandrasekaran, Devi Parikh, Mohit Bansal

Wit is a form of rich interaction that is often grounded in a specific situation (e. g., a comment in response to an event).

Multi-Task Video Captioning with Video and Entailment Generation

no code implementations ACL 2017 Ramakanth Pasunuru, Mohit Bansal

Video captioning, the task of describing the content of a video, has seen some promising improvements in recent years with sequence-to-sequence models, but accurately learning the temporal and logical dynamics involved in the task still remains a challenge, especially given the lack of sufficient annotated data.

Multi-Task Learning Video Captioning +1

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

2 code implementations CVPR 2017 Licheng Yu, Hao Tan, Mohit Bansal, Tamara L. Berg

The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions.

Referring Expression Comprehension

Coherent Dialogue with Attention-based Language Models

no code implementations21 Nov 2016 Hongyuan Mei, Mohit Bansal, Matthew R. Walter

We model coherent conversation continuation via RNN-based dialogue models equipped with a dynamic attention mechanism.

Language Modelling

Navigational Instruction Generation as Inverse Reinforcement Learning with Neural Machine Translation

no code implementations11 Oct 2016 Andrea F. Daniele, Mohit Bansal, Matthew R. Walter

We first decide which information to share with the user according to their preferences, using a policy trained from human demonstrations via inverse reinforcement learning.

Human robot interaction Machine Translation +1

Interpreting Neural Networks to Improve Politeness Comprehension

no code implementations EMNLP 2016 Malika Aubakirova, Mohit Bansal

We present an interpretable neural network approach to predicting and understanding politeness in natural language requests.

Contextual RNN-GANs for Abstract Reasoning Diagram Generation

no code implementations29 Sep 2016 Arnab Ghosh, Viveka Kulharia, Amitabha Mukerjee, Vinay Namboodiri, Mohit Bansal

Understanding, predicting, and generating object motions and transformations is a core problem in artificial intelligence.

Video Generation

Who did What: A Large-Scale Person-Centered Cloze Dataset

no code implementations EMNLP 2016 Takeshi Onishi, Hai Wang, Mohit Bansal, Kevin Gimpel, David Mcallester

We have constructed a new "Who-did-What" dataset of over 200, 000 fill-in-the-gap (cloze) multiple choice reading comprehension problems constructed from the LDC English Gigaword newswire corpus.

Reading Comprehension

Charagram: Embedding Words and Sentences via Character n-grams

no code implementations EMNLP 2016 John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu

We present Charagram embeddings, a simple approach for learning character-based compositional models to embed textual sequences.

Part-Of-Speech Tagging Sentence Similarity +1

The Role of Context Types and Dimensionality in Learning Word Embeddings

no code implementations NAACL 2016 Oren Melamud, David McClosky, Siddharth Patwardhan, Mohit Bansal

We provide the first extensive evaluation of how using different types of context to learn skip-gram word embeddings affects performance on a wide range of intrinsic and extrinsic NLP tasks.

Learning Word Embeddings

End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures

2 code implementations ACL 2016 Makoto Miwa, Mohit Bansal

We present a novel end-to-end neural model to extract entities and relations between them.

 Ranked #1 on Relation Extraction on ACE 2005 (Sentence Encoder metric)

Relation Classification

Towards Universal Paraphrastic Sentence Embeddings

no code implementations25 Nov 2015 John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu

We again find that the word averaging models perform well for sentence similarity and entailment, outperforming LSTMs.

General Classification Sentence Embeddings +2

Learning Articulated Motion Models from Visual and Lingual Signals

no code implementations17 Nov 2015 Zhengyang Wu, Mohit Bansal, Matthew R. Walter

In this paper, we present a multimodal learning framework that incorporates both visual and lingual information to estimate the structure and parameters that define kinematic models of articulated objects.

Language Modelling Word Embeddings

Accurate Vision-based Vehicle Localization using Satellite Imagery

no code implementations30 Oct 2015 Hang Chu, Hongyuan Mei, Mohit Bansal, Matthew R. Walter

We propose a method for accurately localizing ground vehicles with the aid of satellite imagery.

Visual Localization

What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment

1 code implementation NAACL 2016 Hongyuan Mei, Mohit Bansal, Matthew R. Walter

We propose an end-to-end, domain-independent neural encoder-aligner-decoder model for selective generation, i. e., the joint task of content selection and surface realization.

Data-to-Text Generation

Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences

1 code implementation12 Jun 2015 Hongyuan Mei, Mohit Bansal, Matthew R. Walter

We propose a neural sequence-to-sequence model for direction following, a task that is essential to realizing effective autonomous agents.

Natural Language Understanding

From Paraphrase Database to Compositional Paraphrase Model and Back

1 code implementation TACL 2015 John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu, Dan Roth

The Paraphrase Database (PPDB; Ganitkevitch et al., 2013) is an extensive semantic resource, consisting of a list of phrase pairs with (heuristic) confidence estimates.

Word Embeddings

Web-scale Surface and Syntactic n-gram Features for Dependency Parsing

no code implementations25 Feb 2015 Dominick Ng, Mohit Bansal, James R. Curran

We develop novel first- and second-order features for dependency parsing based on the Google Syntactic Ngrams corpus, a collection of subtree counts of parsed sentences from scanned books.

Dependency Parsing

Cannot find the paper you are looking for? You can Submit a new open access paper.