1 code implementation • COLING 2022 • Danfeng Guo, Arpit Gupta, Sanchit Agarwal, Jiun-Yu Kao, Shuyang Gao, Arijit Biswas, Chien-Wei Lin, Tagyoung Chung, Mohit Bansal
Learning from multimodal data has become a popular research topic in recent years.
1 code implementation • NAACL (TextGraphs) 2021 • Qi Zeng, Manling Li, Tuan Lai, Heng Ji, Mohit Bansal, Hanghang Tong
Current methods for event representation ignore related events in a corpus-level global context.
no code implementations • NAACL (sdp) 2021 • Yash Gupta, Pawan Sasanka Ammanamanchi, Shikha Bordia, Arjun Manoharan, Deepak Mittal, Ramakanth Pasunuru, Manish Shrivastava, Maneesh Singh, Mohit Bansal, Preethi Jyothi
Large pretrained models have seen enormous success in extractive summarization tasks.
no code implementations • Findings (NAACL) 2022 • Adyasha Maharana, Quan Tran, Franck Dernoncourt, Seunghyun Yoon, Trung Bui, Walter Chang, Mohit Bansal
We construct and present a new multimodal dataset consisting of software instructional livestreams and containing manual annotations for both detailed and abstract procedural intent that enable training and evaluation of joint video and text understanding models.
1 code implementation • EMNLP 2021 • Ramakanth Pasunuru, Veselin Stoyanov, Mohit Bansal
In this work, we propose a continual few-shot learning (CFL) task, in which a system is challenged with a difficult phenomenon and asked to learn to correct mistakes with only a few (10 to 15) training examples.
no code implementations • ACL (RepL4NLP) 2021 • Han Guo, Ramakanth Pasunuru, Mohit Bansal
Many recalibration methods have been proposed in the literature for quantifying predictive uncertainty and calibrating model outputs, with varying degrees of complexity.
1 code implementation • EMNLP 2021 • Adyasha Maharana, Mohit Bansal
Such information is even more important for story visualization since its inputs have an explicit narrative structure that needs to be translated into an image sequence (or visual story).
1 code implementation • EMNLP 2021 • Hyounghun Kim, Jialu Li, Mohit Bansal
In this paper, we explore the Navigation from Dialogue History (NDH) task, which is based on the Cooperative Vision-and-Dialogue Navigation (CVDN) dataset, and present a state-of-the-art model which is built upon Vision-Language transformers.
1 code implementation • EMNLP 2021 • Yichen Jiang, Mohit Bansal
Motivated by the failure of a Transformer model on the SCAN compositionality challenge (Lake and Baroni, 2018), which requires parsing a command into actions, we propose two auxiliary sequence prediction tasks as additional training supervision.
1 code implementation • COLING 2022 • Adyasha Maharana, Mohit Bansal
Recent advances in commonsense reasoning have been fueled by the availability of large-scale human annotated datasets.
1 code implementation • NAACL 2022 • Adyasha Maharana, Mohit Bansal
Hence, we examine the effect of a human-like easy-to-difficult curriculum during finetuning of language models for commonsense reasoning tasks.
no code implementations • NAACL (DeeLIO) 2021 • Lisa Bauer, Lingjia Deng, Mohit Bansal
We examine the effect of domain-specific external knowledge variations on deep large scale language model performance.
1 code implementation • NAACL 2022 • Ori Shapira, Ramakanth Pasunuru, Mohit Bansal, Ido Dagan, Yael Amsterdamer
Interactive summarization is a task that facilitates user-guided exploration of information within a document set.
no code implementations • NAACL 2022 • Sha Li, Mahdi Namazifar, Di Jin, Mohit Bansal, Heng Ji, Yang Liu, Dilek Hakkani-Tur
In this work, we propose to automatically convert the background knowledge documents into document semantic graphs and then perform knowledge selection over such graphs.
1 code implementation • NAACL (ACL) 2022 • Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang, Tuan Lai, Xudong Lin, Ziqi Wang, Iris Liu, Ben Zhou, Haoyang Wen, Manling Li, Darryl Hannan, Jie Lei, Hyounghun Kim, Rotem Dror, Haoyu Wang, Michael Regan, Qi Zeng, Qing Lyu, Charles Yu, Carl Edwards, Xiaomeng Jin, Yizhu Jiao, Ghazaleh Kazeminejad, Zhenhailong Wang, Chris Callison-Burch, Mohit Bansal, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Martha Palmer, Heng Ji
We introduce RESIN-11, a new schema-guided event extraction&prediction framework that can be applied to a large variety of newsworthy scenarios.
1 code implementation • 1 May 2025 • Vaidehi Patil, Yi-Lin Sung, Peter Hase, Jie Peng, Tianlong Chen, Mohit Bansal
To address this gap, we first introduce a multimodal unlearning benchmark, UnLOK-VQA (Unlearning Outside Knowledge VQA), as well as an attack-and-defense framework to evaluate methods for deleting specific multimodal knowledge from MLLMs.
no code implementations • 22 Apr 2025 • Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Junyuan Mao, Yu Wang, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Wenjie Qu, Yue Liu, Chengwei Liu, Yifan Zhang, Qiankun Li, Chongye Guo, Yalan Qin, Zhaoxin Fan, Yi Ding, Donghai Hong, Jiaming Ji, Yingxin Lai, Zitong Yu, Xinfeng Li, Yifan Jiang, Yanhui Li, Xinyu Deng, Junlin Wu, Dongxia Wang, Yihao Huang, Yufei Guo, Jen-tse Huang, Qiufeng Wang, Wenxuan Wang, Dongrui Liu, Yanwei Yue, Wenke Huang, Guancheng Wan, Heng Chang, Tianlin Li, Yi Yu, Chenghao Li, Jiawei Li, Lei Bai, Jie Zhang, Qing Guo, Jingyi Wang, Tianlong Chen, Joey Tianyi Zhou, Xiaojun Jia, Weisong Sun, Cong Wu, Jing Chen, Xuming Hu, Yiming Li, Xiao Wang, Ningyu Zhang, Luu Anh Tuan, Guowen Xu, Jiaheng Zhang, Tianwei Zhang, Xingjun Ma, Jindong Gu, Xiang Wang, Bo An, Jun Sun, Mohit Bansal, Shirui Pan, Lingjuan Lyu, Yuval Elovici, Bhavya Kailkhura, Yaodong Yang, Hongwei Li, Wenyuan Xu, Yizhou Sun, Wei Wang, Qing Li, Ke Tang, Yu-Gang Jiang, Felix Juefei-Xu, Hui Xiong, XiaoFeng Wang, DaCheng Tao, Philip S. Yu, Qingsong Wen, Yang Liu
Currently, existing surveys on LLM safety primarily focus on specific stages of the LLM lifecycle, e. g., deployment phase or fine-tuning phase, lacking a comprehensive understanding of the entire "lifechain" of LLMs.
1 code implementation • 17 Apr 2025 • Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal
Large language model (LLM) agents are increasingly employing retrieval-augmented generation (RAG) to improve the factuality of their responses.
no code implementations • 11 Apr 2025 • Jialu Li, Shoubin Yu, Han Lin, Jaemin Cho, Jaehong Yoon, Mohit Bansal
Video-MSG consists of three steps, where in the first two steps, Video-MSG creates Video Sketch, a fine-grained spatio-temporal plan for the final video, specifying background, foreground, and object trajectories, in the form of draft video frames.
1 code implementation • 10 Apr 2025 • Hanqi Xiao, Yi-Lin Sung, Elias Stengel-Eskin, Mohit Bansal
We compare TaCQ-based quantization to existing mixed-precision quantization methods when conditioning both on general-purpose and task-specific data.
no code implementations • 21 Mar 2025 • Brihi Joshi, Sriram Venkatapathy, Mohit Bansal, Nanyun Peng, Haw-Shiuan Chang
Evaluating creative text such as human-written stories using language models has always been a challenging task -- owing to the subjectivity of multi-annotator ratings.
no code implementations • 19 Mar 2025 • David Wan, Justin Chih-Yao Chen, Elias Stengel-Eskin, Mohit Bansal
We investigate how iterative collaboration among multiple instances and types of large language models (LLMs) enhances subtasks in the refinement process, such as error detection, critiquing unfaithful sentences, and making corrections based on critiques.
no code implementations • 18 Mar 2025 • Shoubin Yu, Difan Liu, Ziqiao Ma, Yicong Hong, Yang Zhou, Hao Tan, Joyce Chai, Mohit Bansal
To support diverse tasks and complex instructions, we employ a curriculum learning strategy: first aligning the MLLM and video diffusion model with large-scale instructional image editing data, followed by end-to-end fine-tuning on high-quality multitask video data.
no code implementations • 7 Mar 2025 • Justin Chih-Yao Chen, Sukwon Yun, Elias Stengel-Eskin, Tianlong Chen, Mohit Bansal
We propose a skill-based recruiting strategy that dynamically selects the most relevant set of expert LLMs for diverse reasoning tasks based on their strengths.
1 code implementation • 3 Mar 2025 • Yi-Lin Sung, Prateek Yadav, Jialu Li, Jaehong Yoon, Mohit Bansal
Building on this finding, we propose RSQ (Rotate, Scale, then Quantize), which (1) applies rotations (orthogonal transformation) to the model to mitigate outliers (those with exceptionally large magnitude), (2) scales the token feature based on its importance, and (3) quantizes the model using the GPTQ framework with the second-order statistics computed by scaled tokens.
no code implementations • 21 Feb 2025 • Zaid Khan, Ali Farhadi, Ranjay Krishna, Luca Weihs, Mohit Bansal, Tanmay Gupta
When a human requests an LLM to complete a coding task using functionality from a large code repository, how do we provide context from the repo to the LLM?
1 code implementation • 20 Feb 2025 • Vaidehi Patil, Elias Stengel-Eskin, Mohit Bansal
We find that UPCORE improves both standard metrics and AUC, benefitting from positive transfer between the coreset and pruned points while reducing negative transfer from the forget set to points outside of it.
1 code implementation • 3 Feb 2025 • Archiki Prasad, Elias Stengel-Eskin, Justin Chih-Yao Chen, Zaid Khan, Mohit Bansal
However, we uncover a trade-off between generating unit test inputs that reveal errors when given a faulty code and correctly predicting the unit test output without access to the gold solution.
1 code implementation • 12 Dec 2024 • Xizi Wang, Feng Cheng, Ziyang Wang, Huiyu Wang, Md Mohaiminul Islam, Lorenzo Torresani, Mohit Bansal, Gedas Bertasius, David Crandall
Recent work has focused on enabling Video LLMs to perform video temporal grounding via next-token prediction of temporal timestamps.
1 code implementation • 11 Dec 2024 • Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, LiMin Wang
In this paper, we introduce a Self-Refining Data Flywheel (SRDF) that generates high-quality and large-scale navigational instruction-trajectory pairs by iteratively refining the data pool through the collaboration between two models, the instruction generator and the navigator, without any human-in-the-loop annotation.
1 code implementation • 10 Dec 2024 • Shiyue Zhang, David Wan, Arie Cattan, Ayal Klein, Ido Dagan, Mohit Bansal
The Pyramid human evaluation protocol, which assesses content selection by breaking the reference summary into sub-units and verifying their presence in the system summary, has been widely adopted.
1 code implementation • 7 Dec 2024 • Gengze Zhou, Yicong Hong, Zun Wang, Chongyang Zhao, Mohit Bansal, Qi Wu
The academic field of learning instruction-guided visual navigation can be generally categorized into high-level category-specific search and low-level language-guided navigation, depending on the granularity of language instruction, in which the former emphasizes the exploration process, while the latter concentrates on following detailed textual commands.
no code implementations • 29 Nov 2024 • Justin Chih-Yao Chen, Zifeng Wang, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister
Reverse thinking plays a crucial role in human reasoning.
no code implementations • 25 Nov 2024 • Zun Wang, Jialu Li, Han Lin, Jaehong Yoon, Mohit Bansal
To address these challenges, we propose DreamRunner, a novel story-to-video generation method: First, we structure the input script using a large language model (LLM) to facilitate both coarse-grained scene planning as well as fine-grained object-level layout and motion planning.
no code implementations • 22 Nov 2024 • Daeun Lee, Jaehong Yoon, Jaemin Cho, Mohit Bansal
In (2) refinement planning, we identify accurately generated objects and then create localized prompts to refine other areas in the video.
no code implementations • 15 Nov 2024 • Andong Deng, Tongjia Chen, Shoubin Yu, Taojiannan Yang, Lincoln Spencer, Yapeng Tian, Ajmal Saeed Mian, Mohit Bansal, Chen Chen
In this paper, we introduce Motion-Grounded Video Reasoning, a new motion understanding task that requires generating visual answers (video segmentation masks) according to the input question, and hence needs implicit spatiotemporal reasoning and grounding.
no code implementations • 7 Nov 2024 • Jaemin Cho, Debanjan Mahata, Ozan Irsoy, Yujie He, Mohit Bansal
However, there are difficulties in applying these methods in real-world scenarios: (a) questions often require information across different pages or documents, where MLMs cannot handle many long documents; (b) documents often have important information in visual elements such as figures, but text extraction tools ignore them.
no code implementations • 6 Nov 2024 • Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang, Jing Xu, Maryam Fazel-Zarandi, Mohit Bansal, Sainbayar Sukhbaatar, Jason Weston, Jane Yu
Self-alignment, whereby models learn to improve themselves without human annotation, is a rapidly growing research area.
1 code implementation • 3 Nov 2024 • Haw-Shiuan Chang, Nanyun Peng, Mohit Bansal, Anil Ramakrishna, Tagyoung Chung
In FactualityPrompts, an open-ended text generation benchmark, sampling using APD significantly boosts factuality in comparison to the CD sampling and its variants, and achieves state-of-the-art results for Pythia 6. 9B and OPT 6. 7B.
1 code implementation • 31 Oct 2024 • David Wan, Jesse Vig, Mohit Bansal, Shafiq Joty
Large Language Models (LLMs) often exhibit positional bias in long-context settings, under-attending to information in the middle of inputs.
no code implementations • 24 Oct 2024 • Jialu Li, Yuanzhen Li, Neal Wadhwa, Yael Pritch, David E. Jacobs, Michael Rubinstein, Mohit Bansal, Nataniel Ruiz
We introduce the concept of a generative infinite game, a video game that transcends the traditional boundaries of finite, hard-coded systems by using generative models.
1 code implementation • 18 Oct 2024 • Elias Stengel-Eskin, Peter Hase, Mohit Bansal
PBT consistently improves resistance to misinformation and resilience to being challenged while also resulting in the best overall performance on holistic data containing both positive and negative persuasion.
1 code implementation • 16 Oct 2024 • Jaehong Yoon, Shoubin Yu, Vaidehi Patil, Huaxiu Yao, Mohit Bansal
To address these, we propose SAFREE, a novel, training-free approach for safe T2I and T2V, that does not alter the model's weights.
1 code implementation • 14 Oct 2024 • Adyasha Maharana, Jaehong Yoon, Tianlong Chen, Mohit Bansal
This data selector samples a subset of the most important samples from each skill cluster for training.
no code implementations • 9 Oct 2024 • Thomas Palmeira Ferraz, Kartik Mehta, Yu-Hsiang Lin, Haw-Shiuan Chang, Shereen Oraby, Sijia Liu, Vivek Subramanian, Tagyoung Chung, Mohit Bansal, Nanyun Peng
To address the performance gap between open-source and proprietary models, we propose the Decompose, Critique and Refine (DeCRIM) self-correction pipeline, which enhances LLMs' ability to follow constraints.
1 code implementation • 9 Oct 2024 • Pingzhi Li, Prateek Yadav, Jaehong Yoon, Jie Peng, Yi-Lin Sung, Mohit Bansal, Tianlong Chen
Our experiments using T5-based models for T0 and FLAN tasks demonstrate that GLIDER achieves substantially improved held-in performance while maintaining strong generalization on held-out tasks.
1 code implementation • 9 Oct 2024 • Arie Cattan, Paul Roit, Shiyue Zhang, David Wan, Roee Aharoni, Idan Szpektor, Mohit Bansal, Ido Dagan
There has been an increasing interest in detecting hallucinations in model-generated texts, both manually and automatically, at varying levels of granularity.
1 code implementation • 8 Oct 2024 • Zaid Khan, Elias Stengel-Eskin, Jaemin Cho, Mohit Bansal
Students are iteratively trained and evaluated on generated data, and their feedback (in the form of errors or weak skills) is reported to the agent after each iteration.
no code implementations • 4 Oct 2024 • Prateek Yadav, Tu Vu, Jonathan Lai, Alexandra Chronopoulou, Manaal Faruqui, Mohit Bansal, Tsendsuren Munkhdalai
This leaves many unanswered questions about the effect of scaling model size and how it interplays with other key factors -- like the base model quality and number of expert models -- , to affect the merged model's performance.
no code implementations • 4 Oct 2024 • Han Lin, Tushar Nagarajan, Nicolas Ballas, Mido Assran, Mojtaba Komeili, Mohit Bansal, Koustuv Sinha
In this work, we show that a strong off-the-shelf frozen pretrained visual encoder, along with a well designed prediction model, can achieve state-of-the-art (SoTA) performance in forecasting and procedural planning without the need for pretraining the prediction model, nor requiring additional supervision from language or ASR.
1 code implementation • 2 Oct 2024 • Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal
Our results on commonsense and math reasoning tasks demonstrate that LASeR can boost iterative LLM optimization by optimizing for multiple RMs, improving the absolute average accuracy of Llama-3-8B over three datasets by 2. 67% over training with ensemble RM scores while also showing superior training efficiency (e. g., a 2x speedup).
1 code implementation • 18 Sep 2024 • Justin Chih-Yao Chen, Archiki Prasad, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal
Moreover, to ensure effective refinement, we employ a multi-agent loop with three agents: Solver, Reviewer (which generates targeted feedback based on step-wise RM scores), and the Refiner (which incorporates feedback).
1 code implementation • 11 Sep 2024 • Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal
Knowledge conflict arises from discrepancies between information in the context of a large language model (LLM) and the knowledge stored in its parameters.
no code implementations • 25 Aug 2024 • Suyash Vardhan Mathur, Jainit Sushil Bafna, Kunal Kartik, Harshita Khandelwal, Manish Shrivastava, Vivek Gupta, Mohit Bansal, Dan Roth
With the evolution of AI models capable of multimodal reasoning, it is pertinent to assess their efficacy in handling such structured data.
no code implementations • 13 Aug 2024 • Prateek Yadav, Colin Raffel, Mohammed Muqeeth, Lucas Caccia, Haokun Liu, Tianlong Chen, Mohit Bansal, Leshem Choshen, Alessandro Sordoni
The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to a particular domain or task.
1 code implementation • 19 Jul 2024 • Swarnadeep Saha, Archiki Prasad, Justin Chih-Yao Chen, Peter Hase, Elias Stengel-Eskin, Mohit Bansal
To this end, we propose the System-1. x Planner, a controllable planning framework with LLMs that is capable of generating hybrid plans and balancing between the two planning modes based on the difficulty of the problem at hand.
1 code implementation • 9 Jul 2024 • Yue Zhang, Ziqiao Ma, Jialu Li, Yanyuan Qiao, Zun Wang, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development.
no code implementations • 27 Jun 2024 • Peter Hase, Thomas Hofweber, Xiang Zhou, Elias Stengel-Eskin, Mohit Bansal
With this goal in mind, this paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research.
1 code implementation • 17 Jun 2024 • Amith Ananthram, Elias Stengel-Eskin, Carl Vondrick, Mohit Bansal, Kathleen McKeown
Moreover, while prompting in the language of a target culture can lead to reductions in bias, it is not a substitute for building AI more representative of the world's languages.
1 code implementation • 11 Jun 2024 • Haw-Shiuan Chang, Nanyun Peng, Mohit Bansal, Anil Ramakrishna, Tagyoung Chung
If a LLM's entropy is higher than the asymptotic entropy (i. e., the LLM is more uncertain than it should be), the THF model predicts a high hallucination hazard, which leads to a lower p threshold in REAL sampling.
no code implementations • 5 Jun 2024 • Thomas Hofweber, Peter Hase, Elias Stengel-Eskin, Mohit Bansal
We consider both logical coherence norms as well as coherence norms tied to the strength of belief.
1 code implementation • 2 Jun 2024 • Ori Ernst, Ori Shapira, Aviv Slobodkin, Sharon Adar, Mohit Bansal, Jacob Goldberger, Ran Levy, Ido Dagan
Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection, followed by text generation.
2 code implementations • 31 May 2024 • Elias Stengel-Eskin, Peter Hase, Mohit Bansal
To calibrate both implicit and explicit confidence markers, we introduce a pragmatic, listener-aware finetuning method (LACIE) that models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener.
1 code implementation • 29 May 2024 • Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal
Specifically, we incorporate multigranularity information into a tree-based representation, allowing VideoTree to extract query-relevant details from long videos in a coarse-to-fine manner.
2 code implementations • 28 May 2024 • Jaehong Yoon, Shoubin Yu, Mohit Bansal
(3) RACCooN also plans to imagine new objects in a given video, so users simply prompt the model to receive a detailed video editing plan for complex video editing.
no code implementations • 8 May 2024 • Xuehai He, Jian Zheng, Jacob Zhiyuan Fang, Robinson Piramuthu, Mohit Bansal, Vicente Ordonez, Gunnar A Sigurdsson, Nanyun Peng, Xin Eric Wang
Controllable text-to-image (T2I) diffusion models generate images conditioned on both text prompts and semantic inputs of other modalities like edge maps.
no code implementations • 15 Apr 2024 • Han Lin, Jaemin Cho, Abhay Zala, Mohit Bansal
ControlNets are widely used for adding spatial control to text-to-image diffusion models with different conditions, such as depth maps, scribbles/sketches, and human poses.
1 code implementation • 31 Mar 2024 • Qin Liu, Jaemin Cho, Mohit Bansal, Marc Niethammer
In light of this, we reintroduce this dense design into the generalist models, to facilitate the development of generalist models with high segmentation quality.
no code implementations • 30 Mar 2024 • Taishi Nakamura, Mayank Mishra, Simone Tedeschi, Yekun Chai, Jason T Stillerman, Felix Friedrich, Prateek Yadav, Tanmay Laud, Vu Minh Chien, Terry Yue Zhuo, Diganta Misra, Ben Bogin, Xuan-Son Vu, Marzena Karpinska, Arnav Varma Dantuluri, Wojciech Kusa, Tommaso Furlanello, Rio Yokota, Niklas Muennighoff, Suhas Pai, Tosin Adewumi, Veronika Laippala, Xiaozhe Yao, Adalberto Junior, Alpay Ariyak, Aleksandr Drozd, Jordan Clive, Kshitij Gupta, Liangyu Chen, Qi Sun, Ken Tsui, Noah Persaud, Nour Fahmy, Tianlong Chen, Mohit Bansal, Nicolo Monti, Tai Dang, Ziyang Luo, Tien-Tung Bui, Roberto Navigli, Virendra Mehta, Matthew Blumberg, Victor May, Huu Nguyen, Sampo Pyysalo
Despite these efforts, such models encounter challenges such as limited multilingual capabilities, risks of catastrophic forgetting during continual pretraining, and the high costs of training models from scratch, alongside the need to align with AI safety standards and regulatory frameworks.
no code implementations • 18 Mar 2024 • Abhay Zala, Jaemin Cho, Han Lin, Jaehong Yoon, Mohit Bansal
Then, we enable the LLM to continuously adapt the generated environments to progressively improve the skills that the agent is weak at, by providing feedback to the LLM in the form of the agent's performance.
1 code implementation • 13 Mar 2024 • Feng Cheng, Ziyang Wang, Yi-Lin Sung, Yan-Bo Lin, Mohit Bansal, Gedas Bertasius
Our DAM model outperforms prior state-of-the-art continual learning approaches by 9. 1% while exhibiting 1. 9% less forgetting on 6 VidQA datasets spanning various domains.
no code implementations • 11 Mar 2024 • Jialu Li, Jaemin Cho, Yi-Lin Sung, Jaehong Yoon, Mohit Bansal
In this paper, we introduce SELMA: Skill-Specific Expert Learning and Merging with Auto-Generated Data, a novel paradigm to improve the faithfulness of T2I models by fine-tuning models on automatically generated, multi-skill image-text datasets, with skill-specific expert learning and merging.
no code implementations • 4 Mar 2024 • David Wan, Jaemin Cho, Elias Stengel-Eskin, Mohit Bansal
Highlighting particularly relevant regions of an image can improve the performance of vision-language models (VLMs) on various vision-language (VL) tasks by guiding the model to attend more closely to these regions of interest.
no code implementations • 28 Feb 2024 • Alyssa Hwang, Kalpit Dixit, Miguel Ballesteros, Yassine Benajiba, Vittorio Castelli, Markus Dreyer, Mohit Bansal, Kathleen McKeown
We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents.
1 code implementation • 27 Feb 2024 • Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang
Using this pipeline, we collect LoCoMo, a dataset of very long-term conversations, each encompassing 300 turns and 9K tokens on avg., over up to 35 sessions.
1 code implementation • 20 Feb 2024 • Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal
Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers.
2 code implementations • 19 Feb 2024 • Jinhao Duan, Renming Zhang, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Elias Stengel-Eskin, Mohit Bansal, Tianlong Chen, Kaidi Xu
We further characterize the game-theoretic properties of LLMs, such as equilibrium and Pareto Efficiency in repeated games.
no code implementations • 13 Feb 2024 • Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Yuguang Yao, Chris Yuhao Liu, Xiaojun Xu, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, Yang Liu
We explore machine unlearning (MU) in the domain of large language models (LLMs), referred to as LLM unlearning.
1 code implementation • 9 Feb 2024 • Yichen Jiang, Xiang Zhou, Mohit Bansal
Transformers generalize to novel compositions of structures and entities after being trained on a complex dataset, but easily overfit on datasets of insufficient complexity.
1 code implementation • 8 Feb 2024 • Shoubin Yu, Jaehong Yoon, Mohit Bansal
Despite impressive advancements in recent multimodal reasoning approaches, they are still limited in flexibility and efficiency, as these models typically process only a few fixed modality inputs and require updates to numerous parameters.
Ranked #1 on
Question Answering
on SQA3D
no code implementations • 5 Feb 2024 • Jialu Li, Aishwarya Padmakumar, Gaurav Sukhatme, Mohit Bansal
Outdoor Vision-and-Language Navigation (VLN) requires an agent to navigate through realistic 3D outdoor environments based on natural language instructions.
1 code implementation • 2 Feb 2024 • Justin Chih-Yao Chen, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal
Experiments on seven widely used commonsense and math reasoning benchmarks show that MAGDi improves the reasoning capabilities of smaller models, outperforming several methods that distill from a single teacher and multiple teachers.
1 code implementation • 29 Jan 2024 • Elias Stengel-Eskin, Archiki Prasad, Mohit Bansal
While large language models (LLMs) are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality.
1 code implementation • 19 Jan 2024 • Xiyao Wang, YuHang Zhou, Xiaoyu Liu, Hongjin Lu, Yuancheng Xu, Feihong He, Jaehong Yoon, Taixi Lu, Gedas Bertasius, Mohit Bansal, Huaxiu Yao, Furong Huang
However, current MLLM benchmarks are predominantly designed to evaluate reasoning based on static information about a single image, and the ability of modern MLLMs to extrapolate from image sequences, which is essential for understanding our ever-changing world, has been less investigated.
1 code implementation • 12 Jan 2024 • Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe
In this paper, we present the surprising conclusion that current pretrained language models often generalize relatively well from easy to hard data, even performing as well as oracle models finetuned on hard data.
2 code implementations • 10 Jan 2024 • Yue Huang, Lichao Sun, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao liu, Heng Ji, Hongyi Wang, huan zhang, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang, Mohit Bansal, James Zou, Jian Pei, Jian Liu, Jianfeng Gao, Jiawei Han, Jieyu Zhao, Jiliang Tang, Jindong Wang, Joaquin Vanschoren, John Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He, Lifu Huang, Michael Backes, Neil Zhenqiang Gong, Philip S. Yu, Pin-Yu Chen, Quanquan Gu, ran Xu, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu, Tianyi Zhou, William Wang, Xiang Li, Xiangliang Zhang, Xiao Wang, Xing Xie, Xun Chen, Xuyu Wang, Yan Liu, Yanfang Ye, Yinzhi Cao, Yong Chen, Yue Zhao
This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions.
1 code implementation • CVPR 2024 • Qin Liu, Jaemin Cho, Mohit Bansal, Marc Niethammer
In light of this we reintroduce this dense design into the generalist models to facilitate the development of generalist models with high segmentation quality.
no code implementations • CVPR 2024 • Zineng Tang, ZiYi Yang, Mahmoud Khademi, Yang Liu, Chenguang Zhu, Mohit Bansal
We present CoDi-2 a Multimodal Large Language Model (MLLM) for learning in-context interleaved multimodal representations.
1 code implementation • 28 Dec 2023 • Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius
Furthermore, we show that a specialized prompt that asks the LLM first to summarize the noisy short-term visual captions and then answer a given input question leads to a significant LVQA performance boost.
Ranked #4 on
Zero-Shot Video Question Answer
on NExT-GQA
1 code implementation • 7 Dec 2023 • Derek Tam, Mohit Bansal, Colin Raffel
Model merging aims to cheaply combine individual task-specific models into a single multitask model.
no code implementations • 30 Nov 2023 • Zineng Tang, ZiYi Yang, Mahmoud Khademi, Yang Liu, Chenguang Zhu, Mohit Bansal
We present CoDi-2, a versatile and interactive Multimodal Large Language Model (MLLM) that can follow complex multimodal interleaved instructions, conduct in-context learning (ICL), reason, chat, edit, etc., in an any-to-any input-output modality paradigm.
1 code implementation • 28 Nov 2023 • Vaidehi Patil, Adyasha Maharana, Mohit Bansal
In this paper, we study bias arising from confounders in a causal graph for multimodal data and examine a novel approach that leverages causally-motivated information minimization to learn the confounder representations.
1 code implementation • 22 Nov 2023 • Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal
Despite the efficiency of PEFT methods, the size of expert models can make it onerous to retrieve expert models per query over high-latency networks like the Internet or serve multiple experts on a single GPU.
no code implementations • CVPR 2024 • Xiaohui Zhang, Jaehong Yoon, Mohit Bansal, Huaxiu Yao
This optimization process is controlled by a gradient modification mechanism to prevent the shared head from losing previously acquired information.
1 code implementation • 8 Nov 2023 • Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabharwal, Mohit Bansal, Tushar Khot
Large Language Models (LLMs) are increasingly being used for interactive decision-making tasks requiring planning and adapting to the environment.
1 code implementation • 8 Nov 2023 • Xiang Zhou, Yichen Jiang, Mohit Bansal
However, in contrast to this poor performance, state-of-the-art models trained on larger and more general datasets show better generalization ability.
no code implementations • 27 Oct 2023 • Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang
With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above.
no code implementations • 23 Oct 2023 • Swarnadeep Saha, Omer Levy, Asli Celikyilmaz, Mohit Bansal, Jason Weston, Xian Li
Large Language Models (LLMs) are frequently used for multi-faceted language generation and evaluation tasks that involve satisfying intricate user constraints or taking into account multiple aspects and criteria.
no code implementations • 18 Oct 2023 • Abhay Zala, Han Lin, Jaemin Cho, Mohit Bansal
In the second stage, we use a diagram generator, DiagramGLIGEN, and a text label rendering module to generate diagrams (with clear text labels) following the diagram plans.
1 code implementation • 16 Oct 2023 • Leonardo F. R. Ribeiro, Mohit Bansal, Markus Dreyer
Readability refers to how easily a reader can understand a written text.
1 code implementation • 11 Oct 2023 • Adyasha Maharana, Prateek Yadav, Mohit Bansal
There are two dominant approaches: (1) geometry-based data selection for maximizing data diversity in the coreset, and (2) functions that assign difficulty scores to samples based on training dynamics.
1 code implementation • 9 Oct 2023 • Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal
An increasing number of vision-language tasks can be handled with little to no training, i. e., in a zero and few-shot manner, by marrying large language models (LLMs) to vision encoders, resulting in large vision-language models (LVLMs).
no code implementations • 4 Oct 2023 • Yi-Lin Sung, Jaehong Yoon, Mohit Bansal
We first determine the sparsity ratios of different layers or blocks by leveraging the global importance score, which is efficiently computed based on the zeroth-order approximation of the global model gradients.
1 code implementation • 2 Oct 2023 • Pingzhi Li, Zhenyu Zhang, Prateek Yadav, Yi-Lin Sung, Yu Cheng, Mohit Bansal, Tianlong Chen
Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks, however, they have issues like (a) High Memory Usage, due to duplication of the network layers into multiple copies as experts; and (b) Redundancy in Experts, as common learning-based routing policies suffer from representational collapse.
1 code implementation • 1 Oct 2023 • Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng, Chelsea Finn, Mohit Bansal, Huaxiu Yao
Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages.
1 code implementation • 29 Sep 2023 • Vaidehi Patil, Peter Hase, Mohit Bansal
Experimentally, we show that even state-of-the-art model editing methods such as ROME struggle to truly delete factual information from models like GPT-J, as our whitebox and blackbox attacks can recover "deleted" information from an edited model 38% of the time.
no code implementations • 26 Sep 2023 • Han Lin, Abhay Zala, Jaemin Cho, Mohit Bansal
Our experiments demonstrate that our proposed VideoDirectorGPT framework substantially improves layout and movement control in both single- and multi-scene video generation and can generate multi-scene videos with consistency, while achieving competitive performance with SOTAs in open-domain single-scene T2V generation.
2 code implementations • 22 Sep 2023 • Justin Chih-Yao Chen, Swarnadeep Saha, Mohit Bansal
In each round, ReConcile initiates discussion between agents via a 'discussion prompt' that consists of (a) grouped answers and explanations generated by each agent in the previous round, (b) their confidence scores, and (c) demonstrations of answer-rectifying human explanations, used for convincing other agents.
1 code implementation • ICCV 2023 • Ziyang Wang, Yi-Lin Sung, Feng Cheng, Gedas Bertasius, Mohit Bansal
Specifically, our model captures the cross-modal similarity information at different granularity levels.
Ranked #12 on
Video Retrieval
on MSR-VTT
1 code implementation • ICCV 2023 • Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao
Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.
no code implementations • 5 Jul 2023 • Prateek Yadav, Qing Sun, Hantian Ding, Xiaopeng Li, Dejiao Zhang, Ming Tan, Xiaofei Ma, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Mohit Bansal, Bing Xiang
Large-scale code generation models such as Codex and CodeT5 have achieved impressive performance.
no code implementations • 4 Jul 2023 • Jonathan Pilault, Can Liu, Mohit Bansal, Markus Dreyer
Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks.
1 code implementation • 15 Jun 2023 • Swarnadeep Saha, Peter Hase, Mohit Bansal
We first show that teacher LLMs can indeed intervene on student reasoning to improve their performance.
3 code implementations • NeurIPS 2023 • Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, Mohit Bansal
To address this, we propose our method, TRIM, ELECT SIGN & MERGE (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign.
1 code implementation • 27 May 2023 • Yu Zhou, Sha Li, Manling Li, Xudong Lin, Shih-Fu Chang, Mohit Bansal, Heng Ji
To automate the induction of such graph scripts for given tasks, we propose to take advantage of loosely aligned videos of people performing the tasks.
1 code implementation • 26 May 2023 • Shiyue Zhang, Shijie Wu, Ozan Irsoy, Steven Lu, Mohit Bansal, Mark Dredze, David Rosenberg
Autoregressive language models are trained by minimizing the cross-entropy of the model distribution Q relative to the data distribution P -- that is, minimizing the forward cross-entropy, which is equivalent to maximum likelihood estimation (MLE).
no code implementations • 24 May 2023 • Jaemin Cho, Abhay Zala, Mohit Bansal
First, we introduce VPGen, an interpretable step-by-step T2I generation framework that decomposes T2I generation into three steps: object/count generation, layout generation, and image generation.
2 code implementations • NeurIPS 2023 • Zineng Tang, ZiYi Yang, Chenguang Zhu, Michael Zeng, Mohit Bansal
We present Composable Diffusion (CoDi), a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities.
Ranked #8 on
Audio Generation
on AudioCaps
(FAD metric)
1 code implementation • NeurIPS 2023 • Zhenhailong Wang, Ansel Blume, Sha Li, Genglin Liu, Jaemin Cho, Zineng Tang, Mohit Bansal, Heng Ji
Action knowledge involves the understanding of textual, visual, and temporal aspects of actions.
Ranked #42 on
Video Question Answering
on NExT-QA
(using extra training data)
1 code implementation • NeurIPS 2023 • Shoubin Yu, Jaemin Cho, Prateek Yadav, Mohit Bansal
SeViLA framework consists of two modules: Localizer and Answerer, where both are parameter-efficiently fine-tuned from BLIP-2.
Ranked #2 on
Video Question Answering
on NExT-QA (Efficient)
1 code implementation • 8 May 2023 • David Wan, Shiyue Zhang, Mohit Bansal
Cache-LMs, which augment LMs with a memory of recent history, can increase context dependency and have shown remarkable performance in diverse language generation tasks.
1 code implementation • 28 Apr 2023 • Yi-Lin Sung, Linjie Li, Kevin Lin, Zhe Gan, Mohit Bansal, Lijuan Wang
In this paper, we expand on this concept to a multimodal setup by merging transformers trained on different modalities.
1 code implementation • 21 Apr 2023 • Archiki Prasad, Swarnadeep Saha, Xiang Zhou, Mohit Bansal
Multi-step reasoning ability is fundamental to many natural language tasks, yet it is unclear what constitutes a good reasoning chain and how to evaluate them.
2 code implementations • 13 Apr 2023 • Jaemin Cho, Linjie Li, Zhengyuan Yang, Zhe Gan, Lijuan Wang, Mohit Bansal
In this paper, we propose LayoutBench, a diagnostic benchmark for layout-guided image generation that examines four categories of spatial control skills: number, position, size, and shape.
no code implementations • CVPR 2023 • Jialu Li, Mohit Bansal
We then fine-tune the agent on the VLN task with an auxiliary loss that minimizes the difference between the view semantics generated by the agent and the ground truth view semantics of the next step.
1 code implementation • CVPR 2023 • Abhay Zala, Jaemin Cho, Satwik Kottur, Xilun Chen, Barlas Oğuz, Yasher Mehdad, Mohit Bansal
Our hierarchical benchmark consists of video retrieval, moment retrieval, and two novel moment segmentation and step captioning tasks.
1 code implementation • 28 Mar 2023 • Adyasha Maharana, Amita Kamath, Christopher Clark, Mohit Bansal, Aniruddha Kembhavi
As general purpose vision models get increasingly effective at a wide set of tasks, it is imperative that they be consistent across the tasks they support.
1 code implementation • 6 Mar 2023 • David Wan, Mengwen Liu, Kathleen McKeown, Markus Dreyer, Mohit Bansal
We present a systematic study of the effect of generation techniques such as beam search and nucleus sampling on faithfulness in abstractive summarization.
1 code implementation • NeurIPS 2023 • Peter Hase, Mohit Bansal, Been Kim, Asma Ghandeharioun
This finding raises questions about how past work relies on Causal Tracing to select which model layers to edit.
no code implementations • 16 Dec 2022 • Swarnadeep Saha, Xinyan Velocity Yu, Mohit Bansal, Ramakanth Pasunuru, Asli Celikyilmaz
We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning.
1 code implementation • CVPR 2023 • Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius
To do so, we propose a latent audio-visual hybrid (LAVISH) adapter that adapts pretrained ViTs to audio-visual tasks by injecting a small number of trainable parameters into every layer of a frozen ViT.
Ranked #4 on
Audio-visual Question Answering
on MUSIC-AVQA
1 code implementation • CVPR 2023 • Feng Cheng, Xizi Wang, Jie Lei, David Crandall, Mohit Bansal, Gedas Bertasius
Furthermore, our model also obtains state-of-the-art video question-answering results on ActivityNet-QA, MSRVTT-QA, MSRVTT-MC and TVQA.
Ranked #2 on
Video Retrieval
on Condensed Movies
(using extra training data)
4 code implementations • CVPR 2023 • Zineng Tang, ZiYi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal
UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation.
Ranked #5 on
Visual Question Answering (VQA)
on InfographicVQA
(using extra training data)
1 code implementation • 28 Nov 2022 • Yichen Jiang, Xiang Zhou, Mohit Bansal
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
1 code implementation • 21 Nov 2022 • Zineng Tang, Jaemin Cho, Jie Lei, Mohit Bansal
We present Perceiver-VL, a vision-and-language framework that efficiently handles high-dimensional multimodal inputs such as long videos and text.
1 code implementation • 15 Nov 2022 • Derek Tam, Anisha Mascarenhas, Shiyue Zhang, Sarah Kwan, Mohit Bansal, Colin Raffel
To generate summaries that are factually inconsistent, we generate summaries from a suite of summarization models that we have manually annotated as factually inconsistent.
1 code implementation • 14 Nov 2022 • Swarnadeep Saha, Peter Hase, Nazneen Rajani, Mohit Bansal
We observe that (1) GPT-3 explanations are as grammatical as human explanations regardless of the hardness of the test samples, (2) for easy examples, GPT-3 generates highly supportive explanations but human explanations are more generalizable, and (3) for hard examples, human explanations are significantly better than GPT-3 explanations both in terms of label-supportiveness and generalizability judgements.
1 code implementation • 4 Nov 2022 • David Wan, Mohit Bansal
Current metrics for evaluating factuality for abstractive document summarization have achieved high correlations with human judgment, but they do not account for the vision modality and thus are not adequate for vision-and-language summarization.
1 code implementation • 18 Oct 2022 • Prateek Yadav, Mohit Bansal
Although there is no forgetting, the performance of SupSup is sub-optimal because fixed weights restrict its representational power.
3 code implementations • 28 Sep 2022 • Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal
In this work, we present the Textless Vision-Language Transformer (TVLT), where homogeneous transformer blocks take raw visual and audio inputs for vision-and-language representation learning with minimal modality-specific design, and do not use text-specific modules such as tokenization or automatic speech recognition (ASR).
1 code implementation • 21 Sep 2022 • Swarnadeep Saha, Shiyue Zhang, Peter Hase, Mohit Bansal
We demonstrate that SP-Search effectively represents the generative process behind human summaries using modules that are typically faithful to their intended behavior.
1 code implementation • 13 Sep 2022 • Adyasha Maharana, Darryl Hannan, Mohit Bansal
Hence, we first propose the task of story continuation, where the generated visual story is conditioned on a source image, allowing for better generalization to narratives with new characters.
Ranked #3 on
Story Continuation
on FlintstonesSV
1 code implementation • 8 Sep 2022 • Shiyue Zhang, David Wan, Mohit Bansal
Though extractive summarization is less prone to the common unfaithfulness issues of abstractive summaries, does that mean extractive is equal to faithful?
1 code implementation • 25 Jul 2022 • Yonatan Bitton, Nitzan Bitton Guetta, Ron Yosef, Yuval Elovici, Mohit Bansal, Gabriel Stanovsky, Roy Schwartz
While vision-and-language models perform well on tasks such as visual question answering, they struggle when it comes to basic human commonsense reasoning skills.
Ranked #1 on
Common Sense Reasoning
on WinoGAViL
1 code implementation • NAACL 2022 • Hyounghun Kim, Abhay Zala, Mohit Bansal
Next, a counterfactual imagined scene change (in textual form) is applied, and the model has to predict the new response to the initial question based on this scene change.
1 code implementation • NAACL (ACL) 2022 • Yinuo Hu, Shiyue Zhang, Viji Sathy, A. T. Panter, Mohit Bansal
Ten university professors from diverse departments serve as evaluators of the system and all agree that SETSum helps them interpret SET results more efficiently; and 6 out of 10 instructors prefer our system over the standard static PDF report (while the remaining 4 would like to have both).
1 code implementation • Findings (NAACL) 2022 • Jialu Li, Hao Tan, Mohit Bansal
Empirically, on the Room-Across-Room dataset, we show that our multilingual agent gets large improvements in all metrics over the strong baseline model when generalizing to unseen environments with the cross-lingual language representation and the environment-agnostic visual representation.
1 code implementation • NAACL 2022 • Xiang Zhou, Shiyue Zhang, Mohit Bansal
MPoSM can model arbitrary tag dependency and perform POS induction through the objective of masked POS reconstruction.
1 code implementation • 22 Jun 2022 • Zhuofan Ying, Peter Hase, Mohit Bansal
In this paper, we show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason (RRR) metrics by optimizing for four key model objectives: (1) accurate predictions given limited but sufficient information (Sufficiency); (2) max-entropy predictions given no important information (Uncertainty); (3) invariance of predictions to changes in unimportant features (Invariance); and (4) alignment between model FI explanations and human FI explanations (Plausibility).
no code implementations • 15 Jun 2022 • Sha Li, Mahdi Namazifar, Di Jin, Mohit Bansal, Heng Ji, Yang Liu, Dilek Hakkani-Tur
Providing conversation models with background knowledge has been shown to make open-domain dialogues more informative and engaging.
2 code implementations • 13 Jun 2022 • Yi-Lin Sung, Jaemin Cho, Mohit Bansal
LST saves 69% of the memory costs to fine-tune the whole network, while other methods only save 26% of that in similar parameter usages (hence, 2. 7x more memory savings).
5 code implementations • 9 Jun 2022 • Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu
BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.
2 code implementations • 7 Jun 2022 • Jie Lei, Tamara L. Berg, Mohit Bansal
Training an effective video-and-language model intuitively requires multiple frames as model inputs.
Ranked #5 on
Video Retrieval
on SSv2-template retrieval
(using extra training data)
1 code implementation • Findings (NAACL) 2022 • Jaemin Cho, Seunghyun Yoon, Ajinkya Kale, Franck Dernoncourt, Trung Bui, Mohit Bansal
Toward more descriptive and distinctive caption generation, we propose using CLIP, a multimodal encoder trained on huge image-text pairs from web, to calculate multimodal similarity and use it as a reward function.
Ranked #26 on
Image Captioning
on COCO Captions
1 code implementation • 22 May 2022 • Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, ZiYi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji
The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples, such as domain-specific captioning, question answering, and future event prediction.
no code implementations • insights (ACL) 2022 • Hyounghun Kim, Aishwarya Padmakumar, Di Jin, Mohit Bansal, Dilek Hakkani-Tur
Natural language guided embodied task completion is a challenging problem since it requires understanding natural language instructions, aligning them with egocentric visual observations, and choosing appropriate actions to execute in the environment to produce desired changes.
1 code implementation • NAACL 2022 • David Wan, Mohit Bansal
We present FactPEGASUS, an abstractive summarization model that addresses the problem of factuality during pre-training and fine-tuning: (1) We augment the sentence selection strategy of PEGASUS's (Zhang et al., 2020) pre-training objective to create pseudo-summaries that are both important and factual; (2) We introduce three complementary components for fine-tuning.
2 code implementations • 11 May 2022 • Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, Colin Raffel
ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
Ranked #1 on
Few-Shot Text Classification
on RAFT
1 code implementation • Findings (NAACL) 2022 • Arthur Bražinskas, Ramesh Nallapati, Mohit Bansal, Markus Dreyer
In the same vein, we pre-train the adapters in a query-based manner on customer reviews and then fine-tune them on annotated datasets.
no code implementations • AMTA 2022 • Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzman
Since a skewed data distribution is considered to be harmful, a sampling strategy is usually used to balance languages in the corpus.
1 code implementation • ACL 2022 • Shiyue Zhang, Ben Frey, Mohit Bansal
We hope that our work serves not only to inform the NLP community about Cherokee, but also to provide inspiration for future work on endangered languages in general.
3 code implementations • NAACL 2022 • Leonardo F. R. Ribeiro, Mengwen Liu, Iryna Gurevych, Markus Dreyer, Mohit Bansal
Despite recent improvements in abstractive summarization, most current approaches generate summaries that are not factually consistent with the source document, severely restricting their trust and usage in real-world applications.
1 code implementation • ACL 2022 • Swarnadeep Saha, Prateek Yadav, Mohit Bansal
In this work, we study pre-trained language models that generate explanation graphs in an end-to-end manner and analyze their ability to learn the structural constraints and semantics of such graphs.
1 code implementation • 6 Apr 2022 • Yan-Bo Lin, Jie Lei, Mohit Bansal, Gedas Bertasius
We introduce an audiovisual method for long-range text-to-video retrieval.
1 code implementation • CVPR 2022 • Jialu Li, Hao Tan, Mohit Bansal
Training on these edit-augmented environments prevents the agent from overfitting to existing environments and helps generalize better to new, unseen environments.
Ranked #2 on
Vision and Language Navigation
on RxR
(using extra training data)
2 code implementations • 14 Mar 2022 • Archiki Prasad, Peter Hase, Xiang Zhou, Mohit Bansal
Providing natural language instructions in prompts is a useful new paradigm for improving task performance of large language models in a zero-shot setting.
no code implementations • 10 Mar 2022 • Jie Lei, Xinlei Chen, Ning Zhang, Mengjiao Wang, Mohit Bansal, Tamara L. Berg, Licheng Yu
In this work, we propose LoopITR, which combines them in the same network for joint learning.
1 code implementation • 24 Feb 2022 • Hyounghun Kim, Doo Soon Kim, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Mohit Bansal
To our knowledge, this is the first dataset that provides conversational image search and editing annotations, where the agent holds a grounded conversation with users and helps them to search and edit images according to their requests.
2 code implementations • ICCV 2023 • Jaemin Cho, Abhay Zala, Mohit Bansal
In this work, we investigate the visual reasoning capabilities and social biases of different text-to-image models, covering both multimodal transformer language models and diffusion models.
2 code implementations • 20 Dec 2021 • Revanth Gangi Reddy, Xilin Rui, Manling Li, Xudong Lin, Haoyang Wen, Jaemin Cho, Lifu Huang, Mohit Bansal, Avirup Sil, Shih-Fu Chang, Alexander Schwing, Heng Ji
Specifically, the task involves multi-hop questions that require reasoning over image-caption pairs to identify the grounded visual object being referred to and then predicting a span from the news body text to answer the question.
2 code implementations • NAACL 2022 • Ori Ernst, Avi Caciularu, Ori Shapira, Ramakanth Pasunuru, Mohit Bansal, Jacob Goldberger, Ido Dagan
Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.
1 code implementation • Findings (EMNLP) 2021 • Yichen Jiang, Mohit Bansal
On examples with a maximum source and target length of 30 from De-En, WMT'16 English-Romanian, and WMT'21 English-Chinese translation tasks, our learned order outperforms all heuristic generation orders on four out of six tasks.
no code implementations • 16 Dec 2021 • Lisa Bauer, Karthik Gopalakrishnan, Spandana Gella, Yang Liu, Mohit Bansal, Dilek Hakkani-Tur
We define three broad classes of task descriptions for these tasks: statement, question, and completion, with numerous lexical variants within each class.
1 code implementation • CVPR 2022 • Yi-Lin Sung, Jaemin Cho, Mohit Bansal
Our results demonstrate that training the adapter with the weight-sharing technique (4. 18% of total parameters for image-text tasks and 3. 39% for video-text tasks) can match the performance of fine-tuning the entire model.
1 code implementation • 8 Dec 2021 • Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, Lijuan Wang
Based on this, we ask an even bolder question: can we have an all-MLP architecture for VL modeling, where both VL fusion and the vision encoder are replaced with MLPs?
1 code implementation • NeurIPS 2021 • Jie Lei, Tamara Berg, Mohit Bansal
Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w. r. t.
Ranked #6 on
Video Grounding
on QVHighlights
1 code implementation • 26 Nov 2021 • Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer
In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks.
1 code implementation • 1 Nov 2021 • Prateek Yadav, Peter Hase, Mohit Bansal
Current approaches try to optimize for the cost incurred by users when adopting a recourse, but they assume that all users share the same cost function.
1 code implementation • 21 Oct 2021 • Adyasha Maharana, Mohit Bansal
Prior work in this domain has shown that there is ample room for improvement in the generated image sequence in terms of visual quality, consistency and relevance.
1 code implementation • 30 Sep 2021 • Yichen Jiang, Mohit Bansal
Motivated by the failure of a Transformer model on the SCAN compositionality challenge (Lake and Baroni, 2018), which requires parsing a command into actions, we propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics, as additional training supervision.
1 code implementation • EMNLP (ACL) 2021 • Eran Hirsch, Alon Eirew, Ori Shapira, Avi Caciularu, Arie Cattan, Ori Ernst, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Ido Dagan
We introduce iFacetSum, a web application for exploring topical document sets.
1 code implementation • EMNLP 2021 • Shiyue Zhang, Mohit Bansal
In this work, we propose flexible semiautomatic to automatic summary evaluation metrics, following the Pyramid human evaluation method.
1 code implementation • ACL 2021 • Zineng Tang, Shiyue Zhang, Hyounghun Kim, Mohit Bansal
Recent years have witnessed various types of generative models for natural language generation (NLG), especially RNNs or transformer based sequence-to-sequence models, as well as variational autoencoder (VAE) and generative adversarial network (GAN) based models.
no code implementations • ACL 2021 • Yi Fung, Christopher Thomas, Revanth Gangi Reddy, Sandeep Polisetty, Heng Ji, Shih-Fu Chang, Kathleen McKeown, Mohit Bansal, Avi Sil
To defend against machine-generated fake news, an effective mechanism is urgently needed.
1 code implementation • ACL 2021 • Jie Lei, Tamara L. Berg, Mohit Bansal
We introduce mTVR, a large-scale multilingual video moment retrieval dataset, containing 218K English and Chinese queries from 21. 8K TV show video clips.
1 code implementation • ACL 2021 • Shiyue Zhang, Asli Celikyilmaz, Jianfeng Gao, Mohit Bansal
Furthermore, we find that widely used automatic evaluation metrics (ROUGE, BERTScore) are weakly correlated with human judgments on this email thread summarization task.
Ranked #1 on
Email Thread Summarization
on EmailSum (short)
2 code implementations • ACL 2021 • Shiyue Zhang, Benjamin Frey, Mohit Bansal
The quantitative evaluation demonstrates that our backbone translation models achieve state-of-the-art translation performance and our quality estimation well correlates with both BLEU and human judgment.
4 code implementations • 20 Jul 2021 • Jie Lei, Tamara L. Berg, Mohit Bansal
Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w. r. t.
Ranked #18 on
Highlight Detection
on QVHighlights
4 code implementations • 13 Jul 2021 • Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, Kurt Keutzer
Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using a relatively small set of manually-annotated data (as compared to web-crawled data), to perceive the visual world.
Ranked #4 on
Vision and Language Navigation
on RxR
(using extra training data)
1 code implementation • NeurIPS 2021 • Zineng Tang, Jaemin Cho, Hao Tan, Mohit Bansal
We train a multi-modal teacher model on a video-text dataset, and then transfer its knowledge to a student language model with a text dataset.
1 code implementation • 21 Jun 2021 • Hao Tan, Jie Lei, Thomas Wolf, Mohit Bansal
Unlike language, where the text tokens are more independent, neighboring video tokens typically have strong correlations (e. g., consecutive video frames usually look very similar), and hence uniformly masking individual tokens will make the task too trivial to learn useful representations.
Ranked #11 on
Action Recognition
on Diving-48
no code implementations • Joint Conference on Lexical and Computational Semantics 2021 • Duccio Pappadopulo, Lisa Bauer, Marco Farina, Ozan İrsoy, Mohit Bansal
In this paper, we apply DAG-LSTMs to the conversation disentanglement task.
no code implementations • 14 Jun 2021 • Jiaao Chen, Derek Tam, Colin Raffel, Mohit Bansal, Diyi Yang
NLP has achieved great progress in the past decade through the use of neural models and large labeled datasets.
1 code implementation • 8 Jun 2021 • Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara Lee Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu
Most existing video-and-language (VidL) research focuses on a single dataset, or multiple datasets of a single task.
1 code implementation • NAACL 2021 • Swarnadeep Saha, Prateek Yadav, Mohit Bansal
In order to jointly learn from all proof graphs and exploit the correlations between multiple proofs for a question, we pose this task as a set generation problem over structured output spaces where each proof is represented as a directed graph.
1 code implementation • NAACL 2021 • Yichen Jiang, Asli Celikyilmaz, Paul Smolensky, Paul Soulos, Sudha Rao, Hamid Palangi, Roland Fernandez, Caitlin Smith, Mohit Bansal, Jianfeng Gao
On several syntactic and semantic probing tasks, we demonstrate the emergent structural information in the role vectors and improved syntactic interpretability in the TPR layer outputs.
1 code implementation • NeurIPS 2021 • Peter Hase, Harry Xie, Mohit Bansal
In this paper, we study several under-explored dimensions of FI explanations, providing conceptual and empirical improvements for this form of explanation.
1 code implementation • NAACL 2021 • Ramakanth Pasunuru, Mengwen Liu, Mohit Bansal, Sujith Ravi, Markus Dreyer
We also show improvements in a transfer-only setup on the DUC-2004 dataset.
1 code implementation • NAACL 2021 • Zineng Tang, Jie Lei, Mohit Bansal
Second, to alleviate the temporal misalignment issue, our method incorporates an entropy minimization-based constrained attention loss, to encourage the model to automatically focus on the correct caption from a pool of candidate ASR captions.