no code implementations • 12 Mar 2024 • Shikhar Murty, Christopher Manning, Peter Shaw, Mandar Joshi, Kenton Lee
Unfortunately, LM agents often fail to generalize to new environments without human demonstrations.
no code implementations • CVPR 2024 • Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut
We explore the boundaries of scaling up a multilingual vision and language model both in terms of size of the components and the breadth of its training task mixture.
no code implementations • 16 Nov 2023 • Wang Zhu, Alekh Agarwal, Mandar Joshi, Robin Jia, Jesse Thomason, Kristina Toutanova
Pre-processing tools, such as optical character recognition (OCR), can map document image inputs to textual tokens, then large language models (LLMs) can reason over text.
1 code implementation • NeurIPS 2023 • Peter Shaw, Mandar Joshi, James Cohan, Jonathan Berant, Panupong Pasupat, Hexiang Hu, Urvashi Khandelwal, Kenton Lee, Kristina Toutanova
Much of the previous work towards digital agents for graphical user interfaces (GUIs) has relied on text-based representations (derived from HTML or other structured data sources), which are not always readily available.
2 code implementations • 29 May 2023 • Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut
We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture.
Ranked #1 on
Fine-Grained Image Recognition
on OVEN
2 code implementations • ICCV 2023 • Hexiang Hu, Yi Luan, Yang Chen, Urvashi Khandelwal, Mandar Joshi, Kenton Lee, Kristina Toutanova, Ming-Wei Chang
Large-scale multi-modal pre-training models such as CLIP and PaLI exhibit strong generalization on various visual domains and tasks.
Ranked #2 on
Fine-Grained Image Recognition
on OVEN
1 code implementation • 20 Dec 2022 • Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun
Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24. 0% improvement over finetuned SOTA on human-written queries from the task of chart QA.
Chart Question Answering
Factual Inconsistency Detection in Chart Captioning
+3
1 code implementation • 19 Dec 2022 • Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos
Visual language data such as plots, charts, and infographics are ubiquitous in the human world.
Ranked #2 on
Visual Question Answering
on PlotQA-D2
4 code implementations • 7 Oct 2022 • Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova
Visually-situated language is ubiquitous -- sources range from textbooks with diagrams to web pages with images and tables, to mobile apps with buttons and forms.
Ranked #18 on
Visual Question Answering (VQA)
on InfographicVQA
no code implementations • 9 May 2022 • Mandar Joshi, Terra Blevins, Mike Lewis, Daniel S. Weld, Luke Zettlemoyer
Creating labeled natural language training data is expensive and requires significant human effort.
1 code implementation • 15 Apr 2022 • Devendra Singh Sachan, Mike Lewis, Mandar Joshi, Armen Aghajanyan, Wen-tau Yih, Joelle Pineau, Luke Zettlemoyer
We propose a simple and effective re-ranking method for improving passage retrieval in open question answering.
no code implementations • 19 Jan 2022 • Armen Aghajanyan, Bernie Huang, Candace Ross, Vladimir Karpukhin, Hu Xu, Naman Goyal, Dmytro Okhonko, Mandar Joshi, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer
We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens.
1 code implementation • ACL 2021 • Weijia Shi, Mandar Joshi, Luke Zettlemoyer
Short textual descriptions of entities provide summaries of their key attributes and have been shown to be useful sources of background knowledge for tasks such as entity linking and question answering.
no code implementations • ICLR 2022 • Armen Aghajanyan, Dmytro Okhonko, Mike Lewis, Mandar Joshi, Hu Xu, Gargi Ghosh, Luke Zettlemoyer
We introduce HTLM, a hyper-text language model trained on a large-scale web crawl.
Ranked #1 on
Table-to-Text Generation
on DART
1 code implementation • 9 Jun 2021 • Weijia Shi, Mandar Joshi, Luke Zettlemoyer
Short textual descriptions of entities provide summaries of their key attributes and have been shown to be useful sources of background knowledge for tasks such as entity linking and question answering.
1 code implementation • Joint Conference on Lexical and Computational Semantics 2021 • Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan
We point out that common evaluation practices for cross-document coreference resolution have been unrealistically permissive in their assumed settings, yielding inflated results.
coreference-resolution
Cross Document Coreference Resolution
1 code implementation • Findings (ACL) 2021 • Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan
Here, we introduce the first end-to-end model for CD coreference resolution from raw text, which extends the prominent model for within-document coreference to the CD setting.
coreference-resolution
Cross Document Coreference Resolution
no code implementations • EACL 2021 • Terra Blevins, Mandar Joshi, Luke Zettlemoyer
Current models for Word Sense Disambiguation (WSD) struggle to disambiguate rare senses, despite reaching human performance on global WSD metrics.
2 code implementations • 23 Sep 2020 • Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan
Recent evaluation protocols for Cross-document (CD) coreference resolution have often been inconsistent or lenient, leading to incomparable results across works and overestimation of performance.
coreference-resolution
Cross Document Coreference Resolution
+2
2 code implementations • EMNLP 2020 • Bhargavi Paranjape, Mandar Joshi, John Thickstun, Hannaneh Hajishirzi, Luke Zettlemoyer
Decisions of complex language understanding models can be rationalized by limiting their inputs to a relevant subsequence of the original text.
no code implementations • 24 Apr 2020 • Mandar Joshi, Kenton Lee, Yi Luan, Kristina Toutanova
We present a method to represent input texts by contextualizing them jointly with dynamically retrieved textual encyclopedic background knowledge from multiple documents.
2 code implementations • IJCNLP 2019 • Mandar Joshi, Omer Levy, Daniel S. Weld, Luke Zettlemoyer
We apply BERT to coreference resolution, achieving strong improvements on the OntoNotes (+3. 9 F1) and GAP (+11. 5 F1) benchmarks.
Ranked #11 on
Coreference Resolution
on CoNLL 2012
(using extra training data)
67 code implementations • 26 Jul 2019 • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.
Ranked #1 on
Only Connect Walls Dataset Task 1 (Grouping)
on OCW
(Wasserstein Distance (WD) metric, using extra
training data)
6 code implementations • TACL 2020 • Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer Levy
We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text.
Ranked #1 on
Question Answering
on TriviaQA
(F1 metric)
3 code implementations • NAACL 2019 • Mandar Joshi, Eunsol Choi, Omer Levy, Daniel S. Weld, Luke Zettlemoyer
Reasoning about implied relationships (e. g., paraphrastic, common sense, encyclopedic) between pairs of words is crucial for many cross-sentence inference problems.
3 code implementations • ACL 2017 • Mandar Joshi, Eunsol Choi, Daniel S. Weld, Luke Zettlemoyer
We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples.