Search Results for author: Kristina Toutanova

Found 46 papers, 18 papers with code

Understanding the World's Museums through Vision-Language Reasoning

no code implementations2 Dec 2024 Ada-Astrid Balauca, Sanjana Garai, Stefan Balauca, Rasesh Udayakumar Shetty, Naitik Agrawal, Dhwanil Subhashbhai Shah, Yuqian Fu, Xi Wang, Kristina Toutanova, Danda Pani Paudel, Luc van Gool

In this work, we facilitate such reasoning by (a) collecting and curating a large-scale dataset of 65M images and 200M question-answer pairs in the standard museum catalog format for exhibits from all around the world; (b) training large vision-language models on the collected dataset; (c) benchmarking their ability on five visual question answering tasks.

Benchmarking Question Answering +1

ALTA: Compiler-Based Analysis of Transformers

1 code implementation23 Oct 2024 Peter Shaw, James Cohan, Jacob Eisenstein, Kenton Lee, Jonathan Berant, Kristina Toutanova

We propose a new programming language called ALTA and a compiler that can map ALTA programs to Transformer weights.

Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits

1 code implementation3 Sep 2024 Ada-Astrid Balauca, Danda Pani Paudel, Kristina Toutanova, Luc van Gool

In this work, we aim to adapt CLIP for fine-grained and structured -- in the form of tabular data -- visual understanding of museum exhibits.

Attribute

Mitigating Catastrophic Forgetting in Language Transfer via Model Merging

no code implementations11 Jul 2024 Anton Alexandrov, Veselin Raychev, Mark Niklas Müller, Ce Zhang, Martin Vechev, Kristina Toutanova

As open-weight large language models (LLMs) achieve ever more impressive performances across a wide range of tasks in English, practitioners aim to adapt these models to different languages.

Efficient End-to-End Visual Document Understanding with Rationale Distillation

no code implementations16 Nov 2023 Wang Zhu, Alekh Agarwal, Mandar Joshi, Robin Jia, Jesse Thomason, Kristina Toutanova

Pre-processing tools, such as optical character recognition (OCR), can map document image inputs to textual tokens, then large language models (LLMs) can reason over text.

document understanding Image to text +2

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces

1 code implementation NeurIPS 2023 Peter Shaw, Mandar Joshi, James Cohan, Jonathan Berant, Panupong Pasupat, Hexiang Hu, Urvashi Khandelwal, Kenton Lee, Kristina Toutanova

Much of the previous work towards digital agents for graphical user interfaces (GUIs) has relied on text-based representations (derived from HTML or other structured data sources), which are not always readily available.

Instruction Following

Anchor Prediction: Automatic Refinement of Internet Links

1 code implementation23 May 2023 Nelson F. Liu, Kenton Lee, Kristina Toutanova

Internet links enable users to deepen their understanding of a topic by providing convenient access to related information.

Implicit Relations

QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations

1 code implementation19 May 2023 Chaitanya Malaviya, Peter Shaw, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

To study the ability of retrieval systems to meet such information needs, we construct QUEST, a dataset of 3357 natural language queries with implicit set operations, that map to a set of entities corresponding to Wikipedia documents.

Natural Language Queries Negation +1

Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer

no code implementations30 Jun 2021 Iulia Turc, Kenton Lee, Jacob Eisenstein, Ming-Wei Chang, Kristina Toutanova

Zero-shot cross-lingual transfer is emerging as a practical solution: pre-trained models later fine-tuned on one transfer language exhibit surprising performance when tested on many target languages.

Question Answering Zero-Shot Cross-Lingual Transfer

Joint Passage Ranking for Diverse Multi-Answer Retrieval

no code implementations EMNLP 2021 Sewon Min, Kenton Lee, Ming-Wei Chang, Kristina Toutanova, Hannaneh Hajishirzi

We study multi-answer retrieval, an under-explored problem that requires retrieving passages to cover multiple distinct answers for a given question.

Answer Generation Diversity +5

Representations for Question Answering from Documents with Tables and Text

no code implementations EACL 2021 Vicky Zayats, Kristina Toutanova, Mari Ostendorf

Tables in Web documents are pervasive and can be directly used to answer many of the queries searched on the Web, motivating their integration in question answering.

Natural Questions Question Answering

Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both?

1 code implementation ACL 2021 Peter Shaw, Ming-Wei Chang, Panupong Pasupat, Kristina Toutanova

This has motivated new specialized architectures with stronger compositional biases, but most of these approaches have only been evaluated on synthetically-generated datasets, which are not representative of natural language variation.

Semantic Parsing

Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering

1 code implementation ACL 2020 Hao Cheng, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

We address the problem of extractive question answering using document-level distant super-vision, pairing questions and relevant documents with answer strings.

Extractive Question-Answering Question Answering +1

Sparse, Dense, and Attentional Representations for Text Retrieval

1 code implementation1 May 2020 Yi Luan, Jacob Eisenstein, Kristina Toutanova, Michael Collins

Dual encoders perform retrieval by encoding documents and queries into dense lowdimensional vectors, scoring each document by its inner product with the query.

Open-Domain Question Answering Text Retrieval

Contextualized Representations Using Textual Encyclopedic Knowledge

no code implementations24 Apr 2020 Mandar Joshi, Kenton Lee, Yi Luan, Kristina Toutanova

We present a method to represent input texts by contextualizing them jointly with dynamically retrieved textual encyclopedic background knowledge from multiple documents.

Language Modelling Reading Comprehension +1

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

40 code implementations ICLR 2020 Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training.

Knowledge Distillation Language Modelling +2

Zero-Shot Entity Linking by Reading Entity Descriptions

3 code implementations ACL 2019 Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, Honglak Lee

First, we show that strong reading comprehension models pre-trained on large unlabeled data can be used to generalize to unseen entities.

Entity Linking Reading Comprehension

Latent Retrieval for Weakly Supervised Open Domain Question Answering

3 code implementations ACL 2019 Kenton Lee, Ming-Wei Chang, Kristina Toutanova

We show for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system.

Information Retrieval Open-Domain Question Answering +1

Language Model Pre-training for Hierarchical Document Representations

no code implementations ICLR 2019 Ming-Wei Chang, Kristina Toutanova, Kenton Lee, Jacob Devlin

Hierarchical neural architectures are often used to capture long-distance dependencies and have been applied to many document-level tasks such as summarization, document segmentation, and sentiment analysis.

Document Summarization Extractive Document Summarization +6

Improving Span-based Question Answering Systems with Coarsely Labeled Data

no code implementations5 Nov 2018 Hao Cheng, Ming-Wei Chang, Kenton Lee, Ankur Parikh, Michael Collins, Kristina Toutanova

We study approaches to improve fine-grained short answer Question Answering models by integrating coarse-grained data annotated for paragraph-level relevance and show that coarsely annotated data can bring significant performance gains.

Multi-Task Learning Question Answering

A Nested Attention Neural Hybrid Model for Grammatical Error Correction

no code implementations ACL 2017 Jianshu Ji, Qinlong Wang, Kristina Toutanova, Yongen Gong, Steven Truong, Jianfeng Gao

Grammatical error correction (GEC) systems strive to correct both global errors in word order and usage, and local errors in spelling and inflection.

Grammatical Error Correction Machine Translation +1

NLP for Precision Medicine

no code implementations ACL 2017 Hoifung Poon, Chris Quirk, Kristina Toutanova, Wen-tau Yih

We will introduce precision medicine and showcase the vast opportunities for NLP in this burgeoning field with great societal impact.

Decision Making Entity Linking +2

E-TIPSY: Search Query Corpus Annotated with Entities, Term Importance, POS Tags, and Syntactic Parses

no code implementations LREC 2016 Yuval Marton, Kristina Toutanova

In addition, it contains automatically produced annotations of named entities, part-of-speech tags, and syntactic parses for the same queries.

POS

Cannot find the paper you are looking for? You can Submit a new open access paper.