Search Results for author: Doug Downey

Found 62 papers, 35 papers with code

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature

no code implementations10 Jun 2024 David Wadden, Kejian Shi, Jacob Morrison, Aakanksha Naik, Shruti Singh, Nitzan Barzilay, Kyle Lo, Tom Hope, Luca Soldaini, Shannon Zejiang Shen, Doug Downey, Hannaneh Hajishirzi, Arman Cohan

We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks covering five essential scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification.

Claim Verification Instruction Following +3

TOPICAL: TOPIC Pages AutomagicaLly

1 code implementation3 May 2024 John Giorgi, Amanpreet Singh, Doug Downey, Sergey Feldman, Lucy Lu Wang

Topic pages aggregate useful information about an entity or concept into a single succinct and accessible article.

Retrieval

MARG: Multi-Agent Review Generation for Scientific Papers

1 code implementation8 Jan 2024 Mike D'Arcy, Tom Hope, Larry Birnbaum, Doug Downey

We study the ability of LLMs to generate feedback for scientific papers and develop MARG, a feedback generation approach using multiple LLM instances that engage in internal discussion.

Review Generation Specificity

CARE: Extracting Experimental Findings From Clinical Literature

no code implementations16 Nov 2023 Aakanksha Naik, Bailey Kuehl, Erin Bransom, Doug Downey, Tom Hope

Focusing on biomedicine, this work presents CARE -- a new IE dataset for the task of extracting clinical findings.

Relation Extraction

ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews

1 code implementation21 Jun 2023 Mike D'Arcy, Alexis Ross, Erin Bransom, Bailey Kuehl, Jonathan Bragg, Tom Hope, Doug Downey

Revising scientific papers based on peer feedback is a challenging task that requires not only deep scientific knowledge and reasoning, but also the ability to recognize the implicit requests in high-level feedback and to choose the best of many possible ways to update the manuscript in response.

Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents

1 code implementation1 Jun 2023 Catherine Chen, Zejiang Shen, Dan Klein, Gabriel Stanovsky, Doug Downey, Kyle Lo

Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers.

SciMON: Scientific Inspiration Machines Optimized for Novelty

1 code implementation23 May 2023 Qingyun Wang, Doug Downey, Heng Ji, Tom Hope

We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature.

Contextualized Literature-based Discovery Link Prediction +1

S2abEL: A Dataset for Entity Linking from Scientific Tables

1 code implementation30 Apr 2023 Yuze Lou, Bailey Kuehl, Erin Bransom, Sergey Feldman, Aakanksha Naik, Doug Downey

Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications.

Entity Linking Question Answering

Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks

no code implementations5 Apr 2023 Zejiang Shen, Tal August, Pao Siangliulue, Kyle Lo, Jonathan Bragg, Jeff Hammerbacher, Doug Downey, Joseph Chee Chang, David Sontag

In this position paper, we argue that developing AI supports for expository writing has unique and exciting research challenges and can lead to high real-world impacts.

Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections

no code implementations13 Feb 2023 Srishti Palani, Aakanksha Naik, Doug Downey, Amy X. Zhang, Jonathan Bragg, Joseph Chee Chang

Scholars who want to research a scientific topic must take time to read, extract meaning, and identify connections across many papers.

Descriptive Navigate +1

I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation

no code implementations19 Dec 2022 Chandra Bhagavatula, Jena D. Hwang, Doug Downey, Ronan Le Bras, Ximing Lu, Lianhui Qin, Keisuke Sakaguchi, Swabha Swayamdipta, Peter West, Yejin Choi

Here, we investigate an alternative that a priori seems impossible: can smaller language models (e. g., GPT-2) win over models that are orders of magnitude larger and better (e. g., GPT-3), if powered with novel commonsense distillation algorithms?

Imitation Learning Knowledge Distillation

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

2 code implementations23 Nov 2022 Amanpreet Singh, Mike D'Arcy, Arman Cohan, Doug Downey, Sergey Feldman

In response, we introduce SciRepEval, the first comprehensive benchmark for training and evaluating scientific document representations.

Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models

1 code implementation23 Oct 2022 Victor S. Bursztyn, David Demeter, Doug Downey, Larry Birnbaum

In this work, we present compositional fine-tuning (CFT): an approach based on explicitly decomposing a target task into component tasks, and then fine-tuning smaller LMs on a curriculum of such component tasks.

Sports Understanding

FeedLens: Polymorphic Lenses for Personalizing Exploratory Search over Knowledge Graphs

no code implementations16 Aug 2022 Harmanpreet Kaur, Doug Downey, Amanpreet Singh, Evie Yu-Yen Cheng, Daniel S. Weld, Jonathan Bragg

We implement our technique in a novel system, FeedLens, which is built over Semantic Scholar, a production system for navigating the scientific literature KG.

Knowledge Graphs

Embedding Recycling for Language Models

1 code implementation11 Jul 2022 Jon Saad-Falcon, Amanpreet Singh, Luca Soldaini, Mike D'Arcy, Arman Cohan, Doug Downey

Real-world applications of neural language models often involve running many different models over the same corpus.

Question Answering Text Classification

Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities

1 code implementation22 Jun 2022 Zejiang Shen, Kyle Lo, Lauren Yu, Nathan Dahlberg, Margo Schlanger, Doug Downey

With the advent of large language models, methods for abstractive summarization have made great strides, creating potential for use in applications to aid knowledge workers processing unwieldy document collections.

Abstractive Text Summarization Document Summarization +2

Penguins Don't Fly: Reasoning about Generics through Instantiations and Exceptions

no code implementations23 May 2022 Emily Allaway, Jena D. Hwang, Chandra Bhagavatula, Kathleen McKeown, Doug Downey, Yejin Choi

Generics express generalizations about the world (e. g., birds can fly) that are not universally true (e. g., newborn birds and penguins cannot fly).

Natural Language Inference

CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction

1 code implementation16 May 2022 Tara Safavi, Doug Downey, Tom Hope

Knowledge graph (KG) link prediction is a fundamental task in artificial intelligence, with applications in natural language processing, information retrieval, and biomedicine.

Information Retrieval Knowledge Graph Embeddings +2

A Computational Inflection for Scientific Discovery

no code implementations4 May 2022 Tom Hope, Doug Downey, Oren Etzioni, Daniel S. Weld, Eric Horvitz

We stand at the foot of a significant inflection in the trajectory of scientific discovery.

Retrieval

From Who You Know to What You Read: Augmenting Scientific Recommendations with Implicit Social Networks

no code implementations21 Apr 2022 Hyeonsu B. Kang, Rafal Kocielnik, Andrew Head, Jiangjiang Yang, Matt Latzke, Aniket Kittur, Daniel S. Weld, Doug Downey, Jonathan Bragg

To improve the discovery experience we introduce multiple new methods for \em augmenting recommendations with textual relevance messages that highlight knowledge-graph connections between recommended papers and a user's publication and interaction history.

Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search

1 code implementation16 Mar 2022 Daniel King, Zejiang Shen, Nishant Subramani, Daniel S. Weld, Iz Beltagy, Doug Downey

Based on our findings, we present PINOCCHIO, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations.

Abstractive Text Summarization

Few-Shot Self-Rationalization with Natural Language Prompts

1 code implementation Findings (NAACL) 2022 Ana Marasović, Iz Beltagy, Doug Downey, Matthew E. Peters

We identify the right prompting approach by extensively exploring natural language prompts on FEB. Then, by using this prompt and scaling the model size, we demonstrate that making progress on few-shot self-rationalization is possible.

Limitations of Active Learning With Deep Transformer Language Models

no code implementations29 Sep 2021 Mike D'Arcy, Doug Downey

Active Learning (AL) has the potential to reduce labeling cost when training natural language processing models, but its effectiveness with the large pretrained transformer language models that power today's NLP is uncertain.

Active Learning

Exploring The Role of Local and Global Explanations in Recommender Systems

no code implementations27 Sep 2021 Marissa Radensky, Doug Downey, Kyle Lo, Zoran Popović, Daniel S. Weld

However, we note that the two explanation approaches may be better compared in the context of a higher-stakes or more opaque domain.

Recommendation Systems

VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups

1 code implementation1 Jun 2021 Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, Doug Downey

Experiments are conducted on a newly curated evaluation suite, S2-VLUE, that unifies existing automatically-labeled datasets and includes a new dataset of manual annotations covering diverse papers from 19 scientific disciplines.

Language Modelling Text Classification +2

ABNIRML: Analyzing the Behavior of Neural IR Models

2 code implementations2 Nov 2020 Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman Cohan

Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search.

Language Modelling Sentence

High-Precision Extraction of Emerging Concepts from Scientific Literature

1 code implementation11 Jun 2020 Daniel King, Doug Downey, Daniel S. Weld

From a corpus of computer science papers on arXiv, we find that our method achieves a Precision@1000 of 99%, compared to 86% for prior work, and a substantially better precision-yield trade-off across the top 15, 000 extractions.

Vocal Bursts Intensity Prediction

Stolen Probability: A Structural Weakness of Neural Language Models

1 code implementation ACL 2020 David Demeter, Gregory Kimmel, Doug Downey

Neural Network Language Models (NNLMs) generate probability distributions by applying a softmax function to a distance metric formed by taking the dot product of a prediction vector with all word vectors in a high-dimensional embedding space.

Inductive Bias

SPECTER: Document-level Representation Learning using Citation-informed Transformers

5 code implementations ACL 2020 Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, Daniel S. Weld

We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph.

Citation Prediction Document Classification +4

LIMEADE: From AI Explanations to Advice Taking

1 code implementation9 Mar 2020 Benjamin Charles Germain Lee, Doug Downey, Kyle Lo, Daniel S. Weld

We show our method improves accuracy compared to a rigorous baseline on the image classification domains.

BIG-bench Machine Learning Image Classification +1

Using Large Corpus N-gram Statistics to Improve Recurrent Neural Language Models

no code implementations NAACL 2019 Yiben Yang, Ji-Ping Wang, Doug Downey

Recurrent neural network language models (RNNLM) form a valuable foundation for many NLP systems, but training the models can be computationally expensive, and may take days to train on a large corpus.

CODAH: An Adversarially-Authored Question Answering Dataset for Common Sense

1 code implementation WS 2019 Michael Chen, Mike D{'}Arcy, Alisa Liu, Fern, Jared ez, Doug Downey

To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems.

Common Sense Reasoning Question Answering +2

CODAH: An Adversarially Authored Question-Answer Dataset for Common Sense

2 code implementations8 Apr 2019 Michael Chen, Mike D'Arcy, Alisa Liu, Jared Fernandez, Doug Downey

To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems.

 Ranked #1 on Common Sense Reasoning on CODAH (using extra training data)

Common Sense Reasoning Question Answering +2

Sampling Informative Training Data for RNN Language Models

no code implementations ACL 2018 Fern, Jared ez, Doug Downey

We propose an unsupervised importance sampling approach to selecting training data for recurrent neural network (RNNs) language models.

Language Modelling

Definition Modeling: Learning to define word embeddings in natural language

2 code implementations1 Dec 2016 Thanapon Noraset, Chen Liang, Larry Birnbaum, Doug Downey

Distributed representations of words have been shown to capture lexical semantics, as demonstrated by their effectiveness in word similarity and analogical relation tasks.

Word Embeddings Word Similarity

Cannot find the paper you are looking for? You can Submit a new open access paper.