Search Results for author: Doug Downey

Found 60 papers, 34 papers with code

Overcoming the Memory Bottleneck in Distributed Training of Latent Variable Models of Text

no code implementations • NAACL 2013 • Yi Yang, Alex Yates, er, Doug Downey

Paper
Add Code

Scaling Semi-supervised Naive Bayes with Feature Marginals

no code implementations • ACL 2013 • Michael Lucas, Doug Downey

Sentiment Analysis Text Classification

Paper
Add Code

Learning Representations for Weakly Supervised Natural Language Processing Tasks

no code implementations • CL 2014 • Fei Huang, Arun Ahuja, Doug Downey, Yi Yang, Yuhong Guo, Alex Yates, er

Paper
Add Code

Active Learning with Constrained Topic Model

no code implementations • WS 2014 • Yi Yang, SHimei Pan, Doug Downey, Kunpeng Zhang

Active Learning Topic Models

Paper
Add Code

Adding High-Precision Links to Wikipedia

no code implementations • EMNLP 2014 • Thanapon Noraset, Ch Bhagavatula, ra, Doug Downey

Entity Linking Knowledge Base Population +1

Paper
Add Code

Efficient Methods for Inferring Large Sparse Topic Hierarchies

no code implementations • IJCNLP 2015 • Doug Downey, Ch Bhagavatula, ra, Yi Yang

Topic Models

Paper
Add Code

Efficient Methods for Incorporating Knowledge into Topic Models

no code implementations • EMNLP 2015 • Yi Yang, Doug Downey, Jordan Boyd-Graber

Topic Models

Paper
Add Code

Definition Modeling: Learning to define word embeddings in natural language

2 code implementations • 1 Dec 2016 • Thanapon Noraset, Chen Liang, Larry Birnbaum, Doug Downey

Distributed representations of words have been shown to capture lexical semantics, as demonstrated by their effectiveness in word similarity and analogical relation tasks.

Word Embeddings Word Similarity

Paper
Code

VecShare: A Framework for Sharing Word Representation Vectors

no code implementations • EMNLP 2017 • Fern, Jared ez, Zhaocheng Yu, Doug Downey

Many Natural Language Processing (NLP) models rely on distributed vector representations of words.

Document Classification General Classification +1

Paper
Add Code

Construction of the Literature Graph in Semantic Scholar

no code implementations • NAACL 2018 • Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine van Zuylen, Oren Etzioni

We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery.

Entity Extraction using GAN graph construction

Paper
Add Code

Sampling Informative Training Data for RNN Language Models

no code implementations • ACL 2018 • Fern, Jared ez, Doug Downey

We propose an unsupervised importance sampling approach to selecting training data for recurrent neural network (RNNs) language models.

Language Modelling

Paper
Add Code

Extracting Commonsense Properties from Embeddings with Limited Human Guidance

1 code implementation • ACL 2018 • Yiben Yang, Larry Birnbaum, Ji-Ping Wang, Doug Downey

Intelligent systems require common sense, but automatically extracting this knowledge from text can be difficult.

Active Learning Common Sense Reasoning +1

Paper
Code

Estimating Marginal Probabilities of n-grams for Recurrent Neural Language Models

no code implementations • EMNLP 2018 • Thanapon Noraset, Doug Downey, Lidong Bing

Recurrent neural network language models (RNNLMs) are the current standard-bearer for statistical language modeling.

Language Modelling Machine Translation +1

Paper
Add Code

A new evaluation framework for topic modeling algorithms based on synthetic corpora

1 code implementation • 28 Jan 2019 • Hanyu Shi, Martin Gerlach, Isabel Diersen, Doug Downey, Luis A. N. Amaral

Topic models are in widespread use in natural language processing and beyond.

General Classification Topic Models

Paper
Code

CODAH: An Adversarially Authored Question-Answer Dataset for Common Sense

2 code implementations • 8 Apr 2019 • Michael Chen, Mike D'Arcy, Alisa Liu, Jared Fernandez, Doug Downey

To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems.

Ranked #1 on Common Sense Reasoning on CODAH (using extra training data)

Common Sense Reasoning Question Answering +2

Paper
Code

Using Large Corpus N-gram Statistics to Improve Recurrent Neural Language Models

no code implementations • NAACL 2019 • Yiben Yang, Ji-Ping Wang, Doug Downey

Recurrent neural network language models (RNNLM) form a valuable foundation for many NLP systems, but training the models can be computationally expensive, and may take days to train on a large corpus.

Paper
Add Code

A Semantic Cover Approach for Topic Modeling

no code implementations • SEMEVAL 2019 • Rajagopal Venkatesaramani, Doug Downey, Bradley Malin, Yevgeniy Vorobeychik

We introduce a novel topic modeling approach based on constructing a semantic set cover for clusters of similar documents.

Document Classification General Classification +2

Paper
Add Code

CODAH: An Adversarially-Authored Question Answering Dataset for Common Sense

1 code implementation • WS 2019 • Michael Chen, Mike D{'}Arcy, Alisa Liu, Fern, Jared ez, Doug Downey

Common Sense Reasoning Question Answering +2

Paper
Code

Abductive Commonsense Reasoning

2 code implementations • ICLR 2020 • Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, Yejin Choi

Abductive reasoning is inference to the most plausible explanation.

Multiple-choice Natural Language Inference +1

Paper
Code

Multi-sense Definition Modeling using Word Sense Decompositions

no code implementations • 19 Sep 2019 • Ruimin Zhu, Thanapon Noraset, Alisa Liu, Wenxin Jiang, Doug Downey

Word embeddings capture syntactic and semantic information about words.

Word Embeddings

Paper
Add Code

Just Add Functions: A Neural-Symbolic Language Model

no code implementations • 11 Dec 2019 • David Demeter, Doug Downey

How can we augment today's neural models with such encodings?

Inductive Bias Language Modelling

Paper
Add Code

LIMEADE: From AI Explanations to Advice Taking

1 code implementation • 9 Mar 2020 • Benjamin Charles Germain Lee, Doug Downey, Kyle Lo, Daniel S. Weld

We show our method improves accuracy compared to a rigorous baseline on the image classification domains.

BIG-bench Machine Learning Image Classification +1

Paper
Code

SPECTER: Document-level Representation Learning using Citation-informed Transformers

5 code implementations • ACL 2020 • Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, Daniel S. Weld

We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph.

Ranked #1 on Document Classification on SciDocs (MAG)

Citation Prediction Document Classification +4

493

Paper
Code

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

6 code implementations • ACL 2020 • Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith

Language models pretrained on text from a wide variety of sources form the foundation of today's NLP.

Citation Intent Classification

519

Paper
Code

Generative Data Augmentation for Commonsense Reasoning

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Yiben Yang, Chaitanya Malaviya, Jared Fernandez, Swabha Swayamdipta, Ronan Le Bras, Ji-Ping Wang, Chandra Bhagavatula, Yejin Choi, Doug Downey

Recent advances in commonsense reasoning depend on large-scale human-annotated training data to achieve peak performance.

Ranked #1 on Question Answering on CODAH

Common Sense Reasoning Coreference Resolution +4

Paper
Code

Stolen Probability: A Structural Weakness of Neural Language Models

1 code implementation • ACL 2020 • David Demeter, Gregory Kimmel, Doug Downey

Neural Network Language Models (NNLMs) generate probability distributions by applying a softmax function to a distance metric formed by taking the dot product of a prediction vector with all word vectors in a high-dimensional embedding space.

Inductive Bias

Paper
Code

High-Precision Extraction of Emerging Concepts from Scientific Literature

1 code implementation • 11 Jun 2020 • Daniel King, Doug Downey, Daniel S. Weld

From a corpus of computer science papers on arXiv, we find that our method achieves a Precision@1000 of 99%, compared to 86% for prior work, and a substantially better precision-yield trade-off across the top 15, 000 extractions.

Vocal Bursts Intensity Prediction

Paper
Code

ABNIRML: Analyzing the Behavior of Neural IR Models

2 code implementations • 2 Nov 2020 • Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman Cohan

Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search.

Language Modelling Sentence

296

Paper
Code

Simplified Data Wrangling with ir_datasets

1 code implementation • 3 Mar 2021 • Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, Nazli Goharian

Managing the data for Information Retrieval (IR) experiments can be challenging.

Information Retrieval Retrieval

296

Paper
Code

SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts

2 code implementations • AKBC 2021 • Arie Cattan, Sophie Johnson, Daniel Weld, Ido Dagan, Iz Beltagy, Doug Downey, Tom Hope

Determining coreference of concept mentions across multiple documents is a fundamental task in natural language understanding.

coreference-resolution Cross Document Coreference Resolution +1

Paper
Code

VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups

1 code implementation • 1 Jun 2021 • Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, Doug Downey

Experiments are conducted on a newly curated evaluation suite, S2-VLUE, that unifies existing automatically-labeled datasets and includes a new dataset of manual annotations covering diverse papers from 19 scientific disciplines.

Language Modelling Text Classification +2

153

Paper
Code

"It doesn't look good for a date": Transforming Critiques into Preferences for Conversational Recommendation Systems

1 code implementation • 15 Sep 2021 • Victor S. Bursztyn, Jennifer Healey, Nedim Lipka, Eunyee Koh, Doug Downey, Larry Birnbaum

Conversations aimed at determining good recommendations are iterative in nature.

Common Sense Reasoning Language Modelling +1

Paper
Code

Exploring The Role of Local and Global Explanations in Recommender Systems

no code implementations • 27 Sep 2021 • Marissa Radensky, Doug Downey, Kyle Lo, Zoran Popović, Daniel S. Weld

However, we note that the two explanation approaches may be better compared in the context of a higher-stakes or more opaque domain.

Recommendation Systems

Paper
Add Code

Limitations of Active Learning With Deep Transformer Language Models

no code implementations • 29 Sep 2021 • Mike D'Arcy, Doug Downey

Active Learning (AL) has the potential to reduce labeling cost when training natural language processing models, but its effectiveness with the large pretrained transformer language models that power today's NLP is uncertain.

Active Learning

Paper
Add Code

Few-Shot Self-Rationalization with Natural Language Prompts

1 code implementation • Findings (NAACL) 2022 • Ana Marasović, Iz Beltagy, Doug Downey, Matthew E. Peters

We identify the right prompting approach by extensively exploring natural language prompts on FEB. Then, by using this prompt and scaling the model size, we demonstrate that making progress on few-shot self-rationalization is possible.

Paper
Code

Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search

1 code implementation • 16 Mar 2022 • Daniel King, Zejiang Shen, Nishant Subramani, Daniel S. Weld, Iz Beltagy, Doug Downey

Based on our findings, we present PINOCCHIO, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations.

Abstractive Text Summarization

Paper
Code

From Who You Know to What You Read: Augmenting Scientific Recommendations with Implicit Social Networks

no code implementations • 21 Apr 2022 • Hyeonsu B. Kang, Rafal Kocielnik, Andrew Head, Jiangjiang Yang, Matt Latzke, Aniket Kittur, Daniel S. Weld, Doug Downey, Jonathan Bragg

To improve the discovery experience we introduce multiple new methods for \em augmenting recommendations with textual relevance messages that highlight knowledge-graph connections between recommended papers and a user's publication and interaction history.

Paper
Add Code

A Computational Inflection for Scientific Discovery

no code implementations • 4 May 2022 • Tom Hope, Doug Downey, Oren Etzioni, Daniel S. Weld, Eric Horvitz

We stand at the foot of a significant inflection in the trajectory of scientific discovery.

Retrieval

Paper
Add Code

ACCoRD: A Multi-Document Approach to Generating Diverse Descriptions of Scientific Concepts

1 code implementation • 14 May 2022 • Sonia K. Murthy, Kyle Lo, Daniel King, Chandra Bhagavatula, Bailey Kuehl, Sophie Johnson, Jonathan Borchardt, Daniel S. Weld, Tom Hope, Doug Downey

We present ACCoRD, an end-to-end system tackling the novel task of generating sets of descriptions of scientific concepts.

Paper
Code

CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction

1 code implementation • 16 May 2022 • Tara Safavi, Doug Downey, Tom Hope

Knowledge graph (KG) link prediction is a fundamental task in artificial intelligence, with applications in natural language processing, information retrieval, and biomedicine.

Information Retrieval Knowledge Graph Embeddings +2

Paper
Code

Penguins Don't Fly: Reasoning about Generics through Instantiations and Exceptions

no code implementations • 23 May 2022 • Emily Allaway, Jena D. Hwang, Chandra Bhagavatula, Kathleen McKeown, Doug Downey, Yejin Choi

Generics express generalizations about the world (e. g., birds can fly) that are not universally true (e. g., newborn birds and penguins cannot fly).

Natural Language Inference

Paper
Add Code

Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities

1 code implementation • 22 Jun 2022 • Zejiang Shen, Kyle Lo, Lauren Yu, Nathan Dahlberg, Margo Schlanger, Doug Downey

With the advent of large language models, methods for abstractive summarization have made great strides, creating potential for use in applications to aid knowledge workers processing unwieldy document collections.

Abstractive Text Summarization Document Summarization +2

Paper
Code

Embedding Recycling for Language Models

1 code implementation • 11 Jul 2022 • Jon Saad-Falcon, Amanpreet Singh, Luca Soldaini, Mike D'Arcy, Arman Cohan, Doug Downey

Real-world applications of neural language models often involve running many different models over the same corpus.

Question Answering Text Classification

Paper
Code

FeedLens: Polymorphic Lenses for Personalizing Exploratory Search over Knowledge Graphs

no code implementations • 16 Aug 2022 • Harmanpreet Kaur, Doug Downey, Amanpreet Singh, Evie Yu-Yen Cheng, Daniel S. Weld, Jonathan Bragg

We implement our technique in a novel system, FeedLens, which is built over Semantic Scholar, a production system for navigating the scientific literature KG.

Knowledge Graphs

Paper
Add Code

Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models

1 code implementation • 23 Oct 2022 • Victor S. Bursztyn, David Demeter, Doug Downey, Larry Birnbaum

In this work, we present compositional fine-tuning (CFT): an approach based on explicitly decomposing a target task into component tasks, and then fine-tuning smaller LMs on a curriculum of such component tasks.

Sports Understanding

Paper
Code

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

2 code implementations • 23 Nov 2022 • Amanpreet Singh, Mike D'Arcy, Arman Cohan, Doug Downey, Sergey Feldman

In response, we introduce SciRepEval, the first comprehensive benchmark for training and evaluating scientific document representations.

Paper
Code

I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation

no code implementations • 19 Dec 2022 • Chandra Bhagavatula, Jena D. Hwang, Doug Downey, Ronan Le Bras, Ximing Lu, Lianhui Qin, Keisuke Sakaguchi, Swabha Swayamdipta, Peter West, Yejin Choi

Here, we investigate an alternative that a priori seems impossible: can smaller language models (e. g., GPT-2) win over models that are orders of magnitude larger and better (e. g., GPT-3), if powered with novel commonsense distillation algorithms?

Imitation Learning Knowledge Distillation

Paper
Add Code

The Semantic Scholar Open Data Platform

1 code implementation • 24 Jan 2023 • Rodney Kinney, Chloe Anastasiades, Russell Authur, Iz Beltagy, Jonathan Bragg, Alexandra Buraczynski, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Arman Cohan, Miles Crawford, Doug Downey, Jason Dunkelberger, Oren Etzioni, Rob Evans, Sergey Feldman, Joseph Gorney, David Graham, Fangzhou Hu, Regan Huff, Daniel King, Sebastian Kohlmeier, Bailey Kuehl, Michael Langan, Daniel Lin, Haokun Liu, Kyle Lo, Jaron Lochner, Kelsey MacMillan, Tyler Murray, Chris Newell, Smita Rao, Shaurya Rohatgi, Paul Sayre, Zejiang Shen, Amanpreet Singh, Luca Soldaini, Shivashankar Subramanian, Amber Tanaka, Alex D. Wade, Linda Wagner, Lucy Lu Wang, Chris Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Madeleine van Zuylen, Daniel S. Weld

The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field.

graph construction

Paper
Code

Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections

no code implementations • 13 Feb 2023 • Srishti Palani, Aakanksha Naik, Doug Downey, Amy X. Zhang, Jonathan Bragg, Joseph Chee Chang

Scholars who want to research a scientific topic must take time to read, extract meaning, and identify connections across many papers.

Descriptive Navigate +1

Paper
Add Code

The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces

no code implementations • 25 Mar 2023 • Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Kinney, Aniket Kittur, Hyeonsu Kang, Egor Klevak, Bailey Kuehl, Michael Langan, Matt Latzke, Jaron Lochner, Kelsey MacMillan, Eric Marsh, Tyler Murray, Aakanksha Naik, Ngoc-Uyen Nguyen, Srishti Palani, Soya Park, Caroline Paulic, Napol Rachatasumrit, Smita Rao, Paul Sayre, Zejiang Shen, Pao Siangliulue, Luca Soldaini, Huy Tran, Madeleine van Zuylen, Lucy Lu Wang, Christopher Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Marti A. Hearst, Daniel S. Weld

Scholarly publications are key to the transfer of knowledge from scholars to others.

Paper
Add Code

Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks

no code implementations • 5 Apr 2023 • Zejiang Shen, Tal August, Pao Siangliulue, Kyle Lo, Jonathan Bragg, Jeff Hammerbacher, Doug Downey, Joseph Chee Chang, David Sontag

In this position paper, we argue that developing AI supports for expository writing has unique and exciting research challenges and can lead to high real-world impacts.

Paper
Add Code

S2abEL: A Dataset for Entity Linking from Scientific Tables

1 code implementation • 30 Apr 2023 • Yuze Lou, Bailey Kuehl, Erin Bransom, Sergey Feldman, Aakanksha Naik, Doug Downey

Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications.

Entity Linking Question Answering

Paper
Code

SciMON: Scientific Inspiration Machines Optimized for Novelty

1 code implementation • 23 May 2023 • Qingyun Wang, Doug Downey, Heng Ji, Tom Hope

We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature.

Contextualized Literature-based Discovery Link Prediction +1

Paper
Code

Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents

1 code implementation • 1 Jun 2023 • Catherine Chen, Zejiang Shen, Dan Klein, Gabriel Stanovsky, Doug Downey, Kyle Lo

Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers.

Paper
Code

ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews

1 code implementation • 21 Jun 2023 • Mike D'Arcy, Alexis Ross, Erin Bransom, Bailey Kuehl, Jonathan Bragg, Tom Hope, Doug Downey

Revising scientific papers based on peer feedback is a challenging task that requires not only deep scientific knowledge and reasoning, but also the ability to recognize the implicit requests in high-level feedback and to choose the best of many possible ways to update the manuscript in response.

Paper
Code

CARE: Extracting Experimental Findings From Clinical Literature

no code implementations • 16 Nov 2023 • Aakanksha Naik, Bailey Kuehl, Erin Bransom, Doug Downey, Tom Hope

Focusing on biomedicine, this work presents CARE (Clinical Aggregation-oriented Result Extraction) -- a new IE dataset for the task of extracting clinical findings.

Relation Extraction

Paper
Add Code

CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies

1 code implementation • 19 Nov 2023 • Arie Cattan, Tom Hope, Doug Downey, Roy Bar-Haim, Lilach Eden, Yoav Kantor, Ido Dagan

Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items.

coreference-resolution Cross Document Coreference Resolution

Paper
Code

MARG: Multi-Agent Review Generation for Scientific Papers

1 code implementation • 8 Jan 2024 • Mike D'Arcy, Tom Hope, Larry Birnbaum, Doug Downey

We study the ability of LLMs to generate feedback for scientific papers and develop MARG, a feedback generation approach using multiple LLM instances that engage in internal discussion.

Review Generation Specificity

Paper
Code

Who’s on First?: Probing the Learning and Representation Capabilities of Language Models on Deterministic Closed Domains

1 code implementation • CoNLL (EMNLP) 2021 • David Demeter, Doug Downey

The capabilities of today’s natural language processing systems are typically evaluated using large datasets of curated questions and answers.

Benchmarking Language Modelling +1

Paper
Code

“It doesn’t look good for a date”: Transforming Critiques into Preferences for Conversational Recommendation Systems

1 code implementation • EMNLP 2021 • Victor Bursztyn, Jennifer Healey, Nedim Lipka, Eunyee Koh, Doug Downey, Larry Birnbaum

Conversations aimed at determining good recommendations are iterative in nature.

Common Sense Reasoning Language Modelling +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.