2 code implementations • 1 Dec 2016 • Thanapon Noraset, Chen Liang, Larry Birnbaum, Doug Downey
Distributed representations of words have been shown to capture lexical semantics, as demonstrated by their effectiveness in word similarity and analogical relation tasks.
no code implementations • EMNLP 2017 • Fern, Jared ez, Zhaocheng Yu, Doug Downey
Many Natural Language Processing (NLP) models rely on distributed vector representations of words.
no code implementations • NAACL 2018 • Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine van Zuylen, Oren Etzioni
We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery.
no code implementations • ACL 2018 • Fern, Jared ez, Doug Downey
We propose an unsupervised importance sampling approach to selecting training data for recurrent neural network (RNNs) language models.
1 code implementation • ACL 2018 • Yiben Yang, Larry Birnbaum, Ji-Ping Wang, Doug Downey
Intelligent systems require common sense, but automatically extracting this knowledge from text can be difficult.
no code implementations • EMNLP 2018 • Thanapon Noraset, Doug Downey, Lidong Bing
Recurrent neural network language models (RNNLMs) are the current standard-bearer for statistical language modeling.
1 code implementation • 28 Jan 2019 • Hanyu Shi, Martin Gerlach, Isabel Diersen, Doug Downey, Luis A. N. Amaral
Topic models are in widespread use in natural language processing and beyond.
2 code implementations • 8 Apr 2019 • Michael Chen, Mike D'Arcy, Alisa Liu, Jared Fernandez, Doug Downey
To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems.
Ranked #1 on Common Sense Reasoning on CODAH (using extra training data)
no code implementations • NAACL 2019 • Yiben Yang, Ji-Ping Wang, Doug Downey
Recurrent neural network language models (RNNLM) form a valuable foundation for many NLP systems, but training the models can be computationally expensive, and may take days to train on a large corpus.
no code implementations • SEMEVAL 2019 • Rajagopal Venkatesaramani, Doug Downey, Bradley Malin, Yevgeniy Vorobeychik
We introduce a novel topic modeling approach based on constructing a semantic set cover for clusters of similar documents.
1 code implementation • WS 2019 • Michael Chen, Mike D{'}Arcy, Alisa Liu, Fern, Jared ez, Doug Downey
To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems.
2 code implementations • ICLR 2020 • Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, Yejin Choi
Abductive reasoning is inference to the most plausible explanation.
no code implementations • 19 Sep 2019 • Ruimin Zhu, Thanapon Noraset, Alisa Liu, Wenxin Jiang, Doug Downey
Word embeddings capture syntactic and semantic information about words.
no code implementations • 11 Dec 2019 • David Demeter, Doug Downey
How can we augment today's neural models with such encodings?
1 code implementation • 9 Mar 2020 • Benjamin Charles Germain Lee, Doug Downey, Kyle Lo, Daniel S. Weld
We show our method improves accuracy compared to a rigorous baseline on the image classification domains.
5 code implementations • ACL 2020 • Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, Daniel S. Weld
We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph.
Ranked #1 on Document Classification on SciDocs (MAG)
6 code implementations • ACL 2020 • Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith
Language models pretrained on text from a wide variety of sources form the foundation of today's NLP.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Yiben Yang, Chaitanya Malaviya, Jared Fernandez, Swabha Swayamdipta, Ronan Le Bras, Ji-Ping Wang, Chandra Bhagavatula, Yejin Choi, Doug Downey
Recent advances in commonsense reasoning depend on large-scale human-annotated training data to achieve peak performance.
Ranked #1 on Question Answering on CODAH
1 code implementation • ACL 2020 • David Demeter, Gregory Kimmel, Doug Downey
Neural Network Language Models (NNLMs) generate probability distributions by applying a softmax function to a distance metric formed by taking the dot product of a prediction vector with all word vectors in a high-dimensional embedding space.
1 code implementation • 11 Jun 2020 • Daniel King, Doug Downey, Daniel S. Weld
From a corpus of computer science papers on arXiv, we find that our method achieves a Precision@1000 of 99%, compared to 86% for prior work, and a substantially better precision-yield trade-off across the top 15, 000 extractions.
2 code implementations • 2 Nov 2020 • Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman Cohan
Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search.
1 code implementation • 3 Mar 2021 • Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, Nazli Goharian
Managing the data for Information Retrieval (IR) experiments can be challenging.
2 code implementations • AKBC 2021 • Arie Cattan, Sophie Johnson, Daniel Weld, Ido Dagan, Iz Beltagy, Doug Downey, Tom Hope
Determining coreference of concept mentions across multiple documents is a fundamental task in natural language understanding.
coreference-resolution Cross Document Coreference Resolution +1
1 code implementation • 1 Jun 2021 • Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, Doug Downey
Experiments are conducted on a newly curated evaluation suite, S2-VLUE, that unifies existing automatically-labeled datasets and includes a new dataset of manual annotations covering diverse papers from 19 scientific disciplines.
1 code implementation • 15 Sep 2021 • Victor S. Bursztyn, Jennifer Healey, Nedim Lipka, Eunyee Koh, Doug Downey, Larry Birnbaum
Conversations aimed at determining good recommendations are iterative in nature.
no code implementations • 27 Sep 2021 • Marissa Radensky, Doug Downey, Kyle Lo, Zoran Popović, Daniel S. Weld
However, we note that the two explanation approaches may be better compared in the context of a higher-stakes or more opaque domain.
no code implementations • 29 Sep 2021 • Mike D'Arcy, Doug Downey
Active Learning (AL) has the potential to reduce labeling cost when training natural language processing models, but its effectiveness with the large pretrained transformer language models that power today's NLP is uncertain.
1 code implementation • Findings (NAACL) 2022 • Ana Marasović, Iz Beltagy, Doug Downey, Matthew E. Peters
We identify the right prompting approach by extensively exploring natural language prompts on FEB. Then, by using this prompt and scaling the model size, we demonstrate that making progress on few-shot self-rationalization is possible.
1 code implementation • 16 Mar 2022 • Daniel King, Zejiang Shen, Nishant Subramani, Daniel S. Weld, Iz Beltagy, Doug Downey
Based on our findings, we present PINOCCHIO, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations.
no code implementations • 21 Apr 2022 • Hyeonsu B. Kang, Rafal Kocielnik, Andrew Head, Jiangjiang Yang, Matt Latzke, Aniket Kittur, Daniel S. Weld, Doug Downey, Jonathan Bragg
To improve the discovery experience we introduce multiple new methods for \em augmenting recommendations with textual relevance messages that highlight knowledge-graph connections between recommended papers and a user's publication and interaction history.
no code implementations • 4 May 2022 • Tom Hope, Doug Downey, Oren Etzioni, Daniel S. Weld, Eric Horvitz
We stand at the foot of a significant inflection in the trajectory of scientific discovery.
1 code implementation • 14 May 2022 • Sonia K. Murthy, Kyle Lo, Daniel King, Chandra Bhagavatula, Bailey Kuehl, Sophie Johnson, Jonathan Borchardt, Daniel S. Weld, Tom Hope, Doug Downey
We present ACCoRD, an end-to-end system tackling the novel task of generating sets of descriptions of scientific concepts.
1 code implementation • 16 May 2022 • Tara Safavi, Doug Downey, Tom Hope
Knowledge graph (KG) link prediction is a fundamental task in artificial intelligence, with applications in natural language processing, information retrieval, and biomedicine.
no code implementations • 23 May 2022 • Emily Allaway, Jena D. Hwang, Chandra Bhagavatula, Kathleen McKeown, Doug Downey, Yejin Choi
Generics express generalizations about the world (e. g., birds can fly) that are not universally true (e. g., newborn birds and penguins cannot fly).
1 code implementation • 22 Jun 2022 • Zejiang Shen, Kyle Lo, Lauren Yu, Nathan Dahlberg, Margo Schlanger, Doug Downey
With the advent of large language models, methods for abstractive summarization have made great strides, creating potential for use in applications to aid knowledge workers processing unwieldy document collections.
1 code implementation • 11 Jul 2022 • Jon Saad-Falcon, Amanpreet Singh, Luca Soldaini, Mike D'Arcy, Arman Cohan, Doug Downey
Real-world applications of neural language models often involve running many different models over the same corpus.
no code implementations • 16 Aug 2022 • Harmanpreet Kaur, Doug Downey, Amanpreet Singh, Evie Yu-Yen Cheng, Daniel S. Weld, Jonathan Bragg
We implement our technique in a novel system, FeedLens, which is built over Semantic Scholar, a production system for navigating the scientific literature KG.
1 code implementation • 23 Oct 2022 • Victor S. Bursztyn, David Demeter, Doug Downey, Larry Birnbaum
In this work, we present compositional fine-tuning (CFT): an approach based on explicitly decomposing a target task into component tasks, and then fine-tuning smaller LMs on a curriculum of such component tasks.
2 code implementations • 23 Nov 2022 • Amanpreet Singh, Mike D'Arcy, Arman Cohan, Doug Downey, Sergey Feldman
In response, we introduce SciRepEval, the first comprehensive benchmark for training and evaluating scientific document representations.
no code implementations • 19 Dec 2022 • Chandra Bhagavatula, Jena D. Hwang, Doug Downey, Ronan Le Bras, Ximing Lu, Lianhui Qin, Keisuke Sakaguchi, Swabha Swayamdipta, Peter West, Yejin Choi
Here, we investigate an alternative that a priori seems impossible: can smaller language models (e. g., GPT-2) win over models that are orders of magnitude larger and better (e. g., GPT-3), if powered with novel commonsense distillation algorithms?
1 code implementation • 24 Jan 2023 • Rodney Kinney, Chloe Anastasiades, Russell Authur, Iz Beltagy, Jonathan Bragg, Alexandra Buraczynski, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Arman Cohan, Miles Crawford, Doug Downey, Jason Dunkelberger, Oren Etzioni, Rob Evans, Sergey Feldman, Joseph Gorney, David Graham, Fangzhou Hu, Regan Huff, Daniel King, Sebastian Kohlmeier, Bailey Kuehl, Michael Langan, Daniel Lin, Haokun Liu, Kyle Lo, Jaron Lochner, Kelsey MacMillan, Tyler Murray, Chris Newell, Smita Rao, Shaurya Rohatgi, Paul Sayre, Zejiang Shen, Amanpreet Singh, Luca Soldaini, Shivashankar Subramanian, Amber Tanaka, Alex D. Wade, Linda Wagner, Lucy Lu Wang, Chris Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Madeleine van Zuylen, Daniel S. Weld
The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field.
no code implementations • 13 Feb 2023 • Srishti Palani, Aakanksha Naik, Doug Downey, Amy X. Zhang, Jonathan Bragg, Joseph Chee Chang
Scholars who want to research a scientific topic must take time to read, extract meaning, and identify connections across many papers.
no code implementations • 25 Mar 2023 • Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Kinney, Aniket Kittur, Hyeonsu Kang, Egor Klevak, Bailey Kuehl, Michael Langan, Matt Latzke, Jaron Lochner, Kelsey MacMillan, Eric Marsh, Tyler Murray, Aakanksha Naik, Ngoc-Uyen Nguyen, Srishti Palani, Soya Park, Caroline Paulic, Napol Rachatasumrit, Smita Rao, Paul Sayre, Zejiang Shen, Pao Siangliulue, Luca Soldaini, Huy Tran, Madeleine van Zuylen, Lucy Lu Wang, Christopher Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Marti A. Hearst, Daniel S. Weld
Scholarly publications are key to the transfer of knowledge from scholars to others.
no code implementations • 5 Apr 2023 • Zejiang Shen, Tal August, Pao Siangliulue, Kyle Lo, Jonathan Bragg, Jeff Hammerbacher, Doug Downey, Joseph Chee Chang, David Sontag
In this position paper, we argue that developing AI supports for expository writing has unique and exciting research challenges and can lead to high real-world impacts.
1 code implementation • 30 Apr 2023 • Yuze Lou, Bailey Kuehl, Erin Bransom, Sergey Feldman, Aakanksha Naik, Doug Downey
Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications.
1 code implementation • 23 May 2023 • Qingyun Wang, Doug Downey, Heng Ji, Tom Hope
We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature.
Contextualized Literature-based Discovery Link Prediction +1
1 code implementation • 1 Jun 2023 • Catherine Chen, Zejiang Shen, Dan Klein, Gabriel Stanovsky, Doug Downey, Kyle Lo
Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers.
1 code implementation • 21 Jun 2023 • Mike D'Arcy, Alexis Ross, Erin Bransom, Bailey Kuehl, Jonathan Bragg, Tom Hope, Doug Downey
Revising scientific papers based on peer feedback is a challenging task that requires not only deep scientific knowledge and reasoning, but also the ability to recognize the implicit requests in high-level feedback and to choose the best of many possible ways to update the manuscript in response.
no code implementations • 16 Nov 2023 • Aakanksha Naik, Bailey Kuehl, Erin Bransom, Doug Downey, Tom Hope
Focusing on biomedicine, this work presents CARE (Clinical Aggregation-oriented Result Extraction) -- a new IE dataset for the task of extracting clinical findings.
1 code implementation • 19 Nov 2023 • Arie Cattan, Tom Hope, Doug Downey, Roy Bar-Haim, Lilach Eden, Yoav Kantor, Ido Dagan
Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items.
coreference-resolution Cross Document Coreference Resolution
1 code implementation • 8 Jan 2024 • Mike D'Arcy, Tom Hope, Larry Birnbaum, Doug Downey
We study the ability of LLMs to generate feedback for scientific papers and develop MARG, a feedback generation approach using multiple LLM instances that engage in internal discussion.
1 code implementation • CoNLL (EMNLP) 2021 • David Demeter, Doug Downey
The capabilities of today’s natural language processing systems are typically evaluated using large datasets of curated questions and answers.
1 code implementation • EMNLP 2021 • Victor Bursztyn, Jennifer Healey, Nedim Lipka, Eunyee Koh, Doug Downey, Larry Birnbaum
Conversations aimed at determining good recommendations are iterative in nature.