1 code implementation • EMNLP 2021 • Victor Bursztyn, Jennifer Healey, Nedim Lipka, Eunyee Koh, Doug Downey, Larry Birnbaum
Conversations aimed at determining good recommendations are iterative in nature.
1 code implementation • CoNLL (EMNLP) 2021 • David Demeter, Doug Downey
The capabilities of today’s natural language processing systems are typically evaluated using large datasets of curated questions and answers.
1 code implementation • 1 Jun 2023 • Catherine Chen, Zejiang Shen, Dan Klein, Gabriel Stanovsky, Doug Downey, Kyle Lo
Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers.
1 code implementation • 23 May 2023 • Qingyun Wang, Doug Downey, Heng Ji, Tom Hope
Literature-Based Discovery (LBD) aims to discover new scientific knowledge by mining papers and generating hypotheses.
1 code implementation • 30 Apr 2023 • Yuze Lou, Bailey Kuehl, Erin Bransom, Sergey Feldman, Aakanksha Naik, Doug Downey
Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications.
no code implementations • 5 Apr 2023 • Zejiang Shen, Tal August, Pao Siangliulue, Kyle Lo, Jonathan Bragg, Jeff Hammerbacher, Doug Downey, Joseph Chee Chang, David Sontag
In this position paper, we argue that developing AI supports for expository writing has unique and exciting research challenges and can lead to high real-world impacts.
no code implementations • 25 Mar 2023 • Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Kinney, Aniket Kittur, Hyeonsu Kang, Egor Klevak, Bailey Kuehl, Michael Langan, Matt Latzke, Jaron Lochner, Kelsey MacMillan, Eric Marsh, Tyler Murray, Aakanksha Naik, Ngoc-Uyen Nguyen, Srishti Palani, Soya Park, Caroline Paulic, Napol Rachatasumrit, Smita Rao, Paul Sayre, Zejiang Shen, Pao Siangliulue, Luca Soldaini, Huy Tran, Madeleine van Zuylen, Lucy Lu Wang, Christopher Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Marti A. Hearst, Daniel S. Weld
Scholarly publications are key to the transfer of knowledge from scholars to others.
no code implementations • 13 Feb 2023 • Srishti Palani, Aakanksha Naik, Doug Downey, Amy X. Zhang, Jonathan Bragg, Joseph Chee Chang
Scholars who want to research a scientific topic must take time to read, extract meaning, and identify connections across many papers.
1 code implementation • 24 Jan 2023 • Rodney Kinney, Chloe Anastasiades, Russell Authur, Iz Beltagy, Jonathan Bragg, Alexandra Buraczynski, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Arman Cohan, Miles Crawford, Doug Downey, Jason Dunkelberger, Oren Etzioni, Rob Evans, Sergey Feldman, Joseph Gorney, David Graham, Fangzhou Hu, Regan Huff, Daniel King, Sebastian Kohlmeier, Bailey Kuehl, Michael Langan, Daniel Lin, Haokun Liu, Kyle Lo, Jaron Lochner, Kelsey MacMillan, Tyler Murray, Chris Newell, Smita Rao, Shaurya Rohatgi, Paul Sayre, Zejiang Shen, Amanpreet Singh, Luca Soldaini, Shivashankar Subramanian, Amber Tanaka, Alex D. Wade, Linda Wagner, Lucy Lu Wang, Chris Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Madeleine van Zuylen, Daniel S. Weld
The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field.
no code implementations • 19 Dec 2022 • Chandra Bhagavatula, Jena D. Hwang, Doug Downey, Ronan Le Bras, Ximing Lu, Lianhui Qin, Keisuke Sakaguchi, Swabha Swayamdipta, Peter West, Yejin Choi
Here, we investigate an alternative that a priori seems impossible: can smaller language models (e. g., GPT-2) win over models that are orders of magnitude larger and better (e. g., GPT-3), if powered with novel commonsense distillation algorithms?
1 code implementation • 23 Nov 2022 • Amanpreet Singh, Mike D'Arcy, Arman Cohan, Doug Downey, Sergey Feldman
However, existing benchmarks for evaluating these representations fail to capture the diversity of relevant tasks.
1 code implementation • 23 Oct 2022 • Victor S. Bursztyn, David Demeter, Doug Downey, Larry Birnbaum
In this work, we present compositional fine-tuning (CFT): an approach based on explicitly decomposing a target task into component tasks, and then fine-tuning smaller LMs on a curriculum of such component tasks.
no code implementations • 16 Aug 2022 • Harmanpreet Kaur, Doug Downey, Amanpreet Singh, Evie Yu-Yen Cheng, Daniel S. Weld, Jonathan Bragg
We implement our technique in a novel system, FeedLens, which is built over Semantic Scholar, a production system for navigating the scientific literature KG.
1 code implementation • 11 Jul 2022 • Jon Saad-Falcon, Amanpreet Singh, Luca Soldaini, Mike D'Arcy, Arman Cohan, Doug Downey
Real-world applications of neural language models often involve running many different models over the same corpus.
1 code implementation • 22 Jun 2022 • Zejiang Shen, Kyle Lo, Lauren Yu, Nathan Dahlberg, Margo Schlanger, Doug Downey
With the advent of large language models, methods for abstractive summarization have made great strides, creating potential for use in applications to aid knowledge workers processing unwieldy document collections.
no code implementations • 23 May 2022 • Emily Allaway, Jena D. Hwang, Chandra Bhagavatula, Kathleen McKeown, Doug Downey, Yejin Choi
Generics express generalizations about the world (e. g., birds can fly) that are not universally true (e. g., newborn birds and penguins cannot fly).
1 code implementation • 16 May 2022 • Tara Safavi, Doug Downey, Tom Hope
Knowledge graph (KG) link prediction is a fundamental task in artificial intelligence, with applications in natural language processing, information retrieval, and biomedicine.
1 code implementation • 14 May 2022 • Sonia K. Murthy, Kyle Lo, Daniel King, Chandra Bhagavatula, Bailey Kuehl, Sophie Johnson, Jonathan Borchardt, Daniel S. Weld, Tom Hope, Doug Downey
We present ACCoRD, an end-to-end system tackling the novel task of generating sets of descriptions of scientific concepts.
no code implementations • 4 May 2022 • Tom Hope, Doug Downey, Oren Etzioni, Daniel S. Weld, Eric Horvitz
We stand at the foot of a significant inflection in the trajectory of scientific discovery.
no code implementations • 21 Apr 2022 • Hyeonsu B. Kang, Rafal Kocielnik, Andrew Head, Jiangjiang Yang, Matt Latzke, Aniket Kittur, Daniel S. Weld, Doug Downey, Jonathan Bragg
To improve the discovery experience we introduce multiple new methods for \em augmenting recommendations with textual relevance messages that highlight knowledge-graph connections between recommended papers and a user's publication and interaction history.
no code implementations • 16 Mar 2022 • Daniel King, Zejiang Shen, Nishant Subramani, Daniel S. Weld, Iz Beltagy, Doug Downey
Based on our findings, we present PINOCCHIO, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations.
1 code implementation • Findings (NAACL) 2022 • Ana Marasović, Iz Beltagy, Doug Downey, Matthew E. Peters
We identify the right prompting approach by extensively exploring natural language prompts on FEB. Then, by using this prompt and scaling the model size, we demonstrate that making progress on few-shot self-rationalization is possible.
no code implementations • 29 Sep 2021 • Mike D'Arcy, Doug Downey
Active Learning (AL) has the potential to reduce labeling cost when training natural language processing models, but its effectiveness with the large pretrained transformer language models that power today's NLP is uncertain.
no code implementations • 27 Sep 2021 • Marissa Radensky, Doug Downey, Kyle Lo, Zoran Popović, Daniel S. Weld
However, we note that the two explanation approaches may be better compared in the context of a higher-stakes or more opaque domain.
1 code implementation • 15 Sep 2021 • Victor S. Bursztyn, Jennifer Healey, Nedim Lipka, Eunyee Koh, Doug Downey, Larry Birnbaum
Conversations aimed at determining good recommendations are iterative in nature.
1 code implementation • 1 Jun 2021 • Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, Doug Downey
Experiments are conducted on a newly curated evaluation suite, S2-VLUE, that unifies existing automatically-labeled datasets and includes a new dataset of manual annotations covering diverse papers from 19 scientific disciplines.
2 code implementations • AKBC 2021 • Arie Cattan, Sophie Johnson, Daniel Weld, Ido Dagan, Iz Beltagy, Doug Downey, Tom Hope
Determining coreference of concept mentions across multiple documents is a fundamental task in natural language understanding.
coreference-resolution
Cross Document Coreference Resolution
+1
1 code implementation • 3 Mar 2021 • Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, Nazli Goharian
Managing the data for Information Retrieval (IR) experiments can be challenging.
2 code implementations • 2 Nov 2020 • Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman Cohan
We present a new comprehensive framework for Analyzing the Behavior of Neural IR ModeLs (ABNIRML), which includes new types of diagnostic tests that allow us to probe several characteristics---such as sensitivity to word order---that are not addressed by previous techniques.
1 code implementation • 11 Jun 2020 • Daniel King, Doug Downey, Daniel S. Weld
From a corpus of computer science papers on arXiv, we find that our method achieves a Precision@1000 of 99%, compared to 86% for prior work, and a substantially better precision-yield trade-off across the top 15, 000 extractions.
1 code implementation • ACL 2020 • David Demeter, Gregory Kimmel, Doug Downey
Neural Network Language Models (NNLMs) generate probability distributions by applying a softmax function to a distance metric formed by taking the dot product of a prediction vector with all word vectors in a high-dimensional embedding space.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Yiben Yang, Chaitanya Malaviya, Jared Fernandez, Swabha Swayamdipta, Ronan Le Bras, Ji-Ping Wang, Chandra Bhagavatula, Yejin Choi, Doug Downey
Recent advances in commonsense reasoning depend on large-scale human-annotated training data to achieve peak performance.
Ranked #1 on
Question Answering
on CODAH
5 code implementations • ACL 2020 • Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith
Language models pretrained on text from a wide variety of sources form the foundation of today's NLP.
5 code implementations • ACL 2020 • Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, Daniel S. Weld
We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph.
Ranked #1 on
Document Classification
on SciDocs (MAG)
1 code implementation • 9 Mar 2020 • Benjamin Charles Germain Lee, Doug Downey, Kyle Lo, Daniel S. Weld
We show our method improves accuracy compared to a rigorous baseline on the image classification domains.
no code implementations • 11 Dec 2019 • David Demeter, Doug Downey
How can we augment today's neural models with such encodings?
no code implementations • 19 Sep 2019 • Ruimin Zhu, Thanapon Noraset, Alisa Liu, Wenxin Jiang, Doug Downey
Word embeddings capture syntactic and semantic information about words.
2 code implementations • ICLR 2020 • Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, Yejin Choi
Abductive reasoning is inference to the most plausible explanation.
no code implementations • NAACL 2019 • Yiben Yang, Ji-Ping Wang, Doug Downey
Recurrent neural network language models (RNNLM) form a valuable foundation for many NLP systems, but training the models can be computationally expensive, and may take days to train on a large corpus.
1 code implementation • WS 2019 • Michael Chen, Mike D{'}Arcy, Alisa Liu, Fern, Jared ez, Doug Downey
To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems.
no code implementations • SEMEVAL 2019 • Rajagopal Venkatesaramani, Doug Downey, Bradley Malin, Yevgeniy Vorobeychik
We introduce a novel topic modeling approach based on constructing a semantic set cover for clusters of similar documents.
1 code implementation • 8 Apr 2019 • Michael Chen, Mike D'Arcy, Alisa Liu, Jared Fernandez, Doug Downey
To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems.
Ranked #1 on
Common Sense Reasoning
on CODAH
(using extra training data)
1 code implementation • 28 Jan 2019 • Hanyu Shi, Martin Gerlach, Isabel Diersen, Doug Downey, Luis A. N. Amaral
Topic models are in widespread use in natural language processing and beyond.
no code implementations • EMNLP 2018 • Thanapon Noraset, Doug Downey, Lidong Bing
Recurrent neural network language models (RNNLMs) are the current standard-bearer for statistical language modeling.
no code implementations • ACL 2018 • Fern, Jared ez, Doug Downey
We propose an unsupervised importance sampling approach to selecting training data for recurrent neural network (RNNs) language models.
1 code implementation • ACL 2018 • Yiben Yang, Larry Birnbaum, Ji-Ping Wang, Doug Downey
Intelligent systems require common sense, but automatically extracting this knowledge from text can be difficult.
no code implementations • NAACL 2018 • Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine van Zuylen, Oren Etzioni
We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery.
no code implementations • EMNLP 2017 • Fern, Jared ez, Zhaocheng Yu, Doug Downey
Many Natural Language Processing (NLP) models rely on distributed vector representations of words.
2 code implementations • 1 Dec 2016 • Thanapon Noraset, Chen Liang, Larry Birnbaum, Doug Downey
Distributed representations of words have been shown to capture lexical semantics, as demonstrated by their effectiveness in word similarity and analogical relation tasks.