no code implementations • LREC 2014 • Heeyoung Lee, Mihai Surdeanu, Bill MacCartney, Dan Jurafsky
We investigate the importance of text analysis for stock price prediction.
no code implementations • LREC 2014 • Kevin Reschke, Martin Jankowiak, Mihai Surdeanu, Christopher Manning, Daniel Jurafsky
We present a new publicly available dataset and event extraction task in the plane crash domain based on Wikipedia infoboxes and newswire text.
no code implementations • 8 Sep 2014 • Daniel Fried, Mihai Surdeanu, Stephen Kobourov, Melanie Hingle, Dane Bell
We investigate the predictive power behind the language of food on social media.
no code implementations • TACL 2015 • Daniel Fried, Peter Jansen, Gustave Hahn-Powell, Mihai Surdeanu, Peter Clark
We introduce a higher-order formalism that allows all these lexical semantic models to chain direct evidence to construct indirect associations between question and answer texts, by casting the task as the traversal of graphs that encode direct term associations.
1 code implementation • 24 Sep 2015 • Marco A. Valenzuela-Escárcega, Gus Hahn-Powell, Mihai Surdeanu
Here we include a thorough definition of the Odin rule language, together with a description of the Odin API in the Scala language, which allows one to apply these rules to arbitrary texts.
no code implementations • LREC 2016 • Dane Bell, Daniel Fried, Luwen Huangfu, Mihai Surdeanu, Stephen Kobourov
The strategy uses a game-like quiz with data and questions acquired semi-automatically from Twitter.
no code implementations • LREC 2016 • Dane Bell, Gus Hahn-Powell, Marco A. Valenzuela-Escárcega, Mihai Surdeanu
We describe challenges and advantages unique to coreference resolution in the biomedical domain, and a sieve-based architecture that leverages domain knowledge for both entity and event coreference resolution.
no code implementations • LREC 2016 • Marco A. Valenzuela-Esc{\'a}rcega, Gus Hahn-Powell, Mihai Surdeanu
Odin is an information extraction framework that applies cascades of finite state automata over both surface text and syntactic dependency graphs.
2 code implementations • WS 2016 • Gus Hahn-Powell, Dane Bell, Marco A. Valenzuela-Escárcega, Mihai Surdeanu
Causal precedence between biochemical interactions is crucial in the biomedical domain, because it transforms collections of individual interactions, e. g., bindings and phosphorylations, into the causal mechanisms needed to inform meaningful search and inference.
no code implementations • WS 2016 • Marco A. Valenzuela-Escarcega, Gus Hahn-Powell, Dane Bell, Mihai Surdeanu
We propose an approach for biomedical information extraction that marries the advantages of machine learning models, e. g., learning directly from data, with the benefits of rule-based approaches, e. g., interpretability.
no code implementations • EMNLP 2016 • Rebecca Sharp, Mihai Surdeanu, Peter Jansen, Peter Clark, Michael Hammond
We argue that a better approach is to look for answers that are related to the question in a relevant way, according to the information need of the question, which may be determined through task-specific embeddings.
no code implementations • COLING 2016 • Peter Jansen, Niranjan Balasubramanian, Mihai Surdeanu, Peter Clark
These explanations are used to create a fine-grained categorization of the requirements.
no code implementations • CL 2017 • Peter Jansen, Rebecca Sharp, Mihai Surdeanu, Peter Clark
Our best configuration answers 44{\%} of the questions correctly, where the top justifications for 57{\%} of these correct answers contain a compelling human-readable justification that explains the inference required to arrive at the correct answer.
no code implementations • CONLL 2017 • Rebecca Sharp, Mihai Surdeanu, Peter Jansen, Marco A. Valenzuela-Esc{\'a}rcega, Peter Clark, Michael Hammond
We propose a neural network architecture for QA that reranks answer justifications as an intermediate (and human-interpretable) step in answer selection.
Ranked #1 on Question Answering on AI2 Kaggle Dataset
no code implementations • EMNLP 2017 • Enrique Noriega-Atala, Marco A. Valenzuela-Escarcega, Clayton T. Morrison, Mihai Surdeanu
In this work, we introduce a focused reading approach to guide the machine reading of biomedical literature towards what literature should be read to answer a biomedical query as efficiently as possible.
1 code implementation • LREC 2018 • Angus G. Forbes, Kristine Lee, Gus Hahn-Powell, Marco A. Valenzuela-Escárcega, Mihai Surdeanu
Additionally, we include an approach to representing text annotations in which annotation subgraphs, or semantic summaries, are used to show relationships outside of the sequential context of the text itself.
no code implementations • WS 2019 • Marco A. Valenzuela-Escárcega, Ajay Nagesh, Mihai Surdeanu
We propose a lightly-supervised approach for information extraction, in particular named entity classification, which combines the benefits of traditional bootstrapping, i. e., use of limited annotations and interpretability of extraction patterns, with the robust learning approaches proposed in representation learning.
no code implementations • WS 2018 • Fan Luo, Marco A. Valenzuela-Esc{\'a}rcega, Gus Hahn-Powell, Mihai Surdeanu
We introduce a machine learning approach for the identification of {``}white spaces{''} in scientific knowledge.
no code implementations • NAACL 2018 • Ajay Nagesh, Mihai Surdeanu
We propose a novel approach to semi-supervised learning for information extraction that uses ladder networks (Rasmus et al., 2015).
no code implementations • 5 Jul 2018 • Vikas Yadav, Rebecca Sharp, Mihai Surdeanu
We also achieve 26. 56\% and 58. 36\% on ARC challenge and easy dataset respectively.
no code implementations • COLING 2018 • Ajay Nagesh, Mihai Surdeanu
Several semi-supervised representation learning methods have been proposed recently that mitigate the drawbacks of traditional bootstrapping: they reduce the amount of semantic drift introduced by iterative approaches through one-shot learning; others address the sparsity of data through the learning of custom, dense representation for the information modeled.
no code implementations • WS 2018 • Dane Bell, Egoitz Laparra, Aditya Kousik, Terron Ishihara, Mihai Surdeanu, Stephen Kobourov
This work explores the detection of individuals{'} risk of type 2 diabetes mellitus (T2DM) directly from their social media (Twitter) activity.
no code implementations • EMNLP 2018 • Matthew Berger, Ajay Nagesh, Joshua Levine, Mihai Surdeanu, Helen Zhang
We challenge a common assumption in active learning, that a list-based interface populated by informative samples provides for efficient and effective data annotation.
no code implementations • WS 2018 • Mithun Paul, Rebecca Sharp, Mihai Surdeanu
For example, such a system trained in the news domain may learn that a sentence like {``}Palestinians recognize Texas as part of Mexico{''} tends to be unsupported, but this fact (and its corresponding lexicalized cues) have no value in, say, a scientific domain.
1 code implementation • NAACL 2019 • Vikas Yadav, Steven Bethard, Mihai Surdeanu
We propose a simple, fast, and mostly-unsupervised approach for non-factoid question answering (QA) called Alignment over Heterogeneous Embeddings (AHE).
no code implementations • NAACL 2019 • George C. G. Barbosa, Zechy Wong, Gus Hahn-Powell, Dane Bell, Rebecca Sharp, Marco A. Valenzuela-Esc{\'a}rcega, Mihai Surdeanu
Many of the most pressing current research problems (e. g., public health, food security, or climate change) require multi-disciplinary collaborations.
no code implementations • SEMEVAL 2019 • Pooja Lakshmi Narayan, Ajay Nagesh, Mihai Surdeanu
Our work aims to address this gap by exploring different noise strategies for the semi-supervised named entity classification task, including statistical methods such as adding Gaussian noise to input embeddings, and linguistically-inspired ones such as dropping words and replacing words with their synonyms.
no code implementations • SEMEVAL 2019 • Vikas Yadav, Egoitz Laparra, Ti-Tai Wang, Mihai Surdeanu, Steven Bethard
We present the Named Entity Recognition (NER) and disambiguation model used by the University of Arizona team (UArizona) for the SemEval 2019 task 12.
no code implementations • WS 2019 • Fan Luo, Ajay Nagesh, Rebecca Sharp, Mihai Surdeanu
Generating a large amount of training data for information extraction (IE) is either costly (if annotations are created manually), or runs the risk of introducing noisy instances (if distant supervision is used).
no code implementations • WS 2019 • Enrique Noriega-Atala, Zhengzhong Liang, John Bachman, Clayton Morrison, Mihai Surdeanu
An important task in the machine reading of biochemical events expressed in biomedical texts is correctly reading the polarity, i. e., attributing whether the biochemical event is a promotion or an inhibition.
1 code implementation • NAACL 2019 • Rebecca Sharp, Adarsh Pyarelal, Benjamin Gyori, Keith Alcock, Egoitz Laparra, Marco A. Valenzuela-Esc{\'a}rcega, Ajay Nagesh, Vikas Yadav, John Bachman, Zheng Tang, Heather Lent, Fan Luo, Mithun Paul, Steven Bethard, Kobus Barnard, Clayton Morrison, Mihai Surdeanu
Building causal models of complicated phenomena such as food insecurity is currently a slow and labor-intensive manual process.
no code implementations • IJCNLP 2019 • Sandeep Suntwal, Mithun Paul, Rebecca Sharp, Mihai Surdeanu
As expected, even though this method achieves high accuracy when evaluated in the same domain, the performance in the target domain is poor, marginally above chance. To mitigate this dependence on lexicalized information, we experiment with several strategies for masking out names by replacing them with their semantic category, coupled with a unique identifier to mark that the same or new entities are referenced between claim and evidence.
no code implementations • WS 2019 • Hoang Van, Ahmad Musa, Hang Chen, Stephen Kobourov, Mihai Surdeanu
Second, we investigate the effect of socioeconomic factors (income, poverty, and education) on predicting state-level T2DM rates.
no code implementations • IJCNLP 2019 • Vikas Yadav, Steven Bethard, Mihai Surdeanu
We show that the sentences selected by our method improve the performance of a state-of-the-art supervised QA model on two multi-hop QA datasets: AI2's Reasoning Challenge (ARC) and Multi-Sentence Reading Comprehension (MultiRC).
no code implementations • LREC 2020 • Robert Vacareanu, George Caique Gouveia Barbosa, Marco A. Valenzuela-Esc{\'a}rcega, Mihai Surdeanu
For example, for the sentence John eats cake, the tag to be predicted for the token cake is -1 because its head (eats) occurs one token to the left.
no code implementations • LREC 2020 • Mithun Paul Panenghat, S Suntwal, eep, Faiz Rafique, Rebecca Sharp, Mihai Surdeanu
Modeling natural language inference is a challenging task.
1 code implementation • ACL 2020 • Vikas Yadav, Steven Bethard, Mihai Surdeanu
Evidence retrieval is a critical stage of question answering (QA), necessary not only to improve performance, but also to explain the decisions of the corresponding QA method.
no code implementations • ACL 2020 • Zheng Tang, Gus Hahn-Powell, Mihai Surdeanu
Our approach uses an encoder-decoder architecture, which jointly trains a classifier for event extraction, and a rule decoder that generates syntactico-semantic rules that explain the decisions of the event classifier.
no code implementations • 22 Sep 2020 • Zhengzhong Liang, Yiyun Zhao, Mihai Surdeanu
Evidence retrieval is a key component of explainable question answering (QA).
no code implementations • 15 Oct 2020 • Hoang Van, Ahmad Musa, Mihai Surdeanu, Stephen Kobourov
Specifically, we analyze over770, 000 tweets published during the lockdown and the equivalent period in the five previous years and highlight several worrying trends.
no code implementations • COLING 2020 • Robert Vacareanu, Marco A. Valenzuela-Esc{\'a}rcega, Rebecca Sharp, Mihai Surdeanu
This paper explores an unsupervised approach to learning a compositional representation function for multi-word expressions (MWEs), and evaluates it on the Tratz dataset, which associates two-word expressions with the semantic relation between the compound constituents (e. g. the label employer is associated with the noun compound government agency) (Tratz, 2011).
no code implementations • NAACL 2021 • Zhengzhong Liang, Steven Bethard, Mihai Surdeanu
Moreover, models trained on simpler tasks tend to fail when directly tested on more complex problems.
no code implementations • NAACL 2021 • Mitch Paul Mithun, Sandeep Suntwal, Mihai Surdeanu
While neural networks produce state-of-the-art performance in several NLP tasks, they generally depend heavily on lexicalized information, which transfer poorly between domains.
no code implementations • NAACL 2021 • Vikas Yadav, Steven Bethard, Mihai Surdeanu
We specifically emphasize on the importance of retrieving evidence jointly by showing several comparative analyses to other methods that retrieve and rerank evidence sentences individually.
1 code implementation • 8 Jun 2021 • Hoang Van, Vikas Yadav, Mihai Surdeanu
We propose a simple and effective strategy for data augmentation for low-resource machine reading comprehension (MRC).
1 code implementation • Findings (EMNLP) 2021 • Hoang Van, Zheng Tang, Mihai Surdeanu
The general goal of text simplification (TS) is to reduce text complexity for human consumption.
no code implementations • 17 Dec 2021 • Enrique Noriega-Atala, Peter M. Lovett, Clayton T. Morrison, Mihai Surdeanu
We introduce a family of deep-learning architectures for inter-sentence relation extraction, i. e., relations where the participants are not necessarily in the same sentence.
2 code implementations • LREC 2022 • Roya Kabiri, Simin Karimi, Mihai Surdeanu
We then investigate the parsing of informal Persian by training two dependency parsers on existing formal treebanks and evaluating them on out-of-domain data, i. e. the development set of our informal treebank.
no code implementations • LREC 2022 • Andrew Zupon, Andrew Carnie, Michael Hammond, Mihai Surdeanu
Annotation inconsistencies between data sets can cause problems for low-resource NLP, where noisy or inconsistent data cannot be as easily replaced compared with resource-rich languages.
1 code implementation • LREC 2022 • Robert Vacareanu, Marco A. Valenzuela-Escarcega, George C. G. Barbosa, Rebecca Sharp, Mihai Surdeanu
While deep learning approaches to information extraction have had many successes, they can be difficult to augment or maintain as needs shift.
1 code implementation • 25 Apr 2022 • Zheng Tang, Mihai Surdeanu
Our approach uses a multi-task learning architecture, which jointly trains a classifier for relation extraction, and a sequence model that labels words in the context of the relation that explain the decisions of the relation classifier.
no code implementations • 7 May 2022 • Zhengzhong Liang, Tushar Khot, Steven Bethard, Mihai Surdeanu, Ashish Sabharwal
Considerable progress has been made recently in open-domain question answering (QA) problems, which require Information Retrieval (IR) and Reading Comprehension (RC).
2 code implementations • ACL ARR November 2021 • Mohaddeseh Bastan, Nishant Shankar, Mihai Surdeanu, Niranjan Balasubramanian
We leverage this structure and create a summarization task, where the input is a collection of sentences and the main entities in an abstract, and the output includes the relationship and a sentence that summarizes the mechanism.
no code implementations • NAACL (SUKI) 2022 • Enrique Noriega-Atala, Mihai Surdeanu, Clayton T. Morrison
We propose a method to teach an automated agent to learn how to search for multi-hop paths of relations between entities in an open domain.
no code implementations • 25 Aug 2022 • Shahriar Golchin, Mihai Surdeanu, Nazgol Tavabi, Ata Kiapour
We construct these compact subsets from the unstructured data using a combination of abstractive summaries and extractive keywords.
1 code implementation • 26 Oct 2022 • Mohaddeseh Bastan, Mihai Surdeanu, Niranjan Balasubramanian
We introduce a novel semi-supervised procedure that bootstraps an NLI dataset from existing biomedical dataset that pairs mechanisms with experimental evidence in abstracts.
Ranked #1 on Natural Language Inference on BioNLI
1 code implementation • 30 Oct 2022 • Alice Saebom Kwak, Jacob O. Israelsen, Clayton T. Morrison, Derek E. Bambauer, Mihai Surdeanu
This work introduces a natural language inference (NLI) dataset that focuses on the validity of statements in legal wills.
1 code implementation • 28 Apr 2023 • Zhengzhong Liang, Zeyu Zhang, Steven Bethard, Mihai Surdeanu
Languages models have been successfully applied to a variety of reasoning tasks in NLP, yet the language models still suffer from compositional generalization.
no code implementations • 6 Jul 2023 • Enfa George, Mihai Surdeanu
Such a dataset is necessary to address the challenge of distinguishing between sexually suggestive content and virtual sex education videos on TikTok.
1 code implementation • 11 Jul 2023 • Sushma Anand Akoju, Robert Vacareanu, Haris Riaz, Eduardo Blanco, Mihai Surdeanu
To this end, we modify the original texts using a set of phrases - modifiers that correspond to universal quantifiers, existential quantifiers, negation, and other concept modifiers in Natural Logic (NL) (MacCartney, 2009).
no code implementations • Proceedings of the 8th Workshop on Representation Learning for NLP 2023 • Mahdi Rahimi, Mihai Surdeanu
While fully supervised relation classification (RC) models perform well on large-scale datasets, their performance drops drastically in low-resource settings.
no code implementations • 14 Jul 2023 • Shahriar Golchin, Mihai Surdeanu, Nazgol Tavabi, Ata Kiapour
We propose a novel task-agnostic in-domain pre-training method that sits between generic pre-training and fine-tuning.
1 code implementation • 16 Aug 2023 • Shahriar Golchin, Mihai Surdeanu
To estimate contamination of individual instances, we employ "guided instruction:" a prompt consisting of the dataset name, partition type, and the random-length initial segment of a reference instance, asking the LLM to complete it.
no code implementations • 4 Nov 2023 • Fan Luo, Mihai Surdeanu
Building a question answering (QA) model with less annotation costs can be achieved by utilizing active learning (AL) training strategy.
no code implementations • 5 Nov 2023 • Fan Luo, Mihai Surdeanu
However, semantic equivalence is not the only relevance signal that needs to be considered when retrieving evidences for multi-hop questions.
1 code implementation • 10 Nov 2023 • Shahriar Golchin, Mihai Surdeanu
We propose the Data Contamination Quiz (DCQ), a simple and effective approach to detect data contamination in large language models (LLMs) and estimate the amount of it.
1 code implementation • 4 Feb 2024 • Razvan-Gabriel Dumitru, Darius Peteleaza, Mihai Surdeanu
Further, the additional parameters necessary for the multiple temporal perspectives are fine-tuned with minimal computational overhead, avoiding the need for a full pre-training.
no code implementations • 5 Mar 2024 • Robert Vacareanu, Fahmida Alam, Md Asiful Islam, Haris Riaz, Mihai Surdeanu
Human interventions to the rules for the TACRED relation \texttt{org:parents} boost the performance on that relation by as much as 26\% relative improvement, without negatively impacting the other relations, and without retraining the semantic matching component.
1 code implementation • 26 Mar 2024 • Haris Riaz, Razvan-Gabriel Dumitru, Mihai Surdeanu
In a zero-shot setting, ELLEN also achieves over 75% of the performance of a strong, fully supervised model trained on gold data.
no code implementations • 5 Apr 2024 • Fahmida Alam, Md Asiful Islam, Robert Vacareanu, Mihai Surdeanu
We introduce a meta dataset for few-shot relation extraction, which includes two datasets derived from existing supervised relation extraction datasets NYT29 (Takanobu et al., 2019; Nayak and Ng, 2020) and WIKIDATA (Sorokin and Gurevych, 2017) as well as a few-shot form of the TACRED dataset (Sabo et al., 2021).
2 code implementations • 11 Apr 2024 • Robert Vacareanu, Vlad-Andrei Negru, Vasile Suciu, Mihai Surdeanu
We analyze how well pre-trained large language models (e. g., Llama2, GPT-4, Claude 3, etc) can do linear and non-linear regression when given in-context examples, without any additional training or gradient updates.
no code implementations • LREC 2022 • Fan Luo, Mihai Surdeanu
Through an evaluation on HotpotQA, a popular dataset for multi-hop QA, we show that our method yields: (a) improved evidence retrieval, (b) improved QA performance when using the retrieved sentences; and (c) effective and faithful explanations when answers are provided.
no code implementations • LREC 2022 • Mahdi Rahimi, Mihai Surdeanu
With their Discovery of Inference Rules from Text (DIRT) algorithm, Lin and Pantel (2001) made a seminal contribution to the field of rule acquisition from text, by adapting the distributional hypothesis of Harris (1954) to rules that model binary relations such as X treat Y. DIRT’s relevance is renewed in today’s neural era given the recent focus on interpretability in the field of natural language processing.
no code implementations • BioNLP (ACL) 2022 • Zhengzhong Liang, Enrique Noriega-Atala, Clayton Morrison, Mihai Surdeanu
Recognizing causal precedence relations among the chemical interactions in biomedical literature is crucial to understanding the underlying biological mechanisms.
no code implementations • EMNLP 2021 • Mitch Paul Mithun, Sandeep Suntwal, Mihai Surdeanu
While neural networks produce state-of-the- art performance in several NLP tasks, they generally depend heavily on lexicalized information, which transfer poorly between domains.
no code implementations • NAACL (HCINLP) 2022 • Mihai Surdeanu, John Hungerford, Yee Seng Chan, Jessica MacBride, Benjamin Gyori, Andrew Zupon, Zheng Tang, Haoling Qiu, Bonan Min, Yan Zverev, Caitlin Hilverman, Max Thomas, Walter Andrews, Keith Alcock, Zeyu Zhang, Michael Reynolds, Steven Bethard, Rebecca Sharp, Egoitz Laparra
An existing domain taxonomy for normalizing content is often assumed when discussing approaches to information extraction, yet often in real-world scenarios there is none. When one does exist, as the information needs shift, it must be continually extended.
no code implementations • insights (ACL) 2022 • Maria Alexeeva, Allegra A. Beal, Mihai Surdeanu
In this paper, we introduce and justify a new task—causal link extraction based on beliefs—and do a qualitative analysis of the ability of a large language model—InstructGPT-3—to generate implicit consequences of beliefs.
no code implementations • NAACL (TrustNLP) 2021 • Zheng Tang, Mihai Surdeanu
We introduce a method that transforms a rule-based relation extraction (RE) classifier into a neural one such that both interpretability and performance are achieved.
no code implementations • NAACL (ACL) 2022 • Robert Vacareanu, George C.G. Barbosa, Enrique Noriega-Atala, Gus Hahn-Powell, Rebecca Sharp, Marco A. Valenzuela-Escárcega, Mihai Surdeanu
We propose a system that assists a user in constructing transparent information extraction models, consisting of patterns (or rules) written in a declarative language, through program synthesis. Users of our system can specify their requirements through the use of examples, which are collected with a search interface. The rule-synthesis system proposes rule candidates and the results of applying them on a textual corpus; the user has the option to accept the candidate, request another option, or adjust the examples provided to the system. Through an interactive evaluation, we show that our approach generates high-precision rules even in a 1-shot setting.
no code implementations • PANDL (COLING) 2022 • Robert Vacareanu, Dane Bell, Mihai Surdeanu
In this paper we revisit the direction of using lexico-syntactic patterns for relation extraction instead of today’s ubiquitous neural classifiers.
no code implementations • EMNLP (insights) 2020 • Zhengzhong Liang, Mihai Surdeanu
Large pretrained language models (LM) have been used successfully for multi-hop question answering.
no code implementations • EMNLP (insights) 2020 • Andrew Zupon, Faiz Rafique, Mihai Surdeanu
Neural networks are a common tool in NLP, but it is not always clear which architecture to use for a given task.