Search Results for author: Mihai Surdeanu

Found 82 papers, 16 papers with code

Interpretability Rules: Jointly Bootstrapping a Neural Relation Extractorwith an Explanation Decoder

no code implementations NAACL (TrustNLP) 2021 Zheng Tang, Mihai Surdeanu

We introduce a method that transforms a rule-based relation extraction (RE) classifier into a neural one such that both interpretability and performance are achieved.

Relation Extraction

A Human-machine Interface for Few-shot Rule Synthesis for Information Extraction

no code implementations NAACL (ACL) 2022 Robert Vacareanu, George C.G. Barbosa, Enrique Noriega-Atala, Gus Hahn-Powell, Rebecca Sharp, Marco A. Valenzuela-Escárcega, Mihai Surdeanu

We propose a system that assists a user in constructing transparent information extraction models, consisting of patterns (or rules) written in a declarative language, through program synthesis. Users of our system can specify their requirements through the use of examples, which are collected with a search interface. The rule-synthesis system proposes rule candidates and the results of applying them on a textual corpus; the user has the option to accept the candidate, request another option, or adjust the examples provided to the system. Through an interactive evaluation, we show that our approach generates high-precision rules even in a 1-shot setting.

Relation Extraction

Taxonomy Builder: a Data-driven and User-centric Tool for Streamlining Taxonomy Construction

no code implementations NAACL (HCINLP) 2022 Mihai Surdeanu, John Hungerford, Yee Seng Chan, Jessica MacBride, Benjamin Gyori, Andrew Zupon, Zheng Tang, Haoling Qiu, Bonan Min, Yan Zverev, Caitlin Hilverman, Max Thomas, Walter Andrews, Keith Alcock, Zeyu Zhang, Michael Reynolds, Steven Bethard, Rebecca Sharp, Egoitz Laparra

An existing domain taxonomy for normalizing content is often assumed when discussing approaches to information extraction, yet often in real-world scenarios there is none. When one does exist, as the information needs shift, it must be continually extended.

Pretrained Language Models Text Summarization

Combining Extraction and Generation for Constructing Belief-Consequence Causal Links

no code implementations insights (ACL) 2022 Maria Alexeeva, Allegra A. Beal, Mihai Surdeanu

In this paper, we introduce and justify a new task—causal link extraction based on beliefs—and do a qualitative analysis of the ability of a large language model—InstructGPT-3—to generate implicit consequences of beliefs.

Language Modelling

Low Resource Causal Event Detection from Biomedical Literature

no code implementations BioNLP (ACL) 2022 Zhengzhong Liang, Enrique Noriega-Atala, Clayton Morrison, Mihai Surdeanu

Recognizing causal precedence relations among the chemical interactions in biomedical literature is crucial to understanding the underlying biological mechanisms.

Event Detection Knowledge Distillation

Students Who Study Together Learn Better: On the Importance of Collective Knowledge Distillation for Domain Transfer in Fact Verification

no code implementations EMNLP 2021 Mitch Paul Mithun, Sandeep Suntwal, Mihai Surdeanu

While neural networks produce state-of-the- art performance in several NLP tasks, they generally depend heavily on lexicalized information, which transfer poorly between domains.

Fact Verification Knowledge Distillation

PatternRank: Jointly Ranking Patterns and Extractions for Relation Extraction Using Graph-Based Algorithms

no code implementations PANDL (COLING) 2022 Robert Vacareanu, Dane Bell, Mihai Surdeanu

In this paper we revisit the direction of using lexico-syntactic patterns for relation extraction instead of today’s ubiquitous neural classifiers.

Relation Extraction

A STEP towards Interpretable Multi-Hop Reasoning:Bridge Phrase Identification and Query Expansion

no code implementations LREC 2022 Fan Luo, Mihai Surdeanu

Through an evaluation on HotpotQA, a popular dataset for multi-hop QA, we show that our method yields: (a) improved evidence retrieval, (b) improved QA performance when using the retrieved sentences; and (c) effective and faithful explanations when answers are provided.

Multi-hop Question Answering Question Answering +1

Do Transformer Networks Improve the Discovery of Rules from Text?

no code implementations LREC 2022 Mahdi Rahimi, Mihai Surdeanu

With their Discovery of Inference Rules from Text (DIRT) algorithm, Lin and Pantel (2001) made a seminal contribution to the field of rule acquisition from text, by adapting the distributional hypothesis of Harris (1954) to rules that model binary relations such as X treat Y. DIRT’s relevance is renewed in today’s neural era given the recent focus on interpretability in the field of natural language processing.

Language Modelling Question Answering

Validity Assessment of Legal Will Statements as Natural Language Inference

1 code implementation30 Oct 2022 Alice Saebom Kwak, Jacob O. Israelsen, Clayton T. Morrison, Derek E. Bambauer, Mihai Surdeanu

This work introduces a natural language inference (NLI) dataset that focuses on the validity of statements in legal wills.

Natural Language Inference

BioNLI: Generating a Biomedical NLI Dataset Using Lexico-semantic Constraints for Adversarial Examples

1 code implementation26 Oct 2022 Mohaddeseh Bastan, Mihai Surdeanu, Niranjan Balasubramanian

We introduce a novel semi-supervised procedure that bootstraps an NLI dataset from existing biomedical dataset that pairs mechanisms with experimental evidence in abstracts.

Decision Making Natural Language Inference

A Compact Pretraining Approach for Neural Language Models

1 code implementation25 Aug 2022 Shahriar Golchin, Mihai Surdeanu, Nazgol Tavabi, Ata Kiapour

We construct these compact subsets from the unstructured data using a combination of abstractive summaries and extractive keywords.

Domain Adaptation

SuMe: A Dataset Towards Summarizing Biomedical Mechanisms

1 code implementation ACL ARR November 2021 Mohaddeseh Bastan, Nishant Shankar, Mihai Surdeanu, Niranjan Balasubramanian

We leverage this structure and create a summarization task, where the input is a collection of sentences and the main entities in an abstract, and the output includes the relationship and a sentence that summarizes the mechanism.

Better Retrieval May Not Lead to Better Question Answering

no code implementations7 May 2022 Zhengzhong Liang, Tushar Khot, Steven Bethard, Mihai Surdeanu, Ashish Sabharwal

Considerable progress has been made recently in open-domain question answering (QA) problems, which require Information Retrieval (IR) and Reading Comprehension (RC).

Information Retrieval Open-Domain Question Answering +3

It Takes Two Flints to Make a Fire: Multitask Learning of Neural Relation and Explanation Classifiers

1 code implementation25 Apr 2022 Zheng Tang, Mihai Surdeanu

Our approach uses a multi-task learning architecture, which jointly trains a classifier for relation extraction, and a sequence model that labels words in the context of the relation that explain the decisions of the relation classifier.

Multi-Task Learning Relation Extraction

Automatic Correction of Syntactic Dependency Annotation Differences

no code implementations LREC 2022 Andrew Zupon, Andrew Carnie, Michael Hammond, Mihai Surdeanu

Annotation inconsistencies between data sets can cause problems for low-resource NLP, where noisy or inconsistent data cannot be as easily replaced compared with resource-rich languages.

Dependency Parsing TAG

Informal Persian Universal Dependency Treebank

2 code implementations LREC 2022 Roya Kabiri, Simin Karimi, Mihai Surdeanu

We then investigate the parsing of informal Persian by training two dependency parsers on existing formal treebanks and evaluating them on out-of-domain data, i. e. the development set of our informal treebank.

Neural Architectures for Biological Inter-Sentence Relation Extraction

no code implementations17 Dec 2021 Enrique Noriega-Atala, Peter M. Lovett, Clayton T. Morrison, Mihai Surdeanu

We introduce a family of deep-learning architectures for inter-sentence relation extraction, i. e., relations where the participants are not necessarily in the same sentence.

Feature Engineering Relation Extraction

Cheap and Good? Simple and Effective Data Augmentation for Low Resource Machine Reading

1 code implementation8 Jun 2021 Hoang Van, Vikas Yadav, Mihai Surdeanu

We propose a simple and effective strategy for data augmentation for low-resource machine reading comprehension (MRC).

Data Augmentation Machine Reading Comprehension +1

If You Want to Go Far Go Together: Unsupervised Joint Candidate Evidence Retrieval for Multi-hop Question Answering

no code implementations NAACL 2021 Vikas Yadav, Steven Bethard, Mihai Surdeanu

We specifically emphasize on the importance of retrieving evidence jointly by showing several comparative analyses to other methods that retrieve and rerank evidence sentences individually.

Answer Selection Multi-hop Question Answering +1

Data and Model Distillation as a Solution for Domain-transferable Fact Verification

no code implementations NAACL 2021 Mitch Paul Mithun, Sandeep Suntwal, Mihai Surdeanu

While neural networks produce state-of-the-art performance in several NLP tasks, they generally depend heavily on lexicalized information, which transfer poorly between domains.

Fact Verification

An Unsupervised Method for Learning Representations of Multi-word Expressions for Semantic Classification

no code implementations COLING 2020 Robert Vacareanu, Marco A. Valenzuela-Esc{\'a}rcega, Rebecca Sharp, Mihai Surdeanu

This paper explores an unsupervised approach to learning a compositional representation function for multi-word expressions (MWEs), and evaluates it on the Tratz dataset, which associates two-word expressions with the semantic relation between the compound constituents (e. g. the label employer is associated with the noun compound government agency) (Tratz, 2011).

The Language of Food during the Pandemic: Hints about the Dietary Effects of Covid-19

no code implementations15 Oct 2020 Hoang Van, Ahmad Musa, Mihai Surdeanu, Stephen Kobourov

Specifically, we analyze over770, 000 tweets published during the lockdown and the equivalent period in the five previous years and highlight several worrying trends.


Exploring Interpretability in Event Extraction: Multitask Learning of a Neural Event Classifier and an Explanation Decoder

no code implementations ACL 2020 Zheng Tang, Gus Hahn-Powell, Mihai Surdeanu

Our approach uses an encoder-decoder architecture, which jointly trains a classifier for event extraction, and a rule decoder that generates syntactico-semantic rules that explain the decisions of the event classifier.

Event Extraction

Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering

1 code implementation ACL 2020 Vikas Yadav, Steven Bethard, Mihai Surdeanu

Evidence retrieval is a critical stage of question answering (QA), necessary not only to improve performance, but also to explain the decisions of the corresponding QA method.

Evidence Selection Multi-hop Question Answering +2

Parsing as Tagging

no code implementations LREC 2020 Robert Vacareanu, George Caique Gouveia Barbosa, Marco A. Valenzuela-Esc{\'a}rcega, Mihai Surdeanu

For example, for the sentence John eats cake, the tag to be predicted for the token cake is -1 because its head (eats) occurs one token to the left.

Dependency Parsing TAG

Quick and (not so) Dirty: Unsupervised Selection of Justification Sentences for Multi-hop Question Answering

no code implementations IJCNLP 2019 Vikas Yadav, Steven Bethard, Mihai Surdeanu

We show that the sentences selected by our method improve the performance of a state-of-the-art supervised QA model on two multi-hop QA datasets: AI2's Reasoning Challenge (ARC) and Multi-Sentence Reading Comprehension (MultiRC).

Information Retrieval Multi-hop Question Answering +3

What does the language of foods say about us?

no code implementations WS 2019 Hoang Van, Ahmad Musa, Hang Chen, Stephen Kobourov, Mihai Surdeanu

Second, we investigate the effect of socioeconomic factors (income, poverty, and education) on predicting state-level T2DM rates.

On the Importance of Delexicalization for Fact Verification

no code implementations IJCNLP 2019 Sandeep Suntwal, Mithun Paul, Rebecca Sharp, Mihai Surdeanu

As expected, even though this method achieves high accuracy when evaluated in the same domain, the performance in the target domain is poor, marginally above chance. To mitigate this dependence on lexicalized information, we experiment with several strategies for masking out names by replacing them with their semantic category, coupled with a unique identifier to mark that the same or new entities are referenced between claim and evidence.

Fact Verification Natural Language Inference +2

Semi-Supervised Teacher-Student Architecture for Relation Extraction

no code implementations WS 2019 Fan Luo, Ajay Nagesh, Rebecca Sharp, Mihai Surdeanu

Generating a large amount of training data for information extraction (IE) is either costly (if annotations are created manually), or runs the risk of introducing noisy instances (if distant supervision is used).

Binary Relation Extraction Denoising

Understanding the Polarity of Events in the Biomedical Literature: Deep Learning vs. Linguistically-informed Methods

no code implementations WS 2019 Enrique Noriega-Atala, Zhengzhong Liang, John Bachman, Clayton Morrison, Mihai Surdeanu

An important task in the machine reading of biochemical events expressed in biomedical texts is correctly reading the polarity, i. e., attributing whether the biochemical event is a promotion or an inhibition.

Reading Comprehension

Exploration of Noise Strategies in Semi-supervised Named Entity Classification

no code implementations SEMEVAL 2019 Pooja Lakshmi Narayan, Ajay Nagesh, Mihai Surdeanu

Our work aims to address this gap by exploring different noise strategies for the semi-supervised named entity classification task, including statistical methods such as adding Gaussian noise to input embeddings, and linguistically-inspired ones such as dropping words and replacing words with their synonyms.

Classification General Classification +1

Alignment over Heterogeneous Embeddings for Question Answering

1 code implementation NAACL 2019 Vikas Yadav, Steven Bethard, Mihai Surdeanu

We propose a simple, fast, and mostly-unsupervised approach for non-factoid question answering (QA) called Alignment over Heterogeneous Embeddings (AHE).

Question Answering Sentence Embeddings

A mostly unlexicalized model for recognizing textual entailment

no code implementations WS 2018 Mithun Paul, Rebecca Sharp, Mihai Surdeanu

For example, such a system trained in the news domain may learn that a sentence like {``}Palestinians recognize Texas as part of Mexico{''} tends to be unsupported, but this fact (and its corresponding lexicalized cues) have no value in, say, a scientific domain.

Fake News Detection Information Retrieval +3

Detecting Diabetes Risk from Social Media Activity

no code implementations WS 2018 Dane Bell, Egoitz Laparra, Aditya Kousik, Terron Ishihara, Mihai Surdeanu, Stephen Kobourov

This work explores the detection of individuals{'} risk of type 2 diabetes mellitus (T2DM) directly from their social media (Twitter) activity.

Domain Adaptation

Visual Supervision in Bootstrapped Information Extraction

no code implementations EMNLP 2018 Matthew Berger, Ajay Nagesh, Joshua Levine, Mihai Surdeanu, Helen Zhang

We challenge a common assumption in active learning, that a list-based interface populated by informative samples provides for efficient and effective data annotation.

Active Learning General Classification

An Exploration of Three Lightly-supervised Representation Learning Approaches for Named Entity Classification

no code implementations COLING 2018 Ajay Nagesh, Mihai Surdeanu

Several semi-supervised representation learning methods have been proposed recently that mitigate the drawbacks of traditional bootstrapping: they reduce the amount of semantic drift introduced by iterative approaches through one-shot learning; others address the sparsity of data through the learning of custom, dense representation for the information modeled.

General Classification One-Shot Learning +1

Lightly-supervised Representation Learning with Global Interpretability

no code implementations WS 2019 Marco A. Valenzuela-Escárcega, Ajay Nagesh, Mihai Surdeanu

We propose a lightly-supervised approach for information extraction, in particular named entity classification, which combines the benefits of traditional bootstrapping, i. e., use of limited annotations and interpretability of extraction patterns, with the robust learning approaches proposed in representation learning.

Representation Learning

Text Annotation Graphs: Annotating Complex Natural Language Phenomena

1 code implementation LREC 2018 Angus G. Forbes, Kristine Lee, Gus Hahn-Powell, Marco A. Valenzuela-Escárcega, Mihai Surdeanu

Additionally, we include an approach to representing text annotations in which annotation subgraphs, or semantic summaries, are used to show relationships outside of the sequential context of the text itself.

Event Extraction TAG +1

Learning what to read: Focused machine reading

no code implementations EMNLP 2017 Enrique Noriega-Atala, Marco A. Valenzuela-Escarcega, Clayton T. Morrison, Mihai Surdeanu

In this work, we introduce a focused reading approach to guide the machine reading of biomedical literature towards what literature should be read to answer a biomedical query as efficiently as possible.

Reading Comprehension

Framing QA as Building and Ranking Intersentence Answer Justifications

no code implementations CL 2017 Peter Jansen, Rebecca Sharp, Mihai Surdeanu, Peter Clark

Our best configuration answers 44{\%} of the questions correctly, where the top justifications for 57{\%} of these correct answers contain a compelling human-readable justification that explains the inference required to arrive at the correct answer.

Multiple-choice Question Answering

Creating Causal Embeddings for Question Answering with Minimal Supervision

no code implementations EMNLP 2016 Rebecca Sharp, Mihai Surdeanu, Peter Jansen, Peter Clark, Michael Hammond

We argue that a better approach is to look for answers that are related to the question in a relevant way, according to the information need of the question, which may be determined through task-specific embeddings.

Question Answering Word Embeddings

SnapToGrid: From Statistical to Interpretable Models for Biomedical Information Extraction

no code implementations WS 2016 Marco A. Valenzuela-Escarcega, Gus Hahn-Powell, Dane Bell, Mihai Surdeanu

We propose an approach for biomedical information extraction that marries the advantages of machine learning models, e. g., learning directly from data, with the benefits of rule-based approaches, e. g., interpretability.

BIG-bench Machine Learning Event Extraction

This before That: Causal Precedence in the Biomedical Domain

2 code implementations WS 2016 Gus Hahn-Powell, Dane Bell, Marco A. Valenzuela-Escárcega, Mihai Surdeanu

Causal precedence between biochemical interactions is crucial in the biomedical domain, because it transforms collections of individual interactions, e. g., bindings and phosphorylations, into the causal mechanisms needed to inform meaningful search and inference.

Odin's Runes: A Rule Language for Information Extraction

no code implementations LREC 2016 Marco A. Valenzuela-Esc{\'a}rcega, Gus Hahn-Powell, Mihai Surdeanu

Odin is an information extraction framework that applies cascades of finite state automata over both surface text and syntactic dependency graphs.

Sieve-based Coreference Resolution in the Biomedical Domain

no code implementations LREC 2016 Dane Bell, Gus Hahn-Powell, Marco A. Valenzuela-Escárcega, Mihai Surdeanu

We describe challenges and advantages unique to coreference resolution in the biomedical domain, and a sieve-based architecture that leverages domain knowledge for both entity and event coreference resolution.

coreference-resolution Coreference Resolution +2

Description of the Odin Event Extraction Framework and Rule Language

1 code implementation24 Sep 2015 Marco A. Valenzuela-Escárcega, Gus Hahn-Powell, Mihai Surdeanu

Here we include a thorough definition of the Odin rule language, together with a description of the Odin API in the Scala language, which allows one to apply these rules to arbitrary texts.

Event Extraction

Higher-order Lexical Semantic Models for Non-factoid Answer Reranking

no code implementations TACL 2015 Daniel Fried, Peter Jansen, Gustave Hahn-Powell, Mihai Surdeanu, Peter Clark

We introduce a higher-order formalism that allows all these lexical semantic models to chain direct evidence to construct indirect associations between question and answer texts, by casting the task as the traversal of graphs that encode direct term associations.

Open-Domain Question Answering Semantic Similarity +1

Analyzing the Language of Food on Social Media

no code implementations8 Sep 2014 Daniel Fried, Mihai Surdeanu, Stephen Kobourov, Melanie Hingle, Dane Bell

We investigate the predictive power behind the language of food on social media.

Event Extraction Using Distant Supervision

no code implementations LREC 2014 Kevin Reschke, Martin Jankowiak, Mihai Surdeanu, Christopher Manning, Daniel Jurafsky

We present a new publicly available dataset and event extraction task in the plane crash domain based on Wikipedia infoboxes and newswire text.

Event Extraction Knowledge Base Population +2

Cannot find the paper you are looking for? You can Submit a new open access paper.