Search Results for author: Reut Tsarfaty

Found 49 papers, 12 papers with code

QADiscourse - Discourse Relations as QA Pairs: Representation, Crowdsourcing and Baselines

no code implementations EMNLP 2020 Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan

Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding.

Natural Language Understanding

Well-Defined Morphology is Sentence-Level Morphology

no code implementations EMNLP (MRL) 2021 Omer Goldman, Reut Tsarfaty

Morphological tasks have gained decent popularity within the NLP community in the recent years, with large multi-lingual datasets providing morphological analysis of words, either in or out of context.

Morphological Analysis Morphological Inflection

(Un)solving Morphological Inflection: Lemma Overlap Artificially Inflates Models’ Performance

no code implementations ACL 2022 Omer Goldman, David Guriel, Reut Tsarfaty

In the domain of Morphology, Inflection is a fundamental and important task that gained a lot of traction in recent years, mostly via SIGMORPHON’s shared-tasks. With average accuracy above 0. 9 over the scores of all languages, the task is considered mostly solved using relatively generic neural seq2seq models, even with little data provided. In this work, we propose to re-evaluate morphological inflection models by employing harder train-test splits that will challenge the generalization capacity of the models.

Morphological Inflection

UniMorph 4.0: Universal Morphology

no code implementations7 May 2022 Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

Breaking Character: Are Subwords Good Enough for MRLs After All?

no code implementations10 Apr 2022 Omri Keren, Tal Avinari, Reut Tsarfaty, Omer Levy

Large pretrained language models (PLMs) typically tokenize the input string into contiguous subwords before any pretraining or inference.

Language Modelling Morphological Disambiguation +4

Neural Token Segmentation for High Token-Internal Complexity

no code implementations21 Mar 2022 Idan Brusilovsky, Reut Tsarfaty

Tokenizing raw texts into word units is an essential pre-processing step for critical tasks in the NLP pipeline such as tagging, parsing, named entity recognition, and more.

Dependency Parsing Named Entity Recognition +1

Morphological Reinflection with Multiple Arguments: An Extended Annotation schema and a Georgian Case Study

no code implementations ACL 2022 David Guriel, Omer Goldman, Reut Tsarfaty

In recent years, a flurry of morphological datasets had emerged, most notably UniMorph, a multi-lingual repository of inflection tables.

Morphology Without Borders: Clause-Level Morphological Annotation

no code implementations25 Feb 2022 Omer Goldman, Reut Tsarfaty

We use this dataset to derive 3 clause-level morphological tasks: inflection, reinflection and analysis.

Dyna-bAbI: unlocking bAbI's potential with dynamic synthetic benchmarking

no code implementations30 Nov 2021 Ronen Tamari, Kyle Richardson, Aviad Sar-Shalom, Noam Kahlon, Nelson Liu, Reut Tsarfaty, Dafna Shahaf

However, the main synthetic resource for story understanding, the bAbI benchmark, lacks such a systematic mechanism for controllable task generation.

Natural Language Understanding

Text-based NP Enrichment

1 code implementation24 Sep 2021 Yanai Elazar, Victoria Basmov, Yoav Goldberg, Reut Tsarfaty

Understanding the relations between entities denoted by NPs in a text is a critical part of human-like natural language understanding.

Natural Language Understanding

Asking It All: Generating Contextualized Questions for any Semantic Role

1 code implementation EMNLP 2021 Valentina Pyatkin, Paul Roit, Julian Michael, Reut Tsarfaty, Yoav Goldberg, Ido Dagan

We develop a two-stage model for this task, which first produces a context-independent question prototype for each role and then revises it to be contextually appropriate for the passage.

Question Generation

(Un)solving Morphological Inflection: Lemma Overlap Artificially Inflates Models' Performance

1 code implementation12 Aug 2021 Omer Goldman, David Guriel, Reut Tsarfaty

The effect is most significant for low-resourced languages with a drop as high as 95 points, but even high-resourced languages lose about 10 points on average.

Morphological Inflection

The Possible, the Plausible, and the Desirable: Event-Based Modality Detection for Language Processing

2 code implementations ACL 2021 Valentina Pyatkin, Shoval Sadde, Aynat Rubinstein, Paul Portner, Reut Tsarfaty

Modality is the linguistic ability to describe events with added information such as how desirable, plausible, or feasible they are.

Minimal Supervision for Morphological Inflection

1 code implementation EMNLP 2021 Omer Goldman, Reut Tsarfaty

Neural models for the various flavours of morphological inflection tasks have proven to be extremely accurate given ample labeled data -- data that may be slow and costly to obtain.

Morphological Inflection

A Pointer Network Architecture for Joint Morphological Segmentation and Tagging

no code implementations Findings of the Association for Computational Linguistics 2020 Amit Seker, Reut Tsarfaty

Neural MD may be addressed as a simple pipeline, where segmentation is followed by sequence tagging, or as an end-to-end model, predicting morphemes from raw tokens.

Morphological Disambiguation

ZEST: Zero-shot Learning from Text Descriptions using Textual Similarity and Visual Summarization

1 code implementation Findings of the Association for Computational Linguistics 2020 Tzuf Paz-Argaman, Yuval Atzmon, Gal Chechik, Reut Tsarfaty

Specifically, given birds' images with free-text descriptions of their species, we learn to classify images of previously-unseen species based on specie descriptions.

Zero-Shot Learning

QADiscourse -- Discourse Relations as QA Pairs: Representation, Crowdsourcing and Baselines

1 code implementation6 Oct 2020 Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan

Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding.

Natural Language Understanding

Evaluating NLP Models via Contrast Sets

no code implementations1 Oct 2020 Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou

Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.

Reading Comprehension Sentiment Analysis

Neural Modeling for Named Entities and Morphology (NEMO^2)

3 code implementations30 Jul 2020 Dan Bareket, Reut Tsarfaty

Named Entity Recognition (NER) is a fundamental NLP task, commonly formulated as classification over a sequence of tokens.

Named Entity Recognition NER

Getting the \#\#life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?

no code implementations WS 2020 Stav Klein, Reut Tsarfaty

Therefore, when using word-pieces in MRLs, we must consider that: (1) a linear segmentation into sub-word units might not capture the full morphological complexity of words; and (2) representations that leave morphological knowledge on sub-word units inaccessible might negatively affect performance.

TAG Word Embeddings

From SPMRL to NMRL: What Did We Learn (and Unlearn) in a Decade of Parsing Morphologically-Rich Languages (MRLs)?

no code implementations ACL 2020 Reut Tsarfaty, Dan Bareket, Stav Klein, Amit Seker

It has been exactly a decade since the first establishment of SPMRL, a research initiative unifying multiple research efforts to address the peculiar challenges of Statistical Parsing for Morphologically-Rich Languages (MRLs). Here we reflect on parsing MRLs in that decade, highlight the solutions and lessons learned for the architectural, modeling and lexical challenges in the pre-neural era, and argue that similar challenges re-emerge in neural architectures for MRLs.

pyBART: Evidence-based Syntactic Transformations for IE

1 code implementation ACL 2020 Aryeh Tiktinsky, Yoav Goldberg, Reut Tsarfaty

We present pyBART, an easy-to-use open-source Python library for converting English UD trees either to Enhanced UD graphs or to our representation.

Relation Extraction

Ecological Semantics: Programming Environments for Situated Language Understanding

no code implementations10 Mar 2020 Ronen Tamari, Gabriel Stanovsky, Dafna Shahaf, Reut Tsarfaty

Large-scale natural language understanding (NLU) systems have made impressive progress: they can be applied flexibly across a variety of tasks, and employ minimal structural assumptions.

Common Sense Reasoning Grounded language learning +1

RUN through the Streets: A New Dataset and Baseline Models for Realistic Urban Navigation

1 code implementation IJCNLP 2019 Tzuf Paz-Argaman, Reut Tsarfaty

Following navigation instructions in natural language requires a composition of language, action, and knowledge of the environment.

What's Wrong with Hebrew NLP? And How to Make it Right

no code implementations IJCNLP 2019 Reut Tsarfaty, Amit Seker, Shoval Sadde, Stav Klein

For languages with simple morphology, such as English, automatic annotation pipelines such as spaCy or Stanford's CoreNLP successfully serve projects in academia and the industry.

Morphological Disambiguation

The Hebrew Universal Dependency Treebank: Past Present and Future

no code implementations WS 2018 Shoval Sade, Amit Seker, Reut Tsarfaty

The Hebrew treebank (HTB), consisting of 6221 morpho-syntactically annotated newspaper sentences, has been the only resource for training and validating statistical parsers and taggers for Hebrew, for almost two decades now.

Dependency Parsing

Universal Morpho-Syntactic Parsing and the Contribution of Lexica: Analyzing the ONLP Lab Submission to the CoNLL 2018 Shared Task

no code implementations CONLL 2018 Amit Seker, Amir More, Reut Tsarfaty

We present the contribution of the ONLP lab at the Open University of Israel to the UD shared task on multilingual parsing from raw text to Universal Dependencies.

Representations and Architectures in Neural Sentiment Analysis for Morphologically Rich Languages: A Case Study from Modern Hebrew

no code implementations COLING 2018 Adam Amram, Anat Ben David, Reut Tsarfaty

To empirically study these effects, we develop a new sentiment analysis benchmark for Hebrew, based on 12K social media comments, and provide two instances of these data: in token-based and morpheme-based settings.

Sentiment Analysis Text Classification

Universal Joint Morph-Syntactic Processing: The Open University of Israel's Submission to The CoNLL 2017 Shared Task

no code implementations CONLL 2017 Amir More, Reut Tsarfaty

Our parser requires a lattice as input, so we generate morphological analyses of surface tokens using a data-driven morphological analyzer that derives its lexicon from the UD training corpora, and we rely on UDPipe for sentence segmentation and surface-level tokenization.

Sentence segmentation Word Embeddings

Data-Driven Broad-Coverage Grammars for Opinionated Natural Language Generation (ONLG)

no code implementations ACL 2017 Tomer Cagan, Stefan L. Frank, Reut Tsarfaty

Opinionated Natural Language Generation (ONLG) is a new, challenging, task that aims to automatically generate human-like, subjective, responses to opinionated articles online.

Language Modelling Text Generation +1

Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies

1 code implementation COLING 2016 Amir More, Reut Tsarfaty

Parsing texts into universal dependencies (UD) in realistic scenarios requires infrastructure for the morphological analysis and disambiguation (MA{\&}D) of typologically different languages as a first tier.

Morphological Analysis TAG

Cannot find the paper you are looking for? You can Submit a new open access paper.