no code implementations • EMNLP 2020 • Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan
Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding.
no code implementations • LREC 2022 • Merel Scholman, Valentina Pyatkin, Frances Yung, Ido Dagan, Reut Tsarfaty, Vera Demberg
The current contribution studies the effect of worker selection and training on the agreement on implicit relation labels between workers and gold labels, for both the DC and the QA method.
no code implementations • ACL 2022 • Amit Seker, Elron Bandel, Dan Bareket, Idan Brusilovsky, Refael Greenfeld, Reut Tsarfaty
First, so far, Hebrew resources for training large language models are not of the same magnitude as their English counterparts.
no code implementations • ACL 2022 • Omer Goldman, David Guriel, Reut Tsarfaty
In the domain of Morphology, Inflection is a fundamental and important task that gained a lot of traction in recent years, mostly via SIGMORPHON’s shared-tasks. With average accuracy above 0. 9 over the scores of all languages, the task is considered mostly solved using relatively generic neural seq2seq models, even with little data provided. In this work, we propose to re-evaluate morphological inflection models by employing harder train-test splits that will challenge the generalization capacity of the models.
no code implementations • EMNLP (MRL) 2021 • Omer Goldman, Reut Tsarfaty
Morphological tasks have gained decent popularity within the NLP community in the recent years, with large multi-lingual datasets providing morphological analysis of words, either in or out of context.
no code implementations • *SEM (NAACL) 2022 • Ronen Tamari, Kyle Richardson, Noam Kahlon, Aviad Sar-Shalom, Nelson F. Liu, Reut Tsarfaty, Dafna Shahaf
However, the main synthetic resource for story understanding, the bAbI benchmark, lacks such a systematic mechanism for controllable task generation.
no code implementations • 1 Oct 2023 • Aviya Maimon, Reut Tsarfaty
On two benchmarks for coherence scoring rated by humans, one containing 500 automatically-generated short stories and another containing 4k real-world texts, our experiments confirm that jointly training on the proposed tasks leads to better performance on each task compared with task-specific models, and to better performance on assessing coherence overall, compared with strong baselines.
no code implementations • 6 Jul 2023 • Roni Rabin, Alexandre Djerbetian, Roee Engelberg, Lidan Hackmon, Gal Elidan, Reut Tsarfaty, Amir Globerson
Human communication often involves information gaps between the interlocutors.
1 code implementation • 2 Jul 2023 • Tzuf Paz-Argaman, Tal Bauman, Itai Mondshine, Itzhak Omer, Sagi Dalyot, Reut Tsarfaty
The task of textual geolocation - retrieving the coordinates of a place based on a free-form language description - calls for not only grounding but also natural language understanding and geospatial reasoning.
1 code implementation • 21 Jun 2023 • David Guriel, Omer Goldman, Reut Tsarfaty
Recent years have brought great advances into solving morphological tasks, mostly due to powerful neural models applied to various tasks as (re)inflection and analysis.
no code implementations • 26 May 2023 • Royi Rassin, Yoav Goldberg, Reut Tsarfaty
In this work we propose a conjunct resolution task that operates directly on the text and makes use of a split-and-rephrase paradigm in order to recover the missing elements in the coordination structure.
no code implementations • 24 May 2023 • Victoria Basmov, Yoav Goldberg, Reut Tsarfaty
This paper sheds light on the limitations of ChatGPT's understanding capabilities, focusing on simple inference tasks that are typically easy for humans but appear to be challenging for the model.
no code implementations • 3 Apr 2023 • Valentina Pyatkin, Frances Yung, Merel C. J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg
Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks.
no code implementations • 19 Dec 2022 • Matan Eyal, Hila Noga, Roee Aharoni, Idan Szpektor, Reut Tsarfaty
We demonstrate that by casting tasks in the Hebrew NLP pipeline as text-to-text tasks, we can leverage powerful multilingual, pretrained sequence-to-sequence models as mT5, eliminating the need for a specialized, morpheme-based, separately fine-tuned decoder.
no code implementations • 28 Nov 2022 • Eylon Gueta, Avi Shmidman, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Joshua Guedalia, Moshe Koppel, Dan Bareket, Amit Seker, Reut Tsarfaty
We perform a contrastive analysis of this model against all previous Hebrew PLMs (mBERT, heBERT, AlephBERT) and assess the effects of larger vocabularies on task performance.
1 code implementation • 15 Nov 2022 • Kyle Richardson, Ronen Tamari, Oren Sultan, Reut Tsarfaty, Dafna Shahaf, Ashish Sabharwal
Can we teach natural language understanding models to track their beliefs through intermediate points in text?
no code implementations • LREC 2022 • Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
no code implementations • 10 Apr 2022 • Omri Keren, Tal Avinari, Reut Tsarfaty, Omer Levy
Large pretrained language models (PLMs) typically tokenize the input string into contiguous subwords before any pretraining or inference.
no code implementations • 21 Mar 2022 • Idan Brusilovsky, Reut Tsarfaty
Tokenizing raw texts into word units is an essential pre-processing step for critical tasks in the NLP pipeline such as tagging, parsing, named entity recognition, and more.
no code implementations • ACL 2022 • David Guriel, Omer Goldman, Reut Tsarfaty
In recent years, a flurry of morphological datasets had emerged, most notably UniMorph, a multi-lingual repository of inflection tables.
no code implementations • 25 Feb 2022 • Omer Goldman, Reut Tsarfaty
We use this dataset to derive 3 clause-level morphological tasks: inflection, reinflection and analysis.
no code implementations • 30 Nov 2021 • Ronen Tamari, Kyle Richardson, Aviad Sar-Shalom, Noam Kahlon, Nelson Liu, Reut Tsarfaty, Dafna Shahaf
However, the main synthetic resource for story understanding, the bAbI benchmark, lacks such a systematic mechanism for controllable task generation.
1 code implementation • 24 Sep 2021 • Yanai Elazar, Victoria Basmov, Yoav Goldberg, Reut Tsarfaty
Understanding the relations between entities denoted by NPs in a text is a critical part of human-like natural language understanding.
1 code implementation • EMNLP 2021 • Valentina Pyatkin, Paul Roit, Julian Michael, Reut Tsarfaty, Yoav Goldberg, Ido Dagan
We develop a two-stage model for this task, which first produces a context-independent question prototype for each role and then revises it to be contextually appropriate for the passage.
1 code implementation • 12 Aug 2021 • Omer Goldman, David Guriel, Reut Tsarfaty
The effect is most significant for low-resourced languages with a drop as high as 95 points, but even high-resourced languages lose about 10 points on average.
no code implementations • 27 Jun 2021 • Royi Lachmy, Valentina Pyatkin, Avshalom Manevich, Reut Tsarfaty
Abstraction is a core tenet of human cognition and communication.
2 code implementations • ACL 2021 • Valentina Pyatkin, Shoval Sadde, Aynat Rubinstein, Paul Portner, Reut Tsarfaty
Modality is the linguistic ability to describe events with added information such as how desirable, plausible, or feasible they are.
1 code implementation • EMNLP 2021 • Omer Goldman, Reut Tsarfaty
Neural models for the various flavours of morphological inflection tasks have proven to be extremely accurate given ample labeled data -- data that may be slow and costly to obtain.
2 code implementations • 8 Apr 2021 • Amit Seker, Elron Bandel, Dan Bareket, Idan Brusilovsky, Refael Shaked Greenfeld, Reut Tsarfaty
Second, there are no accepted tasks and benchmarks to evaluate the progress of Hebrew PLMs on.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Amit Seker, Reut Tsarfaty
Neural MD may be addressed as a simple pipeline, where segmentation is followed by sequence tagging, or as an end-to-end model, predicting morphemes from raw tokens.
no code implementations • EMNLP (insights) 2020 • Yanai Elazar, Victoria Basmov, Shauli Ravfogel, Yoav Goldberg, Reut Tsarfaty
In this work, we follow known methodologies of collecting labeled data for the complement coercion phenomenon.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Tzuf Paz-Argaman, Yuval Atzmon, Gal Chechik, Reut Tsarfaty
Specifically, given birds' images with free-text descriptions of their species, we learn to classify images of previously-unseen species based on specie descriptions.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Moshe Koppel, Reut Tsarfaty
One of the primary tasks of morphological parsers is the disambiguation of homographs.
1 code implementation • 6 Oct 2020 • Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan
Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding.
no code implementations • 1 Oct 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
4 code implementations • 30 Jul 2020 • Dan Bareket, Reut Tsarfaty
Named Entity Recognition (NER) is a fundamental NLP task, commonly formulated as classification over a sequence of tokens.
no code implementations • WS 2020 • Stav Klein, Reut Tsarfaty
Therefore, when using word-pieces in MRLs, we must consider that: (1) a linear segmentation into sub-word units might not capture the full morphological complexity of words; and (2) representations that leave morphological knowledge on sub-word units inaccessible might negatively affect performance.
no code implementations • ACL 2020 • Reut Tsarfaty, Dan Bareket, Stav Klein, Amit Seker
It has been exactly a decade since the first establishment of SPMRL, a research initiative unifying multiple research efforts to address the peculiar challenges of Statistical Parsing for Morphologically-Rich Languages (MRLs). Here we reflect on parsing MRLs in that decade, highlight the solutions and lessons learned for the architectural, modeling and lexical challenges in the pre-neural era, and argue that similar challenges re-emerge in neural architectures for MRLs.
1 code implementation • ACL 2020 • Aryeh Tiktinsky, Yoav Goldberg, Reut Tsarfaty
We present pyBART, an easy-to-use open-source Python library for converting English UD trees either to Enhanced UD graphs or to our representation.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
no code implementations • 10 Mar 2020 • Ronen Tamari, Gabriel Stanovsky, Dafna Shahaf, Reut Tsarfaty
Large-scale natural language understanding (NLU) systems have made impressive progress: they can be applied flexibly across a variety of tasks, and employ minimal structural assumptions.
1 code implementation • IJCNLP 2019 • Tzuf Paz-Argaman, Reut Tsarfaty
Following navigation instructions in natural language requires a composition of language, action, and knowledge of the environment.
no code implementations • IJCNLP 2019 • Reut Tsarfaty, Amit Seker, Shoval Sadde, Stav Klein
For languages with simple morphology, such as English, automatic annotation pipelines such as spaCy or Stanford's CoreNLP successfully serve projects in academia and the industry.
no code implementations • TACL 2019 • Amir More, Amit Seker, Victoria Basmova, Reut Tsarfaty
In standard NLP pipelines, morphological analysis and disambiguation (MA{\&}D) precedes syntactic and semantic downstream tasks.
no code implementations • WS 2018 • Shoval Sade, Amit Seker, Reut Tsarfaty
The Hebrew treebank (HTB), consisting of 6221 morpho-syntactically annotated newspaper sentences, has been the only resource for training and validating statistical parsers and taggers for Hebrew, for almost two decades now.
no code implementations • CONLL 2018 • Amit Seker, Amir More, Reut Tsarfaty
We present the contribution of the ONLP lab at the Open University of Israel to the UD shared task on multilingual parsing from raw text to Universal Dependencies.
no code implementations • COLING 2018 • Adam Amram, Anat Ben David, Reut Tsarfaty
To empirically study these effects, we develop a new sentiment analysis benchmark for Hebrew, based on 12K social media comments, and provide two instances of these data: in token-based and morpheme-based settings.
no code implementations • CONLL 2017 • Amir More, Reut Tsarfaty
Our parser requires a lattice as input, so we generate morphological analyses of surface tokens using a data-driven morphological analyzer that derives its lexicon from the UD training corpora, and we rely on UDPipe for sentence segmentation and surface-level tokenization.
no code implementations • ACL 2017 • Tomer Cagan, Stefan L. Frank, Reut Tsarfaty
Opinionated Natural Language Generation (ONLG) is a new, challenging, task that aims to automatically generate human-like, subjective, responses to opinionated articles online.
1 code implementation • COLING 2016 • Amir More, Reut Tsarfaty
Parsing texts into universal dependencies (UD) in realistic scenarios requires infrastructure for the morphological analysis and disambiguation (MA{\&}D) of typologically different languages as a first tier.
no code implementations • LREC 2016 • Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Haji{\v{c}}, Christopher D. Manning, Ryan Mcdonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, Daniel Zeman
Cross-linguistically consistent annotation is necessary for sound comparative evaluation and cross-lingual learning experiments.
no code implementations • WS 2013 • Djam{\'e} Seddah, Reut Tsarfaty, S K{\"u}bler, ra, C, Marie ito, Jinho D. Choi, Rich{\'a}rd Farkas, Jennifer Foster, Iakes Goenaga, Koldo Gojenola Galletebeitia, Yoav Goldberg, Spence Green, Nizar Habash, Marco Kuhlmann, Wolfgang Maier, Joakim Nivre, Adam Przepi{\'o}rkowski, Ryan Roth, Wolfgang Seeker, Yannick Versley, Veronika Vincze, Marcin Woli{\'n}ski, Alina Wr{\'o}blewska, Eric Villemonte de la Clergerie