no code implementations • EMNLP 2020 • Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan
Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding.
no code implementations • ACL 2022 • Amit Seker, Elron Bandel, Dan Bareket, Idan Brusilovsky, Refael Greenfeld, Reut Tsarfaty
First, so far, Hebrew resources for training large language models are not of the same magnitude as their English counterparts.
no code implementations • EMNLP (MRL) 2021 • Omer Goldman, Reut Tsarfaty
Morphological tasks have gained decent popularity within the NLP community in the recent years, with large multi-lingual datasets providing morphological analysis of words, either in or out of context.
no code implementations • ACL 2022 • Omer Goldman, David Guriel, Reut Tsarfaty
In the domain of Morphology, Inflection is a fundamental and important task that gained a lot of traction in recent years, mostly via SIGMORPHON’s shared-tasks. With average accuracy above 0. 9 over the scores of all languages, the task is considered mostly solved using relatively generic neural seq2seq models, even with little data provided. In this work, we propose to re-evaluate morphological inflection models by employing harder train-test splits that will challenge the generalization capacity of the models.
no code implementations • 7 May 2022 • Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
no code implementations • 10 Apr 2022 • Omri Keren, Tal Avinari, Reut Tsarfaty, Omer Levy
Large pretrained language models (PLMs) typically tokenize the input string into contiguous subwords before any pretraining or inference.
no code implementations • 21 Mar 2022 • Idan Brusilovsky, Reut Tsarfaty
Tokenizing raw texts into word units is an essential pre-processing step for critical tasks in the NLP pipeline such as tagging, parsing, named entity recognition, and more.
no code implementations • ACL 2022 • David Guriel, Omer Goldman, Reut Tsarfaty
In recent years, a flurry of morphological datasets had emerged, most notably UniMorph, a multi-lingual repository of inflection tables.
no code implementations • 25 Feb 2022 • Omer Goldman, Reut Tsarfaty
We use this dataset to derive 3 clause-level morphological tasks: inflection, reinflection and analysis.
no code implementations • 30 Nov 2021 • Ronen Tamari, Kyle Richardson, Aviad Sar-Shalom, Noam Kahlon, Nelson Liu, Reut Tsarfaty, Dafna Shahaf
However, the main synthetic resource for story understanding, the bAbI benchmark, lacks such a systematic mechanism for controllable task generation.
1 code implementation • 24 Sep 2021 • Yanai Elazar, Victoria Basmov, Yoav Goldberg, Reut Tsarfaty
Understanding the relations between entities denoted by NPs in a text is a critical part of human-like natural language understanding.
1 code implementation • EMNLP 2021 • Valentina Pyatkin, Paul Roit, Julian Michael, Reut Tsarfaty, Yoav Goldberg, Ido Dagan
We develop a two-stage model for this task, which first produces a context-independent question prototype for each role and then revises it to be contextually appropriate for the passage.
1 code implementation • 12 Aug 2021 • Omer Goldman, David Guriel, Reut Tsarfaty
The effect is most significant for low-resourced languages with a drop as high as 95 points, but even high-resourced languages lose about 10 points on average.
no code implementations • 27 Jun 2021 • Royi Lachmy, Valentina Pyatkin, Reut Tsarfaty
Forming and interpreting abstraction is a core process in human communication.
2 code implementations • ACL 2021 • Valentina Pyatkin, Shoval Sadde, Aynat Rubinstein, Paul Portner, Reut Tsarfaty
Modality is the linguistic ability to describe events with added information such as how desirable, plausible, or feasible they are.
1 code implementation • EMNLP 2021 • Omer Goldman, Reut Tsarfaty
Neural models for the various flavours of morphological inflection tasks have proven to be extremely accurate given ample labeled data -- data that may be slow and costly to obtain.
1 code implementation • 8 Apr 2021 • Amit Seker, Elron Bandel, Dan Bareket, Idan Brusilovsky, Refael Shaked Greenfeld, Reut Tsarfaty
Second, there are no accepted tasks and benchmarks to evaluate the progress of Hebrew PLMs on.
Ranked #1 on
Named Entity Recognition
on NEMO-Corpus (token,test)
no code implementations • Findings of the Association for Computational Linguistics 2020 • Amit Seker, Reut Tsarfaty
Neural MD may be addressed as a simple pipeline, where segmentation is followed by sequence tagging, or as an end-to-end model, predicting morphemes from raw tokens.
no code implementations • EMNLP (insights) 2020 • Yanai Elazar, Victoria Basmov, Shauli Ravfogel, Yoav Goldberg, Reut Tsarfaty
In this work, we follow known methodologies of collecting labeled data for the complement coercion phenomenon.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Tzuf Paz-Argaman, Yuval Atzmon, Gal Chechik, Reut Tsarfaty
Specifically, given birds' images with free-text descriptions of their species, we learn to classify images of previously-unseen species based on specie descriptions.
1 code implementation • 6 Oct 2020 • Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan
Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Moshe Koppel, Reut Tsarfaty
One of the primary tasks of morphological parsers is the disambiguation of homographs.
no code implementations • 1 Oct 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
3 code implementations • 30 Jul 2020 • Dan Bareket, Reut Tsarfaty
Named Entity Recognition (NER) is a fundamental NLP task, commonly formulated as classification over a sequence of tokens.
Ranked #2 on
Named Entity Recognition
on NEMO-Corpus (token,test)
no code implementations • WS 2020 • Stav Klein, Reut Tsarfaty
Therefore, when using word-pieces in MRLs, we must consider that: (1) a linear segmentation into sub-word units might not capture the full morphological complexity of words; and (2) representations that leave morphological knowledge on sub-word units inaccessible might negatively affect performance.
no code implementations • ACL 2020 • Reut Tsarfaty, Dan Bareket, Stav Klein, Amit Seker
It has been exactly a decade since the first establishment of SPMRL, a research initiative unifying multiple research efforts to address the peculiar challenges of Statistical Parsing for Morphologically-Rich Languages (MRLs). Here we reflect on parsing MRLs in that decade, highlight the solutions and lessons learned for the architectural, modeling and lexical challenges in the pre-neural era, and argue that similar challenges re-emerge in neural architectures for MRLs.
1 code implementation • ACL 2020 • Aryeh Tiktinsky, Yoav Goldberg, Reut Tsarfaty
We present pyBART, an easy-to-use open-source Python library for converting English UD trees either to Enhanced UD graphs or to our representation.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
no code implementations • 10 Mar 2020 • Ronen Tamari, Gabriel Stanovsky, Dafna Shahaf, Reut Tsarfaty
Large-scale natural language understanding (NLU) systems have made impressive progress: they can be applied flexibly across a variety of tasks, and employ minimal structural assumptions.
1 code implementation • IJCNLP 2019 • Tzuf Paz-Argaman, Reut Tsarfaty
Following navigation instructions in natural language requires a composition of language, action, and knowledge of the environment.
no code implementations • IJCNLP 2019 • Reut Tsarfaty, Amit Seker, Shoval Sadde, Stav Klein
For languages with simple morphology, such as English, automatic annotation pipelines such as spaCy or Stanford's CoreNLP successfully serve projects in academia and the industry.
no code implementations • TACL 2019 • Amir More, Amit Seker, Victoria Basmova, Reut Tsarfaty
In standard NLP pipelines, morphological analysis and disambiguation (MA{\&}D) precedes syntactic and semantic downstream tasks.
no code implementations • WS 2018 • Shoval Sade, Amit Seker, Reut Tsarfaty
The Hebrew treebank (HTB), consisting of 6221 morpho-syntactically annotated newspaper sentences, has been the only resource for training and validating statistical parsers and taggers for Hebrew, for almost two decades now.
no code implementations • CONLL 2018 • Amit Seker, Amir More, Reut Tsarfaty
We present the contribution of the ONLP lab at the Open University of Israel to the UD shared task on multilingual parsing from raw text to Universal Dependencies.
no code implementations • COLING 2018 • Adam Amram, Anat Ben David, Reut Tsarfaty
To empirically study these effects, we develop a new sentiment analysis benchmark for Hebrew, based on 12K social media comments, and provide two instances of these data: in token-based and morpheme-based settings.
no code implementations • CONLL 2017 • Amir More, Reut Tsarfaty
Our parser requires a lattice as input, so we generate morphological analyses of surface tokens using a data-driven morphological analyzer that derives its lexicon from the UD training corpora, and we rely on UDPipe for sentence segmentation and surface-level tokenization.
no code implementations • ACL 2017 • Tomer Cagan, Stefan L. Frank, Reut Tsarfaty
Opinionated Natural Language Generation (ONLG) is a new, challenging, task that aims to automatically generate human-like, subjective, responses to opinionated articles online.
1 code implementation • COLING 2016 • Amir More, Reut Tsarfaty
Parsing texts into universal dependencies (UD) in realistic scenarios requires infrastructure for the morphological analysis and disambiguation (MA{\&}D) of typologically different languages as a first tier.
no code implementations • LREC 2016 • Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Haji{\v{c}}, Christopher D. Manning, Ryan Mcdonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, Daniel Zeman
Cross-linguistically consistent annotation is necessary for sound comparative evaluation and cross-lingual learning experiments.
no code implementations • WS 2013 • Djam{\'e} Seddah, Reut Tsarfaty, S K{\"u}bler, ra, C, Marie ito, Jinho D. Choi, Rich{\'a}rd Farkas, Jennifer Foster, Iakes Goenaga, Koldo Gojenola Galletebeitia, Yoav Goldberg, Spence Green, Nizar Habash, Marco Kuhlmann, Wolfgang Maier, Joakim Nivre, Adam Przepi{\'o}rkowski, Ryan Roth, Wolfgang Seeker, Yannick Versley, Veronika Vincze, Marcin Woli{\'n}ski, Alina Wr{\'o}blewska, Eric Villemonte de la Clergerie