no code implementations • ACL 2022 • Amit Seker, Elron Bandel, Dan Bareket, Idan Brusilovsky, Refael Greenfeld, Reut Tsarfaty
First, so far, Hebrew resources for training large language models are not of the same magnitude as their English counterparts.
no code implementations • ACL 2022 • Omer Goldman, David Guriel, Reut Tsarfaty
In the domain of Morphology, Inflection is a fundamental and important task that gained a lot of traction in recent years, mostly via SIGMORPHON’s shared-tasks. With average accuracy above 0. 9 over the scores of all languages, the task is considered mostly solved using relatively generic neural seq2seq models, even with little data provided. In this work, we propose to re-evaluate morphological inflection models by employing harder train-test splits that will challenge the generalization capacity of the models.
no code implementations • EMNLP 2020 • Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan
Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding.
no code implementations • *SEM (NAACL) 2022 • Ronen Tamari, Kyle Richardson, Noam Kahlon, Aviad Sar-Shalom, Nelson F. Liu, Reut Tsarfaty, Dafna Shahaf
However, the main synthetic resource for story understanding, the bAbI benchmark, lacks such a systematic mechanism for controllable task generation.
no code implementations • LREC 2022 • Merel Scholman, Valentina Pyatkin, Frances Yung, Ido Dagan, Reut Tsarfaty, Vera Demberg
The current contribution studies the effect of worker selection and training on the agreement on implicit relation labels between workers and gold labels, for both the DC and the QA method.
no code implementations • EMNLP (MRL) 2021 • Omer Goldman, Reut Tsarfaty
Morphological tasks have gained decent popularity within the NLP community in the recent years, with large multi-lingual datasets providing morphological analysis of words, either in or out of context.
no code implementations • 6 Aug 2024 • Avshalom Manevich, Reut Tsarfaty
Large Vision-Language Models (LVLMs) are an extension of Large Language Models (LLMs) that facilitate processing both image and text inputs, expanding AI capabilities.
1 code implementation • 15 Jul 2024 • Asaf Achi Mordechai, Yoav Goldberg, Reut Tsarfaty
Current Text-to-Code models demonstrate impressive capabilities in generating executable code from natural language snippets.
no code implementations • 29 Jun 2024 • Omer Goldman, Alon Jacovi, Aviv Slobodkin, Aviya Maimon, Ido Dagan, Reut Tsarfaty
By using a descriptive vocabulary and discussing the relevant properties of difficulty in long-context, we can implement more informed research in this area.
1 code implementation • 28 Jun 2024 • Tzuf Paz-Argaman, John Palowitch, Sayali Kulkarni, Reut Tsarfaty, Jason Baldridge
However, performance substantially drops in new environments with no training data.
1 code implementation • 6 Jun 2024 • Tzuf Paz-Argaman, Itai Mondshine, Asaf Achi Mordechai, Reut Tsarfaty
While large language models (LLMs) excel in various natural language tasks in English, their performance in lower-resourced languages like Hebrew, especially for generative tasks such as abstractive summarization, remains unclear.
no code implementations • 31 May 2024 • Valentina Pyatkin, Bonnie Webber, Ido Dagan, Reut Tsarfaty
Semantically, superlatives perform a set comparison: something (or some things) has the min/max property out of a set.
1 code implementation • 11 May 2024 • Avi Shmidman, Cheyn Shmuel Shmidman, Dan Bareket, Moshe Koppel, Reut Tsarfaty
We evaluate all existing models for contextualized Hebrew embeddings on a novel Hebrew homograph challenge sets that we deliver.
no code implementations • 9 Apr 2024 • Victoria Basmov, Yoav Goldberg, Reut Tsarfaty
In particular, while some models prove virtually unaffected by knowledge conflicts in affirmative and negative contexts, when faced with more semantically involved modal and conditional environments, they often fail to separate the text from their internal knowledge.
no code implementations • 11 Mar 2024 • Shaltiel Shmidman, Avi Shmidman, Moshe Koppel, Reut Tsarfaty
Syntactic parsing remains a critical tool for relation extraction and information extraction, especially in resource-scarce languages where LLMs are lacking.
no code implementations • 10 Mar 2024 • Omer Goldman, Avi Caciularu, Matan Eyal, Kris Cao, Idan Szpektor, Reut Tsarfaty
Despite it being the cornerstone of BPE, the most common tokenization algorithm, the importance of compression in the tokenization process is still unclear.
no code implementations • 4 Mar 2024 • Yotam Intrator, Matan Halfon, Roman Goldenberg, Reut Tsarfaty, Matan Eyal, Ehud Rivlin, Yossi Matias, Natalia Aizenberg
Large language models hold significant promise in multilingual applications.
1 code implementation • 26 Feb 2024 • Tzuf Paz-Argaman, Sayali Kulkarni, John Palowitch, Jason Baldridge, Reut Tsarfaty
Current navigation studies concentrate on egocentric local descriptions (e. g., `it will be on your right') that require reasoning over the agent's local perception.
no code implementations • 4 Feb 2024 • Danit Yshaayahu Levi, Reut Tsarfaty
Contemporary multilingual dependency parsers can parse a diverse set of languages, but for Morphologically Rich Languages (MRLs), performance is attested to be lower than other languages.
no code implementations • 3 Jan 2024 • Uri Shaham, Jonathan Herzig, Roee Aharoni, Idan Szpektor, Reut Tsarfaty, Matan Eyal
As instruction-tuned large language models (LLMs) gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial.
no code implementations • 1 Nov 2023 • Eylon Gueta, Omer Goldman, Reut Tsarfaty
We investigate the hypothesis that incorporating explicit morphological knowledge in the pre-training phase can improve the performance of PLMs for MRLs.
1 code implementation • 25 Oct 2023 • Daniela Ben-David, Tzuf Paz-Argaman, Reut Tsarfaty
On the well-known task of stylized image captioning, our experiments show that our approach outperforms semi-supervised state-of-the-art models, while being zero-shot and avoiding costly training, data collection, and prompt engineering.
no code implementations • 25 Oct 2023 • Aviya Maimon, Reut Tsarfaty
Up until now, little work has been done on explicitly assessing the coherence of generated texts and analyzing the factors contributing to (in)coherence.
no code implementations • 24 Oct 2023 • Tal Levy, Omer Goldman, Reut Tsarfaty
The ability to identify and control different kinds of linguistic information encoded in vector representations of words has many use cases, especially for explainability and bias removal.
no code implementations • 1 Oct 2023 • Aviya Maimon, Reut Tsarfaty
On two benchmarks for coherence scoring rated by humans, one containing 500 automatically-generated short stories and another containing 4k real-world texts, our experiments confirm that jointly training on the proposed tasks leads to better performance on each task compared with task-specific models, and to better performance on assessing coherence overall, compared with strong baselines.
no code implementations • 6 Jul 2023 • Roni Rabin, Alexandre Djerbetian, Roee Engelberg, Lidan Hackmon, Gal Elidan, Reut Tsarfaty, Amir Globerson
Human communication often involves information gaps between the interlocutors.
1 code implementation • 2 Jul 2023 • Tzuf Paz-Argaman, Tal Bauman, Itai Mondshine, Itzhak Omer, Sagi Dalyot, Reut Tsarfaty
The task of textual geolocation - retrieving the coordinates of a place based on a free-form language description - calls for not only grounding but also natural language understanding and geospatial reasoning.
1 code implementation • 21 Jun 2023 • David Guriel, Omer Goldman, Reut Tsarfaty
Recent years have brought great advances into solving morphological tasks, mostly due to powerful neural models applied to various tasks as (re)inflection and analysis.
no code implementations • 26 May 2023 • Royi Rassin, Yoav Goldberg, Reut Tsarfaty
In this work we propose a conjunct resolution task that operates directly on the text and makes use of a split-and-rephrase paradigm in order to recover the missing elements in the coordination structure.
no code implementations • 24 May 2023 • Victoria Basmov, Yoav Goldberg, Reut Tsarfaty
We evaluate LLMs' language understanding capacities on simple inference tasks that most humans find trivial.
1 code implementation • 3 Apr 2023 • Valentina Pyatkin, Frances Yung, Merel C. J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg
Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks.
no code implementations • 19 Dec 2022 • Matan Eyal, Hila Noga, Roee Aharoni, Idan Szpektor, Reut Tsarfaty
We demonstrate that by casting tasks in the Hebrew NLP pipeline as text-to-text tasks, we can leverage powerful multilingual, pretrained sequence-to-sequence models as mT5, eliminating the need for a specialized, morpheme-based, separately fine-tuned decoder.
no code implementations • 28 Nov 2022 • Eylon Gueta, Avi Shmidman, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Joshua Guedalia, Moshe Koppel, Dan Bareket, Amit Seker, Reut Tsarfaty
We perform a contrastive analysis of this model against all previous Hebrew PLMs (mBERT, heBERT, AlephBERT) and assess the effects of larger vocabularies on task performance.
Ranked #1 on Named Entity Recognition (NER) on NEMO-Corpus
1 code implementation • 15 Nov 2022 • Kyle Richardson, Ronen Tamari, Oren Sultan, Reut Tsarfaty, Dafna Shahaf, Ashish Sabharwal
Can we teach natural language understanding models to track their beliefs through intermediate points in text?
no code implementations • LREC 2022 • Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
no code implementations • 10 Apr 2022 • Omri Keren, Tal Avinari, Reut Tsarfaty, Omer Levy
Large pretrained language models (PLMs) typically tokenize the input string into contiguous subwords before any pretraining or inference.
no code implementations • 21 Mar 2022 • Idan Brusilovsky, Reut Tsarfaty
Tokenizing raw texts into word units is an essential pre-processing step for critical tasks in the NLP pipeline such as tagging, parsing, named entity recognition, and more.
no code implementations • ACL 2022 • David Guriel, Omer Goldman, Reut Tsarfaty
In recent years, a flurry of morphological datasets had emerged, most notably UniMorph, a multi-lingual repository of inflection tables.
no code implementations • 25 Feb 2022 • Omer Goldman, Reut Tsarfaty
We use this dataset to derive 3 clause-level morphological tasks: inflection, reinflection and analysis.
no code implementations • 30 Nov 2021 • Ronen Tamari, Kyle Richardson, Aviad Sar-Shalom, Noam Kahlon, Nelson Liu, Reut Tsarfaty, Dafna Shahaf
However, the main synthetic resource for story understanding, the bAbI benchmark, lacks such a systematic mechanism for controllable task generation.
1 code implementation • 24 Sep 2021 • Yanai Elazar, Victoria Basmov, Yoav Goldberg, Reut Tsarfaty
Understanding the relations between entities denoted by NPs in a text is a critical part of human-like natural language understanding.
1 code implementation • EMNLP 2021 • Valentina Pyatkin, Paul Roit, Julian Michael, Reut Tsarfaty, Yoav Goldberg, Ido Dagan
We develop a two-stage model for this task, which first produces a context-independent question prototype for each role and then revises it to be contextually appropriate for the passage.
1 code implementation • 12 Aug 2021 • Omer Goldman, David Guriel, Reut Tsarfaty
The effect is most significant for low-resourced languages with a drop as high as 95 points, but even high-resourced languages lose about 10 points on average.
no code implementations • 27 Jun 2021 • Royi Lachmy, Valentina Pyatkin, Avshalom Manevich, Reut Tsarfaty
Abstraction is a core tenet of human cognition and communication.
2 code implementations • ACL 2021 • Valentina Pyatkin, Shoval Sadde, Aynat Rubinstein, Paul Portner, Reut Tsarfaty
Modality is the linguistic ability to describe events with added information such as how desirable, plausible, or feasible they are.
1 code implementation • EMNLP 2021 • Omer Goldman, Reut Tsarfaty
Neural models for the various flavours of morphological inflection tasks have proven to be extremely accurate given ample labeled data -- data that may be slow and costly to obtain.
2 code implementations • 8 Apr 2021 • Amit Seker, Elron Bandel, Dan Bareket, Idan Brusilovsky, Refael Shaked Greenfeld, Reut Tsarfaty
Second, there are no accepted tasks and benchmarks to evaluate the progress of Hebrew PLMs on.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Amit Seker, Reut Tsarfaty
Neural MD may be addressed as a simple pipeline, where segmentation is followed by sequence tagging, or as an end-to-end model, predicting morphemes from raw tokens.
no code implementations • EMNLP (insights) 2020 • Yanai Elazar, Victoria Basmov, Shauli Ravfogel, Yoav Goldberg, Reut Tsarfaty
In this work, we follow known methodologies of collecting labeled data for the complement coercion phenomenon.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Tzuf Paz-Argaman, Yuval Atzmon, Gal Chechik, Reut Tsarfaty
Specifically, given birds' images with free-text descriptions of their species, we learn to classify images of previously-unseen species based on specie descriptions.
1 code implementation • 6 Oct 2020 • Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan
Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Moshe Koppel, Reut Tsarfaty
One of the primary tasks of morphological parsers is the disambiguation of homographs.
no code implementations • 1 Oct 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
4 code implementations • 30 Jul 2020 • Dan Bareket, Reut Tsarfaty
Named Entity Recognition (NER) is a fundamental NLP task, commonly formulated as classification over a sequence of tokens.
no code implementations • WS 2020 • Stav Klein, Reut Tsarfaty
Therefore, when using word-pieces in MRLs, we must consider that: (1) a linear segmentation into sub-word units might not capture the full morphological complexity of words; and (2) representations that leave morphological knowledge on sub-word units inaccessible might negatively affect performance.
1 code implementation • ACL 2020 • Aryeh Tiktinsky, Yoav Goldberg, Reut Tsarfaty
We present pyBART, an easy-to-use open-source Python library for converting English UD trees either to Enhanced UD graphs or to our representation.
no code implementations • ACL 2020 • Reut Tsarfaty, Dan Bareket, Stav Klein, Amit Seker
It has been exactly a decade since the first establishment of SPMRL, a research initiative unifying multiple research efforts to address the peculiar challenges of Statistical Parsing for Morphologically-Rich Languages (MRLs). Here we reflect on parsing MRLs in that decade, highlight the solutions and lessons learned for the architectural, modeling and lexical challenges in the pre-neural era, and argue that similar challenges re-emerge in neural architectures for MRLs.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
no code implementations • 10 Mar 2020 • Ronen Tamari, Gabriel Stanovsky, Dafna Shahaf, Reut Tsarfaty
Large-scale natural language understanding (NLU) systems have made impressive progress: they can be applied flexibly across a variety of tasks, and employ minimal structural assumptions.
1 code implementation • IJCNLP 2019 • Tzuf Paz-Argaman, Reut Tsarfaty
Following navigation instructions in natural language requires a composition of language, action, and knowledge of the environment.
no code implementations • IJCNLP 2019 • Reut Tsarfaty, Amit Seker, Shoval Sadde, Stav Klein
For languages with simple morphology, such as English, automatic annotation pipelines such as spaCy or Stanford's CoreNLP successfully serve projects in academia and the industry.
no code implementations • TACL 2019 • Amir More, Amit Seker, Victoria Basmova, Reut Tsarfaty
In standard NLP pipelines, morphological analysis and disambiguation (MA{\&}D) precedes syntactic and semantic downstream tasks.
no code implementations • WS 2018 • Shoval Sade, Amit Seker, Reut Tsarfaty
The Hebrew treebank (HTB), consisting of 6221 morpho-syntactically annotated newspaper sentences, has been the only resource for training and validating statistical parsers and taggers for Hebrew, for almost two decades now.
no code implementations • CONLL 2018 • Amit Seker, Amir More, Reut Tsarfaty
We present the contribution of the ONLP lab at the Open University of Israel to the UD shared task on multilingual parsing from raw text to Universal Dependencies.
no code implementations • COLING 2018 • Adam Amram, Anat Ben David, Reut Tsarfaty
To empirically study these effects, we develop a new sentiment analysis benchmark for Hebrew, based on 12K social media comments, and provide two instances of these data: in token-based and morpheme-based settings.
no code implementations • CONLL 2017 • Amir More, Reut Tsarfaty
Our parser requires a lattice as input, so we generate morphological analyses of surface tokens using a data-driven morphological analyzer that derives its lexicon from the UD training corpora, and we rely on UDPipe for sentence segmentation and surface-level tokenization.
no code implementations • ACL 2017 • Tomer Cagan, Stefan L. Frank, Reut Tsarfaty
Opinionated Natural Language Generation (ONLG) is a new, challenging, task that aims to automatically generate human-like, subjective, responses to opinionated articles online.
1 code implementation • COLING 2016 • Amir More, Reut Tsarfaty
Parsing texts into universal dependencies (UD) in realistic scenarios requires infrastructure for the morphological analysis and disambiguation (MA{\&}D) of typologically different languages as a first tier.
no code implementations • LREC 2016 • Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Haji{\v{c}}, Christopher D. Manning, Ryan Mcdonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, Daniel Zeman
Cross-linguistically consistent annotation is necessary for sound comparative evaluation and cross-lingual learning experiments.
no code implementations • WS 2013 • Djam{\'e} Seddah, Reut Tsarfaty, S K{\"u}bler, ra, C, Marie ito, Jinho D. Choi, Rich{\'a}rd Farkas, Jennifer Foster, Iakes Goenaga, Koldo Gojenola Galletebeitia, Yoav Goldberg, Spence Green, Nizar Habash, Marco Kuhlmann, Wolfgang Maier, Joakim Nivre, Adam Przepi{\'o}rkowski, Ryan Roth, Wolfgang Seeker, Yannick Versley, Veronika Vincze, Marcin Woli{\'n}ski, Alina Wr{\'o}blewska, Eric Villemonte de la Clergerie