Search Results for author: Reut Tsarfaty

Found 79 papers, 23 papers with code

(Un)solving Morphological Inflection: Lemma Overlap Artificially Inflates Models’ Performance

no code implementations ACL 2022 Omer Goldman, David Guriel, Reut Tsarfaty

In the domain of Morphology, Inflection is a fundamental and important task that gained a lot of traction in recent years, mostly via SIGMORPHON’s shared-tasks. With average accuracy above 0. 9 over the scores of all languages, the task is considered mostly solved using relatively generic neural seq2seq models, even with little data provided. In this work, we propose to re-evaluate morphological inflection models by employing harder train-test splits that will challenge the generalization capacity of the models.

LEMMA Morphological Inflection

QADiscourse - Discourse Relations as QA Pairs: Representation, Crowdsourcing and Baselines

no code implementations EMNLP 2020 Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan

Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding.

Natural Language Understanding Sentence

Design Choices in Crowdsourcing Discourse Relation Annotations: The Effect of Worker Selection and Training

no code implementations LREC 2022 Merel Scholman, Valentina Pyatkin, Frances Yung, Ido Dagan, Reut Tsarfaty, Vera Demberg

The current contribution studies the effect of worker selection and training on the agreement on implicit relation labels between workers and gold labels, for both the DC and the QA method.

Relation

Well-Defined Morphology is Sentence-Level Morphology

no code implementations EMNLP (MRL) 2021 Omer Goldman, Reut Tsarfaty

Morphological tasks have gained decent popularity within the NLP community in the recent years, with large multi-lingual datasets providing morphological analysis of words, either in or out of context.

Morphological Analysis Morphological Inflection +1

Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)

no code implementations6 Aug 2024 Avshalom Manevich, Reut Tsarfaty

Large Vision-Language Models (LVLMs) are an extension of Large Language Models (LLMs) that facilitate processing both image and text inputs, expanding AI capabilities.

Object

NoviCode: Generating Programs from Natural Language Utterances by Novices

1 code implementation15 Jul 2024 Asaf Achi Mordechai, Yoav Goldberg, Reut Tsarfaty

Current Text-to-Code models demonstrate impressive capabilities in generating executable code from natural language snippets.

Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP

no code implementations29 Jun 2024 Omer Goldman, Alon Jacovi, Aviv Slobodkin, Aviya Maimon, Ido Dagan, Reut Tsarfaty

By using a descriptive vocabulary and discussing the relevant properties of difficulty in long-context, we can implement more informed research in this area.

Book summarization Descriptive

HeSum: a Novel Dataset for Abstractive Text Summarization in Hebrew

1 code implementation6 Jun 2024 Tzuf Paz-Argaman, Itai Mondshine, Asaf Achi Mordechai, Reut Tsarfaty

While large language models (LLMs) excel in various natural language tasks in English, their performance in lower-resourced languages like Hebrew, especially for generative tasks such as abstractive summarization, remains unclear.

Abstractive Text Summarization Sentence

Superlatives in Context: Modeling the Implicit Semantics of Superlatives

no code implementations31 May 2024 Valentina Pyatkin, Bonnie Webber, Ido Dagan, Reut Tsarfaty

Semantically, superlatives perform a set comparison: something (or some things) has the min/max property out of a set.

Do Pretrained Contextual Language Models Distinguish between Hebrew Homograph Analyses?

1 code implementation11 May 2024 Avi Shmidman, Cheyn Shmuel Shmidman, Dan Bareket, Moshe Koppel, Reut Tsarfaty

We evaluate all existing models for contextualized Hebrew embeddings on a novel Hebrew homograph challenge sets that we deliver.

Word Sense Disambiguation

LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements

no code implementations9 Apr 2024 Victoria Basmov, Yoav Goldberg, Reut Tsarfaty

In particular, while some models prove virtually unaffected by knowledge conflicts in affirmative and negative contexts, when faced with more semantically involved modal and conditional environments, they often fail to separate the text from their internal knowledge.

Natural Language Understanding Question Answering +2

MRL Parsing Without Tears: The Case of Hebrew

no code implementations11 Mar 2024 Shaltiel Shmidman, Avi Shmidman, Moshe Koppel, Reut Tsarfaty

Syntactic parsing remains a critical tool for relation extraction and information extraction, especially in resource-scarce languages where LLMs are lacking.

Dependency Parsing POS +2

Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance

no code implementations10 Mar 2024 Omer Goldman, Avi Caciularu, Matan Eyal, Kris Cao, Idan Szpektor, Reut Tsarfaty

Despite it being the cornerstone of BPE, the most common tokenization algorithm, the importance of compression in the tokenization process is still unclear.

Language Modelling Text Compression

Where Do We Go from Here? Multi-scale Allocentric Relational Inference from Natural Spatial Descriptions

1 code implementation26 Feb 2024 Tzuf Paz-Argaman, Sayali Kulkarni, John Palowitch, Jason Baldridge, Reut Tsarfaty

Current navigation studies concentrate on egocentric local descriptions (e. g., `it will be on your right') that require reasoning over the agent's local perception.

Information Retrieval

A Truly Joint Neural Architecture for Segmentation and Parsing

no code implementations4 Feb 2024 Danit Yshaayahu Levi, Reut Tsarfaty

Contemporary multilingual dependency parsers can parse a diverse set of languages, but for Morphologically Rich Languages (MRLs), performance is attested to be lower than other languages.

ARC Segmentation

Multilingual Instruction Tuning With Just a Pinch of Multilinguality

no code implementations3 Jan 2024 Uri Shaham, Jonathan Herzig, Roee Aharoni, Idan Szpektor, Reut Tsarfaty, Matan Eyal

As instruction-tuned large language models (LLMs) gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial.

Cross-Lingual Transfer Instruction Following

Explicit Morphological Knowledge Improves Pre-training of Language Models for Hebrew

no code implementations1 Nov 2023 Eylon Gueta, Omer Goldman, Reut Tsarfaty

We investigate the hypothesis that incorporating explicit morphological knowledge in the pre-training phase can improve the performance of PLMs for MRLs.

Apollo: Zero-shot MultiModal Reasoning with Multiple Experts

1 code implementation25 Oct 2023 Daniela Ben-David, Tzuf Paz-Argaman, Reut Tsarfaty

On the well-known task of stylized image captioning, our experiments show that our approach outperforms semi-supervised state-of-the-art models, while being zero-shot and avoiding costly training, data collection, and prompt engineering.

Image Captioning Multimodal Reasoning +1

CoheSentia: A Novel Benchmark of Incremental versus Holistic Assessment of Coherence in Generated Texts

no code implementations25 Oct 2023 Aviya Maimon, Reut Tsarfaty

Up until now, little work has been done on explicitly assessing the coherence of generated texts and analyzing the factors contributing to (in)coherence.

Sentence

Is Probing All You Need? Indicator Tasks as an Alternative to Probing Embedding Spaces

no code implementations24 Oct 2023 Tal Levy, Omer Goldman, Reut Tsarfaty

The ability to identify and control different kinds of linguistic information encoded in vector representations of words has many use cases, especially for explainability and bias removal.

A Novel Computational and Modeling Foundation for Automatic Coherence Assessment

no code implementations1 Oct 2023 Aviya Maimon, Reut Tsarfaty

On two benchmarks for coherence scoring rated by humans, one containing 500 automatically-generated short stories and another containing 4k real-world texts, our experiments confirm that jointly training on the proposed tasks leads to better performance on each task compared with task-specific models, and to better performance on assessing coherence overall, compared with strong baselines.

4k Long Form Question Answering

HeGeL: A Novel Dataset for Geo-Location from Hebrew Text

1 code implementation2 Jul 2023 Tzuf Paz-Argaman, Tal Bauman, Itai Mondshine, Itzhak Omer, Sagi Dalyot, Reut Tsarfaty

The task of textual geolocation - retrieving the coordinates of a place based on a free-form language description - calls for not only grounding but also natural language understanding and geospatial reasoning.

Natural Language Understanding Retrieval

Morphological Inflection with Phonological Features

1 code implementation21 Jun 2023 David Guriel, Omer Goldman, Reut Tsarfaty

Recent years have brought great advances into solving morphological tasks, mostly due to powerful neural models applied to various tasks as (re)inflection and analysis.

Morphological Inflection

Conjunct Resolution in the Face of Verbal Omissions

no code implementations26 May 2023 Royi Rassin, Yoav Goldberg, Reut Tsarfaty

In this work we propose a conjunct resolution task that operates directly on the text and makes use of a split-and-rephrase paradigm in order to recover the missing elements in the coordination structure.

Missing Elements Sentence +1

Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds

no code implementations24 May 2023 Victoria Basmov, Yoav Goldberg, Reut Tsarfaty

We evaluate LLMs' language understanding capacities on simple inference tasks that most humans find trivial.

Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design

1 code implementation3 Apr 2023 Valentina Pyatkin, Frances Yung, Merel C. J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg

Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks.

Multilingual Sequence-to-Sequence Models for Hebrew NLP

no code implementations19 Dec 2022 Matan Eyal, Hila Noga, Roee Aharoni, Idan Szpektor, Reut Tsarfaty

We demonstrate that by casting tasks in the Hebrew NLP pipeline as text-to-text tasks, we can leverage powerful multilingual, pretrained sequence-to-sequence models as mT5, eliminating the need for a specialized, morpheme-based, separately fine-tuned decoder.

Decoder named-entity-recognition +2

UniMorph 4.0: Universal Morphology

no code implementations LREC 2022 Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

Breaking Character: Are Subwords Good Enough for MRLs After All?

no code implementations10 Apr 2022 Omri Keren, Tal Avinari, Reut Tsarfaty, Omer Levy

Large pretrained language models (PLMs) typically tokenize the input string into contiguous subwords before any pretraining or inference.

Extractive Question-Answering Language Modelling +7

Neural Token Segmentation for High Token-Internal Complexity

no code implementations21 Mar 2022 Idan Brusilovsky, Reut Tsarfaty

Tokenizing raw texts into word units is an essential pre-processing step for critical tasks in the NLP pipeline such as tagging, parsing, named entity recognition, and more.

Dependency Parsing named-entity-recognition +5

Morphological Reinflection with Multiple Arguments: An Extended Annotation schema and a Georgian Case Study

no code implementations ACL 2022 David Guriel, Omer Goldman, Reut Tsarfaty

In recent years, a flurry of morphological datasets had emerged, most notably UniMorph, a multi-lingual repository of inflection tables.

LEMMA

Morphology Without Borders: Clause-Level Morphology

no code implementations25 Feb 2022 Omer Goldman, Reut Tsarfaty

We use this dataset to derive 3 clause-level morphological tasks: inflection, reinflection and analysis.

Dyna-bAbI: unlocking bAbI's potential with dynamic synthetic benchmarking

no code implementations30 Nov 2021 Ronen Tamari, Kyle Richardson, Aviad Sar-Shalom, Noam Kahlon, Nelson Liu, Reut Tsarfaty, Dafna Shahaf

However, the main synthetic resource for story understanding, the bAbI benchmark, lacks such a systematic mechanism for controllable task generation.

Benchmarking Natural Language Understanding

Text-based NP Enrichment

1 code implementation24 Sep 2021 Yanai Elazar, Victoria Basmov, Yoav Goldberg, Reut Tsarfaty

Understanding the relations between entities denoted by NPs in a text is a critical part of human-like natural language understanding.

Natural Language Understanding

Asking It All: Generating Contextualized Questions for any Semantic Role

1 code implementation EMNLP 2021 Valentina Pyatkin, Paul Roit, Julian Michael, Reut Tsarfaty, Yoav Goldberg, Ido Dagan

We develop a two-stage model for this task, which first produces a context-independent question prototype for each role and then revises it to be contextually appropriate for the passage.

Question Generation Question-Generation

(Un)solving Morphological Inflection: Lemma Overlap Artificially Inflates Models' Performance

1 code implementation12 Aug 2021 Omer Goldman, David Guriel, Reut Tsarfaty

The effect is most significant for low-resourced languages with a drop as high as 95 points, but even high-resourced languages lose about 10 points on average.

LEMMA Morphological Inflection

The Possible, the Plausible, and the Desirable: Event-Based Modality Detection for Language Processing

2 code implementations ACL 2021 Valentina Pyatkin, Shoval Sadde, Aynat Rubinstein, Paul Portner, Reut Tsarfaty

Modality is the linguistic ability to describe events with added information such as how desirable, plausible, or feasible they are.

Minimal Supervision for Morphological Inflection

1 code implementation EMNLP 2021 Omer Goldman, Reut Tsarfaty

Neural models for the various flavours of morphological inflection tasks have proven to be extremely accurate given ample labeled data -- data that may be slow and costly to obtain.

Morphological Inflection

A Pointer Network Architecture for Joint Morphological Segmentation and Tagging

no code implementations Findings of the Association for Computational Linguistics 2020 Amit Seker, Reut Tsarfaty

Neural MD may be addressed as a simple pipeline, where segmentation is followed by sequence tagging, or as an end-to-end model, predicting morphemes from raw tokens.

Morphological Disambiguation

ZEST: Zero-shot Learning from Text Descriptions using Textual Similarity and Visual Summarization

1 code implementation Findings of the Association for Computational Linguistics 2020 Tzuf Paz-Argaman, Yuval Atzmon, Gal Chechik, Reut Tsarfaty

Specifically, given birds' images with free-text descriptions of their species, we learn to classify images of previously-unseen species based on specie descriptions.

Zero-Shot Learning

QADiscourse -- Discourse Relations as QA Pairs: Representation, Crowdsourcing and Baselines

1 code implementation6 Oct 2020 Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan

Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding.

Natural Language Understanding Sentence

Evaluating NLP Models via Contrast Sets

no code implementations1 Oct 2020 Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou

Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.

Reading Comprehension Sentiment Analysis

Neural Modeling for Named Entities and Morphology (NEMO^2)

4 code implementations30 Jul 2020 Dan Bareket, Reut Tsarfaty

Named Entity Recognition (NER) is a fundamental NLP task, commonly formulated as classification over a sequence of tokens.

Named Entity Recognition

Getting the \#\#life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?

no code implementations WS 2020 Stav Klein, Reut Tsarfaty

Therefore, when using word-pieces in MRLs, we must consider that: (1) a linear segmentation into sub-word units might not capture the full morphological complexity of words; and (2) representations that leave morphological knowledge on sub-word units inaccessible might negatively affect performance.

TAG Word Embeddings

pyBART: Evidence-based Syntactic Transformations for IE

1 code implementation ACL 2020 Aryeh Tiktinsky, Yoav Goldberg, Reut Tsarfaty

We present pyBART, an easy-to-use open-source Python library for converting English UD trees either to Enhanced UD graphs or to our representation.

Relation Extraction

From SPMRL to NMRL: What Did We Learn (and Unlearn) in a Decade of Parsing Morphologically-Rich Languages (MRLs)?

no code implementations ACL 2020 Reut Tsarfaty, Dan Bareket, Stav Klein, Amit Seker

It has been exactly a decade since the first establishment of SPMRL, a research initiative unifying multiple research efforts to address the peculiar challenges of Statistical Parsing for Morphologically-Rich Languages (MRLs). Here we reflect on parsing MRLs in that decade, highlight the solutions and lessons learned for the architectural, modeling and lexical challenges in the pre-neural era, and argue that similar challenges re-emerge in neural architectures for MRLs.

Ecological Semantics: Programming Environments for Situated Language Understanding

no code implementations10 Mar 2020 Ronen Tamari, Gabriel Stanovsky, Dafna Shahaf, Reut Tsarfaty

Large-scale natural language understanding (NLU) systems have made impressive progress: they can be applied flexibly across a variety of tasks, and employ minimal structural assumptions.

Common Sense Reasoning Grounded language learning +1

RUN through the Streets: A New Dataset and Baseline Models for Realistic Urban Navigation

1 code implementation IJCNLP 2019 Tzuf Paz-Argaman, Reut Tsarfaty

Following navigation instructions in natural language requires a composition of language, action, and knowledge of the environment.

What's Wrong with Hebrew NLP? And How to Make it Right

no code implementations IJCNLP 2019 Reut Tsarfaty, Amit Seker, Shoval Sadde, Stav Klein

For languages with simple morphology, such as English, automatic annotation pipelines such as spaCy or Stanford's CoreNLP successfully serve projects in academia and the industry.

Morphological Disambiguation

The Hebrew Universal Dependency Treebank: Past Present and Future

no code implementations WS 2018 Shoval Sade, Amit Seker, Reut Tsarfaty

The Hebrew treebank (HTB), consisting of 6221 morpho-syntactically annotated newspaper sentences, has been the only resource for training and validating statistical parsers and taggers for Hebrew, for almost two decades now.

Dependency Parsing

Universal Morpho-Syntactic Parsing and the Contribution of Lexica: Analyzing the ONLP Lab Submission to the CoNLL 2018 Shared Task

no code implementations CONLL 2018 Amit Seker, Amir More, Reut Tsarfaty

We present the contribution of the ONLP lab at the Open University of Israel to the UD shared task on multilingual parsing from raw text to Universal Dependencies.

Representations and Architectures in Neural Sentiment Analysis for Morphologically Rich Languages: A Case Study from Modern Hebrew

no code implementations COLING 2018 Adam Amram, Anat Ben David, Reut Tsarfaty

To empirically study these effects, we develop a new sentiment analysis benchmark for Hebrew, based on 12K social media comments, and provide two instances of these data: in token-based and morpheme-based settings.

Sentiment Analysis Text Classification

Universal Joint Morph-Syntactic Processing: The Open University of Israel's Submission to The CoNLL 2017 Shared Task

no code implementations CONLL 2017 Amir More, Reut Tsarfaty

Our parser requires a lattice as input, so we generate morphological analyses of surface tokens using a data-driven morphological analyzer that derives its lexicon from the UD training corpora, and we rely on UDPipe for sentence segmentation and surface-level tokenization.

MORPH Sentence +2

Data-Driven Broad-Coverage Grammars for Opinionated Natural Language Generation (ONLG)

no code implementations ACL 2017 Tomer Cagan, Stefan L. Frank, Reut Tsarfaty

Opinionated Natural Language Generation (ONLG) is a new, challenging, task that aims to automatically generate human-like, subjective, responses to opinionated articles online.

Language Modelling Text Generation +1

Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies

1 code implementation COLING 2016 Amir More, Reut Tsarfaty

Parsing texts into universal dependencies (UD) in realistic scenarios requires infrastructure for the morphological analysis and disambiguation (MA{\&}D) of typologically different languages as a first tier.

Morphological Analysis TAG

Cannot find the paper you are looking for? You can Submit a new open access paper.