Search Results for author: Omer Goldman

Found 17 papers, 7 papers with code

SIGMORPHON–UniMorph 2022 Shared Task 0: Generalization and Typologically Diverse Morphological Inflection

1 code implementation NAACL (SIGMORPHON) 2022 Jordan Kodner, Salam Khalifa, Khuyagbaatar Batsuren, Hossep Dolatian, Ryan Cotterell, Faruk Akkus, Antonios Anastasopoulos, Taras Andrushko, Aryaman Arora, Nona Atanalov, Gábor Bella, Elena Budianskaya, Yustinus Ghanggo Ate, Omer Goldman, David Guriel, Simon Guriel, Silvia Guriel-Agiashvili, Witold Kieraś, Andrew Krizhanovsky, Natalia Krizhanovsky, Igor Marchenko, Magdalena Markowska, Polina Mashkovtseva, Maria Nepomniashchaya, Daria Rodionova, Karina Scheifer, Alexandra Sorova, Anastasia Yemelina, Jeremiah Young, Ekaterina Vylomova

The 2022 SIGMORPHON–UniMorph shared task on large scale morphological inflection generation included a wide range of typologically diverse languages: 33 languages from 11 top-level language families: Arabic (Modern Standard), Assamese, Braj, Chukchi, Eastern Armenian, Evenki, Georgian, Gothic, Gujarati, Hebrew, Hungarian, Itelmen, Karelian, Kazakh, Ket, Khalkha Mongolian, Kholosi, Korean, Lamahalot, Low German, Ludic, Magahi, Middle Low German, Old English, Old High German, Old Norse, Polish, Pomak, Slovak, Turkish, Upper Sorbian, Veps, and Xibe.

Morphological Inflection

Well-Defined Morphology is Sentence-Level Morphology

no code implementations EMNLP (MRL) 2021 Omer Goldman, Reut Tsarfaty

Morphological tasks have gained decent popularity within the NLP community in the recent years, with large multi-lingual datasets providing morphological analysis of words, either in or out of context.

Morphological Analysis Morphological Inflection +1

(Un)solving Morphological Inflection: Lemma Overlap Artificially Inflates Models’ Performance

no code implementations ACL 2022 Omer Goldman, David Guriel, Reut Tsarfaty

In the domain of Morphology, Inflection is a fundamental and important task that gained a lot of traction in recent years, mostly via SIGMORPHON’s shared-tasks. With average accuracy above 0. 9 over the scores of all languages, the task is considered mostly solved using relatively generic neural seq2seq models, even with little data provided. In this work, we propose to re-evaluate morphological inflection models by employing harder train-test splits that will challenge the generalization capacity of the models.

LEMMA Morphological Inflection

Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance

no code implementations10 Mar 2024 Omer Goldman, Avi Caciularu, Matan Eyal, Kris Cao, Idan Szpektor, Reut Tsarfaty

Despite it being the cornerstone of BPE, the most common tokenization algorithm, the importance of compression in the tokenization process is still unclear.

Language Modelling Text Compression

Explicit Morphological Knowledge Improves Pre-training of Language Models for Hebrew

no code implementations1 Nov 2023 Eylon Gueta, Omer Goldman, Reut Tsarfaty

We investigate the hypothesis that incorporating explicit morphological knowledge in the pre-training phase can improve the performance of PLMs for MRLs.

Is Probing All You Need? Indicator Tasks as an Alternative to Probing Embedding Spaces

no code implementations24 Oct 2023 Tal Levy, Omer Goldman, Reut Tsarfaty

The ability to identify and control different kinds of linguistic information encoded in vector representations of words has many use cases, especially for explainability and bias removal.

Morphological Inflection with Phonological Features

1 code implementation21 Jun 2023 David Guriel, Omer Goldman, Reut Tsarfaty

Recent years have brought great advances into solving morphological tasks, mostly due to powerful neural models applied to various tasks as (re)inflection and analysis.

Morphological Inflection

Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks

1 code implementation17 May 2023 Alon Jacovi, Avi Caciularu, Omer Goldman, Yoav Goldberg

Data contamination has become prevalent and challenging with the rise of models pretrained on large automatically-crawled corpora.

UniMorph 4.0: Universal Morphology

no code implementations LREC 2022 Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

Morphological Reinflection with Multiple Arguments: An Extended Annotation schema and a Georgian Case Study

no code implementations ACL 2022 David Guriel, Omer Goldman, Reut Tsarfaty

In recent years, a flurry of morphological datasets had emerged, most notably UniMorph, a multi-lingual repository of inflection tables.

LEMMA

Morphology Without Borders: Clause-Level Morphology

no code implementations25 Feb 2022 Omer Goldman, Reut Tsarfaty

We use this dataset to derive 3 clause-level morphological tasks: inflection, reinflection and analysis.

(Un)solving Morphological Inflection: Lemma Overlap Artificially Inflates Models' Performance

1 code implementation12 Aug 2021 Omer Goldman, David Guriel, Reut Tsarfaty

The effect is most significant for low-resourced languages with a drop as high as 95 points, but even high-resourced languages lose about 10 points on average.

LEMMA Morphological Inflection

Minimal Supervision for Morphological Inflection

1 code implementation EMNLP 2021 Omer Goldman, Reut Tsarfaty

Neural models for the various flavours of morphological inflection tasks have proven to be extremely accurate given ample labeled data -- data that may be slow and costly to obtain.

Morphological Inflection

Weakly Supervised Semantic Parsing with Abstract Examples

no code implementations ACL 2018 Omer Goldman, Veronica Latcinnik, Ehud Nave, Amir Globerson, Jonathan Berant

Training semantic parsers from weak supervision (denotations) rather than strong supervision (programs) complicates training in two ways.

Semantic Parsing Visual Reasoning

Weakly-supervised Semantic Parsing with Abstract Examples

1 code implementation14 Nov 2017 Omer Goldman, Veronica Latcinnik, Udi Naveh, Amir Globerson, Jonathan Berant

Training semantic parsers from weak supervision (denotations) rather than strong supervision (programs) complicates training in two ways.

Semantic Parsing Visual Reasoning

Cannot find the paper you are looking for? You can Submit a new open access paper.