Search Results for author: Hila Gonen

Found 25 papers, 12 papers with code

It’s not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT

1 code implementation • EMNLP (BlackboxNLP) 2020 • Hila Gonen, Shauli Ravfogel, Yanai Elazar, Yoav Goldberg

Recent works have demonstrated that multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages.

Translation

Paper
Code

MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling

no code implementations • 15 Mar 2024 • Tomasz Limisiewicz, Terra Blevins, Hila Gonen, Orevaoghene Ahia, Luke Zettlemoyer

A major consideration in multilingual language modeling is how to best represent languages with diverse vocabularies and scripts.

Language Modelling

Paper
Add Code

Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models

no code implementations • 19 Jan 2024 • Terra Blevins, Tomasz Limisiewicz, Suchin Gururangan, Margaret Li, Hila Gonen, Noah A. Smith, Luke Zettlemoyer

Despite their popularity in non-English NLP, multilingual language models often underperform monolingual ones due to inter-language competition for model parameters.

Paper
Add Code

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

1 code implementation • arXiv 2023 • Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Šuppa, Hila Gonen, Joseph Marvin Imperial, Börje F. Karlsson, Peiqin Lin, Nikola Ljubešić, LJ Miranda, Barbara Plank, Arij Riabi, Yuval Pinter

We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages.

Ranked #1 on Named Entity Recognition (NER) on UNER v1 (Danish)

Cross-Lingual NER Multilingual Named Entity Recognition +3

Paper
Code

That was the last straw, we need more: Are Translation Systems Sensitive to Disambiguating Context?

1 code implementation • 23 Oct 2023 • Jaechan Lee, Alisa Liu, Orevaoghene Ahia, Hila Gonen, Noah A. Smith

In experiments, we compare MT-specific models and language models for (i) their preference when given an ambiguous subsentence, (ii) their sensitivity to disambiguating context, and (iii) the performance disparity between figurative and literal source sentences.

Translation

Paper
Code

BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer

no code implementations • 24 May 2023 • Akari Asai, Sneha Kudugunta, Xinyan Velocity Yu, Terra Blevins, Hila Gonen, Machel Reid, Yulia Tsvetkov, Sebastian Ruder, Hannaneh Hajishirzi

Despite remarkable advancements in few-shot generalization in natural language processing, most models are developed and evaluated primarily in English.

Benchmarking Cross-Lingual Transfer +1

Paper
Add Code

Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models

no code implementations • 23 May 2023 • Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David R. Mortensen, Noah A. Smith, Yulia Tsvetkov

Language models have graduated from being research prototypes to commercialized products offered as web APIs, and recent works have highlighted the multilingual capabilities of these products.

Fairness Language Modelling

Paper
Add Code

Dictionary-based Phrase-level Prompting of Large Language Models for Machine Translation

no code implementations • 15 Feb 2023 • Marjan Ghazvininejad, Hila Gonen, Luke Zettlemoyer

Large language models (LLMs) demonstrate remarkable machine translation (MT) abilities via prompting, even though they were not explicitly trained for this task.

Machine Translation Translation

Paper
Add Code

XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models

2 code implementations • 25 Jan 2023 • Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa

Large multilingual language models typically rely on a single vocabulary shared across 100+ languages.

named-entity-recognition Named Entity Recognition +4

Paper
Code

Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?

no code implementations • 20 Dec 2022 • Weijia Shi, Xiaochuang Han, Hila Gonen, Ari Holtzman, Yulia Tsvetkov, Luke Zettlemoyer

Large language models can perform new tasks in a zero-shot fashion, given natural language prompts that specify the desired behavior.

Paper
Add Code

Demystifying Prompts in Language Models via Perplexity Estimation

no code implementations • 8 Dec 2022 • Hila Gonen, Srini Iyer, Terra Blevins, Noah A. Smith, Luke Zettlemoyer

Language models can be prompted to perform a wide variety of zero- and few-shot learning problems.

Few-Shot Learning

Paper
Add Code

Prompting Language Models for Linguistic Structure

no code implementations • 15 Nov 2022 • Terra Blevins, Hila Gonen, Luke Zettlemoyer

Although pretrained language models (PLMs) can be prompted to perform a wide range of language tasks, it remains an open question how much this ability comes from generalizable linguistic understanding versus surface-level lexical patterns.

Chunking In-Context Learning +8

Paper
Add Code

Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

no code implementations • 24 May 2022 • Terra Blevins, Hila Gonen, Luke Zettlemoyer

The emergent cross-lingual transfer seen in multilingual pretrained models has sparked significant interest in studying their behavior.

Cross-Lingual Transfer XLM-R

Paper
Add Code

Analyzing Gender Representation in Multilingual Models

1 code implementation • RepL4NLP (ACL) 2022 • Hila Gonen, Shauli Ravfogel, Yoav Goldberg

Multilingual language models were shown to allow for nontrivial transfer across scripts and languages.

Gender Classification

Paper
Code

Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

1 code implementation • ACL 2020 • Hila Gonen, Ganesh Jawahar, Djamé Seddah, Yoav Goldberg

The problem of comparing two bodies of text and searching for words that differ in their usage between them arises often in digital humanities and computational social science.

Word Embeddings

Paper
Code

Identifying Helpful Sentences in Product Reviews

no code implementations • NAACL 2021 • Iftah Gamzu, Hila Gonen, Gilad Kutiel, Ran Levy, Eugene Agichtein

This task is closely related to the task of Multi Document Summarization in the product reviews domain but differs in its objective and its level of conciseness.

Document Summarization Multi-Document Summarization +1

Paper
Add Code

Pick a Fight or Bite your Tongue: Investigation of Gender Differences in Idiomatic Language Usage

1 code implementation • COLING 2020 • Ella Rabinovich, Hila Gonen, Suzanne Stevenson

A large body of research on gender-linked language has established foundations regarding cross-gender differences in lexical, emotional, and topical preferences, along with their sociological underpinnings.

Paper
Code

It's not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT

1 code implementation • 16 Oct 2020 • Hila Gonen, Shauli Ravfogel, Yanai Elazar, Yoav Goldberg

Recent works have demonstrated that multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages.

Translation

Paper
Code

Automatically Identifying Gender Issues in Machine Translation using Perturbations

no code implementations • Findings of the Association for Computational Linguistics 2020 • Hila Gonen, Kellie Webster

The successful application of neural methods to machine translation has realized huge quality advances for the community.

Machine Translation Translation

Paper
Add Code

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

2 code implementations • ACL 2020 • Shauli Ravfogel, Yanai Elazar, Hila Gonen, Michael Twiton, Yoav Goldberg

The ability to control for the kinds of information encoded in neural representation has a variety of use cases, especially in light of the challenge of interpreting these models.

Fairness Multi-class Classification +1

Paper
Code

How does Grammatical Gender Affect Noun Representations in Gender-Marking Languages?

1 code implementation • CONLL 2019 • Hila Gonen, Yova Kementchedjhieva, Yoav Goldberg

Many natural languages assign grammatical gender also to inanimate nouns in the language.

Word Embeddings

Paper
Code

It's All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution

no code implementations • IJCNLP 2019 • Rowan Hall Maudslay, Hila Gonen, Ryan Cotterell, Simone Teufel

An alternative approach is Counterfactual Data Augmentation (CDA), in which a corpus is duplicated and augmented to remove bias, e. g. by swapping all inherently-gendered words in the copy.

counterfactual Data Augmentation +1