Search Results for author: Leshem Choshen

Found 73 papers, 41 papers with code

Active Learning for BERT: An Empirical Study

1 code implementation EMNLP 2020 Liat Ein-Dor, Alon Halfon, Ariel Gera, Eyal Shnarch, Lena Dankin, Leshem Choshen, Marina Danilevsky, Ranit Aharonov, Yoav Katz, Noam Slonim

Here, we present a large-scale empirical study on active learning techniques for BERT-based classification, addressing a diverse set of AL strategies and datasets.

Active Learning Binary text classification +3

Q^{2}: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering

no code implementations EMNLP 2021 Or Honovich, Leshem Choshen, Roee Aharoni, Ella Neeman, Idan Szpektor, Omri Abend

Neural knowledge-grounded generative models for dialogue often produce content that is factually inconsistent with the knowledge they rely on, making them unreliable and limiting their applicability.

Abstractive Text Summarization Natural Language Inference +3

ZipNN: Lossless Compression for AI Models

1 code implementation7 Nov 2024 Moshik Hershcovitch, Andrew Wood, Leshem Choshen, Guy Girmonsky, Roy Leibovitz, Ilias Ennmouri, Michal Malka, Peter Chin, Swaminathan Sundararaman, Danny Harnik

With the growth of model sizes and the scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these.

Model Compression

Model merging with SVD to tie the Knots

1 code implementation25 Oct 2024 George Stoica, Pratik Ramesh, Boglarka Ecsedi, Leshem Choshen, Judy Hoffman

We study this phenomenon and observe that the weights of LoRA finetuned models showcase a lower degree of alignment compared to their fully-finetuned counterparts.

A Hitchhiker's Guide to Scaling Law Estimation

1 code implementation15 Oct 2024 Leshem Choshen, Yang Zhang, Jacob Andreas

Moreover, while different model families differ scaling behavior, they are often similar enough that a target model's behavior can be predicted from a single model with the same architecture, along with scaling parameter estimates derived from other model families.

Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity

no code implementations22 Aug 2024 Ora Nova Fandina, Leshem Choshen, Eitan Farchi, George Kour, Yotam Perlitz, Orna Raz

We applied these tests in a model safety scenario to assess the reliability of harmfulness detection metrics, uncovering a number of inconsistencies.

Language Modelling Large Language Model +1

Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs

no code implementations20 Aug 2024 Maxim Ifergan, Leshem Choshen, Roee Aharoni, Idan Szpektor, Omri Abend

These findings highlight the need for improved multilingual knowledge representation in LLMs and suggest a path for the development of more robust and consistent multilingual LLMs.

knowledge editing

The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community

no code implementations15 Aug 2024 Shachar Don-Yehiya, Leshem Choshen, Omri Abend

We introduce the ShareLM collection, a unified set of human conversations with large language models, and its accompanying plugin, a Web extension for voluntarily contributing user-model conversations.

A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning

no code implementations13 Aug 2024 Prateek Yadav, Colin Raffel, Mohammed Muqeeth, Lucas Caccia, Haokun Liu, Tianlong Chen, Mohit Bansal, Leshem Choshen, Alessandro Sordoni

The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to a particular domain or task.

Survey

Do These LLM Benchmarks Agree? Fixing Benchmark Evaluation with BenchBench

1 code implementation18 Jul 2024 Yotam Perlitz, Ariel Gera, Ofir Arviv, Asaf Yehudai, Elron Bandel, Eyal Shnarch, Michal Shmueli-Scheuer, Leshem Choshen

Despite the crucial role of BAT for benchmark builders and consumers, there are no standardized procedures for such agreement testing.

Language Modelling

Learning from Naturally Occurring Feedback

1 code implementation15 Jul 2024 Shachar Don-Yehiya, Leshem Choshen, Omri Abend

Training with the extracted feedback shows significant performance improvements over baseline models, demonstrating the efficacy of our approach in enhancing model alignment to human preferences.

Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead

no code implementations17 Jun 2024 Rickard Brüel-Gabrielsson, Jiacheng Zhu, Onkar Bhardwaj, Leshem Choshen, Kristjan Greenewald, Mikhail Yurochkin, Justin Solomon

Fine-tuning large language models (LLMs) with low-rank adaptations (LoRAs) has become common practice, often yielding numerous copies of the same LLM differing only in their LoRA updates.

Model Compression

Efficient multi-prompt evaluation of LLMs

2 code implementations27 May 2024 Felipe Maia Polo, Ronald Xu, Lucas Weber, Mírian Silva, Onkar Bhardwaj, Leshem Choshen, Allysson Flavio Melo de Oliveira, Yuekai Sun, Mikhail Yurochkin

Most popular benchmarks for comparing LLMs rely on a limited set of prompt templates, which may not fully capture the LLMs' abilities and can affect the reproducibility of results on leaderboards.

MMLU

Holmes: A Benchmark to Assess the Linguistic Competence of Language Models

no code implementations29 Apr 2024 Andreas Waldis, Yotam Perlitz, Leshem Choshen, Yufang Hou, Iryna Gurevych

We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence - their unconscious understanding of linguistic phenomena.

Part-Of-Speech Tagging

[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

1 code implementation9 Apr 2024 Leshem Choshen, Ryan Cotterell, Michael Y. Hu, Tal Linzen, Aaron Mueller, Candace Ross, Alex Warstadt, Ethan Wilcox, Adina Williams, Chengxu Zhuang

The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-inspired benchmarks, or analysis techniques.

Lossless and Near-Lossless Compression for Foundation Models

1 code implementation5 Apr 2024 Moshik Hershcovitch, Leshem Choshen, Andrew Wood, Ilias Enmouri, Peter Chin, Swaminathan Sundararaman, Danny Harnik

With the growth of model sizes and scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these.

Asymmetry in Low-Rank Adapters of Foundation Models

1 code implementation26 Feb 2024 Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi, Haitz Sáez de Ocáriz Borde, Rickard Brüel Gabrielsson, Leshem Choshen, Marzyeh Ghassemi, Mikhail Yurochkin, Justin Solomon

Specifically, when updating the parameter matrices of a neural network by adding a product $BA$, we observe that the $B$ and $A$ matrices have distinct functions: $A$ extracts features from the input, while $B$ uses these features to create the desired output.

parameter-efficient fine-tuning

tinyBenchmarks: evaluating LLMs with fewer examples

2 code implementations22 Feb 2024 Felipe Maia Polo, Lucas Weber, Leshem Choshen, Yuekai Sun, Gongjun Xu, Mikhail Yurochkin

The versatility of large language models (LLMs) led to the creation of diverse benchmarks that thoroughly test a variety of language models' abilities.

MMLU Multiple-choice

Label-Efficient Model Selection for Text Generation

no code implementations12 Feb 2024 Shir Ashury-Tahan, Ariel Gera, Benjamin Sznajder, Leshem Choshen, Liat Ein-Dor, Eyal Shnarch

Model selection for a given target task can be costly, as it may entail extensive annotation of the quality of outputs of different models.

Model Selection Text Generation

Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI

1 code implementation25 Jan 2024 Elron Bandel, Yotam Perlitz, Elad Venezian, Roni Friedman-Melamed, Ofir Arviv, Matan Orbach, Shachar Don-Yehyia, Dafna Sheinwald, Ariel Gera, Leshem Choshen, Michal Shmueli-Scheuer, Yoav Katz

In the dynamic landscape of generative NLP, traditional text processing pipelines limit research flexibility and reproducibility, as they are tailored to specific dataset, task, and model combinations.

Genie: Achieving Human Parity in Content-Grounded Datasets Generation

no code implementations25 Jan 2024 Asaf Yehudai, Boaz Carmeli, Yosi Mass, Ofir Arviv, Nathaniel Mills, Assaf Toledo, Eyal Shnarch, Leshem Choshen

Furthermore, we compare models trained on our data with models trained on human-written data -- ELI5 and ASQA for LFQA and CNN-DailyMail for Summarization.

Long Form Question Answering

Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability

no code implementations16 Jan 2024 Afra Feyza Akyürek, Ekin Akyürek, Leshem Choshen, Derry Wijaya, Jacob Andreas

Given a collection of seed documents, DCT prompts LMs to generate additional text implied by these documents, reason globally about the correctness of this generated text, and finally fine-tune on text inferred to be correct.

Fact Verification Text Generation

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

1 code implementation22 Nov 2023 Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal

Despite the efficiency of PEFT methods, the size of expert models can make it onerous to retrieve expert models per query over high-latency networks like the Internet or serve multiple experts on a single GPU.

Language Modelling MMLU +2

Human Learning by Model Feedback: The Dynamics of Iterative Prompting with Midjourney

1 code implementation20 Nov 2023 Shachar Don-Yehiya, Leshem Choshen, Omri Abend

Generating images with a Text-to-Image model often requires multiple trials, where human users iteratively update their prompt based on feedback, namely the output image.

Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion

1 code implementation13 Nov 2023 Kerem Zaman, Leshem Choshen, Shashank Srivastava

Model fusion research aims to aggregate the knowledge of multiple individual models to enhance performance by combining their weights.

Memorization text-classification +1

Efficient Benchmarking of Language Models

no code implementations22 Aug 2023 Yotam Perlitz, Elron Bandel, Ariel Gera, Ofir Arviv, Liat Ein-Dor, Eyal Shnarch, Noam Slonim, Michal Shmueli-Scheuer, Leshem Choshen

The increasing versatility of language models (LMs) has given rise to a new class of benchmarks that comprehensively assess a broad range of capabilities.

Benchmarking

TIES-Merging: Resolving Interference When Merging Models

3 code implementations NeurIPS 2023 Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, Mohit Bansal

To address this, we propose our method, TRIM, ELECT SIGN & MERGE (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign.

Transfer Learning

Jump to Conclusions: Short-Cutting Transformers With Linear Transformations

2 code implementations16 Mar 2023 Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva

This approximation far exceeds the prevailing practice of inspecting hidden representations from all layers, in the space of the final layer.

Decision Making Language Modelling

Knowledge is a Region in Weight Space for Fine-tuned Language Models

no code implementations9 Feb 2023 Almog Gueta, Elad Venezian, Colin Raffel, Noam Slonim, Yoav Katz, Leshem Choshen

Notably, we show that language models that have been finetuned on the same dataset form a tight cluster in the weight space, while models finetuned on different datasets from the same underlying task form a looser cluster.

Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

1 code implementation27 Jan 2023 Alex Warstadt, Leshem Choshen, Aaron Mueller, Adina Williams, Ethan Wilcox, Chengxu Zhuang

In partnership with CoNLL and CMCL, we provide a platform for approaches to pretraining with a limited-size corpus sourced from data inspired by the input to children.

Language Acquisition Language Modelling +1

DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering

1 code implementation10 Nov 2022 Ella Neeman, Roee Aharoni, Or Honovich, Leshem Choshen, Idan Szpektor, Omri Abend

Question answering models commonly have access to two sources of "knowledge" during inference time: (1) parametric knowledge - the factual knowledge encoded in the model weights, and (2) contextual knowledge - external knowledge (e. g., a Wikipedia passage) given to the model to generate a grounded answer.

counterfactual Data Augmentation +2

Where to start? Analyzing the potential value of intermediate models

no code implementations31 Oct 2022 Leshem Choshen, Elad Venezian, Shachar Don-Yehia, Noam Slonim, Yoav Katz

Such a model, finetuned on some source dataset, may provide a better starting point for a new finetuning process on a desired target dataset.

Reinforcement Learning with Large Action Spaces for Neural Machine Translation

no code implementations COLING 2022 Asaf Yehudai, Leshem Choshen, Lior Fox, Omri Abend

Applying Reinforcement learning (RL) following maximum likelihood estimation (MLE) pre-training is a versatile method for enhancing neural machine translation (NMT) performance.

Machine Translation NMT +6

PreQuEL: Quality Estimation of Machine Translation Outputs in Advance

1 code implementation18 May 2022 Shachar Don-Yehiya, Leshem Choshen, Omri Abend

We show that this augmentation method can improve the performance of the Quality-Estimation task as well.

Data Augmentation Machine Translation +2

Some Grammatical Errors are Frequent, Others are Important

1 code implementation11 May 2022 Leshem Choshen, Ofir Shifman, Omri Abend

In Grammatical Error Correction, systems are evaluated by the number of errors they correct.

Grammatical Error Correction

Fusing finetuned models for better pretraining

2 code implementations6 Apr 2022 Leshem Choshen, Elad Venezian, Noam Slonim, Yoav Katz

We also show that fusing is often better than intertraining.

On Neurons Invariant to Sentence Structural Changes in Neural Machine Translation

1 code implementation6 Oct 2021 Gal Patel, Leshem Choshen, Omri Abend

We present a methodology that explores how sentence structure is reflected in neural representations of machine translation systems.

Machine Translation Sentence +1

The Grammar-Learning Trajectories of Neural Language Models

1 code implementation ACL 2022 Leshem Choshen, Guy Hacohen, Daphna Weinshall, Omri Abend

These findings suggest that there is some mutual inductive bias that underlies these models' learning of linguistic phenomena.

Inductive Bias

ComSum: Commit Messages Summarization and Meaning Preservation

1 code implementation23 Aug 2021 Leshem Choshen, Idan Amit

We present ComSum, a data set of 7 million commit messages for text summarization.

Text Summarization

Part of Speech and Universal Dependency effects on English Arabic Machine Translation

no code implementations1 Jun 2021 Ofek Rafaeli, Omri Abend, Leshem Choshen, Dmitry Nikolaev

In this research paper, I will elaborate on a method to evaluate machine translation models based on their performance on underlying syntactical phenomena between English and Arabic languages.

BIG-bench Machine Learning Machine Translation +1

$Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering

1 code implementation16 Apr 2021 Or Honovich, Leshem Choshen, Roee Aharoni, Ella Neeman, Idan Szpektor, Omri Abend

Neural knowledge-grounded generative models for dialogue often produce content that is factually inconsistent with the knowledge they rely on, making them unreliable and limiting their applicability.

Abstractive Text Summarization Dialogue Evaluation +4

Mediators in Determining what Processing BERT Performs First

1 code implementation NAACL 2021 Aviv Slobodkin, Leshem Choshen, Omri Abend

Probing neural models for the ability to perform downstream tasks using their activation patterns is often used to localize what parts of the network specialize in performing what tasks.

SERRANT: a syntactic classifier for English Grammatical Error Types

1 code implementation6 Apr 2021 Leshem Choshen, Matanel Oren, Dmitry Nikolaev, Omri Abend

SERRANT is a system and code for automatic classification of English grammatical errors that combines SErCl and ERRANT.

General Classification

Enhancing the Transformer Decoder with Transition-based Syntax

1 code implementation29 Jan 2021 Leshem Choshen, Omri Abend

Notwithstanding recent advances, syntactic generalization remains a challenge for text decoders.

Decoder Machine Translation +2

Cluster & Tune: Enhance BERT Performance in Low Resource Text Classification

no code implementations1 Jan 2021 Eyal Shnarch, Ariel Gera, Alon Halfon, Lena Dankin, Leshem Choshen, Ranit Aharonov, Noam Slonim

In such low resources scenarios, we suggest performing an unsupervised classification task prior to fine-tuning on the target task.

Clustering General Classification +2

Classifying Syntactic Errors in Learner Language

1 code implementation CONLL 2020 Leshem Choshen, Dmitry Nikolaev, Yevgeni Berzak, Omri Abend

We present a method for classifying syntactic errors in learner language, namely errors whose correction alters the morphosyntactic structure of a sentence.

Classification General Classification +2

Unsupervised Expressive Rules Provide Explainability and Assist Human Experts Grasping New Domains

no code implementations Findings of the Association for Computational Linguistics 2020 Eyal Shnarch, Leshem Choshen, Guy Moshkowich, Noam Slonim, Ranit Aharonov

Approaching new data can be quite deterrent; you do not know how your categories of interest are realized in it, commonly, there is no labeled data at hand, and the performance of domain adaptation methods is unsatisfactory.

Domain Adaptation

Automatically Extracting Challenge Sets for Non-Local Phenomena in Neural Machine Translation

no code implementations CONLL 2019 Leshem Choshen, Omri Abend

We show that the state-of-the-art Transformer MT model is not biased towards monotonic reordering (unlike previous recurrent neural network models), but that nevertheless, long-distance dependencies remain a challenge for the model.

Machine Translation Translation

Automatically Extracting Challenge Sets for Non local Phenomena in Neural Machine Translation

1 code implementation15 Sep 2019 Leshem Choshen, Omri Abend

We show that the state of the art Transformer Machine Translation (MT) model is not biased towards monotonic reordering (unlike previous recurrent neural network models), but that nevertheless, long-distance dependencies remain a challenge for the model.

Machine Translation Translation

Are You Convinced? Choosing the More Convincing Evidence with a Siamese Network

no code implementations ACL 2019 Martin Gleize, Eyal Shnarch, Leshem Choshen, Lena Dankin, Guy Moshkowich, Ranit Aharonov, Noam Slonim

With the advancement in argument detection, we suggest to pay more attention to the challenging task of identifying the more convincing arguments.

On the Weaknesses of Reinforcement Learning for Neural Machine Translation

no code implementations ICLR 2020 Leshem Choshen, Lior Fox, Zohar Aizenbud, Omri Abend

Reinforcement learning (RL) is frequently used to increase performance in text generation tasks, including machine translation (MT), notably through the use of Minimum Risk Training (MRT) and Generative Adversarial Networks (GAN).

Machine Translation reinforcement-learning +4

The Language of Legal and Illegal Activity on the Darknet

2 code implementations ACL 2019 Leshem Choshen, Dan Eldad, Daniel Hershcovich, Elior Sulem, Omri Abend

The non-indexed parts of the Internet (the Darknet) have become a haven for both legal and illegal anonymous activity.

POS

SemEval-2019 Task 1: Cross-lingual Semantic Parsing with UCCA

no code implementations SEMEVAL 2019 Daniel Hershcovich, Zohar Aizenbud, Leshem Choshen, Elior Sulem, Ari Rappoport, Omri Abend

We present the SemEval 2019 shared task on UCCA parsing in English, German and French, and discuss the participating systems and results.

UCCA Parsing

Inherent Biases in Reference-based Evaluation for Grammatical Error Correction

1 code implementation ACL 2018 Leshem Choshen, Omri Abend

The prevalent use of too few references for evaluating text-to-text generation is known to bias estimates of their quality (henceforth, low coverage bias or LCB).

Grammatical Error Correction Sentence +3

SemEval 2019 Shared Task: Cross-lingual Semantic Parsing with UCCA - Call for Participation

no code implementations31 May 2018 Daniel Hershcovich, Leshem Choshen, Elior Sulem, Zohar Aizenbud, Ari Rappoport, Omri Abend

Given the success of recent semantic parsing shared tasks (on SDP and AMR), we expect the task to have a significant contribution to the advancement of UCCA parsing in particular, and semantic parsing in general.

UCCA Parsing

Automatic Metric Validation for Grammatical Error Correction

1 code implementation ACL 2018 Leshem Choshen, Omri Abend

Metric validation in Grammatical Error Correction (GEC) is currently done by observing the correlation between human and metric-induced rankings.

Grammatical Error Correction

Inherent Biases in Reference based Evaluation for Grammatical Error Correction and Text Simplification

1 code implementation30 Apr 2018 Leshem Choshen, Omri Abend

The prevalent use of too few references for evaluating text-to-text generation is known to bias estimates of their quality ({\it low coverage bias} or LCB).

Grammatical Error Correction Sentence +3

Reference-less Measure of Faithfulness for Grammatical Error Correction

1 code implementation NAACL 2018 Leshem Choshen, Omri Abend

We propose USim, a semantic measure for Grammatical Error Correction (GEC) that measures the semantic faithfulness of the output to the source, thereby complementing existing reference-less measures (RLMs) for measuring the output's grammaticality.

Grammatical Error Correction valid

DORA The Explorer: Directed Outreaching Reinforcement Action-Selection

1 code implementation ICLR 2018 Leshem Choshen, Lior Fox, Yonatan Loewenstein

We compare our approach to commonly used RL techniques, and show that using $E$-values improves learning and performance over traditional counters.

Reinforcement Learning Reinforcement Learning (RL) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.