1 code implementation • EMNLP 2020 • Liat Ein-Dor, Alon Halfon, Ariel Gera, Eyal Shnarch, Lena Dankin, Leshem Choshen, Marina Danilevsky, Ranit Aharonov, Yoav Katz, Noam Slonim
Here, we present a large-scale empirical study on active learning techniques for BERT-based classification, addressing a diverse set of AL strategies and datasets.
no code implementations • EMNLP 2021 • Or Honovich, Leshem Choshen, Roee Aharoni, Ella Neeman, Idan Szpektor, Omri Abend
Neural knowledge-grounded generative models for dialogue often produce content that is factually inconsistent with the knowledge they rely on, making them unreliable and limiting their applicability.
Abstractive Text Summarization Natural Language Inference +3
no code implementations • 4 Dec 2024 • Shivalika Singh, Angelika Romanou, Clémentine Fourrier, David I. Adelani, Jian Gang Ngui, Daniel Vila-Suero, Peerat Limkonchotiwat, Kelly Marchisio, Wei Qi Leong, Yosephine Susanto, Raymond Ng, Shayne Longpre, Wei-Yin Ko, Madeline Smith, Antoine Bosselut, Alice Oh, Andre F. T. Martins, Leshem Choshen, Daphne Ippolito, Enzo Ferrante, Marzieh Fadaee, Beyza Ermis, Sara Hooker
Cultural biases in multilingual datasets pose significant challenges for their effectiveness as global benchmarks.
1 code implementation • 7 Nov 2024 • Moshik Hershcovitch, Andrew Wood, Leshem Choshen, Guy Girmonsky, Roy Leibovitz, Ilias Ennmouri, Michal Malka, Peter Chin, Swaminathan Sundararaman, Danny Harnik
With the growth of model sizes and the scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these.
1 code implementation • 25 Oct 2024 • George Stoica, Pratik Ramesh, Boglarka Ecsedi, Leshem Choshen, Judy Hoffman
We study this phenomenon and observe that the weights of LoRA finetuned models showcase a lower degree of alignment compared to their fully-finetuned counterparts.
1 code implementation • 15 Oct 2024 • Leshem Choshen, Yang Zhang, Jacob Andreas
Moreover, while different model families differ scaling behavior, they are often similar enough that a target model's behavior can be predicted from a single model with the same architecture, along with scaling parameter estimates derived from other model families.
no code implementations • 22 Aug 2024 • Ora Nova Fandina, Leshem Choshen, Eitan Farchi, George Kour, Yotam Perlitz, Orna Raz
We applied these tests in a model safety scenario to assess the reliability of harmfulness detection metrics, uncovering a number of inconsistencies.
no code implementations • 20 Aug 2024 • Maxim Ifergan, Leshem Choshen, Roee Aharoni, Idan Szpektor, Omri Abend
These findings highlight the need for improved multilingual knowledge representation in LLMs and suggest a path for the development of more robust and consistent multilingual LLMs.
no code implementations • 15 Aug 2024 • Shachar Don-Yehiya, Ben Burtenshaw, Ramon Fernandez Astudillo, Cailean Osborne, Mimansa Jaiswal, Tzu-Sheng Kuo, Wenting Zhao, Idan Shenfeld, Andi Peng, Mikhail Yurochkin, Atoosa Kasirzadeh, Yangsibo Huang, Tatsunori Hashimoto, Yacine Jernite, Daniel Vila-Suero, Omri Abend, Jennifer Ding, Sara Hooker, Hannah Rose Kirk, Leshem Choshen
In this work, we bring together interdisciplinary experts to assess the opportunities and challenges to realizing an open ecosystem of human feedback for AI.
no code implementations • 15 Aug 2024 • Shachar Don-Yehiya, Leshem Choshen, Omri Abend
We introduce the ShareLM collection, a unified set of human conversations with large language models, and its accompanying plugin, a Web extension for voluntarily contributing user-model conversations.
no code implementations • 13 Aug 2024 • Prateek Yadav, Colin Raffel, Mohammed Muqeeth, Lucas Caccia, Haokun Liu, Tianlong Chen, Mohit Bansal, Leshem Choshen, Alessandro Sordoni
The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to a particular domain or task.
no code implementations • 31 Jul 2024 • Oscar Sainz, Iker García-Ferrero, Alon Jacovi, Jon Ander Campos, Yanai Elazar, Eneko Agirre, Yoav Goldberg, Wei-Lin Chen, Jenny Chim, Leshem Choshen, Luca D'Amico-Wong, Melissa Dell, Run-Ze Fan, Shahriar Golchin, Yucheng Li, PengFei Liu, Bhavish Pahwa, Ameya Prabhu, Suryansh Sharma, Emily Silcock, Kateryna Solonko, David Stap, Mihai Surdeanu, Yu-Min Tseng, Vishaal Udandarao, Zengzhi Wang, Ruijie Xu, Jinglin Yang
The workshop fostered a shared task to collect evidence on data contamination in current available datasets and models.
1 code implementation • 18 Jul 2024 • Yotam Perlitz, Ariel Gera, Ofir Arviv, Asaf Yehudai, Elron Bandel, Eyal Shnarch, Michal Shmueli-Scheuer, Leshem Choshen
Despite the crucial role of BAT for benchmark builders and consumers, there are no standardized procedures for such agreement testing.
1 code implementation • 15 Jul 2024 • Shachar Don-Yehiya, Leshem Choshen, Omri Abend
Training with the extracted feedback shows significant performance improvements over baseline models, demonstrating the efficacy of our approach in enhancing model alignment to human preferences.
no code implementations • 17 Jun 2024 • Rickard Brüel-Gabrielsson, Jiacheng Zhu, Onkar Bhardwaj, Leshem Choshen, Kristjan Greenewald, Mikhail Yurochkin, Justin Solomon
Fine-tuning large language models (LLMs) with low-rank adaptations (LoRAs) has become common practice, often yielding numerous copies of the same LLM differing only in their LoRA updates.
2 code implementations • 27 May 2024 • Felipe Maia Polo, Ronald Xu, Lucas Weber, Mírian Silva, Onkar Bhardwaj, Leshem Choshen, Allysson Flavio Melo de Oliveira, Yuekai Sun, Mikhail Yurochkin
Most popular benchmarks for comparing LLMs rely on a limited set of prompt templates, which may not fully capture the LLMs' abilities and can affect the reproducibility of results on leaderboards.
no code implementations • 29 Apr 2024 • Andreas Waldis, Yotam Perlitz, Leshem Choshen, Yufang Hou, Iryna Gurevych
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence - their unconscious understanding of linguistic phenomena.
1 code implementation • 9 Apr 2024 • Leshem Choshen, Ryan Cotterell, Michael Y. Hu, Tal Linzen, Aaron Mueller, Candace Ross, Alex Warstadt, Ethan Wilcox, Adina Williams, Chengxu Zhuang
The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-inspired benchmarks, or analysis techniques.
1 code implementation • 5 Apr 2024 • Moshik Hershcovitch, Leshem Choshen, Andrew Wood, Ilias Enmouri, Peter Chin, Swaminathan Sundararaman, Danny Harnik
With the growth of model sizes and scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these.
no code implementations • 30 Mar 2024 • Eli Schwartz, Leshem Choshen, Joseph Shtok, Sivan Doveh, Leonid Karlinsky, Assaf Arbelle
Language models struggle with handling numerical data and performing arithmetic operations.
1 code implementation • 26 Feb 2024 • Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi, Haitz Sáez de Ocáriz Borde, Rickard Brüel Gabrielsson, Leshem Choshen, Marzyeh Ghassemi, Mikhail Yurochkin, Justin Solomon
Specifically, when updating the parameter matrices of a neural network by adding a product $BA$, we observe that the $B$ and $A$ matrices have distinct functions: $A$ extracts features from the input, while $B$ uses these features to create the desired output.
2 code implementations • 22 Feb 2024 • Felipe Maia Polo, Lucas Weber, Leshem Choshen, Yuekai Sun, Gongjun Xu, Mikhail Yurochkin
The versatility of large language models (LLMs) led to the creation of diverse benchmarks that thoroughly test a variety of language models' abilities.
no code implementations • 12 Feb 2024 • Shir Ashury-Tahan, Ariel Gera, Benjamin Sznajder, Leshem Choshen, Liat Ein-Dor, Eyal Shnarch
Model selection for a given target task can be costly, as it may entail extensive annotation of the quality of outputs of different models.
1 code implementation • 25 Jan 2024 • Elron Bandel, Yotam Perlitz, Elad Venezian, Roni Friedman-Melamed, Ofir Arviv, Matan Orbach, Shachar Don-Yehyia, Dafna Sheinwald, Ariel Gera, Leshem Choshen, Michal Shmueli-Scheuer, Yoav Katz
In the dynamic landscape of generative NLP, traditional text processing pipelines limit research flexibility and reproducibility, as they are tailored to specific dataset, task, and model combinations.
no code implementations • 25 Jan 2024 • Asaf Yehudai, Boaz Carmeli, Yosi Mass, Ofir Arviv, Nathaniel Mills, Assaf Toledo, Eyal Shnarch, Leshem Choshen
Furthermore, we compare models trained on our data with models trained on human-written data -- ELI5 and ASQA for LFQA and CNN-DailyMail for Summarization.
no code implementations • 16 Jan 2024 • Afra Feyza Akyürek, Ekin Akyürek, Leshem Choshen, Derry Wijaya, Jacob Andreas
Given a collection of seed documents, DCT prompts LMs to generate additional text implied by these documents, reason globally about the correctness of this generated text, and finally fine-tune on text inferred to be correct.
1 code implementation • 22 Nov 2023 • Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal
Despite the efficiency of PEFT methods, the size of expert models can make it onerous to retrieve expert models per query over high-latency networks like the Internet or serve multiple experts on a single GPU.
1 code implementation • 20 Nov 2023 • Shachar Don-Yehiya, Leshem Choshen, Omri Abend
Generating images with a Text-to-Image model often requires multiple trials, where human users iteratively update their prompt based on feedback, namely the output image.
1 code implementation • 13 Nov 2023 • Kerem Zaman, Leshem Choshen, Shashank Srivastava
Model fusion research aims to aggregate the knowledge of multiple individual models to enhance performance by combining their weights.
no code implementations • 22 Aug 2023 • Yotam Perlitz, Elron Bandel, Ariel Gera, Ofir Arviv, Liat Ein-Dor, Eyal Shnarch, Noam Slonim, Michal Shmueli-Scheuer, Leshem Choshen
The increasing versatility of language models (LMs) has given rise to a new class of benchmarks that comprehensively assess a broad range of capabilities.
3 code implementations • NeurIPS 2023 • Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, Mohit Bansal
To address this, we propose our method, TRIM, ELECT SIGN & MERGE (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign.
no code implementations • 24 May 2023 • Taelin Karidi, Leshem Choshen, Gal Patel, Omri Abend
For example, nouns and verbs are among the most frequent POS tags.
2 code implementations • 16 Mar 2023 • Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva
This approximation far exceeds the prevailing practice of inspecting hidden representations from all layers, in the space of the final layer.
no code implementations • 9 Feb 2023 • Almog Gueta, Elad Venezian, Colin Raffel, Noam Slonim, Yoav Katz, Leshem Choshen
Notably, we show that language models that have been finetuned on the same dataset form a tight cluster in the weight space, while models finetuned on different datasets from the same underlying task form a looser cluster.
1 code implementation • 27 Jan 2023 • Alex Warstadt, Leshem Choshen, Aaron Mueller, Adina Williams, Ethan Wilcox, Chengxu Zhuang
In partnership with CoNLL and CMCL, we provide a platform for approaches to pretraining with a limited-size corpus sourced from data inspired by the input to children.
no code implementations • 2 Dec 2022 • Shachar Don-Yehiya, Elad Venezian, Colin Raffel, Noam Slonim, Yoav Katz, Leshem Choshen
We propose a new paradigm to continually evolve pretrained models, denoted ColD Fusion.
1 code implementation • 10 Nov 2022 • Ella Neeman, Roee Aharoni, Or Honovich, Leshem Choshen, Idan Szpektor, Omri Abend
Question answering models commonly have access to two sources of "knowledge" during inference time: (1) parametric knowledge - the factual knowledge encoded in the model weights, and (2) contextual knowledge - external knowledge (e. g., a Wikipedia passage) given to the model to generate a grounded answer.
no code implementations • 31 Oct 2022 • Leshem Choshen, Elad Venezian, Shachar Don-Yehia, Noam Slonim, Yoav Katz
Such a model, finetuned on some source dataset, may provide a better starting point for a new finetuning process on a desired target dataset.
no code implementations • COLING 2022 • Asaf Yehudai, Leshem Choshen, Lior Fox, Omri Abend
Applying Reinforcement learning (RL) following maximum likelihood estimation (MLE) pre-training is a versatile method for enhancing neural machine translation (NMT) performance.
1 code implementation • 2 Aug 2022 • Eyal Shnarch, Alon Halfon, Ariel Gera, Marina Danilevsky, Yannis Katsis, Leshem Choshen, Martin Santillan Cooper, Dina Epelboim, Zheng Zhang, Dakuo Wang, Lucy Yip, Liat Ein-Dor, Lena Dankin, Ilya Shnayderman, Ranit Aharonov, Yunyao Li, Naftali Liberman, Philip Levin Slesarev, Gwilym Newton, Shila Ofek-Koifman, Noam Slonim, Yoav Katz
Text classification can be useful in many real-world scenarios, saving a lot of time for end users.
1 code implementation • 18 May 2022 • Shachar Don-Yehiya, Leshem Choshen, Omri Abend
We show that this augmentation method can improve the performance of the Quality-Estimation task as well.
1 code implementation • 11 May 2022 • Leshem Choshen, Ofir Shifman, Omri Abend
In Grammatical Error Correction, systems are evaluated by the number of errors they correct.
2 code implementations • 6 Apr 2022 • Leshem Choshen, Elad Venezian, Noam Slonim, Yoav Katz
We also show that fusing is often better than intertraining.
1 code implementation • ACL 2022 • Eyal Shnarch, Ariel Gera, Alon Halfon, Lena Dankin, Leshem Choshen, Ranit Aharonov, Noam Slonim
In real-world scenarios, a text classification task often begins with a cold start, when labeled data is scarce.
no code implementations • *SEM (NAACL) 2022 • Aviv Slobodkin, Leshem Choshen, Omri Abend
We further show an additional gain when using both semantic and syntactic structures in some language pairs.
1 code implementation • 6 Oct 2021 • Gal Patel, Leshem Choshen, Omri Abend
We present a methodology that explores how sentence structure is reflected in neural representations of machine translation systems.
1 code implementation • ACL 2022 • Leshem Choshen, Guy Hacohen, Daphna Weinshall, Omri Abend
These findings suggest that there is some mutual inductive bias that underlies these models' learning of linguistic phenomena.
1 code implementation • 23 Aug 2021 • Leshem Choshen, Idan Amit
We present ComSum, a data set of 7 million commit messages for text summarization.
no code implementations • 1 Jun 2021 • Ofek Rafaeli, Omri Abend, Leshem Choshen, Dmitry Nikolaev
In this research paper, I will elaborate on a method to evaluate machine translation models based on their performance on underlying syntactical phenomena between English and Arabic languages.
1 code implementation • 16 Apr 2021 • Or Honovich, Leshem Choshen, Roee Aharoni, Ella Neeman, Idan Szpektor, Omri Abend
Neural knowledge-grounded generative models for dialogue often produce content that is factually inconsistent with the knowledge they rely on, making them unreliable and limiting their applicability.
1 code implementation • NAACL 2021 • Aviv Slobodkin, Leshem Choshen, Omri Abend
Probing neural models for the ability to perform downstream tasks using their activation patterns is often used to localize what parts of the network specialize in performing what tasks.
1 code implementation • LREC 2022 • Piyawat Lertvittayakumjorn, Leshem Choshen, Eyal Shnarch, Francesca Toni
Data exploration is an important step of every data science and machine learning project, including those involving textual data.
1 code implementation • 6 Apr 2021 • Leshem Choshen, Matanel Oren, Dmitry Nikolaev, Omri Abend
SERRANT is a system and code for automatic classification of English grammatical errors that combines SErCl and ERRANT.
1 code implementation • 29 Jan 2021 • Leshem Choshen, Omri Abend
Notwithstanding recent advances, syntactic generalization remains a challenge for text decoders.
no code implementations • 1 Jan 2021 • Eyal Shnarch, Ariel Gera, Alon Halfon, Lena Dankin, Leshem Choshen, Ranit Aharonov, Noam Slonim
In such low resources scenarios, we suggest performing an unsupervised classification task prior to fine-tuning on the target task.
1 code implementation • CONLL 2020 • Leshem Choshen, Dmitry Nikolaev, Yevgeni Berzak, Omri Abend
We present a method for classifying syntactic errors in learner language, namely errors whose correction alters the morphosyntactic structure of a sentence.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Eyal Shnarch, Leshem Choshen, Guy Moshkowich, Noam Slonim, Ranit Aharonov
Approaching new data can be quite deterrent; you do not know how your categories of interest are realized in it, commonly, there is no labeled data at hand, and the performance of domain adaptation methods is unsatisfactory.
no code implementations • 25 Nov 2019 • Liat Ein-Dor, Eyal Shnarch, Lena Dankin, Alon Halfon, Benjamin Sznajder, Ariel Gera, Carlos Alzate, Martin Gleize, Leshem Choshen, Yufang Hou, Yonatan Bilu, Ranit Aharonov, Noam Slonim
One of the main tasks in argument mining is the retrieval of argumentative content pertaining to a given topic.
no code implementations • CONLL 2019 • Leshem Choshen, Omri Abend
We show that the state-of-the-art Transformer MT model is not biased towards monotonic reordering (unlike previous recurrent neural network models), but that nevertheless, long-distance dependencies remain a challenge for the model.
1 code implementation • 15 Sep 2019 • Leshem Choshen, Omri Abend
We show that the state of the art Transformer Machine Translation (MT) model is not biased towards monotonic reordering (unlike previous recurrent neural network models), but that nevertheless, long-distance dependencies remain a challenge for the model.
no code implementations • ACL 2019 • Martin Gleize, Eyal Shnarch, Leshem Choshen, Lena Dankin, Guy Moshkowich, Ranit Aharonov, Noam Slonim
With the advancement in argument detection, we suggest to pay more attention to the challenging task of identifying the more convincing arguments.
no code implementations • ICLR 2020 • Leshem Choshen, Lior Fox, Zohar Aizenbud, Omri Abend
Reinforcement learning (RL) is frequently used to increase performance in text generation tasks, including machine translation (MT), notably through the use of Minimum Risk Training (MRT) and Generative Adversarial Networks (GAN).
1 code implementation • WS 2019 • Yoav Kantor, Yoav Katz, Leshem Choshen, Edo Cohen-Karlik, Naftali Liberman, Assaf Toledo, Amir Menczel, Noam Slonim
We also present a spellchecker created for this task which outperforms standard spellcheckers tested on the task of spellchecking.
Ranked #9 on Grammatical Error Correction on BEA-2019 (test)
no code implementations • ICML 2020 • Guy Hacohen, Leshem Choshen, Daphna Weinshall
We further show that this pattern of results reflects the interplay between the way neural networks learn benchmark datasets.
2 code implementations • ACL 2019 • Leshem Choshen, Dan Eldad, Daniel Hershcovich, Elior Sulem, Omri Abend
The non-indexed parts of the Internet (the Darknet) have become a haven for both legal and illegal anonymous activity.
no code implementations • SEMEVAL 2019 • Daniel Hershcovich, Zohar Aizenbud, Leshem Choshen, Elior Sulem, Ari Rappoport, Omri Abend
We present the SemEval 2019 shared task on UCCA parsing in English, German and French, and discuss the participating systems and results.
1 code implementation • ACL 2018 • Leshem Choshen, Omri Abend
The prevalent use of too few references for evaluating text-to-text generation is known to bias estimates of their quality (henceforth, low coverage bias or LCB).
no code implementations • ACL 2018 • Eyal Shnarch, Carlos Alzate, Lena Dankin, Martin Gleize, Yufang Hou, Leshem Choshen, Ranit Aharonov, Noam Slonim
We propose a methodology to blend high quality but scarce strong labeled data with noisy but abundant weak labeled data during the training of neural networks.
no code implementations • 31 May 2018 • Daniel Hershcovich, Leshem Choshen, Elior Sulem, Zohar Aizenbud, Ari Rappoport, Omri Abend
Given the success of recent semantic parsing shared tasks (on SDP and AMR), we expect the task to have a significant contribution to the advancement of UCCA parsing in particular, and semantic parsing in general.
1 code implementation • ACL 2018 • Leshem Choshen, Omri Abend
Metric validation in Grammatical Error Correction (GEC) is currently done by observing the correlation between human and metric-induced rankings.
1 code implementation • 30 Apr 2018 • Leshem Choshen, Omri Abend
The prevalent use of too few references for evaluating text-to-text generation is known to bias estimates of their quality ({\it low coverage bias} or LCB).
1 code implementation • NAACL 2018 • Leshem Choshen, Omri Abend
We propose USim, a semantic measure for Grammatical Error Correction (GEC) that measures the semantic faithfulness of the output to the source, thereby complementing existing reference-less measures (RLMs) for measuring the output's grammaticality.
1 code implementation • ICLR 2018 • Leshem Choshen, Lior Fox, Yonatan Loewenstein
We compare our approach to commonly used RL techniques, and show that using $E$-values improves learning and performance over traditional counters.