1 code implementation • EMNLP (BlackboxNLP) 2020 • Hila Gonen, Shauli Ravfogel, Yanai Elazar, Yoav Goldberg
Recent works have demonstrated that multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages.
no code implementations • 16 Apr 2025 • Jack Merullo, Noah A. Smith, Sarah Wiegreffe, Yanai Elazar
Pretraining data has a direct impact on the behaviors and quality of language models (LMs), but we only understand the most basic principles of this relationship.
no code implementations • 9 Apr 2025 • Jiacheng Liu, Taylor Blanton, Yanai Elazar, Sewon Min, YenSung Chen, Arnavi Chheda-Kothary, Huy Tran, Byron Bischoff, Eric Marsh, Michael Schmitz, Cassidy Trier, Aaron Sarnat, Jenna James, Jon Borchardt, Bailey Kuehl, Evie Cheng, Karen Farley, Sruthi Sreeram, Taira Anderson, David Albright, Carissa Schoenick, Luca Soldaini, Dirk Groeneveld, Rock Yuren Pang, Pang Wei Koh, Noah A. Smith, Sophie Lebrecht, Yejin Choi, Hannaneh Hajishirzi, Ali Farhadi, Jesse Dodge
We present OLMoTrace, the first system that traces the outputs of language models back to their full, multi-trillion-token training data in real time.
no code implementations • 25 Feb 2025 • Shanshan Xu, T. Y. S. S Santosh, Yanai Elazar, Quirin Vogel, Barbara Plank, Matthias Grabmair
The increased adoption of Large Language Models (LLMs) and their potential to shape public opinion have sparked interest in assessing these models' political leanings.
no code implementations • 29 Oct 2024 • Royi Rassin, Aviv Slobodkin, Shauli Ravfogel, Yanai Elazar, Yoav Goldberg
GRADE leverages the world knowledge embedded in large language models and visual question-answering systems to identify relevant concept-specific axes of diversity (e. g., ``shape'' and ``color'' for the concept ``cookie'').
1 code implementation • 24 Oct 2024 • Lester James V. Miranda, Yizhong Wang, Yanai Elazar, Sachin Kumar, Valentina Pyatkin, Faeze Brahman, Noah A. Smith, Hannaneh Hajishirzi, Pradeep Dasigi
We analyze features from the routing model to identify characteristics of instances that can benefit from human feedback, e. g., prompts with a moderate safety concern or moderate intent complexity.
1 code implementation • 19 Oct 2024 • Sahil Verma, Royi Rassin, Arnav Das, Gantavya Bhatt, Preethi Seshadri, Chirag Shah, Jeff Bilmes, Hannaneh Hajishirzi, Yanai Elazar
We seek to determine the point at which a model was trained on enough instances to imitate a concept -- the imitation threshold.
no code implementations • 31 Jul 2024 • Oscar Sainz, Iker García-Ferrero, Alon Jacovi, Jon Ander Campos, Yanai Elazar, Eneko Agirre, Yoav Goldberg, Wei-Lin Chen, Jenny Chim, Leshem Choshen, Luca D'Amico-Wong, Melissa Dell, Run-Ze Fan, Shahriar Golchin, Yucheng Li, PengFei Liu, Bhavish Pahwa, Ameya Prabhu, Suryansh Sharma, Emily Silcock, Kateryna Solonko, David Stap, Mihai Surdeanu, Yu-Min Tseng, Vishaal Udandarao, Zengzhi Wang, Ruijie Xu, Jinglin Yang
The workshop fostered a shared task to collect evidence on data contamination in current available datasets and models.
no code implementations • 20 Jul 2024 • Xinyi Wang, Antonis Antoniades, Yanai Elazar, Alfonso Amayuelas, Alon Albalak, Kexun Zhang, William Yang Wang
Furthermore, while model performance improves across all tasks as LLM size increases, only factual question answering shows an increase in memorization, whereas machine translation and reasoning tasks exhibit greater generalization, producing more novel outputs.
no code implementations • 28 Jun 2024 • Chantal Shaib, Yanai Elazar, Junyi Jessy Li, Byron C. Wallace
Recent work on evaluating the diversity of text generated by LLMs has focused on word-level features.
1 code implementation • 18 Jun 2024 • William Merrill, Noah A. Smith, Yanai Elazar
In this work, we investigate the extent to which modern LMs generate $n$-grams from their training data, evaluating both (i) the probability LMs assign to complete training $n$-grams and (ii) $n$-novelty, the proportion of $n$-grams generated by an LM that did not appear in the training data (for arbitrarily large $n$).
no code implementations • 2 Jun 2024 • Bar Iluz, Yanai Elazar, Asaf Yehudai, Gabriel Stanovsky
Most works on gender bias focus on intrinsic bias -- removing traces of information about a protected group from the model's internal representation.
1 code implementation • 26 Feb 2024 • Alon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, Colin Raffel, Shiyu Chang, Tatsunori Hashimoto, William Yang Wang
A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training.
no code implementations • 21 Feb 2024 • Qing Lyu, Kumar Shridhar, Chaitanya Malaviya, Li Zhang, Yanai Elazar, Niket Tandon, Marianna Apidianaki, Mrinmaya Sachan, Chris Callison-Burch
Accurately gauging the confidence level of Large Language Models' (LLMs) predictions is pivotal for their reliable application.
3 code implementations • 1 Feb 2024 • Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi
Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.
1 code implementation • 31 Jan 2024 • Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo
As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training data impacts model capabilities and limitations.
no code implementations • 16 Dec 2023 • Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge
Evaluations of language models (LMs) commonly report perplexity on monolithic data held out from training.
no code implementations • 16 Nov 2023 • Yanai Elazar, Bhargavi Paranjape, Hao Peng, Sarah Wiegreffe, Khyathi Raghavi, Vivek Srikumar, Sameer Singh, Noah A. Smith
Previous work has found that datasets with paired inputs are prone to correlations between a specific part of the input (e. g., the hypothesis in NLI) and the label; consequently, models trained only on those outperform chance.
1 code implementation • 31 Oct 2023 • Yanai Elazar, Akshita Bhagia, Ian Magnusson, Abhilasha Ravichander, Dustin Schwenk, Alane Suhr, Pete Walsh, Dirk Groeneveld, Luca Soldaini, Sameer Singh, Hanna Hajishirzi, Noah A. Smith, Jesse Dodge
We open-source WIMBD's code and artifacts to provide a standard set of evaluations for new text-based corpora and to encourage more analyses and transparency around them.
1 code implementation • 1 Aug 2023 • Preethi Seshadri, Sameer Singh, Yanai Elazar
Bias amplification is a phenomenon in which models exacerbate biases or stereotypes present in the training data.
2 code implementations • 24 Jun 2023 • Yanai Elazar, Jiayao Zhang, David Wadden, Bo Zhang, Noah A. Smith
However, since quality is a challenging construct to estimate, we use the negative outcome control method, using paper citation count as a control variable to debias the quality confounding effect.
1 code implementation • 26 May 2023 • Marius Mosbach, Tiago Pimentel, Shauli Ravfogel, Dietrich Klakow, Yanai Elazar
In this paper, we compare the generalization of few-shot fine-tuning and in-context learning to challenge datasets, while controlling for the models used, the number of examples, and the number of parameters, ranging from 125M to 30B.
no code implementations • 7 Mar 2023 • Amit Moryossef, Yanai Elazar, Yoav Goldberg
Piano fingering -- knowing which finger to use to play each note in a musical piece, is a hard and important skill to master when learning to play the piano.
1 code implementation • 23 Oct 2022 • Elron Bandel, Yoav Goldberg, Yanai Elazar
While fine-tuned language models perform well on many tasks, they were also shown to rely on superficial surface features such as lexical overlap.
no code implementations • 12 Oct 2022 • Hongming Zhang, Yintong Huo, Yanai Elazar, Yangqiu Song, Yoav Goldberg, Dan Roth
We first align commonsense tasks with relevant knowledge from commonsense knowledge bases and ask humans to annotate whether the knowledge is enough or not.
no code implementations • 6 Oct 2022 • Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari, Maria Ryskina, Rita Frieske, Ryan Cotterell, Zhijing Jin
We present a taxonomy for characterising and understanding generalisation research in NLP.
no code implementations • 28 Jul 2022 • Yanai Elazar, Nora Kassner, Shauli Ravfogel, Amir Feder, Abhilasha Ravichander, Marius Mosbach, Yonatan Belinkov, Hinrich Schütze, Yoav Goldberg
Our causal framework and our results demonstrate the importance of studying datasets and the benefits of causality for understanding NLP models.
1 code implementation • 24 Sep 2021 • Yanai Elazar, Victoria Basmov, Yoav Goldberg, Reut Tsarfaty
Understanding the relations between entities denoted by NPs in a text is a critical part of human-like natural language understanding.
no code implementations • 17 Apr 2021 • Ofer Sabo, Yanai Elazar, Yoav Goldberg, Ido Dagan
We explore Few-Shot Learning (FSL) for Relation Classification (RC).
no code implementations • EMNLP 2021 • Yanai Elazar, Hongming Zhang, Yoav Goldberg, Dan Roth
To support this claim, we first show that the current evaluation method of WS is sub-optimal and propose a modification that uses twin sentences for evaluation.
Ranked #24 on
Coreference Resolution
on Winograd Schema Challenge
1 code implementation • EMNLP 2021 • Alon Jacovi, Swabha Swayamdipta, Shauli Ravfogel, Yanai Elazar, Yejin Choi, Yoav Goldberg
Our method is based on projecting model representation to a latent space that captures only the features that are useful (to the model) to differentiate two potential decisions.
1 code implementation • 1 Feb 2021 • Yanai Elazar, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Eduard Hovy, Hinrich Schütze, Yoav Goldberg
In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge?
1 code implementation • EACL 2021 • Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé Seddah
Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning.
1 code implementation • 16 Oct 2020 • Hila Gonen, Shauli Ravfogel, Yanai Elazar, Yoav Goldberg
Recent works have demonstrated that multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages.
no code implementations • EMNLP (insights) 2020 • Yanai Elazar, Victoria Basmov, Shauli Ravfogel, Yoav Goldberg, Reut Tsarfaty
In this work, we follow known methodologies of collecting labeled data for the complement coercion phenomenon.
no code implementations • EMNLP (BlackboxNLP) 2020 • Xikun Zhang, Deepak Ramachandran, Ian Tenney, Yanai Elazar, Dan Roth
Pretrained Language Models (LMs) have been shown to possess significant linguistic, common sense, and factual knowledge.
1 code implementation • EMNLP (BlackboxNLP) 2020 • Shauli Ravfogel, Yanai Elazar, Jacob Goldberger, Yoav Goldberg
Contextualized word representations, such as ELMo and BERT, were shown to perform well on various semantic and syntactic tasks.
no code implementations • 1 Oct 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
no code implementations • 1 Jun 2020 • Yanai Elazar, Shauli Ravfogel, Alon Jacovi, Yoav Goldberg
In this work, we point out the inability to infer behavioral conclusions from probing results and offer an alternative method that focuses on how the information is being used, rather than on what information is encoded.
2 code implementations • ACL 2020 • Shauli Ravfogel, Yanai Elazar, Hila Gonen, Michael Twiton, Yoav Goldberg
The ability to control for the kinds of information encoded in neural representation has a variety of use cases, especially in light of the challenge of interpreting these models.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
2 code implementations • 31 Dec 2019 • Alon Talmor, Yanai Elazar, Yoav Goldberg, Jonathan Berant
A fundamental challenge is to understand whether the performance of a LM on a task should be attributed to the pre-trained representations or to the process of fine-tuning on the task data.
no code implementations • IJCNLP 2019 • Maria Barrett, Yova Kementchedjhieva, Yanai Elazar, Desmond Elliott, Anders S{\o}gaard
Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on.
no code implementations • 25 Sep 2019 • Amit Moryossef, Yanai Elazar, Yoav Goldberg
Automatic Piano Fingering is a hard task which computers can learn using data.
1 code implementation • ACL 2019 • Yanai Elazar, Abhijit Mahabal, Deepak Ramachandran, Tania Bedrax-Weiss, Dan Roth
Most current NLP systems have little knowledge about quantitative attributes of objects and events.
1 code implementation • 26 May 2019 • Yanai Elazar, Yoav Goldberg
We provide the first computational treatment of fused-heads constructions (FH), focusing on the numeric fused-heads (NFH).
Ranked #1 on
Missing Elements
on Numeric Fused-Head (dev)
no code implementations • TACL 2019 • Yanai Elazar, Yoav Goldberg
We provide the first computational treatment of fused-heads constructions (FHs), focusing on the numeric fused-heads (NFHs).
1 code implementation • EMNLP 2018 • Yanai Elazar, Yoav Goldberg
Recent advances in Representation Learning and Adversarial Training seem to succeed in removing unwanted features from the learned representation.
no code implementations • 10 Jul 2018 • Yehezkel S. Resheff, Yanai Elazar, Moni Shahar, Oren Sar Shalom
Latent factor models for recommender systems represent users and items as low dimensional vectors.