Search Results for author: Yonatan Belinkov

Found 101 papers, 57 papers with code

REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space

no code implementations13 Jun 2024 Tomer Ashuach, Martin Tutek, Yonatan Belinkov

REVS identifies and modifies a small subset of neurons relevant for each piece of sensitive information.

DEPTH: Discourse Education through Pre-Training Hierarchically

1 code implementation13 May 2024 Zachary Bamberger, Ofek Glick, Chaim Baskin, Yonatan Belinkov

Language Models (LMs) often struggle with linguistic understanding at the discourse level, even though discourse patterns such as coherence, cohesion, and narrative flow are prevalent in their pre-training data.

Decoder Natural Language Understanding +1

Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs

1 code implementation15 Apr 2024 Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov

In this work, we first introduce an approach for constructing datasets based on the model knowledge for detection and intervention methods in closed-book and open-book question-answering settings.

Hallucination Language Modelling +1

Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms

1 code implementation26 Mar 2024 Michael Hanna, Sandro Pezzelle, Yonatan Belinkov

Most studies determine which edges belong in a LM's circuit by performing causal interventions on each edge independently, but this scales poorly with model size.

Language Modelling

Concept-Best-Matching: Evaluating Compositionality in Emergent Communication

no code implementations17 Mar 2024 Boaz Carmeli, Yonatan Belinkov, Ron Meir

Artificial agents that learn to communicate in order to accomplish a given task acquire communication protocols that are typically opaque to a human.

Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

no code implementations14 Mar 2024 Shadi Iskander, Kira Radinsky, Yonatan Belinkov

Mitigating social biases typically requires identifying the social groups associated with each data sample.

Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

no code implementations9 Mar 2024 Michael Toker, Hadas Orgad, Mor Ventura, Dana Arad, Yonatan Belinkov

Text-to-image diffusion models (T2I) use a latent representation of a text prompt to guide the image generation process.

Image Generation Retrieval

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

no code implementations22 Feb 2024 Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, David Bau

We identify the mechanism that enables entity tracking and show that (i) in both the original model and its fine-tuned versions primarily the same circuit implements entity tracking.

Code Generation Instruction Following

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

no code implementations20 Feb 2024 Shahar Katz, Yonatan Belinkov, Mor Geva, Lior Wolf

Understanding how Transformer-based Language Models (LMs) learn and recall information is a key goal of the deep learning community.

Language Modelling

Accelerating the Global Aggregation of Local Explanations

1 code implementation13 Dec 2023 Alon Mor, Yonatan Belinkov, Benny Kimelfeld

% We devise techniques for accelerating the global aggregation of the Anchor algorithm.

When Language Models Fall in Love: Animacy Processing in Transformer Language Models

1 code implementation23 Oct 2023 Michael Hanna, Yonatan Belinkov, Sandro Pezzelle

However, we also show that even when presented with stories about atypically animate entities, such as a peanut in love, LMs adapt: they treat these entities as animate, though they do not adapt as well as humans.

Unified Concept Editing in Diffusion Models

1 code implementation25 Aug 2023 Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzyńska, David Bau

Text-to-image models suffer from various safety issues that may limit their suitability for deployment.

Linearity of Relation Decoding in Transformer Language Models

1 code implementation17 Aug 2023 Evan Hernandez, Arnab Sen Sharma, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Belinkov, David Bau

Linear relation representations may be obtained by constructing a first-order approximation to the LM from a single prompt, and they exist for a variety of factual, commonsense, and linguistic relations.


Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

1 code implementation1 Aug 2023 Itay Itzhak, Gabriel Stanovsky, Nir Rosenfeld, Yonatan Belinkov

Recent studies show that instruction tuning (IT) and reinforcement learning from human feedback (RLHF) improve the abilities of large language models (LMs) dramatically.

Decision Making

Generating Benchmarks for Factuality Evaluation of Language Models

2 code implementations13 Jul 2023 Dor Muhlgay, Ori Ram, Inbal Magar, Yoav Levine, Nir Ratner, Yonatan Belinkov, Omri Abend, Kevin Leyton-Brown, Amnon Shashua, Yoav Shoham

FACTOR automatically transforms a factual corpus of interest into a benchmark evaluating an LM's propensity to generate true facts from the corpus vs. similar but incorrect statements.

Language Modelling Retrieval

ReFACT: Updating Text-to-Image Models by Editing the Text Encoder

1 code implementation1 Jun 2023 Dana Arad, Hadas Orgad, Yonatan Belinkov

Our world is marked by unprecedented technological, global, and socio-political transformations, posing a significant challenge to text-to-image generative models.

Image Generation

A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis

1 code implementation24 May 2023 Alessandro Stolfo, Yonatan Belinkov, Mrinmaya Sachan

Mathematical reasoning in large language models (LMs) has garnered significant attention in recent work, but there is a limited understanding of how these models process and store information related to arithmetic tasks within their architecture.

Arithmetic Reasoning Mathematical Reasoning +2

VISIT: Visualizing and Interpreting the Semantic Information Flow of Transformers

2 code implementations22 May 2023 Shahar Katz, Yonatan Belinkov

Recent advances in interpretability suggest we can project weights and hidden states of transformer-based language models (LMs) to their vocabulary, a transformation that makes them more human interpretable.

Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection

1 code implementation17 May 2023 Shadi Iskander, Kira Radinsky, Yonatan Belinkov

In this work, we propose Iterative Gradient-Based Projection (IGBP), a novel method for removing non-linear encoded concepts from neural representations.


ContraSim -- A Similarity Measure Based on Contrastive Learning

1 code implementation29 Mar 2023 Adir Rahamim, Yonatan Belinkov

In contrast to common closed-form similarity measures, ContraSim learns a parameterized measure by using both similar and dissimilar examples.

Contrastive Learning

Editing Implicit Assumptions in Text-to-Image Diffusion Models

1 code implementation ICCV 2023 Hadas Orgad, Bahjat Kawar, Yonatan Belinkov

Our Text-to-Image Model Editing method, TIME for short, receives a pair of inputs: a "source" under-specified prompt for which the model makes an implicit assumption (e. g., "a pack of roses"), and a "destination" prompt that describes the same setting, but with a specified desired attribute (e. g., "a pack of blue roses").

Attribute Model Editing

Parallel Context Windows for Large Language Models

1 code implementation21 Dec 2022 Nir Ratner, Yoav Levine, Yonatan Belinkov, Ori Ram, Inbal Magar, Omri Abend, Ehud Karpas, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham

We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training.

In-Context Learning Playing the Game of 2048 +2

BLIND: Bias Removal With No Demographics

1 code implementation20 Dec 2022 Hadas Orgad, Yonatan Belinkov

Common methods to mitigate biases require prior information on the types of biases that should be mitigated (e. g., gender or racial bias) and the social groups associated with each data sample.

Sentiment Analysis Sentiment Classification

What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary

1 code implementation20 Dec 2022 Ori Ram, Liat Bezalel, Adi Zicher, Yonatan Belinkov, Jonathan Berant, Amir Globerson

We leverage this insight and propose a simple way to enrich query and passage representations with lexical information at inference time, and show that this significantly improves performance compared to the original model in zero-shot settings, and specifically on the BEIR benchmark.


BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

6 code implementations9 Nov 2022 BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, Dragomir Radev, Eduardo González Ponferrada, Efrat Levkovizh, Ethan Kim, Eyal Bar Natan, Francesco De Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady Elsahar, Hamza Benyamina, Hieu Tran, Ian Yu, Idris Abdulmumin, Isaac Johnson, Itziar Gonzalez-Dios, Javier de la Rosa, Jenny Chim, Jesse Dodge, Jian Zhu, Jonathan Chang, Jörg Frohberg, Joseph Tobing, Joydeep Bhattacharjee, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro von Werra, Leon Weber, Long Phan, Loubna Ben allal, Ludovic Tanguy, Manan Dey, Manuel Romero Muñoz, Maraim Masoud, María Grandury, Mario Šaško, Max Huang, Maximin Coavoux, Mayank Singh, Mike Tian-Jian Jiang, Minh Chien Vu, Mohammad A. Jauhar, Mustafa Ghaleb, Nishant Subramani, Nora Kassner, Nurulaqilla Khamis, Olivier Nguyen, Omar Espejel, Ona de Gibert, Paulo Villegas, Peter Henderson, Pierre Colombo, Priscilla Amuok, Quentin Lhoest, Rheza Harliman, Rishi Bommasani, Roberto Luis López, Rui Ribeiro, Salomey Osei, Sampo Pyysalo, Sebastian Nagel, Shamik Bose, Shamsuddeen Hassan Muhammad, Shanya Sharma, Shayne Longpre, Somaieh Nikpoor, Stanislav Silberberg, Suhas Pai, Sydney Zink, Tiago Timponi Torrent, Timo Schick, Tristan Thrush, Valentin Danchev, Vassilina Nikoulina, Veronika Laippala, Violette Lepercq, Vrinda Prabhu, Zaid Alyafeai, Zeerak Talat, Arun Raja, Benjamin Heinzerling, Chenglei Si, Davut Emre Taşar, Elizabeth Salesky, Sabrina J. Mielke, Wilson Y. Lee, Abheesht Sharma, Andrea Santilli, Antoine Chaffin, Arnaud Stiegler, Debajyoti Datta, Eliza Szczechla, Gunjan Chhablani, Han Wang, Harshit Pandey, Hendrik Strobelt, Jason Alan Fries, Jos Rozen, Leo Gao, Lintang Sutawika, M Saiful Bari, Maged S. Al-shaibani, Matteo Manica, Nihal Nayak, Ryan Teehan, Samuel Albanie, Sheng Shen, Srulik Ben-David, Stephen H. Bach, Taewoon Kim, Tali Bers, Thibault Fevry, Trishala Neeraj, Urmish Thakker, Vikas Raunak, Xiangru Tang, Zheng-Xin Yong, Zhiqing Sun, Shaked Brody, Yallow Uri, Hadar Tojarieh, Adam Roberts, Hyung Won Chung, Jaesung Tae, Jason Phang, Ofir Press, Conglong Li, Deepak Narayanan, Hatim Bourfoune, Jared Casper, Jeff Rasley, Max Ryabinin, Mayank Mishra, Minjia Zhang, Mohammad Shoeybi, Myriam Peyrounette, Nicolas Patry, Nouamane Tazi, Omar Sanseviero, Patrick von Platen, Pierre Cornette, Pierre François Lavallée, Rémi Lacroix, Samyam Rajbhandari, Sanchit Gandhi, Shaden Smith, Stéphane Requena, Suraj Patil, Tim Dettmers, Ahmed Baruwa, Amanpreet Singh, Anastasia Cheveleva, Anne-Laure Ligozat, Arjun Subramonian, Aurélie Névéol, Charles Lovering, Dan Garrette, Deepak Tunuguntla, Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov, Genta Indra Winata, Hailey Schoelkopf, Jan-Christoph Kalo, Jekaterina Novikova, Jessica Zosa Forde, Jordan Clive, Jungo Kasai, Ken Kawamura, Liam Hazan, Marine Carpuat, Miruna Clinciu, Najoung Kim, Newton Cheng, Oleg Serikov, Omer Antverg, Oskar van der Wal, Rui Zhang, Ruochen Zhang, Sebastian Gehrmann, Shachar Mirkin, Shani Pais, Tatiana Shavrina, Thomas Scialom, Tian Yun, Tomasz Limisiewicz, Verena Rieser, Vitaly Protasov, Vladislav Mikhailov, Yada Pruksachatkun, Yonatan Belinkov, Zachary Bamberger, Zdeněk Kasner, Alice Rueda, Amanda Pestana, Amir Feizpour, Ammar Khan, Amy Faranak, Ana Santos, Anthony Hevia, Antigona Unldreaj, Arash Aghagol, Arezoo Abdollahi, Aycha Tammour, Azadeh HajiHosseini, Bahareh Behroozi, Benjamin Ajibade, Bharat Saxena, Carlos Muñoz Ferrandis, Daniel McDuff, Danish Contractor, David Lansky, Davis David, Douwe Kiela, Duong A. Nguyen, Edward Tan, Emi Baylor, Ezinwanne Ozoani, Fatima Mirza, Frankline Ononiwu, Habib Rezanejad, Hessie Jones, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, Isar Nejadgholi, Jesse Passmore, Josh Seltzer, Julio Bonis Sanz, Livia Dutra, Mairon Samagaio, Maraim Elbadri, Margot Mieskes, Marissa Gerchick, Martha Akinlolu, Michael McKenna, Mike Qiu, Muhammed Ghauri, Mykola Burynok, Nafis Abrar, Nazneen Rajani, Nour Elkott, Nour Fahmy, Olanrewaju Samuel, Ran An, Rasmus Kromann, Ryan Hao, Samira Alizadeh, Sarmad Shubber, Silas Wang, Sourav Roy, Sylvain Viguier, Thanh Le, Tobi Oyebade, Trieu Le, Yoyo Yang, Zach Nguyen, Abhinav Ramesh Kashyap, Alfredo Palasciano, Alison Callahan, Anima Shukla, Antonio Miranda-Escalada, Ayush Singh, Benjamin Beilharz, Bo wang, Caio Brito, Chenxi Zhou, Chirag Jain, Chuxin Xu, Clémentine Fourrier, Daniel León Periñán, Daniel Molano, Dian Yu, Enrique Manjavacas, Fabio Barth, Florian Fuhrimann, Gabriel Altay, Giyaseddin Bayrak, Gully Burns, Helena U. Vrabec, Imane Bello, Ishani Dash, Jihyun Kang, John Giorgi, Jonas Golde, Jose David Posada, Karthik Rangasai Sivaraman, Lokesh Bulchandani, Lu Liu, Luisa Shinzato, Madeleine Hahn de Bykhovetz, Maiko Takeuchi, Marc Pàmies, Maria A Castillo, Marianna Nezhurina, Mario Sänger, Matthias Samwald, Michael Cullan, Michael Weinberg, Michiel De Wolf, Mina Mihaljcic, Minna Liu, Moritz Freidank, Myungsun Kang, Natasha Seelam, Nathan Dahlberg, Nicholas Michio Broad, Nikolaus Muellner, Pascale Fung, Patrick Haller, Ramya Chandrasekhar, Renata Eisenberg, Robert Martin, Rodrigo Canalli, Rosaline Su, Ruisi Su, Samuel Cahyawijaya, Samuele Garda, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Simon Ott, Sinee Sang-aroonsiri, Srishti Kumar, Stefan Schweter, Sushil Bharati, Tanmay Laud, Théo Gigant, Tomoya Kainuma, Wojciech Kusa, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan Xu, Yingxin Xu, Yu Xu, Zhe Tan, Zhongli Xie, Zifan Ye, Mathilde Bras, Younes Belkada, Thomas Wolf

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions.

Decoder Language Modelling +1

Emergent Quantized Communication

no code implementations4 Nov 2022 Boaz Carmeli, Ron Meir, Yonatan Belinkov

The field of emergent communication aims to understand the characteristics of communication as it emerges from artificial agents solving tasks that require information exchange.


Choose Your Lenses: Flaws in Gender Bias Evaluation

no code implementations NAACL (GeBNLP) 2022 Hadas Orgad, Yonatan Belinkov

In this position paper, we assess the current paradigm of gender bias evaluation and identify several flaws in it.

Measures of Information Reflect Memorization Patterns

no code implementations17 Oct 2022 Rachit Bansal, Danish Pruthi, Yonatan Belinkov

In this work, we hypothesize -- and subsequently show -- that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization.

Memorization Model Selection

Mass-Editing Memory in a Transformer

2 code implementations13 Oct 2022 Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, David Bau

Recent work has shown exciting promise in updating large language models with new memories, so as to replace obsolete information or add specialized knowledge.

Language Modelling

Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions

no code implementations28 Jul 2022 Yanai Elazar, Nora Kassner, Shauli Ravfogel, Amir Feder, Abhilasha Ravichander, Marius Mosbach, Yonatan Belinkov, Hinrich Schütze, Yoav Goldberg

Our causal framework and our results demonstrate the importance of studying datasets and the benefits of causality for understanding NLP models.

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

4 code implementations9 Jun 2022 Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning Math +1

IDANI: Inference-time Domain Adaptation via Neuron-level Interventions

1 code implementation DeepLo 2022 Omer Antverg, Eyal Ben-David, Yonatan Belinkov

We propose a new approach for domain adaptation (DA), using neuron-level interventions: We modify the representation of each test example in specific neurons, resulting in a counterfactual example from the source domain, which the model is more familiar with.

counterfactual Domain Adaptation

How Gender Debiasing Affects Internal Model Representations, and Why It Matters

2 code implementations NAACL 2022 Hadas Orgad, Seraphina Goldfarb-Tarrant, Yonatan Belinkov

Common studies of gender bias in NLP focus either on extrinsic bias measured by model performance on a downstream task or on intrinsic bias found in models' internal representations.

Locating and Editing Factual Associations in GPT

2 code implementations10 Feb 2022 Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov

To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME).

counterfactual Model Editing +2

On the Pitfalls of Analyzing Individual Neurons in Language Models

2 code implementations ICLR 2022 Omer Antverg, Yonatan Belinkov

Among these, the common approach is to use an external probe to rank neurons according to their relevance to some linguistic attribute, and to evaluate the obtained ranking using the same probe that produced it.


Variance Pruning: Pruning Language Models via Temporal Neuron Variance

no code implementations29 Sep 2021 Berry Weinstein, Yonatan Belinkov

As language models become larger, different pruning methods have been proposed to reduce model size.

Natural Language Understanding

A Generative Approach for Mitigating Structural Biases in Natural Language Inference

1 code implementation *SEM (NAACL) 2022 Dimion Asael, Zachary Ziegler, Yonatan Belinkov

Many natural language inference (NLI) datasets contain biases that allow models to perform well by only using a biased subset of the input, without considering the remainder features.

Natural Language Inference

Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models

1 code implementation ACL 2021 Matthew Finlayson, Aaron Mueller, Sebastian Gehrmann, Stuart Shieber, Tal Linzen, Yonatan Belinkov

Targeted syntactic evaluations have demonstrated the ability of language models to perform subject-verb agreement given difficult contexts.


Variational Information Bottleneck for Effective Low-Resource Fine-Tuning

1 code implementation ICLR 2021 Rabeeh Karimi Mahabadi, Yonatan Belinkov, James Henderson

Moreover, we show that our VIB model finds sentence representations that are more robust to biases in natural language inference datasets, and thereby obtains better generalization to out-of-domain datasets.

Natural Language Inference Sentence +1

Supervising Model Attention with Human Explanations for Robust Natural Language Inference

1 code implementation16 Apr 2021 Joe Stacey, Yonatan Belinkov, Marek Rei

Natural Language Inference (NLI) models are known to learn from biases and artefacts within their training data, impacting how well they generalise to other unseen datasets.

Natural Language Inference

Probing Classifiers: Promises, Shortcomings, and Advances

no code implementations CL (ACL) 2022 Yonatan Belinkov

Probing classifiers have emerged as one of the prominent methodologies for interpreting and analyzing deep neural network models of natural language processing.

Learning from others' mistakes: Avoiding dataset biases without modeling them

no code implementations ICLR 2021 Victor Sanh, Thomas Wolf, Yonatan Belinkov, Alexander M. Rush

State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended underlying task.

Similarity Analysis of Self-Supervised Speech Representations

no code implementations22 Oct 2020 Yu-An Chung, Yonatan Belinkov, James Glass

We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations.

Representation Learning

Analyzing Individual Neurons in Pre-trained Language Models

1 code implementation EMNLP 2020 Nadir Durrani, Hassan Sajjad, Fahim Dalvi, Yonatan Belinkov

We found small subsets of neurons to predict linguistic tasks, with lower level tasks (such as morphology) localized in fewer neurons, compared to higher level task of predicting syntax.

Interpretability and Analysis in Neural NLP

no code implementations ACL 2020 Yonatan Belinkov, Sebastian Gehrmann, Ellie Pavlick

While deep learning has transformed the natural language processing (NLP) field and impacted the larger computational linguistics community, the rise of neural networks is stained by their opaque nature: It is challenging to interpret the inner workings of neural network models, and explicate their behavior.

Probing Neural Dialog Models for Conversational Understanding

1 code implementation WS 2020 Abdelrhman Saleh, Tovly Deutsch, Stephen Casper, Yonatan Belinkov, Stuart Shieber

The predominant approach to open-domain dialog generation relies on end-to-end training of neural models on chat datasets.

Open-Domain Dialog

The Sensitivity of Language Models and Humans to Winograd Schema Perturbations

2 code implementations ACL 2020 Mostafa Abdou, Vinit Ravishankar, Maria Barrett, Yonatan Belinkov, Desmond Elliott, Anders Søgaard

Large-scale pretrained language models are the major driving force behind recent improvements in performance on the Winograd Schema Challenge, a widely employed test of common sense reasoning ability.

Common Sense Reasoning

Similarity Analysis of Contextual Word Representation Models

1 code implementation ACL 2020 John M. Wu, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, James Glass

We use existing and novel similarity measures that aim to gauge the level of localization of information in the deep models, and facilitate the investigation of which design factors affect model similarity, without requiring any external linguistic annotation.

Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?

no code implementations EACL 2021 Abhilasha Ravichander, Yonatan Belinkov, Eduard Hovy

Although neural models have achieved impressive results on several NLP benchmarks, little is understood about the mechanisms they use to perform language tasks.

Natural Language Inference Sentence +1

Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias

1 code implementation26 Apr 2020 Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Simas Sakenis, Jason Huang, Yaron Singer, Stuart Shieber

Common methods for interpreting neural models in natural language processing typically examine either their structure or their behavior, but not both.

Analyzing Redundancy in Pretrained Transformer Models

1 code implementation EMNLP 2020 Fahim Dalvi, Hassan Sajjad, Nadir Durrani, Yonatan Belinkov

Transformer-based deep NLP models are trained using hundreds of millions of parameters, limiting their applicability in computationally constrained environments.

Transfer Learning

Memory-Augmented Recurrent Neural Networks Can Learn Generalized Dyck Languages

2 code implementations8 Nov 2019 Mirac Suzgun, Sebastian Gehrmann, Yonatan Belinkov, Stuart M. Shieber

We introduce three memory-augmented Recurrent Neural Networks (MARNNs) and explore their capabilities on a series of simple language modeling tasks whose solutions require stack-based mechanisms.

Language Modelling

A Constructive Prediction of the Generalization Error Across Scales

no code implementations ICLR 2020 Jonathan S. Rosenfeld, Amir Rosenfeld, Yonatan Belinkov, Nir Shavit

In this work, we present a functional form which approximates well the generalization error in practice.

End-to-End Bias Mitigation by Modelling Biases in Corpora

2 code implementations ACL 2020 Rabeeh Karimi Mahabadi, Yonatan Belinkov, James Henderson

We experiment on large-scale natural language inference and fact verification benchmarks, evaluating on out-of-domain datasets that are specifically designed to assess the robustness of models against known biases in the training data.

Fact Verification Natural Language Inference +1

Don't Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference

1 code implementation ACL 2019 Yonatan Belinkov, Adam Poliak, Stuart M. Shieber, Benjamin Van Durme, Alexander M. Rush

In contrast to standard approaches to NLI, our methods predict the probability of a premise given a hypothesis and NLI label, discouraging models from ignoring the premise.

Natural Language Inference

Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition

1 code implementation9 Jul 2019 Yonatan Belinkov, Ahmed Ali, James Glass

End-to-end neural network systems for automatic speech recognition (ASR) are trained from acoustic features to text transcriptions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects

1 code implementation NAACL 2019 Gabriel Grand, Yonatan Belinkov

Visual question answering (VQA) models have been shown to over-rely on linguistic biases in VQA datasets, answering questions "blindly" without considering visual context.

Question Answering Visual Question Answering

LSTM Networks Can Perform Dynamic Counting

no code implementations WS 2019 Mirac Suzgun, Sebastian Gehrmann, Yonatan Belinkov, Stuart M. Shieber

In this paper, we systematically assess the ability of standard recurrent networks to perform dynamic counting and to encode hierarchical representations.

Analyzing the Structure of Attention in a Transformer Language Model

no code implementations WS 2019 Jesse Vig, Yonatan Belinkov

The Transformer is a fully attention-based alternative to recurrent networks that has achieved state-of-the-art results across a range of NLP tasks.

Language Modelling

One Size Does Not Fit All: Comparing NMT Representations of Different Granularities

no code implementations NAACL 2019 Nadir Durrani, Fahim Dalvi, Hassan Sajjad, Yonatan Belinkov, Preslav Nakov

Recent work has shown that contextualized word representations derived from neural machine translation are a viable alternative to such from simple word predictions tasks.

Machine Translation NMT +1

Linguistic Knowledge and Transferability of Contextual Representations

no code implementations NAACL 2019 Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, Noah A. Smith

Contextual word representations derived from large-scale neural language models are successful across a diverse set of NLP tasks, suggesting that they encode useful and transferable features of language.

Language Modelling

Character-based Surprisal as a Model of Reading Difficulty in the Presence of Error

no code implementations2 Feb 2019 Michael Hahn, Frank Keller, Yonatan Bisk, Yonatan Belinkov

Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones.

NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks

2 code implementations21 Dec 2018 Fahim Dalvi, Avery Nortonsmith, D. Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, James Glass

We present a toolkit to facilitate the interpretation and understanding of neural network models.

What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models

1 code implementation21 Dec 2018 Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Yonatan Belinkov, Anthony Bau, James Glass

We further present a comprehensive analysis of neurons with the aim to address the following questions: i) how localized or distributed are different linguistic properties in the models?

Language Modelling Machine Translation +1

Analysis Methods in Neural Language Processing: A Survey

no code implementations TACL 2019 Yonatan Belinkov, James Glass

The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems.

On Evaluating the Generalization of LSTM Models in Formal Languages

1 code implementation WS 2019 Mirac Suzgun, Yonatan Belinkov, Stuart M. Shieber

Recurrent Neural Networks (RNNs) are theoretically Turing-complete and established themselves as a dominant model for language processing.

Studying the History of the Arabic Language: Language Technology and a Large-Scale Historical Corpus

1 code implementation11 Sep 2018 Yonatan Belinkov, Alexander Magidow, Alberto Barrón-Cedeño, Avi Shmidman, Maxim Romanov

Arabic is a widely-spoken language with a long and rich history, but existing corpora and language technology focus mostly on modern Arabic and its varieties.

On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference

1 code implementation NAACL 2018 Adam Poliak, Yonatan Belinkov, James Glass, Benjamin Van Durme

We propose a process for investigating the extent to which sentence representations arising from neural machine translation (NMT) systems encode distinct semantic phenomena.

Machine Translation Natural Language Inference +4

Synthetic and Natural Noise Both Break Neural Machine Translation

3 code implementations ICLR 2018 Yonatan Belinkov, Yonatan Bisk

Character-based neural machine translation (NMT) models alleviate out-of-vocabulary issues, learn morphology, and move us closer to completely end-to-end translation systems.

Machine Translation NMT +1

Understanding and Improving Morphological Learning in the Neural Machine Translation Decoder

no code implementations IJCNLP 2017 Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Yonatan Belinkov, Stephan Vogel

End-to-end training makes the neural machine translation (NMT) architecture simpler, yet elegant compared to traditional statistical machine translation (SMT).

Decoder Machine Translation +3

Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

1 code implementation NeurIPS 2017 Yonatan Belinkov, James Glass

In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Neural Machine Translation Training in a Multi-Domain Scenario

no code implementations IWSLT 2017 Hassan Sajjad, Nadir Durrani, Fahim Dalvi, Yonatan Belinkov, Stephan Vogel

Model stacking works best when training begins with the furthest out-of-domain data and the model is incrementally fine-tuned with the next furthest domain and so on.

Machine Translation Translation

What do Neural Machine Translation Models Learn about Morphology?

1 code implementation ACL 2017 Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, James Glass

Neural machine translation (MT) models obtain state-of-the-art performance while maintaining a simple, end-to-end architecture.

Decoder Machine Translation +2

Shamela: A Large-Scale Historical Arabic Corpus

no code implementations WS 2016 Yonatan Belinkov, Alexander Magidow, Maxim Romanov, Avi Shmidman, Moshe Koppel

Arabic is a widely-spoken language with a rich and long history spanning more than fourteen centuries.

Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results

no code implementations25 Sep 2016 Yonatan Belinkov, James Glass

Machine translation between Arabic and Hebrew has so far been limited by a lack of parallel corpora, despite the political and cultural importance of this language pair.

Machine Translation Translation

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

3 code implementations15 Aug 2016 Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, Yoav Goldberg

The analysis sheds light on the relative strengths of different sentence embedding methods with respect to these low level prediction tasks, and on the effect of the encoded vector's dimensionality on the resulting representations.

Sentence Sentence Embedding +1

Cannot find the paper you are looking for? You can Submit a new open access paper.