Search Results for author: Behnam Neyshabur

Found 44 papers, 20 papers with code

Understanding the effect of sparsity on neural networks robustness

no code implementations22 Jun 2022 Lukas Timpl, Rahim Entezari, Hanie Sedghi, Behnam Neyshabur, Olga Saukh

This paper examines the impact of static sparsity on the robustness of a trained network to weight perturbations, data corruption, and adversarial examples.

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

1 code implementation9 Jun 2022 Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramón Risco Delgado, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Timothy Telleen-Lawton, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning

Block-Recurrent Transformers

no code implementations11 Mar 2022 DeLesley Hutchins, Imanol Schlag, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur

It is merely a transformer layer: it uses self-attention and cross-attention to efficiently compute a recurrent function over a large set of state vectors and tokens.

Language Modelling

Data Scaling Laws in NMT: The Effect of Noise and Architecture

no code implementations4 Feb 2022 Yamini Bansal, Behrooz Ghorbani, Ankush Garg, Biao Zhang, Maxim Krikun, Colin Cherry, Behnam Neyshabur, Orhan Firat

In this work, we study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT).

Language Modelling Machine Translation

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

no code implementations ICLR 2022 Saurabh Garg, Sivaraman Balakrishnan, Zachary C. Lipton, Behnam Neyshabur, Hanie Sedghi

Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions that may cause performance drops.

The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks

2 code implementations ICLR 2022 Rahim Entezari, Hanie Sedghi, Olga Saukh, Behnam Neyshabur

In this paper, we conjecture that if the permutation invariance of neural networks is taken into account, SGD solutions will likely have no barrier in the linear interpolation between them.

A Loss Curvature Perspective on Training Instability in Deep Learning

no code implementations8 Oct 2021 Justin Gilmer, Behrooz Ghorbani, Ankush Garg, Sneha Kudugunta, Behnam Neyshabur, David Cardoze, George Dahl, Zachary Nado, Orhan Firat

In this work, we study the evolution of the loss Hessian across many classification tasks in order to understand the effect the curvature of the loss has on the training dynamics.

Exploring the Limits of Large Scale Pre-training

no code implementations ICLR 2022 Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie Sedghi

Recent developments in large-scale machine learning suggest that by scaling up data, model size and training time properly, one might observe that improvements in pre-training would transfer favorably to most downstream tasks.

A Loss Curvature Perspective on Training Instabilities of Deep Learning Models

no code implementations ICLR 2022 Justin Gilmer, Behrooz Ghorbani, Ankush Garg, Sneha Kudugunta, Behnam Neyshabur, David Cardoze, George Edward Dahl, Zachary Nado, Orhan Firat

In this work, we study the evolution of the loss Hessian across many classification tasks in order to understand the effect the curvature of the loss has on the training dynamics.

The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning

no code implementations30 Jun 2021 Anders Andreassen, Yasaman Bahri, Behnam Neyshabur, Rebecca Roelofs

Although machine learning models typically experience a drop in performance on out-of-distribution data, accuracies on in- versus out-of-distribution data are widely observed to follow a single linear trend when evaluated across a testbed of models.

Deep Learning Through the Lens of Example Difficulty

1 code implementation NeurIPS 2021 Robert J. N. Baldock, Hartmut Maennel, Behnam Neyshabur

Existing work on understanding deep learning often employs measures that compress all data-dependent information into a few numbers.

When Do Curricula Work?

1 code implementation ICLR 2021 Xiaoxia Wu, Ethan Dyer, Behnam Neyshabur

Inspired by common use cases of curriculum learning in practice, we investigate the role of limited training time budget and noisy data in the success of curriculum learning.

Understanding the Failure Modes of Out-of-Distribution Generalization

1 code implementation ICLR 2021 Vaishnavh Nagarajan, Anders Andreassen, Behnam Neyshabur

Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time.

Image Classification Out-of-Distribution Generalization

Are wider nets better given the same number of parameters?

2 code implementations ICLR 2021 Anna Golubeva, Behnam Neyshabur, Guy Gur-Ari

Empirical studies demonstrate that the performance of neural networks improves with increasing number of parameters.

Extreme Memorization via Scale of Initialization

1 code implementation ICLR 2021 Harsh Mehta, Ashok Cutkosky, Behnam Neyshabur

In the case of the homogeneous ReLU activation, we show that this behavior can be attributed to the loss function.

Image Classification

What is being transferred in transfer learning?

1 code implementation NeurIPS 2020 Behnam Neyshabur, Hanie Sedghi, Chiyuan Zhang

One desired capability for machines is the ability to transfer their knowledge of one domain to another where data is (usually) scarce.

Transfer Learning

Towards Learning Convolutions from Scratch

no code implementations NeurIPS 2020 Behnam Neyshabur

Convolution is one of the most essential components of architectures used in computer vision.

Image Classification Inductive Bias

Observational Overfitting in Reinforcement Learning

no code implementations ICLR 2020 Xingyou Song, Yiding Jiang, Stephen Tu, Yilun Du, Behnam Neyshabur

A major component of overfitting in model-free reinforcement learning (RL) involves the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process (MDP).

reinforcement-learning

The role of over-parametrization in generalization of neural networks

1 code implementation ICLR 2019 Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann Lecun, Nathan Srebro

Despite existing work on ensuring generalization of neural networks in terms of scale sensitive complexity measures, such as norms, margin and sharpness, these complexity measures do not offer an explanation of why neural networks generalize better with over-parametrization.

Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks

2 code implementations30 May 2018 Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann Lecun, Nathan Srebro

Despite existing work on ensuring generalization of neural networks in terms of scale sensitive complexity measures, such as norms, margin and sharpness, these complexity measures do not offer an explanation of why neural networks generalize better with over-parametrization.

Stronger generalization bounds for deep nets via a compression approach

no code implementations ICML 2018 Sanjeev Arora, Rong Ge, Behnam Neyshabur, Yi Zhang

Analysis of correctness of our compression relies upon some newly identified \textquotedblleft noise stability\textquotedblright properties of trained deep nets, which are also experimentally verified.

Generalization Bounds

Implicit Regularization in Deep Learning

1 code implementation6 Sep 2017 Behnam Neyshabur

In an attempt to better understand generalization in deep learning, we study several possible explanations.

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

no code implementations ICLR 2018 Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro

We present a generalization bound for feedforward neural networks in terms of the product of the spectral norm of the layers and the Frobenius norm of the weights.

Exploring Generalization in Deep Learning

1 code implementation NeurIPS 2017 Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, Nathan Srebro

With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness.

Implicit Regularization in Matrix Factorization

no code implementations NeurIPS 2017 Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro

We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of $X$.

Stabilizing GAN Training with Multiple Random Projections

2 code implementations ICLR 2018 Behnam Neyshabur, Srinadh Bhojanapalli, Ayan Chakrabarti

Training generative adversarial networks is unstable in high-dimensions as the true data distribution tends to be concentrated in a small fraction of the ambient space.

Geometry of Optimization and Implicit Regularization in Deep Learning

1 code implementation8 May 2017 Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro

We argue that the optimization plays a crucial role in generalization of deep learning models through implicit regularization.

Corralling a Band of Bandit Algorithms

no code implementations19 Dec 2016 Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire

We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be run on its own.

Multi-Armed Bandits online learning

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

no code implementations NeurIPS 2016 Behnam Neyshabur, Yuhuai Wu, Ruslan Salakhutdinov, Nathan Srebro

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations.

Global Optimality of Local Search for Low Rank Matrix Recovery

no code implementations NeurIPS 2016 Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro

We show that there are no spurious local minima in the non-convex factorized parametrization of low-rank matrix recovery from incoherent linear measurements.

Data-Dependent Path Normalization in Neural Networks

no code implementations20 Nov 2015 Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro

We propose a unified framework for neural net normalization, regularization and optimization, which includes Path-SGD and Batch-Normalization and interpolates between them across two different dimensions.

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

1 code implementation NeurIPS 2015 Behnam Neyshabur, Ruslan Salakhutdinov, Nathan Srebro

We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights.

Norm-Based Capacity Control in Neural Networks

no code implementations27 Feb 2015 Behnam Neyshabur, Ryota Tomioka, Nathan Srebro

We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

no code implementations20 Dec 2014 Behnam Neyshabur, Ryota Tomioka, Nathan Srebro

We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multilayer feed-forward networks.

Inductive Bias

On Symmetric and Asymmetric LSHs for Inner Product Search

1 code implementation21 Oct 2014 Behnam Neyshabur, Nathan Srebro

We consider the problem of designing locality sensitive hashes (LSH) for inner product similarity, and of the power of asymmetric hashes in this context.

Clustering, Hamming Embedding, Generalized LSH and the Max Norm

no code implementations13 May 2014 Behnam Neyshabur, Yury Makarychev, Nathan Srebro

We study the convex relaxation of clustering and hamming embedding, focusing on the asymmetric case (co-clustering and asymmetric hamming embedding), understanding their relationship to LSH as studied by (Charikar 2002) and to the max-norm ball, and the differences between their symmetric and asymmetric versions.

The Power of Asymmetry in Binary Hashing

1 code implementation NeurIPS 2013 Behnam Neyshabur, Payman Yadollahpour, Yury Makarychev, Ruslan Salakhutdinov, Nathan Srebro

When approximating binary similarity using the hamming distance between short binary hashes, we show that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps.

Sparse Matrix Factorization

no code implementations13 Nov 2013 Behnam Neyshabur, Rina Panigrahy

We investigate the problem of factorizing a matrix into several sparse matrices and propose an algorithm for this under randomness and sparsity assumptions.

Dictionary Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.