1 code implementation • 18 Jul 2023 • Pratyush Maini, Michael C. Mozer, Hanie Sedghi, Zachary C. Lipton, J. Zico Kolter, Chiyuan Zhang
Recent efforts at explaining the interplay of memorization and generalization in deep overparametrized networks have posited that neural networks $\textit{memorize}$ "hard" examples in the final few layers of the model.
2 code implementations • 27 Feb 2023 • Rahim Entezari, Mitchell Wortsman, Olga Saukh, M. Moein Shariatnia, Hanie Sedghi, Ludwig Schmidt
We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance using 3 pre-training methods (supervised, contrastive language-image and image-image), 7 pre-training datasets, and 9 downstream datasets.
1 code implementation • 8 Dec 2022 • Mahsa Forouzesh, Hanie Sedghi, Patrick Thiran
We empirically show the effectiveness of our metric in tracking memorization on various architectures and datasets and provide theoretical insights into the design of the susceptibility metric.
no code implementations • 18 Nov 2022 • Amr Khalifa, Michael C. Mozer, Hanie Sedghi, Behnam Neyshabur, Ibrahim Alabdulmohsin
Inspired by this, we show that extending temperature scaling across all layers improves both calibration and accuracy.
no code implementations • 15 Nov 2022 • Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron Courville, Behnam Neyshabur, Hanie Sedghi
Large language models (LLMs) have shown increasing in-context learning capabilities through scaling up model and data size.
1 code implementation • 15 Nov 2022 • Keller Jordan, Hanie Sedghi, Olga Saukh, Rahim Entezari, Behnam Neyshabur
In this paper we look into the conjecture of Entezari et al. (2021) which states that if the permutation invariance of neural networks is taken into account, then there is likely no loss barrier to the linear interpolation between SGD solutions.
no code implementations • 22 Jun 2022 • Lukas Timpl, Rahim Entezari, Hanie Sedghi, Behnam Neyshabur, Olga Saukh
This paper examines the impact of static sparsity on the robustness of a trained network to weight perturbations, data corruption, and adversarial examples.
1 code implementation • ICLR 2022 • Saurabh Garg, Sivaraman Balakrishnan, Zachary C. Lipton, Behnam Neyshabur, Hanie Sedghi
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions that may cause performance drops.
1 code implementation • ICLR 2022 • Rahim Entezari, Hanie Sedghi, Olga Saukh, Behnam Neyshabur
In this paper, we conjecture that if the permutation invariance of neural networks is taken into account, SGD solutions will likely have no barrier in the linear interpolation between them.
no code implementations • ICLR 2022 • Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie Sedghi
Recent developments in large-scale machine learning suggest that by scaling up data, model size and training time properly, one might observe that improvements in pre-training would transfer favorably to most downstream tasks.
no code implementations • 29 Sep 2021 • Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi
It is shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution; self-training can be successfully applied on gradually shifted samples to adapt the model toward the target distribution.
1 code implementation • 10 Jun 2021 • Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi
It has been shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution, self-training can be successfully applied on gradually shifted samples to adapt the model toward the target distribution.
no code implementations • ICLR 2021 • Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi
We propose a new framework for reasoning about generalization in deep learning.
2 code implementations • 16 Oct 2020 • Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi
We propose a new framework for reasoning about generalization in deep learning.
1 code implementation • NeurIPS 2020 • Behnam Neyshabur, Hanie Sedghi, Chiyuan Zhang
One desired capability for machines is the ability to transfer their knowledge of one domain to another where data is (usually) scarce.
no code implementations • ICLR 2020 • Niladri S. Chatterji, Behnam Neyshabur, Hanie Sedghi
We study the phenomenon that some modules of deep neural networks (DNNs) are more critical than others.
no code implementations • ICLR 2020 • Philip M. Long, Hanie Sedghi
We prove bounds on the generalization error of convolutional networks.
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
no code implementations • 7 Jan 2019 • Philip M. Long, Hanie Sedghi
We analyze the joint probability distribution on the lengths of the vectors of hidden variables in different layers of a fully connected deep network, when the weights and biases are chosen randomly according to Gaussian distributions, and the input is in $\{ -1, 1\}^N$.
1 code implementation • ICLR 2019 • Hanie Sedghi, Vineet Gupta, Philip M. Long
We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their efficient computation.
no code implementations • TACL 2018 • Hanie Sedghi, Ashish Sabharwal
Given a knowledge base or KB containing (noisy) facts about common nouns or generics, such as "all trees produce oxygen" or "some animals live in forests", we consider the problem of inferring additional such facts at a precision similar to that of the starting KB.
no code implementations • 3 Mar 2016 • Hanie Sedghi, Anima Anandkumar
We consider the problem of training input-output recurrent neural networks (RNN) for sequence labeling tasks.
no code implementations • 28 Jun 2015 • Majid Janzamin, Hanie Sedghi, Anima Anandkumar
We propose a novel algorithm based on tensor decomposition for guaranteed training of two-layer neural networks.
no code implementations • 16 Mar 2015 • Anima Anandkumar, Hanie Sedghi
Community detection in graphs has been extensively studied both in theory and in applications.
no code implementations • 19 Dec 2014 • Majid Janzamin, Hanie Sedghi, Anima Anandkumar
In this paper, we consider a novel class of matrix and tensor-valued features, which can be pre-trained using unlabeled samples.
no code implementations • 9 Dec 2014 • Hanie Sedghi, Majid Janzamin, Anima Anandkumar
In contrast, we present a tensor decomposition method which is guaranteed to correctly recover the parameters.
no code implementations • 9 Dec 2014 • Majid Janzamin, Hanie Sedghi, Anima Anandkumar
In this paper, we consider a novel class of matrix and tensor-valued features, which can be pre-trained using unlabeled samples.
no code implementations • 8 Dec 2014 • Hanie Sedghi, Anima Anandkumar
We provide novel guaranteed approaches for training feedforward neural networks with sparse connectivity.
no code implementations • NeurIPS 2014 • Hanie Sedghi, Anima Anandkumar, Edmond Jonckheere
We first analyze the simple setting, where the optimization problem consists of a loss function and a single regularizer (e. g. sparse optimization), and then extend to the multi-block setting with multiple regularizers and multiple variables (e. g. matrix decomposition into sparse and low rank components).
no code implementations • 7 Mar 2014 • Hanie Sedghi, Edmond Jonckheere
We propose a decentralized false data injection detection scheme based on Markov graph of the bus phase angles.
2 code implementations • NeurIPS 2014 • Hanie Sedghi, Anima Anandkumar, Edmond Jonckheere
For sparse optimization, we establish that the modified ADMM method has an optimal convergence rate of $\mathcal{O}(s\log d/T)$, where $s$ is the sparsity level, $d$ is the data dimension and $T$ is the number of steps.