no code implementations • ICML 2020 • Jeffrey Negrea, Daniel Roy, Gintare Karolina Dziugaite
At the same time, we bound the risk of h^ in terms of a surrogate that is constructed by conditioning and shown to belong to a nonrandom class with uniformly small generalization error.
1 code implementation • 2 Jun 2022 • Mansheej Paul, Brett W. Larsen, Surya Ganguli, Jonathan Frankle, Gintare Karolina Dziugaite
A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that $\unicode{x2014}$ after just a few hundred steps of dense training $\unicode{x2014}$ the method can find a sparse sub-network that can be trained to the same accuracy as the dense network.
no code implementations • NeurIPS 2021 • Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Daniel M. Roy
We further show that an inherent limitation of proper learning of VC classes contradicts the existence of a proper learner with constant CMI, and it implies a negative resolution to an open problem of Steinke and Zakynthinou (2020).
no code implementations • 22 Oct 2021 • Soufiane Hayou, Bobby He, Gintare Karolina Dziugaite
In the linear model, we show that a PAC-Bayes generalization error bound is controlled by the magnitude of the change in feature alignment between the 'prior' and 'posterior' data.
1 code implementation • NeurIPS 2021 • Mansheej Paul, Surya Ganguli, Gintare Karolina Dziugaite
In this work, we make the striking observation that, on standard vision benchmarks, the initial loss gradient norm of individual training examples, averaged over several weight initializations, can be used to identify a smaller set of training data that is important for generalization.
no code implementations • 1 Feb 2021 • Gergely Neu, Gintare Karolina Dziugaite, Mahdi Haghifam, Daniel M. Roy
The key factors our bounds depend on are the variance of the gradients (with respect to the data distribution) and the local smoothness of the objective function along the SGD path, and the sensitivity of the loss function to perturbations to the final output.
no code implementations • 14 Dec 2020 • Yiding Jiang, Pierre Foret, Scott Yak, Daniel M. Roy, Hossein Mobahi, Gintare Karolina Dziugaite, Samy Bengio, Suriya Gunasekar, Isabelle Guyon, Behnam Neyshabur
Understanding generalization in deep learning is arguably one of the most important questions in deep learning.
no code implementations • 5 Nov 2020 • Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Daniel M. Roy
We provide a negative resolution to a conjecture of Steinke and Zakynthinou (2020a), by showing that their bound on the conditional mutual information (CMI) of proper learners of Vapnik--Chervonenkis (VC) classes cannot be improved from $d \log n +2$ to $O(d)$, where $n$ is the number of i. i. d.
no code implementations • NeurIPS 2020 • Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya Ganguli
We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK.
no code implementations • 26 Oct 2020 • Gintare Karolina Dziugaite, Shai Ben-David, Daniel M. Roy
We then model the act of enforcing interpretability as that of performing empirical risk minimization over the set of interpretable hypotheses.
1 code implementation • NeurIPS 2020 • Gintare Karolina Dziugaite, Alexandre Drouin, Brady Neal, Nitarshan Rajkumar, Ethan Caballero, Linbo Wang, Ioannis Mitliagkas, Daniel M. Roy
A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk.
no code implementations • ICLR 2021 • Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin
Recent work has explored the possibility of pruning neural networks at initialization.
no code implementations • 19 Jun 2020 • Gintare Karolina Dziugaite, Kyle Hsu, Waseem Gharbieh, Gabriel Arpino, Daniel M. Roy
In this work, we show that the bound based on the oracle prior can be suboptimal: In some cases, a stronger bound is obtained by using a data-dependent oracle prior, i. e., a conditional expectation of the posterior, given a subset of the training data that is then excluded from the empirical risk term.
no code implementations • NeurIPS 2020 • Mahdi Haghifam, Jeffrey Negrea, Ashish Khisti, Daniel M. Roy, Gintare Karolina Dziugaite
Finally, we apply these bounds to the study of Langevin dynamics algorithm, showing that conditioning on the super sample allows us to exploit information in the optimization trajectory to obtain tighter bounds based on hypothesis tests.
no code implementations • 25 Mar 2020 • Elnaz Barshan, Marc-Etienne Brunet, Gintare Karolina Dziugaite
In this work, we focus on the use of influence functions to identify relevant training examples that one might hope "explain" the predictions of a machine learning model.
1 code implementation • ICML 2020 • Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin
We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e. g., random data order and augmentation).
no code implementations • 9 Dec 2019 • Jeffrey Negrea, Gintare Karolina Dziugaite, Daniel M. Roy
At the same time, we bound the risk of $\hat h$ in terms of surrogates constructed by conditioning and denoising, respectively, and shown to belong to nonrandom classes with uniformly small generalization error.
1 code implementation • NeurIPS 2019 • Jeffrey Negrea, Mahdi Haghifam, Gintare Karolina Dziugaite, Ashish Khisti, Daniel M. Roy
In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and Veeravalli (2019).
no code implementations • 25 Sep 2019 • Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin
We observe that these subnetworks match the accuracy of the full network only when two SGD runs for the same subnetwork are connected by linear paths with the no change in test error.
no code implementations • 10 Jun 2019 • Chin-wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste, Aaron Courville
Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to scale to the high-dimensional setting of stochastic neural networks.
3 code implementations • 5 Mar 2019 • Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin
With this change, it finds small subnetworks of deeper networks (e. g., 80% sparsity on Resnet-50) that can complete the training process to match the accuracy of the original network on more challenging tasks (e. g., ImageNet).
no code implementations • NeurIPS 2018 • Gintare Karolina Dziugaite, Daniel M. Roy
The Probably Approximately Correct (PAC) Bayes framework (McAllester, 1999) can incorporate knowledge about the learning algorithm and (data) distribution through the use of distribution-dependent priors, yielding tighter generalization bounds on data-dependent posteriors.
no code implementations • ICLR 2018 • Gintare Karolina Dziugaite, Daniel M. Roy
We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i. e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier.
no code implementations • ICML 2018 • Gintare Karolina Dziugaite, Daniel M. Roy
We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i. e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier.
2 code implementations • 31 Mar 2017 • Gintare Karolina Dziugaite, Daniel M. Roy
One of the defining properties of deep learning is that models are chosen to have many more parameters than available training data.
no code implementations • 2 Aug 2016 • Gintare Karolina Dziugaite, Zoubin Ghahramani, Daniel M. Roy
For Fast-Gradient-Sign perturbations of small magnitude, we found that JPG compression often reverses the drop in classification accuracy to a large extent, but not always.
2 code implementations • 19 Nov 2015 • Gintare Karolina Dziugaite, Daniel M. Roy
Here we consider replacing the inner product by an arbitrary function that we learn from the data at the same time as we learn the latent feature vectors.
Ranked #10 on
Recommendation Systems
on MovieLens 1M
no code implementations • 14 May 2015 • Gintare Karolina Dziugaite, Daniel M. Roy, Zoubin Ghahramani
We frame learning as an optimization minimizing a two-sample test statistic---informally speaking, a good generator network produces samples that cause a two-sample test to fail to reject the null hypothesis.