Search Results for author: Gintare Karolina Dziugaite

Found 42 papers, 9 papers with code

Generalization via Derandomization

no code implementations • ICML 2020 • Jeffrey Negrea, Daniel Roy, Gintare Karolina Dziugaite

At the same time, we bound the risk of h^ in terms of a surrogate that is constructed by conditioning and shown to belong to a nonrandom class with uniformly small generalization error.

Paper
Add Code

Simultaneous linear connectivity of neural networks modulo permutation

no code implementations • 9 Apr 2024 • Ekansh Sharma, Devin Kwok, Tom Denton, Daniel M. Roy, David Rolnick, Gintare Karolina Dziugaite

In contrast, the claim "strong linear connectivity"-that for each network, there exists one permutation that simultaneously connects it with the other networks-is both intuitively and practically more desirable.

Paper
Add Code

Evaluating Interventional Reasoning Capabilities of Large Language Models

no code implementations • 8 Apr 2024 • Tejas Kasetty, Divyat Mahajan, Gintare Karolina Dziugaite, Alexandre Drouin, Dhanya Sridhar

Numerous decision-making tasks require estimating causal effects under interventions on different parts of a system.

Causal Inference Decision Making

Paper
Add Code

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

no code implementations • 14 Feb 2024 • Idan Attias, Gintare Karolina Dziugaite, Mahdi Haghifam, Roi Livni, Daniel M. Roy

In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO).

Generalization Bounds Memorization

Paper
Add Code

Mixtures of Experts Unlock Parameter Scaling for Deep RL

no code implementations • 13 Feb 2024 • Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size.

reinforcement-learning Self-Supervised Learning

Paper
Add Code

Dataset Difficulty and the Role of Inductive Bias

no code implementations • 3 Jan 2024 • Devin Kwok, Nikhil Anand, Jonathan Frankle, Gintare Karolina Dziugaite, David Rolnick

Motivated by the goals of dataset pruning and defect identification, a growing body of methods have been developed to score individual examples within a dataset.

Inductive Bias

Paper
Add Code

Leveraging Function Space Aggregation for Federated Learning at Scale

no code implementations • 17 Nov 2023 • Nikita Dhawan, Nicole Mitchell, Zachary Charles, Zachary Garrett, Gintare Karolina Dziugaite

Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization.

Distributed Optimization Federated Learning

Paper
Add Code

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

no code implementations • 7 Oct 2023 • Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite

We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference.

In-Context Learning

Paper
Add Code

Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias

no code implementations • 30 May 2023 • Yu Yang, Eric Gan, Gintare Karolina Dziugaite, Baharan Mirzasoleiman

In this work, we provide the first theoretical analysis of the effect of simplicity bias on learning spurious correlations.

Inductive Bias

Paper
Add Code

JaxPruner: A concise library for sparsity research

1 code implementation • 27 Apr 2023 • Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research.

196

Paper
Code

Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization

no code implementations • 27 Dec 2022 • Mahdi Haghifam, Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund, Daniel M. Roy, Gintare Karolina Dziugaite

To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization.

Generalization Bounds

Paper
Add Code

The Effect of Data Dimensionality on Neural Network Prunability

no code implementations • 1 Dec 2022 • Zachary Ankner, Alex Renda, Gintare Karolina Dziugaite, Jonathan Frankle, Tian Jin

Practitioners prune neural networks for efficiency gains and generalization improvements, but few scrutinize the factors determining the prunability of a neural network the maximum fraction of weights that pruning can remove without compromising the model's test accuracy.

Paper
Add Code

Pruning's Effect on Generalization Through the Lens of Training and Regularization

no code implementations • 25 Oct 2022 • Tian Jin, Michael Carbin, Daniel M. Roy, Jonathan Frankle, Gintare Karolina Dziugaite

Pruning models in this over-parameterized regime leads to a contradiction -- while theory predicts that reducing model size harms generalization, pruning to a range of sparsities nonetheless improves it.

Paper
Add Code

Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

no code implementations • 6 Oct 2022 • Mansheej Paul, Feng Chen, Brett W. Larsen, Jonathan Frankle, Surya Ganguli, Gintare Karolina Dziugaite

Third, we show how the flatness of the error landscape at the end of training determines a limit on the fraction of weights that can be pruned at each iteration of IMP.

Paper
Add Code

Understanding Generalization via Leave-One-Out Conditional Mutual Information

no code implementations • 29 Jun 2022 • Mahdi Haghifam, Shay Moran, Daniel M. Roy, Gintare Karolina Dziugaite

These leave-one-out variants of the conditional mutual information (CMI) of an algorithm (Steinke and Zakynthinou, 2020) are also seen to control the mean generalization error of learning algorithms with bounded loss functions.

Transductive Learning

Paper
Add Code

Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks

1 code implementation • 2 Jun 2022 • Mansheej Paul, Brett W. Larsen, Surya Ganguli, Jonathan Frankle, Gintare Karolina Dziugaite

A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that $\unicode{x2014}$ after just a few hundred steps of dense training $\unicode{x2014}$ the method can find a sparse sub-network that can be trained to the same accuracy as the dense network.

Paper
Code

Towards a Unified Information-Theoretic Framework for Generalization

no code implementations • NeurIPS 2021 • Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Daniel M. Roy

We further show that an inherent limitation of proper learning of VC classes contradicts the existence of a proper learner with constant CMI, and it implies a negative resolution to an open problem of Steinke and Zakynthinou (2020).

Generalization Bounds

Paper
Add Code

Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning

no code implementations • 22 Oct 2021 • Soufiane Hayou, Bobby He, Gintare Karolina Dziugaite

In the linear model, we show that a PAC-Bayes generalization error bound is controlled by the magnitude of the change in feature alignment between the 'prior' and 'posterior' data.

L2 Regularization regression

Paper
Add Code

Deep Learning on a Data Diet: Finding Important Examples Early in Training

1 code implementation • NeurIPS 2021 • Mansheej Paul, Surya Ganguli, Gintare Karolina Dziugaite

Compared to recent work that prunes data by discarding examples that are rarely forgotten over the course of training, our scores use only local information early in training.

Paper
Code

Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

no code implementations • 1 Feb 2021 • Gergely Neu, Gintare Karolina Dziugaite, Mahdi Haghifam, Daniel M. Roy

The key factors our bounds depend on are the variance of the gradients (with respect to the data distribution) and the local smoothness of the objective function along the SGD path, and the sensitivity of the loss function to perturbations to the final output.

Generalization Bounds Stochastic Optimization

Paper
Add Code

NeurIPS 2020 Competition: Predicting Generalization in Deep Learning

no code implementations • 14 Dec 2020 • Yiding Jiang, Pierre Foret, Scott Yak, Daniel M. Roy, Hossein Mobahi, Gintare Karolina Dziugaite, Samy Bengio, Suriya Gunasekar, Isabelle Guyon, Behnam Neyshabur

Understanding generalization in deep learning is arguably one of the most important questions in deep learning.

Decision Making Generalization Bounds

Paper
Add Code

On the Information Complexity of Proper Learners for VC Classes in the Realizable Case

no code implementations • 5 Nov 2020 • Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Daniel M. Roy

We provide a negative resolution to a conjecture of Steinke and Zakynthinou (2020a), by showing that their bound on the conditional mutual information (CMI) of proper learners of Vapnik--Chervonenkis (VC) classes cannot be improved from $d \log n +2$ to $O(d)$, where $n$ is the number of i. i. d.

Paper
Add Code

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel

no code implementations • NeurIPS 2020 • Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya Ganguli

We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK.

Paper
Add Code

Enforcing Interpretability and its Statistical Impacts: Trade-offs between Accuracy and Interpretability

no code implementations • 26 Oct 2020 • Gintare Karolina Dziugaite, Shai Ben-David, Daniel M. Roy

We then model the act of enforcing interpretability as that of performing empirical risk minimization over the set of interpretable hypotheses.

Binary Classification Learning Theory +1

Paper
Add Code

In Search of Robust Measures of Generalization

1 code implementation • NeurIPS 2020 • Gintare Karolina Dziugaite, Alexandre Drouin, Brady Neal, Nitarshan Rajkumar, Ethan Caballero, Linbo Wang, Ioannis Mitliagkas, Daniel M. Roy

A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk.

Generalization Bounds

Paper
Code

Pruning Neural Networks at Initialization: Why are We Missing the Mark?

no code implementations • ICLR 2021 • Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin

Recent work has explored the possibility of pruning neural networks at initialization.

Paper
Add Code

On the role of data in PAC-Bayes bounds

no code implementations • 19 Jun 2020 • Gintare Karolina Dziugaite, Kyle Hsu, Waseem Gharbieh, Gabriel Arpino, Daniel M. Roy

In this work, we show that the bound based on the oracle prior can be suboptimal: In some cases, a stronger bound is obtained by using a data-dependent oracle prior, i. e., a conditional expectation of the posterior, given a subset of the training data that is then excluded from the empirical risk term.

Paper
Add Code

Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms

no code implementations • NeurIPS 2020 • Mahdi Haghifam, Jeffrey Negrea, Ashish Khisti, Daniel M. Roy, Gintare Karolina Dziugaite

Finally, we apply these bounds to the study of Langevin dynamics algorithm, showing that conditioning on the super sample allows us to exploit information in the optimization trajectory to obtain tighter bounds based on hypothesis tests.

Generalization Bounds

Paper
Add Code

RelatIF: Identifying Explanatory Training Examples via Relative Influence

no code implementations • 25 Mar 2020 • Elnaz Barshan, Marc-Etienne Brunet, Gintare Karolina Dziugaite

In this work, we focus on the use of influence functions to identify relevant training examples that one might hope "explain" the predictions of a machine learning model.

Paper
Add Code

Linear Mode Connectivity and the Lottery Ticket Hypothesis

2 code implementations • ICML 2020 • Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin

We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e. g., random data order and augmentation).

Linear Mode Connectivity

619

Paper
Code

In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictors

no code implementations • 9 Dec 2019 • Jeffrey Negrea, Gintare Karolina Dziugaite, Daniel M. Roy

At the same time, we bound the risk of $\hat h$ in terms of surrogates constructed by conditioning and denoising, respectively, and shown to belong to nonrandom classes with uniformly small generalization error.

Denoising

Paper
Add Code

Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates

1 code implementation • NeurIPS 2019 • Jeffrey Negrea, Mahdi Haghifam, Gintare Karolina Dziugaite, Ashish Khisti, Daniel M. Roy

In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and Veeravalli (2019).

Generalization Bounds

Paper
Code

Mode Connectivity and Sparse Neural Networks

no code implementations • 25 Sep 2019 • Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin

We observe that these subnetworks match the accuracy of the full network only when two SGD runs for the same subnetwork are connected by linear paths with the no change in test error.

Paper
Add Code

Stochastic Neural Network with Kronecker Flow

no code implementations • 10 Jun 2019 • Chin-wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste, Aaron Courville

Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to scale to the high-dimensional setting of stochastic neural networks.

Multi-Armed Bandits Thompson Sampling +1

Paper
Add Code

Stabilizing the Lottery Ticket Hypothesis

3 code implementations • 5 Mar 2019 • Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin

With this change, it finds small subnetworks of deeper networks (e. g., 80% sparsity on Resnet-50) that can complete the training process to match the accuracy of the original network on more challenging tasks (e. g., ImageNet).

619

Paper
Code

Data-dependent PAC-Bayes priors via differential privacy

no code implementations • NeurIPS 2018 • Gintare Karolina Dziugaite, Daniel M. Roy

The Probably Approximately Correct (PAC) Bayes framework (McAllester, 1999) can incorporate knowledge about the learning algorithm and (data) distribution through the use of distribution-dependent priors, yielding tighter generalization bounds on data-dependent posteriors.

Generalization Bounds valid

Paper
Add Code

Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy

no code implementations • ICLR 2018 • Gintare Karolina Dziugaite, Daniel M. Roy

We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i. e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier.

Generalization Bounds valid

Paper
Add Code

Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors

no code implementations • ICML 2018 • Gintare Karolina Dziugaite, Daniel M. Roy

Generalization Bounds valid

Paper
Add Code

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

3 code implementations • 31 Mar 2017 • Gintare Karolina Dziugaite, Daniel M. Roy

One of the defining properties of deep learning is that models are chosen to have many more parameters than available training data.

Generalization Bounds

Paper
Code

A study of the effect of JPG compression on adversarial images

no code implementations • 2 Aug 2016 • Gintare Karolina Dziugaite, Zoubin Ghahramani, Daniel M. Roy

For Fast-Gradient-Sign perturbations of small magnitude, we found that JPG compression often reverses the drop in classification accuracy to a large extent, but not always.

Classification General Classification +1

Paper
Add Code

Neural Network Matrix Factorization

2 code implementations • 19 Nov 2015 • Gintare Karolina Dziugaite, Daniel M. Roy

Here we consider replacing the inner product by an arbitrary function that we learn from the data at the same time as we learn the latent feature vectors.

Ranked #10 on Recommendation Systems on MovieLens 1M

Collaborative Filtering

Paper
Code

Training generative neural networks via Maximum Mean Discrepancy optimization

no code implementations • 14 May 2015 • Gintare Karolina Dziugaite, Daniel M. Roy, Zoubin Ghahramani

We frame learning as an optimization minimizing a two-sample test statistic---informally speaking, a good generator network produces samples that cause a two-sample test to fail to reject the null hypothesis.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.