no code implementations • 27 Dec 2022 • Mahdi Haghifam, Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund, Daniel M. Roy, Gintare Karolina Dziugaite
To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization.
no code implementations • 25 Oct 2022 • Tian Jin, Michael Carbin, Daniel M. Roy, Jonathan Frankle, Gintare Karolina Dziugaite
Pruning models in this over-parameterized regime leads to a contradiction -- while theory predicts that reducing model size harms generalization, pruning to a range of sparsities nonetheless improves it.
no code implementations • 25 Jul 2022 • Jeffrey Negrea, Jun Yang, Haoyue Feng, Daniel M. Roy, Jonathan H. Huggins
Tuning of stochastic gradient algorithms (SGAs) for optimization and sampling is often based on heuristics and trial-and-error rather than generalizable theory.
no code implementations • 29 Jun 2022 • Mahdi Haghifam, Shay Moran, Daniel M. Roy, Gintare Karolina Dziugaite
These leave-one-out variants of the conditional mutual information (CMI) of an algorithm (Steinke and Zakynthinou, 2020) are also seen to control the mean generalization error of learning algorithms with bounded loss functions.
no code implementations • 6 Jun 2022 • Mufan Bill Li, Mihai Nica, Daniel M. Roy
In this work, we study the distribution of this random matrix.
1 code implementation • 10 Feb 2022 • Blair Bilodeau, Linbo Wang, Daniel M. Roy
In this work, we formalize and study this notion of adaptivity, and provide a novel algorithm that simultaneously achieves (a) optimal regret when a d-separator is observed, improving on classical minimax algorithms, and (b) significantly smaller regret than recent causal bandit algorithms when the observed variables are not a d-separator.
no code implementations • NeurIPS 2021 • Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Daniel M. Roy
We further show that an inherent limitation of proper learning of VC classes contradicts the existence of a proper learner with constant CMI, and it implies a negative resolution to an open problem of Steinke and Zakynthinou (2020).
1 code implementation • NeurIPS 2021 • Jeffrey Negrea, Blair Bilodeau, Nicolò Campolongo, Francesco Orabona, Daniel M. Roy
Quantile (and, more generally, KL) regret bounds, such as those achieved by NormalHedge (Chaudhuri, Freund, and Hsu 2009) and its variants, relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data.
no code implementations • NeurIPS 2021 • Mufan Bill Li, Mihai Nica, Daniel M. Roy
To provide a better approximation, we study ReLU ResNets in the infinite-depth-and-width limit, where both depth and width tend to infinity as their ratio, $d/n$, remains constant.
no code implementations • 28 Apr 2021 • Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy
As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training.
no code implementations • 1 Feb 2021 • Gergely Neu, Gintare Karolina Dziugaite, Mahdi Haghifam, Daniel M. Roy
The key factors our bounds depend on are the variance of the gradients (with respect to the data distribution) and the local smoothness of the objective function along the SGD path, and the sensitivity of the loss function to perturbations to the final output.
no code implementations • 14 Dec 2020 • Yiding Jiang, Pierre Foret, Scott Yak, Daniel M. Roy, Hossein Mobahi, Gintare Karolina Dziugaite, Samy Bengio, Suriya Gunasekar, Isabelle Guyon, Behnam Neyshabur
Understanding generalization in deep learning is arguably one of the most important questions in deep learning.
no code implementations • 5 Nov 2020 • Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Daniel M. Roy
We provide a negative resolution to a conjecture of Steinke and Zakynthinou (2020a), by showing that their bound on the conditional mutual information (CMI) of proper learners of Vapnik--Chervonenkis (VC) classes cannot be improved from $d \log n +2$ to $O(d)$, where $n$ is the number of i. i. d.
no code implementations • NeurIPS 2020 • Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya Ganguli
We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK.
no code implementations • 26 Oct 2020 • Gintare Karolina Dziugaite, Shai Ben-David, Daniel M. Roy
We then model the act of enforcing interpretability as that of performing empirical risk minimization over the set of interpretable hypotheses.
1 code implementation • NeurIPS 2020 • Gintare Karolina Dziugaite, Alexandre Drouin, Brady Neal, Nitarshan Rajkumar, Ethan Caballero, Linbo Wang, Ioannis Mitliagkas, Daniel M. Roy
A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk.
no code implementations • ICLR 2021 • Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin
Recent work has explored the possibility of pruning neural networks at initialization.
no code implementations • 13 Jul 2020 • Blair Bilodeau, Jeffrey Negrea, Daniel M. Roy
setting, when the unknown constraint set is restricted to be a singleton, and the unconstrained adversarial setting, when the constraint set is the set of all distributions.
no code implementations • 2 Jul 2020 • Blair Bilodeau, Dylan J. Foster, Daniel M. Roy
We consider the classical problem of sequential probability assignment under logarithmic loss while competing against an arbitrary, potentially nonparametric class of experts.
no code implementations • 19 Jun 2020 • Gintare Karolina Dziugaite, Kyle Hsu, Waseem Gharbieh, Gabriel Arpino, Daniel M. Roy
In this work, we show that the bound based on the oracle prior can be suboptimal: In some cases, a stronger bound is obtained by using a data-dependent oracle prior, i. e., a conditional expectation of the posterior, given a subset of the training data that is then excluded from the empirical risk term.
no code implementations • NeurIPS 2020 • Mahdi Haghifam, Jeffrey Negrea, Ashish Khisti, Daniel M. Roy, Gintare Karolina Dziugaite
Finally, we apply these bounds to the study of Langevin dynamics algorithm, showing that conditioning on the super sample allows us to exploit information in the optimization trajectory to obtain tighter bounds based on hypothesis tests.
2 code implementations • ICML 2020 • Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin
We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e. g., random data order and augmentation).
no code implementations • 9 Dec 2019 • Jeffrey Negrea, Gintare Karolina Dziugaite, Daniel M. Roy
At the same time, we bound the risk of $\hat h$ in terms of surrogates constructed by conditioning and denoising, respectively, and shown to belong to nonrandom classes with uniformly small generalization error.
1 code implementation • NeurIPS 2019 • Jeffrey Negrea, Mahdi Haghifam, Gintare Karolina Dziugaite, Ashish Khisti, Daniel M. Roy
In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and Veeravalli (2019).
no code implementations • 25 Sep 2019 • Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy
As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed on clusters to perform model fitting in parallel.
no code implementations • 25 Sep 2019 • Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin
We observe that these subnetworks match the accuracy of the full network only when two SGD runs for the same subnetwork are connected by linear paths with the no change in test error.
no code implementations • NeurIPS 2019 • Jun Yang, Shengyang Sun, Daniel M. Roy
The developments of Rademacher complexity and PAC-Bayesian theory have been largely independent.
no code implementations • 17 Aug 2019 • Creighton Heaukulani, Daniel M. Roy
We develop constructions for exchangeable sequences of point processes that are rendered conditionally-i. i. d.
1 code implementation • 16 Aug 2019 • Ali Ramezani-Kebrya, Fartash Faghri, Daniel M. Roy
As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed on clusters to perform model fitting in parallel.
3 code implementations • 5 Mar 2019 • Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin
With this change, it finds small subnetworks of deeper networks (e. g., 80% sparsity on Resnet-50) that can complete the training process to match the accuracy of the original network on more challenging tasks (e. g., ImageNet).
no code implementations • NeurIPS 2018 • Gintare Karolina Dziugaite, Daniel M. Roy
The Probably Approximately Correct (PAC) Bayes framework (McAllester, 1999) can incorporate knowledge about the learning algorithm and (data) distribution through the use of distribution-dependent priors, yielding tighter generalization bounds on data-dependent posteriors.
no code implementations • ICLR 2018 • Gintare Karolina Dziugaite, Daniel M. Roy
We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i. e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier.
no code implementations • ICML 2018 • Gintare Karolina Dziugaite, Daniel M. Roy
We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i. e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier.
1 code implementation • 6 Dec 2017 • Victor Veitch, Ekansh Sharma, Zacharie Naulet, Daniel M. Roy
A variety of machine learning tasks---e. g., matrix factorization, topic modelling, and feature allocation---can be viewed as learning the parameters of a probability distribution over bipartite graphs.
1 code implementation • 5 Dec 2017 • Zacharie Naulet, Ekansh Sharma, Victor Veitch, Daniel M. Roy
Graphex processes resolve some pathologies in traditional random graph models, notably, providing models that are both projective and allow sparsity.
Statistics Theory Statistics Theory Primary 62F10, secondary 60G55, 60G70
2 code implementations • 31 Mar 2017 • Gintare Karolina Dziugaite, Daniel M. Roy
One of the defining properties of deep learning is that models are chosen to have many more parameters than available training data.
no code implementations • 2 Aug 2016 • Gintare Karolina Dziugaite, Zoubin Ghahramani, Daniel M. Roy
For Fast-Gradient-Sign perturbations of small magnitude, we found that JPG compression often reverses the drop in classification accuracy to a large extent, but not always.
no code implementations • 7 Jul 2016 • Marco Battiston, Stefano Favaro, Daniel M. Roy, Yee Whye Teh
We characterize the class of exchangeable feature allocations assigning probability $V_{n, k}\prod_{l=1}^{k}W_{m_{l}}U_{n-m_{l}}$ to a feature allocation of $n$ individuals, displaying $k$ features with counts $(m_{1},\ldots, m_{k})$ for these features.
no code implementations • 16 Jun 2016 • Matej Balog, Balaji Lakshminarayanan, Zoubin Ghahramani, Daniel M. Roy, Yee Whye Teh
We introduce the Mondrian kernel, a fast random feature approximation to the Laplace kernel.
no code implementations • NeurIPS 2016 • Roger B. Grosse, Siddharth Ancha, Daniel M. Roy
Markov chain Monte Carlo (MCMC) is one of the main workhorses of probabilistic inference, but it is notoriously hard to measure the quality of approximate posterior samples.
no code implementations • 8 Dec 2015 • Creighton Heaukulani, Daniel M. Roy
We investigate a class of feature allocation models that generalize the Indian buffet process and are parameterized by Gibbs-type random measures.
2 code implementations • 19 Nov 2015 • Gintare Karolina Dziugaite, Daniel M. Roy
Here we consider replacing the inner product by an arbitrary function that we learn from the data at the same time as we learn the latent feature vectors.
Ranked #9 on
Recommendation Systems
on MovieLens 1M
1 code implementation • 11 Jun 2015 • Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh
Many real-world regression problems demand a measure of the uncertainty associated with each prediction.
no code implementations • 14 May 2015 • Gintare Karolina Dziugaite, Daniel M. Roy, Zoubin Ghahramani
We frame learning as an optimization minimizing a two-sample test statistic---informally speaking, a good generator network produces samples that cause a two-sample test to fail to reject the null hypothesis.
no code implementations • 16 Feb 2015 • Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh
Additive regression trees are flexible non-parametric models and popular off-the-shelf tools for real-world non-linear regression.
no code implementations • 31 Dec 2014 • Daniel M. Roy
We describe the combinatorial stochastic process underlying a sequence of conditionally independent Bernoulli processes with a shared beta process hazard measure.
2 code implementations • NeurIPS 2014 • Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh
Ensembles of randomized decision trees, usually referred to as random forests, are widely used for classification and regression tasks in machine learning and statistics.
no code implementations • 31 Dec 2013 • Creighton Heaukulani, Daniel M. Roy
sequences of Bernoulli processes with a common beta process base measure, in which case the combinatorial structure is described by the Indian buffet process.
no code implementations • 30 Dec 2013 • Peter Orbanz, Daniel M. Roy
The natural habitat of most Bayesian methods is data represented by exchangeable sequences of observations, for which de Finetti's theorem provides the theoretical foundation.
no code implementations • 3 Mar 2013 • Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh
Unlike classic decision tree learning algorithms like ID3, C4. 5 and CART, which work in a top-down manner, existing Bayesian algorithms produce an approximation to the posterior distribution by evolving a complete tree (or collection thereof) iteratively via local Monte Carlo modifications to the structure of the tree, e. g., using Markov chain Monte Carlo (MCMC).
no code implementations • 19 Dec 2012 • Cameron E. Freer, Daniel M. Roy, Joshua B. Tenenbaum
In the intervening years, the idea of cognition as computation has emerged as a fundamental tenet of Artificial Intelligence (AI) and cognitive science.
no code implementations • 13 Jun 2012 • Noah Goodman, Vikash Mansinghka, Daniel M. Roy, Keith Bonawitz, Joshua B. Tenenbaum
We introduce Church, a universal language for describing stochastic generative processes.
no code implementations • 17 May 2010 • Nathanael L. Ackerman, Cameron E. Freer, Daniel M. Roy
Specifically, we construct a pair of computable random variables in the unit interval such that the conditional distribution of the first variable given the second encodes the halting problem.
no code implementations • NeurIPS 2008 • Daniel M. Roy, Yee W. Teh
We describe a novel stochastic process that can be used to construct a multidimensional generalization of the stick-breaking process and which is related to the classic stick breaking process described by Sethuraman1994 in one dimension.