Search Results for author: Daniel M. Roy

Found 48 papers, 12 papers with code

Towards a Unified Information-Theoretic Framework for Generalization

no code implementations NeurIPS 2021 Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Daniel M. Roy

We further show that an inherent limitation of proper learning of VC classes contradicts the existence of a proper learner with constant CMI, and it implies a negative resolution to an open problem of Steinke and Zakynthinou (2020).

Generalization Bounds

Minimax Optimal Quantile and Semi-Adversarial Regret via Root-Logarithmic Regularizers

1 code implementation NeurIPS 2021 Jeffrey Negrea, Blair Bilodeau, Nicolò Campolongo, Francesco Orabona, Daniel M. Roy

Quantile (and, more generally, KL) regret bounds, such as those achieved by NormalHedge (Chaudhuri, Freund, and Hsu 2009) and its variants, relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data.

The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization

no code implementations NeurIPS 2021 Mufan Bill Li, Mihai Nica, Daniel M. Roy

To provide a better approximation, we study ReLU ResNets in the infinite-depth-and-width limit, where both depth and width tend to infinity as their ratio, $d/n$, remains constant.

Gaussian Processes

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

no code implementations28 Apr 2021 Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training.

Quantization

Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

no code implementations1 Feb 2021 Gergely Neu, Gintare Karolina Dziugaite, Mahdi Haghifam, Daniel M. Roy

The key factors our bounds depend on are the variance of the gradients (with respect to the data distribution) and the local smoothness of the objective function along the SGD path, and the sensitivity of the loss function to perturbations to the final output.

Generalization Bounds Stochastic Optimization

On the Information Complexity of Proper Learners for VC Classes in the Realizable Case

no code implementations5 Nov 2020 Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Daniel M. Roy

We provide a negative resolution to a conjecture of Steinke and Zakynthinou (2020a), by showing that their bound on the conditional mutual information (CMI) of proper learners of Vapnik--Chervonenkis (VC) classes cannot be improved from $d \log n +2$ to $O(d)$, where $n$ is the number of i. i. d.

Enforcing Interpretability and its Statistical Impacts: Trade-offs between Accuracy and Interpretability

no code implementations26 Oct 2020 Gintare Karolina Dziugaite, Shai Ben-David, Daniel M. Roy

We then model the act of enforcing interpretability as that of performing empirical risk minimization over the set of interpretable hypotheses.

Learning Theory

In Search of Robust Measures of Generalization

1 code implementation NeurIPS 2020 Gintare Karolina Dziugaite, Alexandre Drouin, Brady Neal, Nitarshan Rajkumar, Ethan Caballero, Linbo Wang, Ioannis Mitliagkas, Daniel M. Roy

A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk.

Generalization Bounds

Relaxing the I.I.D. Assumption: Adaptively Minimax Optimal Regret via Root-Entropic Regularization

no code implementations13 Jul 2020 Blair Bilodeau, Jeffrey Negrea, Daniel M. Roy

We show that Hedge with deterministic learning rates is suboptimal between these extremes, and present a new algorithm that adaptively achieves the minimax optimal rate of regret with respect to our relaxations of the i. i. d.

Tight Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance

no code implementations2 Jul 2020 Blair Bilodeau, Dylan J. Foster, Daniel M. Roy

We consider the classical problem of sequential probability assignment under logarithmic loss while competing against an arbitrary, potentially nonparametric class of experts.

On the role of data in PAC-Bayes bounds

no code implementations19 Jun 2020 Gintare Karolina Dziugaite, Kyle Hsu, Waseem Gharbieh, Gabriel Arpino, Daniel M. Roy

In this work, we show that the bound based on the oracle prior can be suboptimal: In some cases, a stronger bound is obtained by using a data-dependent oracle prior, i. e., a conditional expectation of the posterior, given a subset of the training data that is then excluded from the empirical risk term.

Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms

no code implementations NeurIPS 2020 Mahdi Haghifam, Jeffrey Negrea, Ashish Khisti, Daniel M. Roy, Gintare Karolina Dziugaite

Finally, we apply these bounds to the study of Langevin dynamics algorithm, showing that conditioning on the super sample allows us to exploit information in the optimization trajectory to obtain tighter bounds based on hypothesis tests.

Generalization Bounds

Linear Mode Connectivity and the Lottery Ticket Hypothesis

1 code implementation ICML 2020 Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin

We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e. g., random data order and augmentation).

In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictors

no code implementations9 Dec 2019 Jeffrey Negrea, Gintare Karolina Dziugaite, Daniel M. Roy

At the same time, we bound the risk of $\hat h$ in terms of surrogates constructed by conditioning and denoising, respectively, and shown to belong to nonrandom classes with uniformly small generalization error.

Denoising

Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates

1 code implementation NeurIPS 2019 Jeffrey Negrea, Mahdi Haghifam, Gintare Karolina Dziugaite, Ashish Khisti, Daniel M. Roy

In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and Veeravalli (2019).

Generalization Bounds

Mode Connectivity and Sparse Neural Networks

no code implementations25 Sep 2019 Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin

We observe that these subnetworks match the accuracy of the full network only when two SGD runs for the same subnetwork are connected by linear paths with the no change in test error.

Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

no code implementations25 Sep 2019 Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed on clusters to perform model fitting in parallel.

Quantization

Black-box constructions for exchangeable sequences of random multisets

no code implementations17 Aug 2019 Creighton Heaukulani, Daniel M. Roy

We develop constructions for exchangeable sequences of point processes that are rendered conditionally-i. i. d.

Point Processes

NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization

1 code implementation16 Aug 2019 Ali Ramezani-Kebrya, Fartash Faghri, Daniel M. Roy

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed on clusters to perform model fitting in parallel.

Quantization

Stabilizing the Lottery Ticket Hypothesis

3 code implementations5 Mar 2019 Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin

With this change, it finds small subnetworks of deeper networks (e. g., 80% sparsity on Resnet-50) that can complete the training process to match the accuracy of the original network on more challenging tasks (e. g., ImageNet).

Data-dependent PAC-Bayes priors via differential privacy

no code implementations NeurIPS 2018 Gintare Karolina Dziugaite, Daniel M. Roy

The Probably Approximately Correct (PAC) Bayes framework (McAllester, 1999) can incorporate knowledge about the learning algorithm and (data) distribution through the use of distribution-dependent priors, yielding tighter generalization bounds on data-dependent posteriors.

Generalization Bounds

Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy

no code implementations ICLR 2018 Gintare Karolina Dziugaite, Daniel M. Roy

We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i. e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier.

Generalization Bounds

Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors

no code implementations ICML 2018 Gintare Karolina Dziugaite, Daniel M. Roy

We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i. e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier.

Generalization Bounds

Exchangeable modelling of relational data: checking sparsity, train-test splitting, and sparse exchangeable Poisson matrix factorization

1 code implementation6 Dec 2017 Victor Veitch, Ekansh Sharma, Zacharie Naulet, Daniel M. Roy

A variety of machine learning tasks---e. g., matrix factorization, topic modelling, and feature allocation---can be viewed as learning the parameters of a probability distribution over bipartite graphs.

Variational Inference

Bootstrap estimators for the tail-index and for the count statistics of graphex processes

1 code implementation5 Dec 2017 Zacharie Naulet, Ekansh Sharma, Victor Veitch, Daniel M. Roy

Graphex processes resolve some pathologies in traditional random graph models, notably, providing models that are both projective and allow sparsity.

Statistics Theory Statistics Theory Primary 62F10, secondary 60G55, 60G70

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

1 code implementation31 Mar 2017 Gintare Karolina Dziugaite, Daniel M. Roy

One of the defining properties of deep learning is that models are chosen to have many more parameters than available training data.

Generalization Bounds

A study of the effect of JPG compression on adversarial images

no code implementations2 Aug 2016 Gintare Karolina Dziugaite, Zoubin Ghahramani, Daniel M. Roy

For Fast-Gradient-Sign perturbations of small magnitude, we found that JPG compression often reverses the drop in classification accuracy to a large extent, but not always.

Classification General Classification +1

A characterization of product-form exchangeable feature probability functions

no code implementations7 Jul 2016 Marco Battiston, Stefano Favaro, Daniel M. Roy, Yee Whye Teh

We characterize the class of exchangeable feature allocations assigning probability $V_{n, k}\prod_{l=1}^{k}W_{m_{l}}U_{n-m_{l}}$ to a feature allocation of $n$ individuals, displaying $k$ features with counts $(m_{1},\ldots, m_{k})$ for these features.

The Mondrian Kernel

no code implementations16 Jun 2016 Matej Balog, Balaji Lakshminarayanan, Zoubin Ghahramani, Daniel M. Roy, Yee Whye Teh

We introduce the Mondrian kernel, a fast random feature approximation to the Laplace kernel.

Measuring the reliability of MCMC inference with bidirectional Monte Carlo

no code implementations NeurIPS 2016 Roger B. Grosse, Siddharth Ancha, Daniel M. Roy

Markov chain Monte Carlo (MCMC) is one of the main workhorses of probabilistic inference, but it is notoriously hard to measure the quality of approximate posterior samples.

Probabilistic Programming

Gibbs-type Indian buffet processes

no code implementations8 Dec 2015 Creighton Heaukulani, Daniel M. Roy

We investigate a class of feature allocation models that generalize the Indian buffet process and are parameterized by Gibbs-type random measures.

Neural Network Matrix Factorization

2 code implementations19 Nov 2015 Gintare Karolina Dziugaite, Daniel M. Roy

Here we consider replacing the inner product by an arbitrary function that we learn from the data at the same time as we learn the latent feature vectors.

Collaborative Filtering

Mondrian Forests for Large-Scale Regression when Uncertainty Matters

1 code implementation11 Jun 2015 Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh

Many real-world regression problems demand a measure of the uncertainty associated with each prediction.

Gaussian Processes

Training generative neural networks via Maximum Mean Discrepancy optimization

no code implementations14 May 2015 Gintare Karolina Dziugaite, Daniel M. Roy, Zoubin Ghahramani

We frame learning as an optimization minimizing a two-sample test statistic---informally speaking, a good generator network produces samples that cause a two-sample test to fail to reject the null hypothesis.

Particle Gibbs for Bayesian Additive Regression Trees

no code implementations16 Feb 2015 Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh

Additive regression trees are flexible non-parametric models and popular off-the-shelf tools for real-world non-linear regression.

The continuum-of-urns scheme, generalized beta and Indian buffet processes, and hierarchies thereof

no code implementations31 Dec 2014 Daniel M. Roy

We describe the combinatorial stochastic process underlying a sequence of conditionally independent Bernoulli processes with a shared beta process hazard measure.

Mondrian Forests: Efficient Online Random Forests

2 code implementations NeurIPS 2014 Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh

Ensembles of randomized decision trees, usually referred to as random forests, are widely used for classification and regression tasks in machine learning and statistics.

The combinatorial structure of beta negative binomial processes

no code implementations31 Dec 2013 Creighton Heaukulani, Daniel M. Roy

sequences of Bernoulli processes with a common beta process base measure, in which case the combinatorial structure is described by the Indian buffet process.

Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures

no code implementations30 Dec 2013 Peter Orbanz, Daniel M. Roy

The natural habitat of most Bayesian methods is data represented by exchangeable sequences of observations, for which de Finetti's theorem provides the theoretical foundation.

Collaborative Filtering Link Prediction

Top-down particle filtering for Bayesian decision trees

no code implementations3 Mar 2013 Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh

Unlike classic decision tree learning algorithms like ID3, C4. 5 and CART, which work in a top-down manner, existing Bayesian algorithms produce an approximation to the posterior distribution by evolving a complete tree (or collection thereof) iteratively via local Monte Carlo modifications to the structure of the tree, e. g., using Markov chain Monte Carlo (MCMC).

Towards common-sense reasoning via conditional simulation: legacies of Turing in Artificial Intelligence

no code implementations19 Dec 2012 Cameron E. Freer, Daniel M. Roy, Joshua B. Tenenbaum

In the intervening years, the idea of cognition as computation has emerged as a fundamental tenet of Artificial Intelligence (AI) and cognitive science.

Common Sense Reasoning

Church: a language for generative models

no code implementations13 Jun 2012 Noah Goodman, Vikash Mansinghka, Daniel M. Roy, Keith Bonawitz, Joshua B. Tenenbaum

We introduce Church, a universal language for describing stochastic generative processes.

On the computability of conditional probability

no code implementations17 May 2010 Nathanael L. Ackerman, Cameron E. Freer, Daniel M. Roy

Specifically, we construct a pair of computable random variables in the unit interval such that the conditional distribution of the first variable given the second encodes the halting problem.

The Mondrian Process

no code implementations NeurIPS 2008 Daniel M. Roy, Yee W. Teh

We describe a novel stochastic process that can be used to construct a multidimensional generalization of the stick-breaking process and which is related to the classic stick breaking process described by Sethuraman1994 in one dimension.

Cannot find the paper you are looking for? You can Submit a new open access paper.