Search Results for author: Christopher De Sa

Found 44 papers, 15 papers with code

Equivariant Manifold Flows

no code implementations19 Jul 2021 Isay Katsman, Aaron Lou, Derek Lim, Qingxuan Jiang, Ser-Nam Lim, Christopher De Sa

Tractably modelling distributions over manifolds has long been an important goal in the natural sciences.

How Low Can We Go: Trading Memory for Error in Low-Precision Training

no code implementations17 Jun 2021 Chengrun Yang, Ziyang Wu, Jerry Chee, Christopher De Sa, Madeleine Udell

Low-precision arithmetic trains deep learning models using less energy, less memory and less time.


Variance Reduced Training with Stratified Sampling for Forecasting Models

no code implementations2 Mar 2021 Yucheng Lu, Youngsuk Park, Lifan Chen, Yuyang Wang, Christopher De Sa, Dean Foster

In large-scale time series forecasting, one often encounters the situation where the temporal patterns of time series, while drifting over time, differ from one another in the same dataset.

Time Series Time Series Forecasting

Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision

no code implementations26 Feb 2021 Johan Bjorck, Xiangyu Chen, Christopher De Sa, Carla P. Gomes, Kilian Q. Weinberger

Low-precision training has become a popular approach to reduce compute requirements, memory footprint, and energy consumption in supervised learning.

Continuous Control

Hyperparameter Optimization Is Deceiving Us, and How to Stop It

no code implementations5 Feb 2021 A. Feder Cooper, Yucheng Lu, Jessica Zosa Forde, Christopher De Sa

Recent empirical work shows that inconsistent results, based on choice of hyperparameter optimization (HPO) configuration, are a widespread problem in ML research.

Hyperparameter Optimization

Revisiting BFfloat16 Training

no code implementations1 Jan 2021 Pedram Zamirai, Jian Zhang, Christopher R Aberger, Christopher De Sa

We ask can we do pure 16-bit training which requires only 16-bit compute units, while still matching the model accuracy attained by 32-bit training.

Revisiting BFloat16 Training

no code implementations13 Oct 2020 Pedram Zamirai, Jian Zhang, Christopher R. Aberger, Christopher De Sa

State-of-the-art generic low-precision training algorithms use a mix of 16-bit and 32-bit precision, creating the folklore that 16-bit hardware compute units alone are not enough to maximize model accuracy.

Meta-Learning Divergences of Variational Inference

no code implementations6 Jul 2020 Ruqi Zhang, Yingzhen Li, Christopher De Sa, Sam Devlin, Cheng Zhang

Variational inference (VI) plays an essential role in approximate Bayesian inference due to its computational efficiency and broad applicability.

Bayesian Inference Few-Shot Learning +3

Understanding Accuracy-Efficiency Trade-Offs as a Means for Holding Distributed ML Systems Accountable

1 code implementation4 Jul 2020 A. Feder Cooper, Karen Levy, Christopher De Sa

Trade-offs between accuracy and efficiency are found in multiple non-computing domains, such as law and public health, which have developed rules and heuristics to guide how to balance the two in conditions of uncertainty.

Autonomous Vehicles Distributed Computing

Asymptotically Optimal Exact Minibatch Metropolis-Hastings

1 code implementation NeurIPS 2020 Ruqi Zhang, A. Feder Cooper, Christopher De Sa

Metropolis-Hastings (MH) is a commonly-used MCMC algorithm, but it can be intractable on large datasets due to requiring computations over the whole dataset.

Neural Manifold Ordinary Differential Equations

2 code implementations NeurIPS 2020 Aaron Lou, Derek Lim, Isay Katsman, Leo Huang, Qingxuan Jiang, Ser-Nam Lim, Christopher De Sa

To better conform to data geometry, recent deep generative modelling techniques adapt Euclidean constructions to non-Euclidean spaces.

Density Estimation

Optimal Complexity in Decentralized Training

no code implementations15 Jun 2020 Yucheng Lu, Christopher De Sa

Decentralization is a promising method of scaling up parallel machine learning systems.

Image Classification

MixML: A Unified Analysis of Weakly Consistent Parallel Learning

no code implementations14 May 2020 Yucheng Lu, Jack Nash, Christopher De Sa

Parallelism is a ubiquitous method for accelerating machine learning algorithms.

Optimizing JPEG Quantization for Classification Networks

no code implementations5 Mar 2020 Zhijing Li, Christopher De Sa, Adrian Sampson

While a long history of work has sought better Q-tables, existing work either seeks to minimize image distortion or to optimize for models of the human visual system.

Classification General Classification +2

Differentiating through the Fréchet Mean

2 code implementations ICML 2020 Aaron Lou, Isay Katsman, Qingxuan Jiang, Serge Belongie, Ser-Nam Lim, Christopher De Sa

Recent advances in deep representation learning on Riemannian manifolds extend classical deep learning operations to better capture the geometry of the manifold.

Representation Learning

AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC

1 code implementation29 Feb 2020 Ruqi Zhang, A. Feder Cooper, Christopher De Sa

This improves performance, but introduces bias that can cause SGHMC to converge to the wrong distribution.

Moniqua: Modulo Quantized Communication in Decentralized SGD

no code implementations ICML 2020 Yucheng Lu, Christopher De Sa

Running Stochastic Gradient Descent (SGD) in a decentralized fashion has shown promising results.


Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees

1 code implementation NeurIPS 2019 Ruqi Zhang, Christopher De Sa

Gibbs sampling is a Markov chain Monte Carlo method that is often used for learning and inference on graphical models.

OverQ: Opportunistic Outlier Quantization for Neural Network Accelerators

no code implementations13 Oct 2019 Ritchie Zhao, Jordan Dotzel, Zhanqiu Hu, Preslav Ivanov, Christopher De Sa, Zhiru Zhang

Specialized hardware for handling activation outliers can enable low-precision neural networks, but at the cost of nontrivial area overhead.


PipeMare: Asynchronous Pipeline Parallel DNN Training

no code implementations9 Oct 2019 Bowen Yang, Jian Zhang, Jonathan Li, Christopher Ré, Christopher R. Aberger, Christopher De Sa

Pipeline parallelism (PP) when training neural networks enables larger models to be partitioned spatially, leading to both lower network communication and overall higher hardware utilization.

QPyTorch: A Low-Precision Arithmetic Simulation Framework

2 code implementations9 Oct 2019 Tianyi Zhang, Zhiqiu Lin, Guandao Yang, Christopher De Sa

Low-precision training reduces computational cost and produces efficient models.


Dimension-Free Bounds for Low-Precision Training

no code implementations ICLR 2019 Zheng Li, Christopher De Sa

Low-precision training is a promising way of decreasing the time and energy cost of training machine learning models.


SWALP : Stochastic Weight Averaging in Low-Precision Training

2 code implementations26 Apr 2019 Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew Gordon Wilson, Christopher De Sa

Low precision operations can provide scalability, memory savings, portability, and energy efficiency.

Distributed Learning with Sublinear Communication

no code implementations28 Feb 2019 Jayadev Acharya, Christopher De Sa, Dylan J. Foster, Karthik Sridharan

In distributed statistical learning, $N$ samples are split across $m$ machines and a learner wishes to use minimal communication to learn as well as if the examples were on a single machine.


Improving Neural Network Quantization without Retraining using Outlier Channel Splitting

3 code implementations28 Jan 2019 Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang

The majority of existing literature focuses on training quantized DNNs, while this work examines the less-studied topic of quantizing a floating-point model without (re)training.

Language Modelling Neural Network Compression +1

Building Efficient Deep Neural Networks with Unitary Group Convolutions

no code implementations CVPR 2019 Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang

UGConvs generalize two disparate ideas in CNN architecture, channel shuffling (i. e. ShuffleNet) and block-circulant networks (i. e. CirCNN), and provide unifying insights that lead to a deeper understanding of each technique.

Minibatch Gibbs Sampling on Large Graphical Models

no code implementations ICML 2018 Christopher De Sa, Vincent Chen, Wing Wong

Gibbs sampling is the de facto Markov chain Monte Carlo method used for inference and learning on large scale graphical models.

Channel Gating Neural Networks

1 code implementation NeurIPS 2019 Weizhe Hua, Yuan Zhou, Christopher De Sa, Zhiru Zhang, G. Edward Suh

Combining our method with knowledge distillation reduces the compute cost of ResNet-18 by 2. 6$\times$ without accuracy drop on ImageNet.

Knowledge Distillation Network Pruning

Representation Tradeoffs for Hyperbolic Embeddings

1 code implementation ICML 2018 Christopher De Sa, Albert Gu, Christopher Ré, Frederic Sala

Given a tree, we give a combinatorial construction that embeds the tree in hyperbolic space with arbitrarily low distortion without using optimization.

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

no code implementations23 Mar 2018 Dan Alistarh, Christopher De Sa, Nikola Konstantinov

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks.

A Kernel Theory of Modern Data Augmentation

no code implementations16 Mar 2018 Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, Christopher Ré

Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines.

Data Augmentation

High-Accuracy Low-Precision Training

1 code implementation9 Mar 2018 Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R. Aberger, Kunle Olukotun, Christopher Ré

Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it.


Gaussian Quadrature for Kernel Features

no code implementations NeurIPS 2017 Tri Dao, Christopher De Sa, Christopher Ré

We show that deterministic feature maps can be constructed, for any $\gamma > 0$, to achieve error $\epsilon$ with $O(e^{e^\gamma} + \epsilon^{-1/\gamma})$ samples as $\epsilon$ goes to 0.

Speech Recognition

Accelerated Stochastic Power Iteration

2 code implementations10 Jul 2017 Christopher De Sa, Bryan He, Ioannis Mitliagkas, Christopher Ré, Peng Xu

We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity.

Dimensionality Reduction

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

no code implementations25 Oct 2016 Paroma Varma, Bryan He, Dan Iter, Peng Xu, Rose Yu, Christopher De Sa, Christopher Ré

Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to model the behavior of these sources over the entire training set.

Relation Extraction

Parallel SGD: When does averaging help?

no code implementations23 Jun 2016 Jian Zhang, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré

Consider a number of workers running SGD independently on the same pool of data and averaging the models every once in a while -- a common but not well understood practice.

Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much

no code implementations NeurIPS 2016 Bryan He, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré

Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions.

Data Programming: Creating Large Training Sets, Quickly

3 code implementations NeurIPS 2016 Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré

Additionally, in initial user studies we observed that data programming may be an easier way for non-experts to create machine learning models when training data is limited or unavailable.

Slot Filling

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling

no code implementations24 Feb 2016 Christopher De Sa, Kunle Olukotun, Christopher Ré

Gibbs sampling is a Markov chain Monte Carlo technique commonly used for estimating marginal distributions.

Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width

no code implementations NeurIPS 2015 Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

Gibbs sampling on factor graphs is a widely used inference technique, which often produces good empirical results.

Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms

no code implementations22 Jun 2015 Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

with relaxed assumptions on the sparsity of the problem; (2) we analyze asynchronous SGD algorithms for non-convex matrix problems including matrix completion; and (3) we design and analyze an asynchronous SGD algorithm, called Buckwild!, that uses lower-precision arithmetic.

Matrix Completion

Incremental Knowledge Base Construction Using DeepDive

no code implementations3 Feb 2015 Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, Christopher Ré

Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration.

Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems

no code implementations5 Nov 2014 Christopher De Sa, Kunle Olukotun, Christopher Ré

Stochastic gradient descent (SGD) on a low-rank factorization is commonly employed to speed up matrix problems including matrix completion, subspace tracking, and SDP relaxation.

Matrix Completion

Cannot find the paper you are looking for? You can Submit a new open access paper.