Search Results for author: Christopher De Sa

Found 66 papers, 32 papers with code

‘Tecnologica cosa’: Modeling Storyteller Personalities in Boccaccio’s ‘Decameron’

no code implementations EMNLP (LaTeCHCLfL, CLFL, LaTeCH) 2021 A. Cooper, Maria Antoniak, Christopher De Sa, Marilyn Migiel, David Mimno

We explore Boccaccio’s Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian.

QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks

1 code implementation6 Feb 2024 Albert Tseng, Jerry Chee, Qingyao Sun, Volodymyr Kuleshov, Christopher De Sa

Second, QuIP# uses vector quantization techniques to take advantage of the ball-shaped sub-Gaussian distribution that incoherent weights possess: specifically, we introduce a set of hardware-efficient codebooks based on the highly symmetric $E_8$ lattice, which achieves the optimal 8-dimension unit ball packing.

Quantization

Inference for Probabilistic Dependency Graphs

1 code implementation9 Nov 2023 Oliver E. Richardson, Joseph Y. Halpern, Christopher De Sa

Probabilistic dependency graphs (PDGs) are a flexible class of probabilistic graphical models, subsuming Bayesian Networks and Factor Graphs.

ModuLoRA: Finetuning 2-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

2 code implementations28 Sep 2023 Junjie Yin, Jiahao Dong, Yingheng Wang, Christopher De Sa, Volodymyr Kuleshov

We propose a memory-efficient finetuning algorithm for large language models (LLMs) that supports finetuning LLMs with 65B parameters in 2/3/4-bit precision on as little as one 24GB GPU.

Instruction Following Natural Language Inference +3

InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models

no code implementations14 Jun 2023 Yingheng Wang, Yair Schiff, Aaron Gokaslan, Weishen Pan, Fei Wang, Christopher De Sa, Volodymyr Kuleshov

While diffusion models excel at generating high-quality samples, their latent variables typically lack semantic meaning and are not suitable for representation learning.

Representation Learning

Coneheads: Hierarchy Aware Attention

1 code implementation NeurIPS 2023 Albert Tseng, Tao Yu, Toni J. B. Liu, Christopher De Sa

These networks rely heavily on the dot product attention operator, which computes the similarity between two points by taking their inner product.

Shadow Cones: Unveiling Partial Orders in Hyperbolic Space

1 code implementation24 May 2023 Tao Yu, Toni J. B. Liu, Albert Tseng, Christopher De Sa

Our findings indicate that shadow cones offer an innovative, general approach to geometrically encode partial orders, enabling better representation and analysis of datasets with hierarchical structures.

STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition

no code implementations2 Feb 2023 Yucheng Lu, Shivani Agrawal, Suvinay Subramanian, Oleg Rybakov, Christopher De Sa, Amir Yazdanbakhsh

Recent innovations on hardware (e. g. Nvidia A100) have motivated learning N:M structured sparsity masks from scratch for fast model inference.

Machine Translation

Coordinating Distributed Example Orders for Provably Accelerated Training

1 code implementation NeurIPS 2023 A. Feder Cooper, Wentao Guo, Khiem Pham, Tiancheng Yuan, Charlie F. Ruan, Yucheng Lu, Christopher De Sa

Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR).

MCTensor: A High-Precision Deep Learning Library with Multi-Component Floating-Point

1 code implementation18 Jul 2022 Tao Yu, Wentao Guo, Jianan Canal Li, Tiancheng Yuan, Christopher De Sa

In this paper, we introduce MCTensor, a library based on PyTorch for providing general-purpose and high-precision arithmetic for DL training.

Non-Determinism and the Lawlessness of Machine Learning Code

no code implementations23 Jun 2022 A. Feder Cooper, Jonathan Frankle, Christopher De Sa

In this paper, we clarify the overlap and differences between these two concepts, and show that the effects of non-determinism, and consequently its implications for the law, become clearer from the perspective of reasoning about ML outputs as distributions over possible outcomes.

Legal Reasoning

Low-Precision Stochastic Gradient Langevin Dynamics

1 code implementation20 Jun 2022 Ruqi Zhang, Andrew Gordon Wilson, Christopher De Sa

While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored.

Quantization

GraB: Finding Provably Better Data Permutations than Random Reshuffling

2 code implementations22 May 2022 Yucheng Lu, Wentao Guo, Christopher De Sa

To reduce the memory overhead, we leverage discrepancy minimization theory to propose an online Gradient Balancing algorithm (GraB) that enjoys the same rate as herding, while reducing the memory usage from $O(nd)$ to just $O(d)$ and computation from $O(n^2)$ to $O(n)$, where $d$ denotes the model dimension.

Structured Pruning is All You Need for Pruning CNNs at Initialization

no code implementations4 Mar 2022 Yaohui Cai, Weizhe Hua, Hongzheng Chen, G. Edward Suh, Christopher De Sa, Zhiru Zhang

In addition, since PreCropping compresses CNNs at initialization, the computational and memory costs of CNNs are reduced for both training and inference on commodity hardware.

Model Compression

Random Laplacian Features for Learning with Hyperbolic Space

1 code implementation14 Feb 2022 Tao Yu, Christopher De Sa

Due to its geometric properties, hyperbolic space can support high-fidelity embeddings of tree- and graph-structured data, upon which various hyperbolic networks have been developed.

Graph Learning Node Classification +2

Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam

1 code implementation12 Feb 2022 Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He

1-bit gradient compression and local steps are two representative techniques that enable drastic communication reduction in distributed SGD.

Open-Ended Question Answering

Understanding Hyperdimensional Computing for Parallel Single-Pass Learning

1 code implementation10 Feb 2022 Tao Yu, Yichi Zhang, Zhiru Zhang, Christopher De Sa

Using representation theory, we characterize which similarity matrices can be "expressed" by finite group VSA hypervectors, and we show how these VSAs can be constructed.

A General Analysis of Example-Selection for Stochastic Gradient Descent

no code implementations ICLR 2022 Yucheng Lu, Si Yi Meng, Christopher De Sa

In this paper, we develop a broad condition on the sequence of examples used by SGD that is sufficient to prove tight convergence rates in both strongly convex and non-convex settings.

Data Augmentation

Tecnologica cosa: Modeling Storyteller Personalities in Boccaccio's Decameron

no code implementations22 Sep 2021 A. Feder Cooper, Maria Antoniak, Christopher De Sa, Marilyn Migiel, David Mimno

We explore Boccaccio's Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian.

Model Preserving Compression for Neural Networks

1 code implementation30 Jul 2021 Jerry Chee, Megan Renz, Anil Damle, Christopher De Sa

After training complex deep learning models, a common task is to compress the model to reduce compute and storage demands.

Network Pruning

Equivariant Manifold Flows

1 code implementation NeurIPS 2021 Isay Katsman, Aaron Lou, Derek Lim, Qingxuan Jiang, Ser-Nam Lim, Christopher De Sa

Tractably modelling distributions over manifolds has long been an important goal in the natural sciences.

Variance Reduced Training with Stratified Sampling for Forecasting Models

no code implementations2 Mar 2021 Yucheng Lu, Youngsuk Park, Lifan Chen, Yuyang Wang, Christopher De Sa, Dean Foster

In large-scale time series forecasting, one often encounters the situation where the temporal patterns of time series, while drifting over time, differ from one another in the same dataset.

Time Series Time Series Forecasting

Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision

no code implementations26 Feb 2021 Johan Bjorck, Xiangyu Chen, Christopher De Sa, Carla P. Gomes, Kilian Q. Weinberger

Low-precision training has become a popular approach to reduce compute requirements, memory footprint, and energy consumption in supervised learning.

Continuous Control reinforcement-learning +1

Hyperparameter Optimization Is Deceiving Us, and How to Stop It

1 code implementation NeurIPS 2021 A. Feder Cooper, Yucheng Lu, Jessica Zosa Forde, Christopher De Sa

Recent empirical work shows that inconsistent results based on choice of hyperparameter optimization (HPO) configuration are a widespread problem in ML research.

Hyperparameter Optimization

Revisiting BFfloat16 Training

no code implementations1 Jan 2021 Pedram Zamirai, Jian Zhang, Christopher R Aberger, Christopher De Sa

We ask can we do pure 16-bit training which requires only 16-bit compute units, while still matching the model accuracy attained by 32-bit training.

Revisiting BFloat16 Training

no code implementations13 Oct 2020 Pedram Zamirai, Jian Zhang, Christopher R. Aberger, Christopher De Sa

State-of-the-art generic low-precision training algorithms use a mix of 16-bit and 32-bit precision, creating the folklore that 16-bit hardware compute units alone are not enough to maximize model accuracy.

Meta-Learning Divergences of Variational Inference

no code implementations6 Jul 2020 Ruqi Zhang, Yingzhen Li, Christopher De Sa, Sam Devlin, Cheng Zhang

Variational inference (VI) plays an essential role in approximate Bayesian inference due to its computational efficiency and broad applicability.

Bayesian Inference Computational Efficiency +4

Accuracy-Efficiency Trade-Offs and Accountability in Distributed ML Systems

1 code implementation4 Jul 2020 A. Feder Cooper, Karen Levy, Christopher De Sa

Trade-offs between accuracy and efficiency pervade law, public health, and other non-computing domains, which have developed policies to guide how to balance the two in conditions of uncertainty.

Autonomous Vehicles Distributed Computing

Asymptotically Optimal Exact Minibatch Metropolis-Hastings

1 code implementation NeurIPS 2020 Ruqi Zhang, A. Feder Cooper, Christopher De Sa

Metropolis-Hastings (MH) is a commonly-used MCMC algorithm, but it can be intractable on large datasets due to requiring computations over the whole dataset.

regression

Neural Manifold Ordinary Differential Equations

3 code implementations NeurIPS 2020 Aaron Lou, Derek Lim, Isay Katsman, Leo Huang, Qingxuan Jiang, Ser-Nam Lim, Christopher De Sa

To better conform to data geometry, recent deep generative modelling techniques adapt Euclidean constructions to non-Euclidean spaces.

Density Estimation

Optimal Complexity in Decentralized Training

no code implementations15 Jun 2020 Yucheng Lu, Christopher De Sa

Decentralization is a promising method of scaling up parallel machine learning systems.

Image Classification

Optimizing JPEG Quantization for Classification Networks

no code implementations5 Mar 2020 Zhijing Li, Christopher De Sa, Adrian Sampson

While a long history of work has sought better Q-tables, existing work either seeks to minimize image distortion or to optimize for models of the human visual system.

Bayesian Optimization Classification +3

AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC

1 code implementation29 Feb 2020 Ruqi Zhang, A. Feder Cooper, Christopher De Sa

This improves performance, but introduces bias that can cause SGHMC to converge to the wrong distribution.

Differentiating through the Fréchet Mean

2 code implementations ICML 2020 Aaron Lou, Isay Katsman, Qingxuan Jiang, Serge Belongie, Ser-Nam Lim, Christopher De Sa

Recent advances in deep representation learning on Riemannian manifolds extend classical deep learning operations to better capture the geometry of the manifold.

Representation Learning

Moniqua: Modulo Quantized Communication in Decentralized SGD

no code implementations ICML 2020 Yucheng Lu, Christopher De Sa

Running Stochastic Gradient Descent (SGD) in a decentralized fashion has shown promising results.

Quantization

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees

1 code implementation NeurIPS 2019 Ruqi Zhang, Christopher De Sa

Gibbs sampling is a Markov chain Monte Carlo method that is often used for learning and inference on graphical models.

OverQ: Opportunistic Outlier Quantization for Neural Network Accelerators

no code implementations13 Oct 2019 Ritchie Zhao, Jordan Dotzel, Zhanqiu Hu, Preslav Ivanov, Christopher De Sa, Zhiru Zhang

Specialized hardware for handling activation outliers can enable low-precision neural networks, but at the cost of nontrivial area overhead.

Quantization

PipeMare: Asynchronous Pipeline Parallel DNN Training

no code implementations9 Oct 2019 Bowen Yang, Jian Zhang, Jonathan Li, Christopher Ré, Christopher R. Aberger, Christopher De Sa

Pipeline parallelism (PP) when training neural networks enables larger models to be partitioned spatially, leading to both lower network communication and overall higher hardware utilization.

QPyTorch: A Low-Precision Arithmetic Simulation Framework

2 code implementations9 Oct 2019 Tianyi Zhang, Zhiqiu Lin, Guandao Yang, Christopher De Sa

Low-precision training reduces computational cost and produces efficient models.

Quantization

Dimension-Free Bounds for Low-Precision Training

no code implementations ICLR 2019 Zheng Li, Christopher De Sa

Low-precision training is a promising way of decreasing the time and energy cost of training machine learning models.

Quantization

SWALP : Stochastic Weight Averaging in Low-Precision Training

3 code implementations26 Apr 2019 Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew Gordon Wilson, Christopher De Sa

Low precision operations can provide scalability, memory savings, portability, and energy efficiency.

Distributed Learning with Sublinear Communication

no code implementations28 Feb 2019 Jayadev Acharya, Christopher De Sa, Dylan J. Foster, Karthik Sridharan

In distributed statistical learning, $N$ samples are split across $m$ machines and a learner wishes to use minimal communication to learn as well as if the examples were on a single machine.

Quantization

Improving Neural Network Quantization without Retraining using Outlier Channel Splitting

3 code implementations28 Jan 2019 Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang

The majority of existing literature focuses on training quantized DNNs, while this work examines the less-studied topic of quantizing a floating-point model without (re)training.

Language Modelling Neural Network Compression +1

Building Efficient Deep Neural Networks with Unitary Group Convolutions

no code implementations CVPR 2019 Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang

UGConvs generalize two disparate ideas in CNN architecture, channel shuffling (i. e. ShuffleNet) and block-circulant networks (i. e. CirCNN), and provide unifying insights that lead to a deeper understanding of each technique.

Minibatch Gibbs Sampling on Large Graphical Models

no code implementations ICML 2018 Christopher De Sa, Vincent Chen, Wing Wong

Gibbs sampling is the de facto Markov chain Monte Carlo method used for inference and learning on large scale graphical models.

Channel Gating Neural Networks

1 code implementation NeurIPS 2019 Weizhe Hua, Yuan Zhou, Christopher De Sa, Zhiru Zhang, G. Edward Suh

Combining our method with knowledge distillation reduces the compute cost of ResNet-18 by 2. 6$\times$ without accuracy drop on ImageNet.

Knowledge Distillation Network Pruning

Representation Tradeoffs for Hyperbolic Embeddings

3 code implementations ICML 2018 Christopher De Sa, Albert Gu, Christopher Ré, Frederic Sala

Given a tree, we give a combinatorial construction that embeds the tree in hyperbolic space with arbitrarily low distortion without using optimization.

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

no code implementations23 Mar 2018 Dan Alistarh, Christopher De Sa, Nikola Konstantinov

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks.

BIG-bench Machine Learning

A Kernel Theory of Modern Data Augmentation

no code implementations16 Mar 2018 Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, Christopher Ré

Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines.

BIG-bench Machine Learning Data Augmentation

High-Accuracy Low-Precision Training

1 code implementation9 Mar 2018 Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R. Aberger, Kunle Olukotun, Christopher Ré

Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it.

Quantization Vocal Bursts Intensity Prediction

Gaussian Quadrature for Kernel Features

no code implementations NeurIPS 2017 Tri Dao, Christopher De Sa, Christopher Ré

We show that deterministic feature maps can be constructed, for any $\gamma > 0$, to achieve error $\epsilon$ with $O(e^{e^\gamma} + \epsilon^{-1/\gamma})$ samples as $\epsilon$ goes to 0.

speech-recognition Speech Recognition

Accelerated Stochastic Power Iteration

2 code implementations10 Jul 2017 Christopher De Sa, Bryan He, Ioannis Mitliagkas, Christopher Ré, Peng Xu

We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity.

Dimensionality Reduction

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

no code implementations25 Oct 2016 Paroma Varma, Bryan He, Dan Iter, Peng Xu, Rose Yu, Christopher De Sa, Christopher Ré

Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to model the behavior of these sources over the entire training set.

Relation Extraction

Parallel SGD: When does averaging help?

no code implementations23 Jun 2016 Jian Zhang, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré

Consider a number of workers running SGD independently on the same pool of data and averaging the models every once in a while -- a common but not well understood practice.

Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much

no code implementations NeurIPS 2016 Bryan He, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré

Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions.

Data Programming: Creating Large Training Sets, Quickly

4 code implementations NeurIPS 2016 Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré

Additionally, in initial user studies we observed that data programming may be an easier way for non-experts to create machine learning models when training data is limited or unavailable.

BIG-bench Machine Learning Slot Filling

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling

no code implementations24 Feb 2016 Christopher De Sa, Kunle Olukotun, Christopher Ré

Gibbs sampling is a Markov chain Monte Carlo technique commonly used for estimating marginal distributions.

Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width

no code implementations NeurIPS 2015 Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

Gibbs sampling on factor graphs is a widely used inference technique, which often produces good empirical results.

Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms

no code implementations22 Jun 2015 Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

with relaxed assumptions on the sparsity of the problem; (2) we analyze asynchronous SGD algorithms for non-convex matrix problems including matrix completion; and (3) we design and analyze an asynchronous SGD algorithm, called Buckwild!, that uses lower-precision arithmetic.

Matrix Completion

Incremental Knowledge Base Construction Using DeepDive

no code implementations3 Feb 2015 Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, Christopher Ré

Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration.

Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems

no code implementations5 Nov 2014 Christopher De Sa, Kunle Olukotun, Christopher Ré

Stochastic gradient descent (SGD) on a low-rank factorization is commonly employed to speed up matrix problems including matrix completion, subspace tracking, and SDP relaxation.

Matrix Completion

Cannot find the paper you are looking for? You can Submit a new open access paper.