no code implementations • EMNLP (LaTeCHCLfL, CLFL, LaTeCH) 2021 • A. Cooper, Maria Antoniak, Christopher De Sa, Marilyn Migiel, David Mimno
We explore Boccaccio’s Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian.
no code implementations • 9 Dec 2024 • A. Feder Cooper, Christopher A. Choquette-Choo, Miranda Bogen, Matthew Jagielski, Katja Filippova, Ken Ziyu Liu, Alexandra Chouldechova, Jamie Hayes, Yangsibo Huang, Niloofar Mireshghallah, Ilia Shumailov, Eleni Triantafillou, Peter Kairouz, Nicole Mitchell, Percy Liang, Daniel E. Ho, Yejin Choi, Sanmi Koyejo, Fernando Delgado, James Grimmelmann, Vitaly Shmatikov, Christopher De Sa, Solon Barocas, Amy Cyphert, Mark Lemley, danah boyd, Jennifer Wortman Vaughan, Miles Brundage, David Bau, Seth Neel, Abigail Z. Jacobs, Andreas Terzis, Hanna Wallach, Nicolas Papernot, Katherine Lee
We articulate fundamental mismatches between technical methods for machine unlearning in Generative AI, and documented aspirations for broader impact that these methods could have for law and policy.
1 code implementation • 3 Oct 2024 • Andres Potapczynski, Shikai Qiu, Marc Finzi, Christopher Ferri, Zixi Chen, Micah Goldblum, Bayan Bruss, Christopher De Sa, Andrew Gordon Wilson
To analyze the framework, we develop a taxonomy of all such operators based on their computational and algebraic properties and show that differences in the compute-optimal scaling laws are mostly governed by a small number of variables that we introduce.
1 code implementation • 17 Jun 2024 • Albert Tseng, Qingyao Sun, David Hou, Christopher De Sa
Here, we introduce QTIP, which instead uses trellis coded quantization (TCQ) to achieve ultra-high-dimensional quantization.
no code implementations • 7 Jun 2024 • Si Yi Meng, Antonio Orvieto, Daniel Yiming Cao, Christopher De Sa
In one dimension, we show that a step size less than $1/\lambda$ suffices for global convergence.
no code implementations • 5 Jun 2024 • Wentao Guo, Jikai Long, Yimeng Zeng, Zirui Liu, Xinyu Yang, Yide Ran, Jacob R. Gardner, Osbert Bastani, Christopher De Sa, Xiaodong Yu, Beidi Chen, Zhaozhuo Xu
Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models using only forward passes.
2 code implementations • 6 Feb 2024 • Albert Tseng, Jerry Chee, Qingyao Sun, Volodymyr Kuleshov, Christopher De Sa
Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing their weights to low-precision.
1 code implementation • 9 Nov 2023 • Oliver E. Richardson, Joseph Y. Halpern, Christopher De Sa
Probabilistic dependency graphs (PDGs) are a flexible class of probabilistic graphical models, subsuming Bayesian Networks and Factor Graphs.
3 code implementations • 28 Sep 2023 • Junjie Yin, Jiahao Dong, Yingheng Wang, Christopher De Sa, Volodymyr Kuleshov
We propose a memory-efficient finetuning algorithm for large language models (LLMs) that supports finetuning LLMs with 65B parameters in 2/3/4-bit precision on as little as one 24GB GPU.
1 code implementation • NeurIPS 2023 • Jerry Chee, Yaohui Cai, Volodymyr Kuleshov, Christopher De Sa
This work studies post-training parameter quantization in large language models (LLMs).
no code implementations • 14 Jun 2023 • Yingheng Wang, Yair Schiff, Aaron Gokaslan, Weishen Pan, Fei Wang, Christopher De Sa, Volodymyr Kuleshov
While diffusion models excel at generating high-quality samples, their latent variables typically lack semantic meaning and are not suitable for representation learning.
3 code implementations • NeurIPS 2023 • Albert Tseng, Tao Yu, Toni J. B. Liu, Christopher De Sa
These networks rely heavily on the dot product attention operator, which computes the similarity between two points by taking their inner product.
1 code implementation • 24 May 2023 • Tao Yu, Toni J. B. Liu, Albert Tseng, Christopher De Sa
Specifically, we model partial orders as subset relations between shadows formed by a light source and opaque objects in hyperbolic space.
1 code implementation • NeurIPS 2023 • A. Feder Cooper, Wentao Guo, Khiem Pham, Tiancheng Yuan, Charlie F. Ruan, Yucheng Lu, Christopher De Sa
Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR).
1 code implementation • 2 Feb 2023 • Yucheng Lu, Shivani Agrawal, Suvinay Subramanian, Oleg Rybakov, Christopher De Sa, Amir Yazdanbakhsh
Recent innovations on hardware (e. g. Nvidia A100) have motivated learning N:M structured sparsity masks from scratch for fast model inference.
1 code implementation • 27 Jan 2023 • A. Feder Cooper, Katherine Lee, Madiha Zahrah Choksi, Solon Barocas, Christopher De Sa, James Grimmelmann, Jon Kleinberg, Siddhartha Sen, Baobao Zhang
Variance in predictions across different trained models is a significant, under-explored source of error in fair binary classification.
1 code implementation • 18 Jul 2022 • Tao Yu, Wentao Guo, Jianan Canal Li, Tiancheng Yuan, Christopher De Sa
In this paper, we introduce MCTensor, a library based on PyTorch for providing general-purpose and high-precision arithmetic for DL training.
no code implementations • 23 Jun 2022 • A. Feder Cooper, Jonathan Frankle, Christopher De Sa
In this paper, we clarify the overlap and differences between these two concepts, and show that the effects of non-determinism, and consequently its implications for the law, become clearer from the perspective of reasoning about ML outputs as distributions over possible outcomes.
1 code implementation • 20 Jun 2022 • Ruqi Zhang, Andrew Gordon Wilson, Christopher De Sa
While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored.
3 code implementations • 22 May 2022 • Yucheng Lu, Wentao Guo, Christopher De Sa
To reduce the memory overhead, we leverage discrepancy minimization theory to propose an online Gradient Balancing algorithm (GraB) that enjoys the same rate as herding, while reducing the memory usage from $O(nd)$ to just $O(d)$ and computation from $O(n^2)$ to $O(n)$, where $d$ denotes the model dimension.
no code implementations • 4 Mar 2022 • Yaohui Cai, Weizhe Hua, Hongzheng Chen, G. Edward Suh, Christopher De Sa, Zhiru Zhang
In addition, since PreCropping compresses CNNs at initialization, the computational and memory costs of CNNs are reduced for both training and inference on commodity hardware.
1 code implementation • 14 Feb 2022 • Tao Yu, Christopher De Sa
Due to its geometric properties, hyperbolic space can support high-fidelity embeddings of tree- and graph-structured data, upon which various hyperbolic networks have been developed.
1 code implementation • 12 Feb 2022 • Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He
1-bit gradient compression and local steps are two representative techniques that enable drastic communication reduction in distributed SGD.
1 code implementation • 10 Feb 2022 • Tao Yu, Yichi Zhang, Zhiru Zhang, Christopher De Sa
Using representation theory, we characterize which similarity matrices can be "expressed" by finite group VSA hypervectors, and we show how these VSAs can be constructed.
no code implementations • ICLR 2022 • Yucheng Lu, Si Yi Meng, Christopher De Sa
In this paper, we develop a broad condition on the sequence of examples used by SGD that is sufficient to prove tight convergence rates in both strongly convex and non-convex settings.
no code implementations • 22 Sep 2021 • A. Feder Cooper, Maria Antoniak, Christopher De Sa, Marilyn Migiel, David Mimno
We explore Boccaccio's Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian.
1 code implementation • 30 Jul 2021 • Jerry Chee, Megan Renz, Anil Damle, Christopher De Sa
After training complex deep learning models, a common task is to compress the model to reduce compute and storage demands.
1 code implementation • NeurIPS 2021 • Isay Katsman, Aaron Lou, Derek Lim, Qingxuan Jiang, Ser-Nam Lim, Christopher De Sa
Tractably modelling distributions over manifolds has long been an important goal in the natural sciences.
1 code implementation • ICLR 2022 • Chengrun Yang, Ziyang Wu, Jerry Chee, Christopher De Sa, Madeleine Udell
Low-precision arithmetic trains deep learning models using less energy, less memory and less time.
no code implementations • 2 Mar 2021 • Yucheng Lu, Youngsuk Park, Lifan Chen, Yuyang Wang, Christopher De Sa, Dean Foster
In large-scale time series forecasting, one often encounters the situation where the temporal patterns of time series, while drifting over time, differ from one another in the same dataset.
no code implementations • 26 Feb 2021 • Johan Bjorck, Xiangyu Chen, Christopher De Sa, Carla P. Gomes, Kilian Q. Weinberger
Low-precision training has become a popular approach to reduce compute requirements, memory footprint, and energy consumption in supervised learning.
1 code implementation • NeurIPS 2021 • A. Feder Cooper, Yucheng Lu, Jessica Zosa Forde, Christopher De Sa
Recent empirical work shows that inconsistent results based on choice of hyperparameter optimization (HPO) configuration are a widespread problem in ML research.
no code implementations • 1 Jan 2021 • Pedram Zamirai, Jian Zhang, Christopher R Aberger, Christopher De Sa
We ask can we do pure 16-bit training which requires only 16-bit compute units, while still matching the model accuracy attained by 32-bit training.
no code implementations • 13 Oct 2020 • Pedram Zamirai, Jian Zhang, Christopher R. Aberger, Christopher De Sa
State-of-the-art generic low-precision training algorithms use a mix of 16-bit and 32-bit precision, creating the folklore that 16-bit hardware compute units alone are not enough to maximize model accuracy.
no code implementations • 6 Jul 2020 • Ruqi Zhang, Yingzhen Li, Christopher De Sa, Sam Devlin, Cheng Zhang
Variational inference (VI) plays an essential role in approximate Bayesian inference due to its computational efficiency and broad applicability.
1 code implementation • 4 Jul 2020 • A. Feder Cooper, Karen Levy, Christopher De Sa
Trade-offs between accuracy and efficiency pervade law, public health, and other non-computing domains, which have developed policies to guide how to balance the two in conditions of uncertainty.
1 code implementation • NeurIPS 2020 • Ruqi Zhang, A. Feder Cooper, Christopher De Sa
Metropolis-Hastings (MH) is a commonly-used MCMC algorithm, but it can be intractable on large datasets due to requiring computations over the whole dataset.
3 code implementations • NeurIPS 2020 • Aaron Lou, Derek Lim, Isay Katsman, Leo Huang, Qingxuan Jiang, Ser-Nam Lim, Christopher De Sa
To better conform to data geometry, recent deep generative modelling techniques adapt Euclidean constructions to non-Euclidean spaces.
no code implementations • 15 Jun 2020 • Yucheng Lu, Christopher De Sa
Decentralization is a promising method of scaling up parallel machine learning systems.
no code implementations • 14 May 2020 • Yucheng Lu, Jack Nash, Christopher De Sa
Parallelism is a ubiquitous method for accelerating machine learning algorithms.
no code implementations • 5 Mar 2020 • Zhijing Li, Christopher De Sa, Adrian Sampson
While a long history of work has sought better Q-tables, existing work either seeks to minimize image distortion or to optimize for models of the human visual system.
1 code implementation • 29 Feb 2020 • Ruqi Zhang, A. Feder Cooper, Christopher De Sa
This improves performance, but introduces bias that can cause SGHMC to converge to the wrong distribution.
2 code implementations • ICML 2020 • Aaron Lou, Isay Katsman, Qingxuan Jiang, Serge Belongie, Ser-Nam Lim, Christopher De Sa
Recent advances in deep representation learning on Riemannian manifolds extend classical deep learning operations to better capture the geometry of the manifold.
no code implementations • ICML 2020 • Yucheng Lu, Christopher De Sa
Running Stochastic Gradient Descent (SGD) in a decentralized fashion has shown promising results.
1 code implementation • NeurIPS 2019 • Ruqi Zhang, Christopher De Sa
Gibbs sampling is a Markov chain Monte Carlo method that is often used for learning and inference on graphical models.
no code implementations • 13 Oct 2019 • Ritchie Zhao, Jordan Dotzel, Zhanqiu Hu, Preslav Ivanov, Christopher De Sa, Zhiru Zhang
Specialized hardware for handling activation outliers can enable low-precision neural networks, but at the cost of nontrivial area overhead.
2 code implementations • 9 Oct 2019 • Tianyi Zhang, Zhiqiu Lin, Guandao Yang, Christopher De Sa
Low-precision training reduces computational cost and produces efficient models.
no code implementations • 9 Oct 2019 • Bowen Yang, Jian Zhang, Jonathan Li, Christopher Ré, Christopher R. Aberger, Christopher De Sa
Pipeline parallelism (PP) when training neural networks enables larger models to be partitioned spatially, leading to both lower network communication and overall higher hardware utilization.
no code implementations • ICLR 2019 • Zheng Li, Christopher De Sa
Low-precision training is a promising way of decreasing the time and energy cost of training machine learning models.
3 code implementations • 26 Apr 2019 • Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew Gordon Wilson, Christopher De Sa
Low precision operations can provide scalability, memory savings, portability, and energy efficiency.
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
no code implementations • 28 Feb 2019 • Jayadev Acharya, Christopher De Sa, Dylan J. Foster, Karthik Sridharan
In distributed statistical learning, $N$ samples are split across $m$ machines and a learner wishes to use minimal communication to learn as well as if the examples were on a single machine.
3 code implementations • 28 Jan 2019 • Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang
The majority of existing literature focuses on training quantized DNNs, while this work examines the less-studied topic of quantizing a floating-point model without (re)training.
no code implementations • CVPR 2019 • Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang
UGConvs generalize two disparate ideas in CNN architecture, channel shuffling (i. e. ShuffleNet) and block-circulant networks (i. e. CirCNN), and provide unifying insights that lead to a deeper understanding of each technique.
no code implementations • ICML 2018 • Christopher De Sa, Vincent Chen, Wing Wong
Gibbs sampling is the de facto Markov chain Monte Carlo method used for inference and learning on large scale graphical models.
1 code implementation • NeurIPS 2019 • Weizhe Hua, Yuan Zhou, Christopher De Sa, Zhiru Zhang, G. Edward Suh
Combining our method with knowledge distillation reduces the compute cost of ResNet-18 by 2. 6$\times$ without accuracy drop on ImageNet.
3 code implementations • ICML 2018 • Christopher De Sa, Albert Gu, Christopher Ré, Frederic Sala
Given a tree, we give a combinatorial construction that embeds the tree in hyperbolic space with arbitrarily low distortion without using optimization.
no code implementations • 23 Mar 2018 • Dan Alistarh, Christopher De Sa, Nikola Konstantinov
Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks.
no code implementations • 16 Mar 2018 • Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, Christopher Ré
Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines.
1 code implementation • 9 Mar 2018 • Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R. Aberger, Kunle Olukotun, Christopher Ré
Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it.
no code implementations • NeurIPS 2017 • Tri Dao, Christopher De Sa, Christopher Ré
We show that deterministic feature maps can be constructed, for any $\gamma > 0$, to achieve error $\epsilon$ with $O(e^{e^\gamma} + \epsilon^{-1/\gamma})$ samples as $\epsilon$ goes to 0.
2 code implementations • 10 Jul 2017 • Christopher De Sa, Bryan He, Ioannis Mitliagkas, Christopher Ré, Peng Xu
We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity.
no code implementations • 25 Oct 2016 • Paroma Varma, Bryan He, Dan Iter, Peng Xu, Rose Yu, Christopher De Sa, Christopher Ré
Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to model the behavior of these sources over the entire training set.
no code implementations • 23 Jun 2016 • Jian Zhang, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré
Consider a number of workers running SGD independently on the same pool of data and averaging the models every once in a while -- a common but not well understood practice.
no code implementations • NeurIPS 2016 • Bryan He, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré
Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions.
4 code implementations • NeurIPS 2016 • Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré
Additionally, in initial user studies we observed that data programming may be an easier way for non-experts to create machine learning models when training data is limited or unavailable.
no code implementations • 24 Feb 2016 • Christopher De Sa, Kunle Olukotun, Christopher Ré
Gibbs sampling is a Markov chain Monte Carlo technique commonly used for estimating marginal distributions.
no code implementations • NeurIPS 2015 • Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré
Gibbs sampling on factor graphs is a widely used inference technique, which often produces good empirical results.
no code implementations • 22 Jun 2015 • Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré
with relaxed assumptions on the sparsity of the problem; (2) we analyze asynchronous SGD algorithms for non-convex matrix problems including matrix completion; and (3) we design and analyze an asynchronous SGD algorithm, called Buckwild!, that uses lower-precision arithmetic.
no code implementations • 3 Feb 2015 • Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, Christopher Ré
Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration.
no code implementations • 5 Nov 2014 • Christopher De Sa, Kunle Olukotun, Christopher Ré
Stochastic gradient descent (SGD) on a low-rank factorization is commonly employed to speed up matrix problems including matrix completion, subspace tracking, and SDP relaxation.