You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • EMNLP (LaTeCHCLfL, CLFL, LaTeCH) 2021 • A. Cooper, Maria Antoniak, Christopher De Sa, Marilyn Migiel, David Mimno

We explore Boccaccio’s Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian.

1 code implementation • 25 Jul 2023 • Jerry Chee, Yaohui Cai, Volodymyr Kuleshov, Christopher De Sa

This work studies post-training parameter quantization in large language models (LLMs).

no code implementations • 14 Jun 2023 • Yingheng Wang, Yair Schiff, Aaron Gokaslan, Weishen Pan, Fei Wang, Christopher De Sa, Volodymyr Kuleshov

While diffusion models excel at generating high-quality samples, their latent variables typically lack semantic meaning and are not suitable for representation learning.

1 code implementation • 13 Jun 2023 • Kush Bhatia, Avanika Narayan, Christopher De Sa, Christopher Ré

As such, we focus on the LLM's reasoning abilities and demonstrate that this performance gap exists due to their inability to perform simple probabilistic reasoning tasks.

1 code implementation • 1 Jun 2023 • Albert Tseng, Tao Yu, Toni J. B. Liu, Christopher De Sa

These networks rely heavily on the dot product attention operator, which computes the similarity between two points by taking their inner product.

1 code implementation • 24 May 2023 • Tao Yu, Toni J. B. Liu, Albert Tseng, Christopher De Sa

Our findings indicate that shadow cones offer an innovative, general approach to geometrically encode partial orders, enabling better representation and analysis of datasets with hierarchical structures.

no code implementations • 2 Feb 2023 • Yucheng Lu, Shivani Agrawal, Suvinay Subramanian, Oleg Rybakov, Christopher De Sa, Amir Yazdanbakhsh

Recent innovations on hardware (e. g. Nvidia A100) have motivated learning N:M structured sparsity masks from scratch for fast model inference.

no code implementations • 2 Feb 2023 • A. Feder Cooper, Wentao Guo, Khiem Pham, Tiancheng Yuan, Charlie F. Ruan, Yucheng Lu, Christopher De Sa

Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings that are guaranteed to outperform random reshuffling (RR).

1 code implementation • 27 Jan 2023 • A. Feder Cooper, Katherine Lee, Madiha Zahrah Choksi, Solon Barocas, Christopher De Sa, James Grimmelmann, Jon Kleinberg, Siddhartha Sen, Baobao Zhang

Variance in predictions across different trained models is a significant, under-explored source of error in fair classification.

1 code implementation • 18 Jul 2022 • Tao Yu, Wentao Guo, Jianan Canal Li, Tiancheng Yuan, Christopher De Sa

In this paper, we introduce MCTensor, a library based on PyTorch for providing general-purpose and high-precision arithmetic for DL training.

no code implementations • 23 Jun 2022 • A. Feder Cooper, Jonathan Frankle, Christopher De Sa

In this paper, we clarify the overlap and differences between these two concepts, and show that the effects of non-determinism, and consequently its implications for the law, become clearer from the perspective of reasoning about ML outputs as distributions over possible outcomes.

1 code implementation • 20 Jun 2022 • Ruqi Zhang, Andrew Gordon Wilson, Christopher De Sa

While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored.

1 code implementation • 22 May 2022 • Yucheng Lu, Wentao Guo, Christopher De Sa

To reduce the memory overhead, we leverage discrepancy minimization theory to propose an online Gradient Balancing algorithm (GraB) that enjoys the same rate as herding, while reducing the memory usage from $O(nd)$ to just $O(d)$ and computation from $O(n^2)$ to $O(n)$, where $d$ denotes the model dimension.

no code implementations • 4 Mar 2022 • Yaohui Cai, Weizhe Hua, Hongzheng Chen, G. Edward Suh, Christopher De Sa, Zhiru Zhang

In addition, since PreCropping compresses CNNs at initialization, the computational and memory costs of CNNs are reduced for both training and inference on commodity hardware.

1 code implementation • 14 Feb 2022 • Tao Yu, Christopher De Sa

Due to its geometric properties, hyperbolic space can support high-fidelity embeddings of tree- and graph-structured data, upon which various hyperbolic networks have been developed.

1 code implementation • 12 Feb 2022 • Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He

1-bit gradient compression and local steps are two representative techniques that enable drastic communication reduction in distributed SGD.

1 code implementation • 10 Feb 2022 • Tao Yu, Yichi Zhang, Zhiru Zhang, Christopher De Sa

Using representation theory, we characterize which similarity matrices can be "expressed" by finite group VSA hypervectors, and we show how these VSAs can be constructed.

no code implementations • ICLR 2022 • Yucheng Lu, Si Yi Meng, Christopher De Sa

In this paper, we develop a broad condition on the sequence of examples used by SGD that is sufficient to prove tight convergence rates in both strongly convex and non-convex settings.

no code implementations • 22 Sep 2021 • A. Feder Cooper, Maria Antoniak, Christopher De Sa, Marilyn Migiel, David Mimno

We explore Boccaccio's Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian.

1 code implementation • 30 Jul 2021 • Jerry Chee, Megan Renz, Anil Damle, Christopher De Sa

After training complex deep learning models, a common task is to compress the model to reduce compute and storage demands.

1 code implementation • NeurIPS 2021 • Isay Katsman, Aaron Lou, Derek Lim, Qingxuan Jiang, Ser-Nam Lim, Christopher De Sa

Tractably modelling distributions over manifolds has long been an important goal in the natural sciences.

1 code implementation • ICLR 2022 • Chengrun Yang, Ziyang Wu, Jerry Chee, Christopher De Sa, Madeleine Udell

Low-precision arithmetic trains deep learning models using less energy, less memory and less time.

no code implementations • 2 Mar 2021 • Yucheng Lu, Youngsuk Park, Lifan Chen, Yuyang Wang, Christopher De Sa, Dean Foster

In large-scale time series forecasting, one often encounters the situation where the temporal patterns of time series, while drifting over time, differ from one another in the same dataset.

no code implementations • 26 Feb 2021 • Johan Bjorck, Xiangyu Chen, Christopher De Sa, Carla P. Gomes, Kilian Q. Weinberger

Low-precision training has become a popular approach to reduce compute requirements, memory footprint, and energy consumption in supervised learning.

1 code implementation • NeurIPS 2021 • A. Feder Cooper, Yucheng Lu, Jessica Zosa Forde, Christopher De Sa

Recent empirical work shows that inconsistent results based on choice of hyperparameter optimization (HPO) configuration are a widespread problem in ML research.

no code implementations • 1 Jan 2021 • Pedram Zamirai, Jian Zhang, Christopher R Aberger, Christopher De Sa

We ask can we do pure 16-bit training which requires only 16-bit compute units, while still matching the model accuracy attained by 32-bit training.

no code implementations • 13 Oct 2020 • Pedram Zamirai, Jian Zhang, Christopher R. Aberger, Christopher De Sa

State-of-the-art generic low-precision training algorithms use a mix of 16-bit and 32-bit precision, creating the folklore that 16-bit hardware compute units alone are not enough to maximize model accuracy.

no code implementations • 6 Jul 2020 • Ruqi Zhang, Yingzhen Li, Christopher De Sa, Sam Devlin, Cheng Zhang

Variational inference (VI) plays an essential role in approximate Bayesian inference due to its computational efficiency and broad applicability.

1 code implementation • 4 Jul 2020 • A. Feder Cooper, Karen Levy, Christopher De Sa

Trade-offs between accuracy and efficiency pervade law, public health, and other non-computing domains, which have developed policies to guide how to balance the two in conditions of uncertainty.

1 code implementation • NeurIPS 2020 • Ruqi Zhang, A. Feder Cooper, Christopher De Sa

Metropolis-Hastings (MH) is a commonly-used MCMC algorithm, but it can be intractable on large datasets due to requiring computations over the whole dataset.

2 code implementations • NeurIPS 2020 • Aaron Lou, Derek Lim, Isay Katsman, Leo Huang, Qingxuan Jiang, Ser-Nam Lim, Christopher De Sa

To better conform to data geometry, recent deep generative modelling techniques adapt Euclidean constructions to non-Euclidean spaces.

no code implementations • 15 Jun 2020 • Yucheng Lu, Christopher De Sa

Decentralization is a promising method of scaling up parallel machine learning systems.

no code implementations • 14 May 2020 • Yucheng Lu, Jack Nash, Christopher De Sa

Parallelism is a ubiquitous method for accelerating machine learning algorithms.

no code implementations • 5 Mar 2020 • Zhijing Li, Christopher De Sa, Adrian Sampson

While a long history of work has sought better Q-tables, existing work either seeks to minimize image distortion or to optimize for models of the human visual system.

2 code implementations • ICML 2020 • Aaron Lou, Isay Katsman, Qingxuan Jiang, Serge Belongie, Ser-Nam Lim, Christopher De Sa

Recent advances in deep representation learning on Riemannian manifolds extend classical deep learning operations to better capture the geometry of the manifold.

1 code implementation • 29 Feb 2020 • Ruqi Zhang, A. Feder Cooper, Christopher De Sa

This improves performance, but introduces bias that can cause SGHMC to converge to the wrong distribution.

no code implementations • ICML 2020 • Yucheng Lu, Christopher De Sa

Running Stochastic Gradient Descent (SGD) in a decentralized fashion has shown promising results.

1 code implementation • NeurIPS 2019 • Ruqi Zhang, Christopher De Sa

Gibbs sampling is a Markov chain Monte Carlo method that is often used for learning and inference on graphical models.

no code implementations • 13 Oct 2019 • Ritchie Zhao, Jordan Dotzel, Zhanqiu Hu, Preslav Ivanov, Christopher De Sa, Zhiru Zhang

Specialized hardware for handling activation outliers can enable low-precision neural networks, but at the cost of nontrivial area overhead.

2 code implementations • 9 Oct 2019 • Tianyi Zhang, Zhiqiu Lin, Guandao Yang, Christopher De Sa

Low-precision training reduces computational cost and produces efficient models.

no code implementations • 9 Oct 2019 • Bowen Yang, Jian Zhang, Jonathan Li, Christopher Ré, Christopher R. Aberger, Christopher De Sa

Pipeline parallelism (PP) when training neural networks enables larger models to be partitioned spatially, leading to both lower network communication and overall higher hardware utilization.

no code implementations • ICLR 2019 • Zheng Li, Christopher De Sa

Low-precision training is a promising way of decreasing the time and energy cost of training machine learning models.

2 code implementations • 26 Apr 2019 • Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew Gordon Wilson, Christopher De Sa

Low precision operations can provide scalability, memory savings, portability, and energy efficiency.

no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar

Machine learning (ML) techniques are enjoying rapidly increasing adoption.

no code implementations • 28 Feb 2019 • Jayadev Acharya, Christopher De Sa, Dylan J. Foster, Karthik Sridharan

In distributed statistical learning, $N$ samples are split across $m$ machines and a learner wishes to use minimal communication to learn as well as if the examples were on a single machine.

3 code implementations • 28 Jan 2019 • Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang

The majority of existing literature focuses on training quantized DNNs, while this work examines the less-studied topic of quantizing a floating-point model without (re)training.

no code implementations • CVPR 2019 • Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang

UGConvs generalize two disparate ideas in CNN architecture, channel shuffling (i. e. ShuffleNet) and block-circulant networks (i. e. CirCNN), and provide unifying insights that lead to a deeper understanding of each technique.

no code implementations • ICML 2018 • Christopher De Sa, Vincent Chen, Wing Wong

Gibbs sampling is the de facto Markov chain Monte Carlo method used for inference and learning on large scale graphical models.

1 code implementation • NeurIPS 2019 • Weizhe Hua, Yuan Zhou, Christopher De Sa, Zhiru Zhang, G. Edward Suh

Combining our method with knowledge distillation reduces the compute cost of ResNet-18 by 2. 6$\times$ without accuracy drop on ImageNet.

2 code implementations • ICML 2018 • Christopher De Sa, Albert Gu, Christopher Ré, Frederic Sala

Given a tree, we give a combinatorial construction that embeds the tree in hyperbolic space with arbitrarily low distortion without using optimization.

no code implementations • 23 Mar 2018 • Dan Alistarh, Christopher De Sa, Nikola Konstantinov

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks.

no code implementations • 16 Mar 2018 • Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, Christopher Ré

Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines.

1 code implementation • 9 Mar 2018 • Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R. Aberger, Kunle Olukotun, Christopher Ré

Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it.

no code implementations • NeurIPS 2017 • Tri Dao, Christopher De Sa, Christopher Ré

We show that deterministic feature maps can be constructed, for any $\gamma > 0$, to achieve error $\epsilon$ with $O(e^{e^\gamma} + \epsilon^{-1/\gamma})$ samples as $\epsilon$ goes to 0.

2 code implementations • 10 Jul 2017 • Christopher De Sa, Bryan He, Ioannis Mitliagkas, Christopher Ré, Peng Xu

We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity.

no code implementations • 25 Oct 2016 • Paroma Varma, Bryan He, Dan Iter, Peng Xu, Rose Yu, Christopher De Sa, Christopher Ré

Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to model the behavior of these sources over the entire training set.

no code implementations • 23 Jun 2016 • Jian Zhang, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré

Consider a number of workers running SGD independently on the same pool of data and averaging the models every once in a while -- a common but not well understood practice.

no code implementations • NeurIPS 2016 • Bryan He, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré

Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions.

4 code implementations • NeurIPS 2016 • Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré

Additionally, in initial user studies we observed that data programming may be an easier way for non-experts to create machine learning models when training data is limited or unavailable.

no code implementations • 24 Feb 2016 • Christopher De Sa, Kunle Olukotun, Christopher Ré

Gibbs sampling is a Markov chain Monte Carlo technique commonly used for estimating marginal distributions.

no code implementations • NeurIPS 2015 • Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

Gibbs sampling on factor graphs is a widely used inference technique, which often produces good empirical results.

no code implementations • 22 Jun 2015 • Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

with relaxed assumptions on the sparsity of the problem; (2) we analyze asynchronous SGD algorithms for non-convex matrix problems including matrix completion; and (3) we design and analyze an asynchronous SGD algorithm, called Buckwild!, that uses lower-precision arithmetic.

no code implementations • 3 Feb 2015 • Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, Christopher Ré

Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration.

no code implementations • 5 Nov 2014 • Christopher De Sa, Kunle Olukotun, Christopher Ré

Stochastic gradient descent (SGD) on a low-rank factorization is commonly employed to speed up matrix problems including matrix completion, subspace tracking, and SDP relaxation.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.