no code implementations • 14 Feb 2025 • Anne Ouyang, Simon Guo, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini
Efficient GPU kernels are crucial for building performant machine learning architectures, but writing them is a time-consuming challenge that requires significant expertise; therefore, we explore using language models (LMs) to automate kernel generation.
no code implementations • 24 Jan 2025 • Ryan Ehrlich, Bradley Brown, Jordan Juravsky, Ronald Clark, Christopher Ré, Azalia Mirhoseini
Here, we explore this problem in the context of solving real-world GitHub issues from the SWE-bench dataset.
1 code implementation • 9 Dec 2024 • Michael Wornow, Suhana Bedi, Miguel Angel Fuentes Hernandez, Ethan Steinberg, Jason Alan Fries, Christopher Ré, Sanmi Koyejo, Nigam H. Shah
We find that longer context models improve predictive performance -- our Mamba-based model surpasses the prior state-of-the-art on 9/14 tasks on the EHRSHOT prediction benchmark.
1 code implementation • 6 Dec 2024 • Neel Guha, Mayee F. Chen, Trevor Chow, Ishan S. Khare, Christopher Ré
Using this graphical model, we estimate sample-dependent quality scores for each LLM, and route each sample to the LLM with the highest corresponding score.
1 code implementation • 19 Nov 2024 • Maurice Weber, Daniel Fu, Quentin Anthony, Yonatan Oren, Shane Adams, Anton Alexandrov, Xiaozhong Lyu, Huu Nguyen, Xiaozhe Yao, Virginia Adams, Ben Athiwaratkun, Rahul Chalamala, Kezhen Chen, Max Ryabinin, Tri Dao, Percy Liang, Christopher Ré, Irina Rish, Ce Zhang
In addition, we release RedPajama-V2, a massive web-only dataset consisting of raw, unfiltered text data together with quality signals and metadata.
1 code implementation • 8 Nov 2024 • Mayee F. Chen, Michael Y. Hu, Nicholas Lourie, Kyunghyun Cho, Christopher Ré
Finally, we leverage the insights from our framework to derive a new online method named Aioli, which directly estimates the mixing law parameters throughout training and uses them to dynamically adjust proportions.
1 code implementation • 7 Nov 2024 • Tanishq Kumar, Zachary Ankner, Benjamin F. Spector, Blake Bordelon, Niklas Muennighoff, Mansheej Paul, Cengiz Pehlevan, Christopher Ré, aditi raghunathan
Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this.
1 code implementation • 27 Oct 2024 • Benjamin F. Spector, Simran Arora, Aaryan Singhal, Daniel Y. Fu, Christopher Ré
We match CuBLAS and FlashAttention-3 on GEMM and attention inference performance and outperform the strongest baselines by $10-40\%$ on attention backwards, $8\times$ on state space models, and $14\times$ on linear attention.
1 code implementation • 14 Oct 2024 • Michael Zhang, Simran Arora, Rahul Chalamala, Alan Wu, Benjamin Spector, Aaryan Singhal, Krithik Ramesh, Christopher Ré
When compared with prior approaches under the same compute budgets, LoLCATs significantly improves linearizing quality, closing the gap between linearized and original Llama 3. 1 70B and 405B LLMs by 77. 8% and 78. 1% on 5-shot MMLU.
no code implementations • 11 Oct 2024 • Vishnu Sarukkai, Brennan Shacklett, Zander Majercik, Kush Bhatia, Christopher Ré, Kayvon Fatahalian
Our two-step solution leverages the task domain knowledge and the code synthesis abilities of LLMs to author progress functions that estimate task progress from a given state.
2 code implementations • 8 Oct 2024 • Christopher Fifty, Ronald G. Junkins, Dennis Duan, Aniketh Iger, Jerry W. Liu, Ehsan Amid, Sebastian Thrun, Christopher Ré
However, as vector quantization is non-differentiable, the gradient to the encoder flows around the vector quantization layer rather than through it in a straight-through approximation.
no code implementations • 7 Oct 2024 • Avanika Narayan, Mayee F. Chen, Kush Bhatia, Christopher Ré
We find that fine-tuning on Cookbook-generated data is able to improve performance on its corresponding task by up to 52. 7 accuracy points.
1 code implementation • 23 Sep 2024 • Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Guha, E. Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Ré, Azalia Mirhoseini
Additionally, efficiently and automatically searching the space of model choices, inference-time techniques, and their compositions is challenging due to the large design space.
1 code implementation • 31 Jul 2024 • Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, Azalia Mirhoseini
Across multiple tasks and models, we observe that coverage -- the fraction of problems that are solved by any generated sample -- scales with the number of samples over four orders of magnitude.
1 code implementation • 7 Jul 2024 • Simran Arora, Aman Timalsina, Aaryan Singhal, Benjamin Spector, Sabri Eyuboglu, Xinyi Zhao, Ashish Rao, Atri Rudra, Christopher Ré
Recurrent large language models that compete with Transformers in language modeling perplexity are emerging at a rapid rate (e. g., Mamba, RWKV).
1 code implementation • 10 May 2024 • Rom N. Parnichkun, Stefano Massaroli, Alessandro Moro, Jimmy T. H. Smith, Ramin Hasani, Mathias Lechner, Qi An, Christopher Ré, Hajime Asama, Stefano Ermon, Taiji Suzuki, Atsushi Yamashita, Michael Poli
We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size.
1 code implementation • 26 Mar 2024 • Michael Poli, Armin W Thomas, Eric Nguyen, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting, Taiji Suzuki, Brian Hie, Stefano Ermon, Christopher Ré, Ce Zhang, Stefano Massaroli
The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation.
2 code implementations • 28 Feb 2024 • Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Zinsley, James Zou, Atri Rudra, Christopher Ré
In this work, we explore whether we can improve language model efficiency (e. g. by reducing memory consumption) without compromising on recall.
1 code implementation • 18 Feb 2024 • Gautam Machiraju, Alexander Derry, Arjun Desai, Neel Guha, Amir-Hossein Karimi, James Zou, Russ Altman, Christopher Ré, Parag Mallick
Feature attribution, the ability to localize regions of the input data that are relevant for classification, is an important capability for ML models in scientific and biomedical domains.
no code implementations • 12 Feb 2024 • Jon Saad-Falcon, Daniel Y. Fu, Simran Arora, Neel Guha, Christopher Ré
Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e. g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text.
1 code implementation • 7 Feb 2024 • Jordan Juravsky, Bradley Brown, Ryan Ehrlich, Daniel Y. Fu, Christopher Ré, Azalia Mirhoseini
Decoding in this large-batch setting can be bottlenecked by the attention operation, which reads large key-value (KV) caches from memory and computes inefficient matrix-vector products for every sequence in the batch.
1 code implementation • 6 Feb 2024 • Michael Zhang, Kush Bhatia, Hermann Kumbong, Christopher Ré
Experiments show Hedgehog recovers over 99% of standard Transformer quality in train-from-scratch and finetuned-conversion settings, outperforming prior linear attentions up to 6 perplexity points on WikiText-103 with causal GPTs, and up to 8. 7 GLUE score points on finetuned bidirectional BERTs.
3 code implementations • 8 Dec 2023 • Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Zou, Atri Rudra, Christopher Ré
To close the gap between synthetics and real language, we develop a new formalization of the task called multi-query associative recall (MQAR) that better reflects actual language.
1 code implementation • 10 Nov 2023 • Daniel Y. Fu, Hermann Kumbong, Eric Nguyen, Christopher Ré
FlashFFTConv uses a matrix decomposition that computes the FFT using matrix multiply units and enables kernel fusion for long sequences, reducing I/O.
1 code implementation • NeurIPS 2023 • Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Ré
We ask: are there performant architectures that can scale sub-quadratically along sequence length and model dimension?
1 code implementation • NeurIPS 2023 • Neel Guha, Julian Nyarko, Daniel E. Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel N. Rockmore, Diego Zambrano, Dmitry Talisman, Enam Hoque, Faiz Surani, Frank Fagan, Galit Sarfaty, Gregory M. Dickinson, Haggai Porat, Jason Hegland, Jessica Wu, Joe Nudell, Joel Niklaus, John Nay, Jonathan H. Choi, Kevin Tobia, Margaret Hagan, Megan Ma, Michael Livermore, Nikon Rasumov-Rahe, Nils Holzenberger, Noam Kolt, Peter Henderson, Sean Rehaag, Sharad Goel, Shang Gao, Spencer Williams, Sunny Gandhi, Tom Zur, Varun Iyer, Zehua Li
The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform?
2 code implementations • 24 Jun 2023 • Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang Wang, Beidi Chen
Based on these insights, we propose Heavy Hitter Oracle (H$_2$O), a KV cache eviction policy that dynamically retains a balance of recent and H$_2$ tokens.
1 code implementation • 14 Jun 2023 • Khaled Saab, Siyi Tang, Mohamed Taha, Christopher Lee-Messer, Christopher Ré, Daniel Rubin
We find that our multilabel model significantly improves overall seizure onset detection performance (+5. 9 AUROC points) while greatly improving performance among subgroups (up to +8. 3 AUROC points), and decreases false positives on non-epileptiform abnormalities by 8 FPR points.
1 code implementation • 19 Apr 2023 • Simran Arora, Brandon Yang, Sabri Eyuboglu, Avanika Narayan, Andrew Hojel, Immanuel Trummer, Christopher Ré
Code synthesis is cheap, but far less accurate than directly processing each document with the LLM.
1 code implementation • 16 Mar 2023 • Michael Zhang, Khaled K. Saab, Michael Poli, Tri Dao, Karan Goel, Christopher Ré
For expressivity, we propose a new SSM parameterization based on the companion matrix -- a canonical representation for discrete-time processes -- which enables SpaceTime's SSM layers to learn desirable autoregressive processes.
1 code implementation • 13 Mar 2023 • Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang
As a result, when running OPT-175B on a single 16GB GPU, FlexGen achieves significantly higher throughput compared to state-of-the-art offloading systems, reaching a generation throughput of 1 token/s for the first time with an effective batch size of 144.
no code implementations • 1 Mar 2023 • Vishnu Sarukkai, Linden Li, Arden Ma, Christopher Ré, Kayvon Fatahalian
We seek to give users precise control over diffusion-based image generation by modeling complex scenes as sequences of layers, which define the desired spatial arrangement and visual attributes of objects in the scene.
6 code implementations • 21 Feb 2023 • Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré
Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale.
Ranked #37 on
Language Modelling
on WikiText-103
1 code implementation • 13 Feb 2023 • Daniel Y. Fu, Elliot L. Epstein, Eric Nguyen, Armin W. Thomas, Michael Zhang, Tri Dao, Atri Rudra, Christopher Ré
We find that a key requirement to achieving high performance is keeping the convolution kernels smooth.
3 code implementations • 28 Dec 2022 • Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré
First, we use synthetic language modeling tasks to understand the gap between SSMs and attention.
Ranked #2 on
Language Modelling
on The Pile
(Test perplexity metric)
1 code implementation • 26 Nov 2022 • Michael Poli, Stefano Massaroli, Federico Berto, Jinykoo Park, Tri Dao, Christopher Ré, Stefano Ermon
Instead, this work introduces a blueprint for frequency domain learning through a single transform: transform once (T1).
3 code implementations • 16 Nov 2022 • Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda
We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models.
1 code implementation • 12 Oct 2022 • Eric Nguyen, Karan Goel, Albert Gu, Gordon W. Downs, Preey Shah, Tri Dao, Stephen A. Baccus, Christopher Ré
On ImageNet-1k, S4ND exceeds the performance of a Vision Transformer baseline by $1. 5\%$ when training with a $1$D sequence of patches, and matches ConvNeXt when modeling images in $2$D.
3 code implementations • 5 Oct 2022 • Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré
Prompting is a brittle process wherein small modifications to the prompt can cause large variations in the model predictions, and therefore significant effort is dedicated towards designing a painstakingly "perfect prompt" for a task.
Ranked #1 on
Question Answering
on Story Cloze
1 code implementation • 18 Sep 2022 • Lingjiao Chen, Zhihua Jin, Sabri Eyuboglu, Christopher Ré, Matei Zaharia, James Zou
HAPI is the first large-scale dataset of ML API usages and is a unique resource for studying ML-as-a-service (MLaaS).
1 code implementation • 13 Sep 2022 • Neel Guha, Daniel E. Ho, Julian Nyarko, Christopher Ré
Finally-inspired by the Open Science movement-we make a call for the legal and computer science communities to join our efforts by contributing new tasks.
no code implementations • 14 Jul 2022 • Michael Zhang, Christopher Ré
We also find that efficient ways to improve model inference (e. g., via adapters, lightweight networks with FM embeddings as inputs) do not consistently improve and can sometimes hurt group robustness compared to zero-shot (e. g., increasing the accuracy gap by 50. 1 pp on CelebA).
1 code implementation • 24 Jun 2022 • Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, Christopher Ré
Linear time-invariant state space models (SSM) are a classical model from engineering and statistics, that have recently been shown to be very promising in machine learning through the Structured State Space sequence model (S4).
Ranked #2 on
Long-range modeling
on LRA
2 code implementations • 23 Jun 2022 • Albert Gu, Ankit Gupta, Karan Goel, Christopher Ré
On the other hand, a recent variant of S4 called DSS showed that restricting the state matrix to be fully diagonal can still preserve the performance of the original model when using a specific initialization based on approximating S4's matrix.
2 code implementations • 22 Jun 2022 • Armin W. Thomas, Christopher Ré, Russell A. Poldrack
At their core, these frameworks learn the dynamics of brain activity by modeling sequences of activity akin to how sequences of text are modeled in NLP.
no code implementations • 17 Jun 2022 • Jupinder Parmar, Khaled Saab, Brian Pogatchnik, Daniel Rubin, Christopher Ré
Domain generalization in medical image classification is an important problem for trustworthy machine learning to be deployed in healthcare.
no code implementations • 31 May 2022 • Armin W. Thomas, Christopher Ré, Russell A. Poldrack
Deep learning (DL) models find increasing application in mental state decoding, where researchers seek to understand the mapping between mental states (e. g., perceiving fear or joy) and brain activity by identifying those brain regions (and networks) whose activity allows to accurately identify (i. e., decode) these states.
11 code implementations • 27 May 2022 • Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré
We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method.
1 code implementation • 27 May 2022 • Simran Arora, Christopher Ré
However, privacy and quality appear to be in tension in existing systems for personal tasks.
2 code implementations • 20 May 2022 • Avanika Narayan, Ines Chami, Laurel Orr, Simran Arora, Christopher Ré
Foundation Models (FMs) are models trained on large corpora of data that, at very large scale, can generalize to new tasks without any task-specific finetuning.
Ranked #11 on
Entity Resolution
on Amazon-Google
1 code implementation • Findings (ACL) 2022 • Megan Leszczynski, Daniel Y. Fu, Mayee F. Chen, Christopher Ré
Entity retrieval--retrieving information about entity mentions in a query--is a key step in open-domain tasks, such as question answering or fact checking.
1 code implementation • 15 Apr 2022 • Mayee F. Chen, Daniel Y. Fu, Avanika Narayan, Michael Zhang, Zhao Song, Kayvon Fatahalian, Christopher Ré
We first prove that adding a weighted class-conditional InfoNCE loss to SupCon controls the degree of spread.
2 code implementations • 1 Apr 2022 • Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, Christopher Ré
To address these issues, we propose a class of matrices (Monarch) that is hardware-efficient (they are parameterized as products of two block-diagonal matrices for better hardware utilization) and expressive (they can represent many commonly used transforms).
1 code implementation • 24 Mar 2022 • Mayee F. Chen, Daniel Y. Fu, Dyah Adila, Michael Zhang, Frederic Sala, Kayvon Fatahalian, Christopher Ré
Despite the black-box nature of foundation models, we prove results characterizing how our approach improves performance and show that lift scales with the smoothness of label distributions in embedding space.
2 code implementations • ICLR 2022 • Sabri Eyuboglu, Maya Varma, Khaled Saab, Jean-Benoit Delbrouck, Christopher Lee-Messer, Jared Dunnmon, James Zou, Christopher Ré
In this work, we address these challenges by first designing a principled evaluation framework that enables a quantitative comparison of SDMs across 1, 235 slice discovery settings in three input domains (natural images, medical images, and time-series data).
1 code implementation • 14 Mar 2022 • Arjun D Desai, Andrew M Schmidt, Elka B Rubin, Christopher M Sandino, Marianne S Black, Valentina Mazzoli, Kathryn J Stevens, Robert Boutin, Christopher Ré, Garry E Gold, Brian A Hargreaves, Akshay S Chaudhari
While recent machine learning methods for MRI reconstruction and analysis have shown promise for reducing this burden, these techniques are primarily validated with imperfect image quality metrics, which are discordant with clinically-relevant measures that ultimately hamper clinical deployment and clinician trust.
1 code implementation • 14 Mar 2022 • Simran Arora, Patrick Lewis, Angela Fan, Jacob Kahn, Christopher Ré
We first define the PUBLIC-PRIVATE AUTOREGRESSIVE INFORMATION RETRIEVAL (PAIR) privacy framework for the novel retrieval setting over multiple privacy scopes.
Ranked #1 on
Multi-hop Question Answering
on ConcurrentQA
1 code implementation • 3 Mar 2022 • Michael Zhang, Nimit S. Sohoni, Hongyang R. Zhang, Chelsea Finn, Christopher Ré
As ERM models can be good spurious attribute predictors, CNC works by (1) using a trained ERM model's outputs to identify samples with the same class but dissimilar spurious features, and (2) training a robust model with contrastive learning to learn similar representations for same-class samples.
6 code implementations • 20 Feb 2022 • Karan Goel, Albert Gu, Chris Donahue, Christopher Ré
SaShiMi yields state-of-the-art performance for unconditional waveform generation in the autoregressive setting.
no code implementations • 31 Dec 2021 • Nimit S. Sohoni, Maziar Sanjabi, Nicolas Ballas, Aditya Grover, Shaoliang Nie, Hamed Firooz, Christopher Ré
Theoretically, we provide generalization bounds for our approach in terms of the worst-group performance, which scale with respect to both the total number of training points and the number of training points with group labels.
1 code implementation • ICLR 2022 • Tri Dao, Beidi Chen, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, Christopher Ré
To address this, our main insight is to optimize over a continuous superset of sparse matrices with a fixed structure known as products of butterfly matrices.
2 code implementations • 8 Nov 2021 • Avanika Narayan, Piero Molino, Karan Goel, Willie Neiswanger, Christopher Ré
LBT provides a configurable interface for controlling training and customizing evaluation, a standardized training framework for eliminating confounding variables, and support for multi-objective evaluation.
1 code implementation • 3 Nov 2021 • Arjun D Desai, Beliz Gunel, Batu M Ozturkler, Harris Beg, Shreyas Vasanawala, Brian A Hargreaves, Christopher Ré, John M Pauly, Akshay S Chaudhari
Deep neural networks have enabled improved image quality and fast inference times for various inverse problems, including accelerated magnetic resonance imaging (MRI) reconstruction.
7 code implementations • ICLR 2022 • Albert Gu, Karan Goel, Christopher Ré
A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies.
1 code implementation • NeurIPS 2021 • Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré
Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences.
2 code implementations • NeurIPS 2021 • Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré
Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency.
Ranked #2 on
Sequential Image Classification
on Sequential MNIST
1 code implementation • Findings (EMNLP) 2021 • Maya Varma, Laurel Orr, Sen Wu, Megan Leszczynski, Xiao Ling, Christopher Ré
Named entity disambiguation (NED), which involves mapping textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities.
no code implementations • 29 Sep 2021 • Daniel Yang Fu, Mayee F Chen, Michael Zhang, Kayvon Fatahalian, Christopher Ré
Supervised contrastive learning optimizes a loss that pushes together embeddings of points from the same class while pulling apart embeddings of points from different classes.
no code implementations • 16 Aug 2021 • Armin W. Thomas, Christopher Ré, Russell A. Poldrack
In cognitive decoding, researchers aim to characterize a brain region's representations by identifying the cognitive states (e. g., accepting/rejecting a gamble) that can be identified from the region's activity.
2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
2 code implementations • 16 Jul 2021 • Piero Molino, Christopher Ré
In this article we will describe how ML systems are currently structured, highlight important factors for their success and adoption, what are the issues current ML systems are facing and how the systems we developed addressed them.
1 code implementation • 1 Jul 2021 • Mayee Chen, Karan Goel, Nimit S. Sohoni, Fait Poms, Kayvon Fatahalian, Christopher Ré
If an unlabeled sample from the target distribution is available, along with a labeled sample from a possibly different source distribution, standard approaches such as importance weighting can be applied to estimate performance on the target.
1 code implementation • 7 Jun 2021 • Ines Chami, Albert Gu, Dat Nguyen, Christopher Ré
Given directions, PCA relies on: (1) a parameterization of subspaces spanned by these directions, (2) a method of projection onto subspaces that preserves information in these directions, and (3) an objective to optimize, namely the variance explained by projections.
1 code implementation • 2 Jun 2021 • Sahaana Suri, Ihab F. Ilyas, Christopher Ré, Theodoros Rekatsinas
Context enrichment, or rebuilding fragmented context, using keyless joins is an implicit or explicit step in machine learning (ML) pipelines over structured data sources.
1 code implementation • NeurIPS 2021 • Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré
Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences.
3 code implementations • NeurIPS 2021 • Nicholas Roberts, Mikhail Khodak, Tri Dao, Liam Li, Christopher Ré, Ameet Talwalkar
An important goal of AutoML is to automate-away the design of neural networks on new tasks in under-explored domains.
1 code implementation • 3 Mar 2021 • Mayee F. Chen, Benjamin Cohen-Wang, Stephen Mussmann, Frederic Sala, Christopher Ré
We apply our decomposition framework to three scenarios -- well-specified, misspecified, and corrected models -- to 1) choose between labeled and unlabeled data and 2) learn from their combination.
2 code implementations • NAACL 2021 • Karan Goel, Nazneen Rajani, Jesse Vig, Samson Tan, Jason Wu, Stephan Zheng, Caiming Xiong, Mohit Bansal, Christopher Ré
Despite impressive performance on standard benchmarks, deep neural networks are often brittle when deployed in real-world systems.
2 code implementations • ICLR 2020 • Tri Dao, Nimit S. Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, Christopher Ré
Modern neural network architectures use structured linear transformations, such as low-rank matrices, sparse matrices, permutations, and the Fourier transform, to improve inference speed and reduce memory usage compared to general linear maps.
1 code implementation • NeurIPS 2020 • Nimit S. Sohoni, Jared A. Dunnmon, Geoffrey Angus, Albert Gu, Christopher Ré
As the subclass labels are frequently unavailable, models trained using only the coarser-grained class labels often exhibit highly variable performance across different subclasses.
no code implementations • 22 Oct 2020 • Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher Ré, Weijie J. Su
Intuitively, the transfer effect from one task to another task depends on dataset shifts such as sample sizes and covariance matrices.
2 code implementations • NeurIPS 2020 • Ines Chami, Albert Gu, Vaggos Chatziafratis, Christopher Ré
Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree.
1 code implementation • ICLR 2021 • Karan Goel, Albert Gu, Yixuan Li, Christopher Ré
Particularly concerning are models with inconsistent performance on specific subgroups of a class, e. g., exhibiting disparities in skin cancer classification in the presence or absence of a spurious bandage.
1 code implementation • 26 Jun 2020 • Mayee F. Chen, Daniel Y. Fu, Frederic Sala, Sen Wu, Ravi Teja Mullapudi, Fait Poms, Kayvon Fatahalian, Christopher Ré
Our goal is to enable machine learning systems to be trained interactively.
no code implementations • ACL 2020 • Simran Arora, Avner May, Jian Zhang, Christopher Ré
We study the settings for which deep contextual embeddings (e. g., BERT) give large improvements in performance relative to classic pretrained embeddings (e. g., GloVe), and an even simpler baseline---random word embeddings---focusing on the impact of the training set size and the linguistic properties of the task.
1 code implementation • 7 May 2020 • Ines Chami, Sami Abu-El-Haija, Bryan Perozzi, Christopher Ré, Kevin Murphy
The second, graph regularized neural networks, leverages graphs to augment neural network losses with a regularization objective for semi-supervised learning.
2 code implementations • ICML 2020 • Sen Wu, Hongyang R. Zhang, Gregory Valiant, Christopher Ré
We validate our proposed scheme on image and text datasets.
no code implementations • ICLR 2020 • Sen Wu, Hongyang R. Zhang, Christopher Ré
We investigate multi-task learning approaches that use a shared feature representation for all tasks.
3 code implementations • ACL 2020 • Ines Chami, Adva Wolf, Da-Cheng Juan, Frederic Sala, Sujith Ravi, Christopher Ré
However, existing hyperbolic embedding methods do not account for the rich logical patterns in KGs.
Ranked #5 on
Link Prediction
on YAGO3-10
no code implementations • 11 Apr 2020 • Zhaobin Kuang, Frederic Sala, Nimit Sohoni, Sen Wu, Aldo Córdova-Palomera, Jared Dunnmon, James Priest, Christopher Ré
To relax these assumptions, we propose Ivy, a new method to combine IV candidates that can handle correlated and invalid IV candidates in a robust manner.
no code implementations • 17 Mar 2020 • Sarah M. Hooper, Jared A. Dunnmon, Matthew P. Lungren, Sanjiv Sam Gambhir, Christopher Ré, Adam S. Wang, Bhavik N. Patel
We then show that the trained model is robust to reduced tube current and fewer projections, with the AUROC dropping only 0. 65% for images acquired with a 16x reduction in tube current and 0. 22% for images acquired with 8x fewer projections.
1 code implementation • 29 Feb 2020 • Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher R. Aberger, Christopher Ré
To theoretically explain this tradeoff, we introduce a new measure of embedding instability---the eigenspace instability measure---which we prove bounds the disagreement in downstream predictions introduced by the change in word embeddings.
1 code implementation • ICML 2020 • Daniel Y. Fu, Mayee F. Chen, Frederic Sala, Sarah M. Hooper, Kayvon Fatahalian, Christopher Ré
In this work, we show that, for a class of latent variable models highly applicable to weak supervision, we can find a closed-form solution to model parameters, obviating the need for iterative solutions like stochastic gradient descent (SGD).
3 code implementations • NeurIPS 2019 • Ines Chami, Rex Ying, Christopher Ré, Jure Leskovec
Here we propose Hyperbolic Graph Convolutional Neural Network (HGCN), the first inductive hyperbolic GCN that leverages both the expressiveness of GCNs and hyperbolic geometry to learn inductive node representations for hierarchical and scale-free graphs.
Ranked #1 on
Link Prediction
on PPI
(Accuracy metric)
no code implementations • NeurIPS 2019 • Frederic Sala, Paroma Varma, Jason Fries, Daniel Y. Fu, Shiori Sagawa, Saelig Khattar, Ashwini Ramamoorthy, Ke Xiao, Kayvon Fatahalian, James Priest, Christopher Ré
Multi-resolution sources exacerbate this challenge due to complex correlations and sample complexity that scales in the length of the sequence.
no code implementations • 9 Oct 2019 • Bowen Yang, Jian Zhang, Jonathan Li, Christopher Ré, Christopher R. Aberger, Christopher De Sa
Pipeline parallelism (PP) when training neural networks enables larger models to be partitioned spatially, leading to both lower network communication and overall higher hardware utilization.
1 code implementation • 7 Oct 2019 • Daniel Y. Fu, Will Crichton, James Hong, Xinwei Yao, Haotian Zhang, Anh Truong, Avanika Narayan, Maneesh Agrawala, Christopher Ré, Kayvon Fatahalian
Many real-world video analysis applications require the ability to identify domain-specific events in video, such as interviews and commercials in TV news broadcasts, or action sequences in film.
1 code implementation • 27 Sep 2019 • Luke Oakden-Rayner, Jared Dunnmon, Gustavo Carneiro, Christopher Ré
Machine learning models for medical image analysis often suffer from poor performance on important subsets of a population that are not identified during training or testing.
2 code implementations • NeurIPS 2019 • Vincent S. Chen, Sen Wu, Zhenzhen Weng, Alexander Ratner, Christopher Ré
In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and "question" sentences might be important to a dialogue agent's language understanding for product purposes.
1 code implementation • 7 Sep 2019 • Christopher Ré, Feng Niu, Pallavi Gudipati, Charles Srisuwananukorn
We describe a system called Overton, whose main design goal is to support engineers in building, monitoring, and improving production machine learning systems.
1 code implementation • NeurIPS 2019 • Avner May, Jian Zhang, Tri Dao, Christopher Ré
Finally, we show that by using the eigenspace overlap score as a selection criterion between embeddings drawn from a representative set we compressed, we can efficiently identify the better performing embedding with up to $2\times$ lower selection error rates than the next best measure of compression quality, and avoid the cost of training a model for each task of interest.
no code implementations • ICLR 2019 • Albert Gu, Frederic Sala, Beliz Gunel, Christopher Ré
The quality of the representations achieved by embeddings is determined by how well the geometry of the embedding space matches the structure of the data.
no code implementations • 24 Apr 2019 • Nimit S. Sohoni, Christopher R. Aberger, Megan Leszczynski, Jian Zhang, Christopher Ré
In this paper we study a fundamental question: How much memory is actually needed to train a neural network?
1 code implementation • 3 Apr 2019 • Alison Callahan, Jason A. Fries, Christopher Ré, James I Huddleston III, Nicholas J Giori, Scott Delp, Nigam H. Shah
Using hip replacements as a test case, our methods accurately extracted implant details and reports of complications and pain from electronic health records with up to 96. 3% precision, 98. 5% recall, and 97. 4% F1, improved classification performance by 12. 7- 53. 0% over rule-based methods, and detected over 6 times as many complication events compared to using structured data alone.
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
no code implementations • 26 Mar 2019 • Jared Dunnmon, Alexander Ratner, Nishith Khandwala, Khaled Saab, Matthew Markert, Hersh Sagreiya, Roger Goldman, Christopher Lee-Messer, Matthew Lungren, Daniel Rubin, Christopher Ré
Labeling training datasets has become a key barrier to building medical machine learning models.
no code implementations • 14 Mar 2019 • Paroma Varma, Frederic Sala, Ann He, Alexander Ratner, Christopher Ré
Labeling training data is a key bottleneck in the modern machine learning pipeline.
1 code implementation • 14 Mar 2019 • Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, Christopher Ré
Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions.
no code implementations • 2 Dec 2018 • Stephen H. Bach, Daniel Rodriguez, Yintao Liu, Chong Luo, Haidong Shao, Cassandra Xia, Souvik Sen, Alexander Ratner, Braden Hancock, Houman Alborzi, Rahul Kuchhal, Christopher Ré, Rob Malkin
Labeling training data is one of the most costly bottlenecks in developing machine learning-based applications.
1 code implementation • 31 Oct 2018 • Jian Zhang, Avner May, Tri Dao, Christopher Ré
We investigate how to train kernel approximation methods that generalize well under a memory budget.
1 code implementation • 5 Oct 2018 • Alexander Ratner, Braden Hancock, Jared Dunnmon, Frederic Sala, Shreyash Pandey, Christopher Ré
Snorkel MeTaL: A framework for training models with multi-task weak supervision
Ranked #1 on
Semantic Textual Similarity
on SentEval
1 code implementation • NeurIPS 2018 • Anna T. Thomas, Albert Gu, Tri Dao, Atri Rudra, Christopher Ré
The low displacement rank (LDR) framework for structured matrices represents a matrix through two displacement operators and a low-rank residual.
no code implementations • 2 Jul 2018 • Aarthy Shivram Arun, Sai Vikneshwar Mani Jayaraman, Christopher Ré, Atri Rudra
We revisit the classical problem of exact inference on probabilistic graphical models (PGMs).
2 code implementations • ACL 2018 • Braden Hancock, Paroma Varma, Stephanie Wang, Martin Bringmann, Percy Liang, Christopher Ré
Training accurate classifiers requires many labels, but each label provides only limited information (one bit for binary classification).
3 code implementations • ICML 2018 • Christopher De Sa, Albert Gu, Christopher Ré, Frederic Sala
Given a tree, we give a combinatorial construction that embeds the tree in hyperbolic space with arbitrarily low distortion without using optimization.
no code implementations • 5 Apr 2018 • Aarthy Shivram Arun, Sai Vikneshwar Mani Jayaraman, Christopher Ré, Atri Rudra
We revisit the classical problem of exact inference on probabilistic graphical models (PGMs).
no code implementations • 16 Mar 2018 • Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, Christopher Ré
Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines.
1 code implementation • 9 Mar 2018 • Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R. Aberger, Kunle Olukotun, Christopher Ré
Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it.
2 code implementations • 28 Nov 2017 • Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré
In a user study, subject matter experts build models 2. 8x faster and increase predictive performance an average 45. 5% versus seven hours of hand labeling.
no code implementations • NeurIPS 2017 • Tri Dao, Christopher De Sa, Christopher Ré
We show that deterministic feature maps can be constructed, for any $\gamma > 0$, to achieve error $\epsilon$ with $O(e^{e^\gamma} + \epsilon^{-1/\gamma})$ samples as $\epsilon$ goes to 0.
no code implementations • NeurIPS 2017 • Paroma Varma, Bryan He, Payal Bajaj, Imon Banerjee, Nishith Khandwala, Daniel L. Rubin, Christopher Ré
Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline.
1 code implementation • NeurIPS 2017 • Alexander J. Ratner, Henry R. Ehrenberg, Zeshan Hussain, Jared Dunnmon, Christopher Ré
Data augmentation is a ubiquitous technique for increasing the size of labeled training sets by leveraging task-specific data transformations that preserve class labels.
2 code implementations • 10 Jul 2017 • Christopher De Sa, Bryan He, Ioannis Mitliagkas, Christopher Ré, Peng Xu
We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity.
3 code implementations • 13 May 2017 • Madalina Fiterau, Suvrat Bhooshan, Jason Fries, Charles Bournhonesque, Jennifer Hicks, Eni Halilaj, Christopher Ré, Scott Delp
In healthcare applications, temporal variables that encode movement, health status and longitudinal patient evolution are often accompanied by rich structured information such as demographics, diagnostics and medical exam data.
no code implementations • 20 Apr 2017 • Jason Fries, Sen Wu, Alex Ratner, Christopher Ré
We present SwellShark, a framework for building biomedical named entity recognition (NER) systems quickly and without hand-labeled data.
Ranked #2 on
Weakly-Supervised Named Entity Recognition
on BC5CDR
2 code implementations • 15 Mar 2017 • Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, Christopher Ré
We focus on knowledge base construction (KBC) from richly formatted data.
Databases
no code implementations • ICML 2017 • Stephen H. Bach, Bryan He, Alexander Ratner, Christopher Ré
Curating labeled training data has become the primary bottleneck in machine learning.
no code implementations • 25 Oct 2016 • Paroma Varma, Bryan He, Dan Iter, Peng Xu, Rose Yu, Christopher De Sa, Christopher Ré
Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to model the behavior of these sources over the entire training set.
no code implementations • NeurIPS 2016 • Peng Xu, Jiyan Yang, Farbod Roosta-Khorasani, Christopher Ré, Michael W. Mahoney
As second-order methods prove to be effective in finding the minimizer to a high-precision, in this work, we propose randomized Newton-type algorithms that exploit \textit{non-uniform} sub-sampling of $\{\nabla^2 f_i(w)\}_{i=1}^{n}$, as well as inexact updates, as means to reduce the computational complexity.
no code implementations • 23 Jun 2016 • Jian Zhang, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré
Consider a number of workers running SGD independently on the same pool of data and averaging the models every once in a while -- a common but not well understood practice.
1 code implementation • 14 Jun 2016 • Stefan Hadjis, Ce Zhang, Ioannis Mitliagkas, Dan Iter, Christopher Ré
Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs.
no code implementations • NeurIPS 2016 • Bryan He, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré
Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions.
3 code implementations • 31 May 2016 • Ioannis Mitliagkas, Ce Zhang, Stefan Hadjis, Christopher Ré
Since asynchronous methods have better hardware efficiency, this result may shed light on when asynchronous execution is more efficient for deep learning systems.
4 code implementations • NeurIPS 2016 • Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré
Additionally, in initial user studies we observed that data programming may be an easier way for non-experts to create machine learning models when training data is limited or unavailable.
no code implementations • 24 Feb 2016 • Christopher De Sa, Kunle Olukotun, Christopher Ré
Gibbs sampling is a Markov chain Monte Carlo technique commonly used for estimating marginal distributions.
no code implementations • NeurIPS 2015 • Sorathan Chaturapruek, John C. Duchi, Christopher Ré
We show that asymptotically, completely asynchronous stochastic gradient procedures achieve optimal (even to constant factors) convergence rates for the solution of convex optimization problems under nearly the same conditions required for asymptotic optimality of standard stochastic gradient procedures.
no code implementations • NeurIPS 2015 • Christopher M. De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré
Stochastic gradient descent (SGD) is a ubiquitous algorithm for a variety of machine learning problems.
no code implementations • NeurIPS 2015 • Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré
Gibbs sampling on factor graphs is a widely used inference technique, which often produces good empirical results.