Search Results for author: Christopher Ré

Found 133 papers, 85 papers with code

Mechanistic Design and Scaling of Hybrid Architectures

no code implementations26 Mar 2024 Michael Poli, Armin W Thomas, Eric Nguyen, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting, Taiji Suzuki, Brian Hie, Stefano Ermon, Christopher Ré, Ce Zhang, Stefano Massaroli

The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation.

Simple linear attention language models balance the recall-throughput tradeoff

1 code implementation28 Feb 2024 Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Zinsley, James Zou, Atri Rudra, Christopher Ré

In this work, we explore whether we can improve language model efficiency (e. g. by reducing memory consumption) without compromising on recall.

Language Modelling Text Generation

Prospector Heads: Generalized Feature Attribution for Large Models & Data

1 code implementation18 Feb 2024 Gautam Machiraju, Alexander Derry, Arjun Desai, Neel Guha, Amir-Hossein Karimi, James Zou, Russ Altman, Christopher Ré, Parag Mallick

Feature attribution, the ability to localize regions of the input data that are relevant for classification, is an important capability for machine learning models in scientific and biomedical domains.

Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

no code implementations12 Feb 2024 Jon Saad-Falcon, Daniel Y. Fu, Simran Arora, Neel Guha, Christopher Ré

Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e. g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text.

Benchmarking Chunking +2

Hydragen: High-Throughput LLM Inference with Shared Prefixes

no code implementations7 Feb 2024 Jordan Juravsky, Bradley Brown, Ryan Ehrlich, Daniel Y. Fu, Christopher Ré, Azalia Mirhoseini

Decoding in this large-batch setting can be bottlenecked by the attention operation, which reads large key-value (KV) caches from memory and computes inefficient matrix-vector products for every sequence in the batch.

16k Chatbot

The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

no code implementations6 Feb 2024 Michael Zhang, Kush Bhatia, Hermann Kumbong, Christopher Ré

Experiments show Hedgehog recovers over 99% of standard Transformer quality in train-from-scratch and finetuned-conversion settings, outperforming prior linear attentions up to 6 perplexity points on WikiText-103 with causal GPTs, and up to 8. 7 GLUE score points on finetuned bidirectional BERTs.

Zoology: Measuring and Improving Recall in Efficient Language Models

2 code implementations8 Dec 2023 Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Zou, Atri Rudra, Christopher Ré

To close the gap between synthetics and real language, we develop a new formalization of the task called multi-query associative recall (MQAR) that better reflects actual language.

FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

1 code implementation10 Nov 2023 Daniel Y. Fu, Hermann Kumbong, Eric Nguyen, Christopher Ré

FlashFFTConv uses a matrix decomposition that computes the FFT using matrix multiply units and enables kernel fusion for long sequences, reducing I/O.

H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

1 code implementation24 Jun 2023 Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang Wang, Beidi Chen

Based on these insights, we propose Heavy Hitter Oracle (H$_2$O), a KV cache eviction policy that dynamically retains a balance of recent and H$_2$ tokens.

Towards trustworthy seizure onset detection using workflow notes

1 code implementation14 Jun 2023 Khaled Saab, Siyi Tang, Mohamed Taha, Christopher Lee-Messer, Christopher Ré, Daniel Rubin

We find that our multilabel model significantly improves overall seizure onset detection performance (+5. 9 AUROC points) while greatly improving performance among subgroups (up to +8. 3 AUROC points), and decreases false positives on non-epileptiform abnormalities by 8 FPR points.

EEG

Effectively Modeling Time Series with Simple Discrete State Spaces

1 code implementation16 Mar 2023 Michael Zhang, Khaled K. Saab, Michael Poli, Tri Dao, Karan Goel, Christopher Ré

For expressivity, we propose a new SSM parameterization based on the companion matrix -- a canonical representation for discrete-time processes -- which enables SpaceTime's SSM layers to learn desirable autoregressive processes.

Time Series Time Series Classification

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

1 code implementation13 Mar 2023 Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang

As a result, when running OPT-175B on a single 16GB GPU, FlexGen achieves significantly higher throughput compared to state-of-the-art offloading systems, reaching a generation throughput of 1 token/s for the first time with an effective batch size of 144.

Language Modelling Large Language Model

Collage Diffusion

no code implementations1 Mar 2023 Vishnu Sarukkai, Linden Li, Arden Ma, Christopher Ré, Kayvon Fatahalian

We seek to give users precise control over diffusion-based image generation by modeling complex scenes as sequences of layers, which define the desired spatial arrangement and visual attributes of objects in the scene.

Conditional Image Generation Image Harmonization

Hyena Hierarchy: Towards Larger Convolutional Language Models

5 code implementations21 Feb 2023 Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré

Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale.

2k 8k +2

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

3 code implementations28 Dec 2022 Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré

First, we use synthetic language modeling tasks to understand the gap between SSMs and attention.

Ranked #2 on Language Modelling on The Pile (Test perplexity metric)

8k Coreference Resolution +5

S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces

1 code implementation12 Oct 2022 Eric Nguyen, Karan Goel, Albert Gu, Gordon W. Downs, Preey Shah, Tri Dao, Stephen A. Baccus, Christopher Ré

On ImageNet-1k, S4ND exceeds the performance of a Vision Transformer baseline by $1. 5\%$ when training with a $1$D sequence of patches, and matches ConvNeXt when modeling images in $2$D.

Inductive Bias Video Classification

Ask Me Anything: A simple strategy for prompting language models

3 code implementations5 Oct 2022 Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré

Prompting is a brittle process wherein small modifications to the prompt can cause large variations in the model predictions, and therefore significant effort is dedicated towards designing a painstakingly "perfect prompt" for a task.

Coreference Resolution Natural Language Inference +2

HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions

1 code implementation18 Sep 2022 Lingjiao Chen, Zhihua Jin, Sabri Eyuboglu, Christopher Ré, Matei Zaharia, James Zou

HAPI is the first large-scale dataset of ML API usages and is a unique resource for studying ML-as-a-service (MLaaS).

object-detection Object Detection +4

LegalBench: Prototyping a Collaborative Benchmark for Legal Reasoning

1 code implementation13 Sep 2022 Neel Guha, Daniel E. Ho, Julian Nyarko, Christopher Ré

Finally-inspired by the Open Science movement-we make a call for the legal and computer science communities to join our efforts by contributing new tasks.

Legal Reasoning

Contrastive Adapters for Foundation Model Group Robustness

no code implementations14 Jul 2022 Michael Zhang, Christopher Ré

We also find that efficient ways to improve model inference (e. g., via adapters, lightweight networks with FM embeddings as inputs) do not consistently improve and can sometimes hurt group robustness compared to zero-shot (e. g., increasing the accuracy gap by 50. 1 pp on CelebA).

Contrastive Learning Zero-Shot Learning

How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections

1 code implementation24 Jun 2022 Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, Christopher Ré

Linear time-invariant state space models (SSM) are a classical model from engineering and statistics, that have recently been shown to be very promising in machine learning through the Structured State Space sequence model (S4).

Long-range modeling

On the Parameterization and Initialization of Diagonal State Space Models

2 code implementations23 Jun 2022 Albert Gu, Ankit Gupta, Karan Goel, Christopher Ré

On the other hand, a recent variant of S4 called DSS showed that restricting the state matrix to be fully diagonal can still preserve the performance of the original model when using a specific initialization based on approximating S4's matrix.

Long-range modeling Time Series Analysis

Self-Supervised Learning of Brain Dynamics from Broad Neuroimaging Data

1 code implementation22 Jun 2022 Armin W. Thomas, Christopher Ré, Russell A. Poldrack

At their core, these frameworks learn the dynamics of brain activity by modeling sequences of activity akin to how sequences of text are modeled in NLP.

Causal Language Modeling Language Modelling +1

The Importance of Background Information for Out of Distribution Generalization

no code implementations17 Jun 2022 Jupinder Parmar, Khaled Saab, Brian Pogatchnik, Daniel Rubin, Christopher Ré

Domain generalization in medical image classification is an important problem for trustworthy machine learning to be deployed in healthcare.

Domain Generalization Image Classification +3

Comparing interpretation methods in mental state decoding analyses with deep learning models

no code implementations31 May 2022 Armin W. Thomas, Christopher Ré, Russell A. Poldrack

Deep learning (DL) models find increasing application in mental state decoding, where researchers seek to understand the mapping between mental states (e. g., perceiving fear or joy) and brain activity by identifying those brain regions (and networks) whose activity allows to accurately identify (i. e., decode) these states.

Explainable artificial intelligence

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

9 code implementations27 May 2022 Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré

We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method.

16k 4k +3

Can Foundation Models Help Us Achieve Perfect Secrecy?

1 code implementation27 May 2022 Simran Arora, Christopher Ré

However, privacy and quality appear to be in tension in existing systems for personal tasks.

Federated Learning In-Context Learning +1

Can Foundation Models Wrangle Your Data?

2 code implementations20 May 2022 Avanika Narayan, Ines Chami, Laurel Orr, Simran Arora, Christopher Ré

Foundation Models (FMs) are models trained on large corpora of data that, at very large scale, can generalize to new tasks without any task-specific finetuning.

Entity Resolution Imputation +1

TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

1 code implementation Findings (ACL) 2022 Megan Leszczynski, Daniel Y. Fu, Mayee F. Chen, Christopher Ré

Entity retrieval--retrieving information about entity mentions in a query--is a key step in open-domain tasks, such as question answering or fact checking.

Entity Retrieval Fact Checking +3

Monarch: Expressive Structured Matrices for Efficient and Accurate Training

1 code implementation1 Apr 2022 Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, Christopher Ré

To address these issues, we propose a class of matrices (Monarch) that is hardware-efficient (they are parameterized as products of two block-diagonal matrices for better hardware utilization) and expressive (they can represent many commonly used transforms).

Language Modelling MRI Reconstruction

Domino: Discovering Systematic Errors with Cross-Modal Embeddings

2 code implementations ICLR 2022 Sabri Eyuboglu, Maya Varma, Khaled Saab, Jean-Benoit Delbrouck, Christopher Lee-Messer, Jared Dunnmon, James Zou, Christopher Ré

In this work, we address these challenges by first designing a principled evaluation framework that enables a quantitative comparison of SDMs across 1, 235 slice discovery settings in three input domains (natural images, medical images, and time-series data).

Representation Learning Time Series Analysis

Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision

1 code implementation24 Mar 2022 Mayee F. Chen, Daniel Y. Fu, Dyah Adila, Michael Zhang, Frederic Sala, Kayvon Fatahalian, Christopher Ré

Despite the black-box nature of foundation models, we prove results characterizing how our approach improves performance and show that lift scales with the smoothness of label distributions in embedding space.

SKM-TEA: A Dataset for Accelerated MRI Reconstruction with Dense Image Labels for Quantitative Clinical Evaluation

1 code implementation14 Mar 2022 Arjun D Desai, Andrew M Schmidt, Elka B Rubin, Christopher M Sandino, Marianne S Black, Valentina Mazzoli, Kathryn J Stevens, Robert Boutin, Christopher Ré, Garry E Gold, Brian A Hargreaves, Akshay S Chaudhari

While recent machine learning methods for MRI reconstruction and analysis have shown promise for reducing this burden, these techniques are primarily validated with imperfect image quality metrics, which are discordant with clinically-relevant measures that ultimately hamper clinical deployment and clinician trust.

MRI Reconstruction

Reasoning over Public and Private Data in Retrieval-Based Systems

1 code implementation14 Mar 2022 Simran Arora, Patrick Lewis, Angela Fan, Jacob Kahn, Christopher Ré

We first define the PUBLIC-PRIVATE AUTOREGRESSIVE INFORMATION RETRIEVAL (PAIR) privacy framework for the novel retrieval setting over multiple privacy scopes.

Fact Checking Information Retrieval +3

Correct-N-Contrast: A Contrastive Approach for Improving Robustness to Spurious Correlations

1 code implementation3 Mar 2022 Michael Zhang, Nimit S. Sohoni, Hongyang R. Zhang, Chelsea Finn, Christopher Ré

As ERM models can be good spurious attribute predictors, CNC works by (1) using a trained ERM model's outputs to identify samples with the same class but dissimilar spurious features, and (2) training a robust model with contrastive learning to learn similar representations for same-class samples.

Attribute Contrastive Learning

It's Raw! Audio Generation with State-Space Models

6 code implementations20 Feb 2022 Karan Goel, Albert Gu, Chris Donahue, Christopher Ré

SaShiMi yields state-of-the-art performance for unconditional waveform generation in the autoregressive setting.

Audio Generation Density Estimation +1

BARACK: Partially Supervised Group Robustness With Guarantees

no code implementations31 Dec 2021 Nimit S. Sohoni, Maziar Sanjabi, Nicolas Ballas, Aditya Grover, Shaoliang Nie, Hamed Firooz, Christopher Ré

Theoretically, we provide generalization bounds for our approach in terms of the worst-group performance, which scale with respect to both the total number of training points and the number of training points with group labels.

Fairness Generalization Bounds

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

1 code implementation ICLR 2022 Tri Dao, Beidi Chen, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, Christopher Ré

To address this, our main insight is to optimize over a continuous superset of sparse matrices with a fixed structure known as products of butterfly matrices.

Language Modelling

Personalized Benchmarking with the Ludwig Benchmarking Toolkit

2 code implementations8 Nov 2021 Avanika Narayan, Piero Molino, Karan Goel, Willie Neiswanger, Christopher Ré

LBT provides a configurable interface for controlling training and customizing evaluation, a standardized training framework for eliminating confounding variables, and support for multi-objective evaluation.

Benchmarking Hyperparameter Optimization +2

VORTEX: Physics-Driven Data Augmentations Using Consistency Training for Robust Accelerated MRI Reconstruction

1 code implementation3 Nov 2021 Arjun D Desai, Beliz Gunel, Batu M Ozturkler, Harris Beg, Shreyas Vasanawala, Brian A Hargreaves, Christopher Ré, John M Pauly, Akshay S Chaudhari

Deep neural networks have enabled improved image quality and fast inference times for various inverse problems, including accelerated magnetic resonance imaging (MRI) reconstruction.

Data Augmentation MRI Reconstruction

Efficiently Modeling Long Sequences with Structured State Spaces

9 code implementations ICLR 2022 Albert Gu, Karan Goel, Christopher Ré

A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies.

16k Data Augmentation +3

Scatterbrain: Unifying Sparse and Low-rank Attention Approximation

1 code implementation NeurIPS 2021 Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré

Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences.

Image Generation Language Modelling

Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers

2 code implementations NeurIPS 2021 Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré

Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency.

Computational Efficiency Memorization +3

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

1 code implementation Findings (EMNLP) 2021 Maya Varma, Laurel Orr, Sen Wu, Megan Leszczynski, Xiao Ling, Christopher Ré

Named entity disambiguation (NED), which involves mapping textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities.

Data Integration Entity Disambiguation

The Details Matter: Preventing Class Collapse in Supervised Contrastive Learning

no code implementations29 Sep 2021 Daniel Yang Fu, Mayee F Chen, Michael Zhang, Kayvon Fatahalian, Christopher Ré

Supervised contrastive learning optimizes a loss that pushes together embeddings of points from the same class while pulling apart embeddings of points from different classes.

Contrastive Learning Transfer Learning

On the Opportunities and Risks of Foundation Models

2 code implementations16 Aug 2021 Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

Challenges for cognitive decoding using deep learning methods

no code implementations16 Aug 2021 Armin W. Thomas, Christopher Ré, Russell A. Poldrack

In cognitive decoding, researchers aim to characterize a brain region's representations by identifying the cognitive states (e. g., accepting/rejecting a gamble) that can be identified from the region's activity.

Explainable artificial intelligence Transfer Learning

Declarative Machine Learning Systems

2 code implementations16 Jul 2021 Piero Molino, Christopher Ré

In this article we will describe how ML systems are currently structured, highlight important factors for their success and adoption, what are the issues current ML systems are facing and how the systems we developed addressed them.

BIG-bench Machine Learning

Mandoline: Model Evaluation under Distribution Shift

1 code implementation1 Jul 2021 Mayee Chen, Karan Goel, Nimit S. Sohoni, Fait Poms, Kayvon Fatahalian, Christopher Ré

If an unlabeled sample from the target distribution is available, along with a labeled sample from a possibly different source distribution, standard approaches such as importance weighting can be applied to estimate performance on the target.

Density Ratio Estimation Epidemiology

HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

1 code implementation7 Jun 2021 Ines Chami, Albert Gu, Dat Nguyen, Christopher Ré

Given directions, PCA relies on: (1) a parameterization of subspaces spanned by these directions, (2) a method of projection onto subspaces that preserves information in these directions, and (3) an objective to optimize, namely the variance explained by projections.

Dimensionality Reduction

Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins

1 code implementation2 Jun 2021 Sahaana Suri, Ihab F. Ilyas, Christopher Ré, Theodoros Rekatsinas

Context enrichment, or rebuilding fragmented context, using keyless joins is an implicit or explicit step in machine learning (ML) pipelines over structured data sources.

Question Answering Representation Learning

Scatterbrain: Unifying Sparse and Low-rank Attention

1 code implementation NeurIPS 2021 Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré

Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences.

Image Generation Language Modelling

Comparing the Value of Labeled and Unlabeled Data in Method-of-Moments Latent Variable Estimation

1 code implementation3 Mar 2021 Mayee F. Chen, Benjamin Cohen-Wang, Stephen Mussmann, Frederic Sala, Christopher Ré

We apply our decomposition framework to three scenarios -- well-specified, misspecified, and corrected models -- to 1) choose between labeled and unlabeled data and 2) learn from their combination.

Robustness Gym: Unifying the NLP Evaluation Landscape

2 code implementations NAACL 2021 Karan Goel, Nazneen Rajani, Jesse Vig, Samson Tan, Jason Wu, Stephan Zheng, Caiming Xiong, Mohit Bansal, Christopher Ré

Despite impressive performance on standard benchmarks, deep neural networks are often brittle when deployed in real-world systems.

Entity Linking

Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

2 code implementations ICLR 2020 Tri Dao, Nimit S. Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, Christopher Ré

Modern neural network architectures use structured linear transformations, such as low-rank matrices, sparse matrices, permutations, and the Fourier transform, to improve inference speed and reduce memory usage compared to general linear maps.

Image Classification speech-recognition +1

No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems

1 code implementation NeurIPS 2020 Nimit S. Sohoni, Jared A. Dunnmon, Geoffrey Angus, Albert Gu, Christopher Ré

As the subclass labels are frequently unavailable, models trained using only the coarser-grained class labels often exhibit highly variable performance across different subclasses.

Clustering General Classification +1

Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers

no code implementations22 Oct 2020 Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher Ré, Weijie J. Su

Intuitively, the transfer effect from one task to another task depends on dataset shifts such as sample sizes and covariance matrices.

Multi-Task Learning text-classification +1

From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

2 code implementations NeurIPS 2020 Ines Chami, Albert Gu, Vaggos Chatziafratis, Christopher Ré

Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree.

Clustering

Model Patching: Closing the Subgroup Performance Gap with Data Augmentation

1 code implementation ICLR 2021 Karan Goel, Albert Gu, Yixuan Li, Christopher Ré

Particularly concerning are models with inconsistent performance on specific subgroups of a class, e. g., exhibiting disparities in skin cancer classification in the presence or absence of a spurious bandage.

Data Augmentation Skin Cancer Classification

Contextual Embeddings: When Are They Worth It?

no code implementations ACL 2020 Simran Arora, Avner May, Jian Zhang, Christopher Ré

We study the settings for which deep contextual embeddings (e. g., BERT) give large improvements in performance relative to classic pretrained embeddings (e. g., GloVe), and an even simpler baseline---random word embeddings---focusing on the impact of the training set size and the linguistic properties of the task.

Word Embeddings

Machine Learning on Graphs: A Model and Comprehensive Taxonomy

1 code implementation7 May 2020 Ines Chami, Sami Abu-El-Haija, Bryan Perozzi, Christopher Ré, Kevin Murphy

The second, graph regularized neural networks, leverages graphs to augment neural network losses with a regularization objective for semi-supervised learning.

BIG-bench Machine Learning Graph Attention +3

Ivy: Instrumental Variable Synthesis for Causal Inference

no code implementations11 Apr 2020 Zhaobin Kuang, Frederic Sala, Nimit Sohoni, Sen Wu, Aldo Córdova-Palomera, Jared Dunnmon, James Priest, Christopher Ré

To relax these assumptions, we propose Ivy, a new method to combine IV candidates that can handle correlated and invalid IV candidates in a robust manner.

Causal Inference Epidemiology +1

Assessing Robustness to Noise: Low-Cost Head CT Triage

no code implementations17 Mar 2020 Sarah M. Hooper, Jared A. Dunnmon, Matthew P. Lungren, Sanjiv Sam Gambhir, Christopher Ré, Adam S. Wang, Bhavik N. Patel

We then show that the trained model is robust to reduced tube current and fewer projections, with the AUROC dropping only 0. 65% for images acquired with a 16x reduction in tube current and 0. 22% for images acquired with 8x fewer projections.

Computed Tomography (CT) Image Classification +1

Understanding the Downstream Instability of Word Embeddings

1 code implementation29 Feb 2020 Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher R. Aberger, Christopher Ré

To theoretically explain this tradeoff, we introduce a new measure of embedding instability---the eigenspace instability measure---which we prove bounds the disagreement in downstream predictions introduced by the change in word embeddings.

Word Embeddings

Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

1 code implementation ICML 2020 Daniel Y. Fu, Mayee F. Chen, Frederic Sala, Sarah M. Hooper, Kayvon Fatahalian, Christopher Ré

In this work, we show that, for a class of latent variable models highly applicable to weak supervision, we can find a closed-form solution to model parameters, obviating the need for iterative solutions like stochastic gradient descent (SGD).

Hyperbolic Graph Convolutional Neural Networks

3 code implementations NeurIPS 2019 Ines Chami, Rex Ying, Christopher Ré, Jure Leskovec

Here we propose Hyperbolic Graph Convolutional Neural Network (HGCN), the first inductive hyperbolic GCN that leverages both the expressiveness of GCNs and hyperbolic geometry to learn inductive node representations for hierarchical and scale-free graphs.

 Ranked #1 on Link Prediction on PPI (Accuracy metric)

Link Prediction Node Classification

Multi-Resolution Weak Supervision for Sequential Data

no code implementations NeurIPS 2019 Frederic Sala, Paroma Varma, Jason Fries, Daniel Y. Fu, Shiori Sagawa, Saelig Khattar, Ashwini Ramamoorthy, Ke Xiao, Kayvon Fatahalian, James Priest, Christopher Ré

Multi-resolution sources exacerbate this challenge due to complex correlations and sample complexity that scales in the length of the sequence.

PipeMare: Asynchronous Pipeline Parallel DNN Training

no code implementations9 Oct 2019 Bowen Yang, Jian Zhang, Jonathan Li, Christopher Ré, Christopher R. Aberger, Christopher De Sa

Pipeline parallelism (PP) when training neural networks enables larger models to be partitioned spatially, leading to both lower network communication and overall higher hardware utilization.

Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels

1 code implementation7 Oct 2019 Daniel Y. Fu, Will Crichton, James Hong, Xinwei Yao, Haotian Zhang, Anh Truong, Avanika Narayan, Maneesh Agrawala, Christopher Ré, Kayvon Fatahalian

Many real-world video analysis applications require the ability to identify domain-specific events in video, such as interviews and commercials in TV news broadcasts, or action sequences in film.

Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging

no code implementations27 Sep 2019 Luke Oakden-Rayner, Jared Dunnmon, Gustavo Carneiro, Christopher Ré

Machine learning models for medical image analysis often suffer from poor performance on important subsets of a population that are not identified during training or testing.

BIG-bench Machine Learning

Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices

2 code implementations NeurIPS 2019 Vincent S. Chen, Sen Wu, Zhenzhen Weng, Alexander Ratner, Christopher Ré

In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and "question" sentences might be important to a dialogue agent's language understanding for product purposes.

Autonomous Driving BIG-bench Machine Learning

Overton: A Data System for Monitoring and Improving Machine-Learned Products

1 code implementation7 Sep 2019 Christopher Ré, Feng Niu, Pallavi Gudipati, Charles Srisuwananukorn

We describe a system called Overton, whose main design goal is to support engineers in building, monitoring, and improving production machine learning systems.

BIG-bench Machine Learning

On the Downstream Performance of Compressed Word Embeddings

1 code implementation NeurIPS 2019 Avner May, Jian Zhang, Tri Dao, Christopher Ré

Finally, we show that by using the eigenspace overlap score as a selection criterion between embeddings drawn from a representative set we compressed, we can efficiently identify the better performing embedding with up to $2\times$ lower selection error rates than the next best measure of compression quality, and avoid the cost of training a model for each task of interest.

Generalization Bounds Quantization +1

Learning Mixed-Curvature Representations in Product Spaces

no code implementations ICLR 2019 Albert Gu, Frederic Sala, Beliz Gunel, Christopher Ré

The quality of the representations achieved by embeddings is determined by how well the geometry of the embedding space matches the structure of the data.

Riemannian optimization Word Embeddings

Medical device surveillance with electronic health records

1 code implementation3 Apr 2019 Alison Callahan, Jason A. Fries, Christopher Ré, James I Huddleston III, Nicholas J Giori, Scott Delp, Nigam H. Shah

Using hip replacements as a test case, our methods accurately extracted implant details and reports of complications and pain from electronic health records with up to 96. 3% precision, 98. 5% recall, and 97. 4% F1, improved classification performance by 12. 7- 53. 0% over rule-based methods, and detected over 6 times as many complication events compared to using structured data alone.

Reading Comprehension

Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations

1 code implementation14 Mar 2019 Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, Christopher Ré

Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions.

BIG-bench Machine Learning

Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation

1 code implementation31 Oct 2018 Jian Zhang, Avner May, Tri Dao, Christopher Ré

We investigate how to train kernel approximation methods that generalize well under a memory budget.

Quantization

Learning Compressed Transforms with Low Displacement Rank

1 code implementation NeurIPS 2018 Anna T. Thomas, Albert Gu, Tri Dao, Atri Rudra, Christopher Ré

The low displacement rank (LDR) framework for structured matrices represents a matrix through two displacement operators and a low-rank residual.

Image Classification Language Modelling

Hypertree Decompositions Revisited for PGMs

no code implementations2 Jul 2018 Aarthy Shivram Arun, Sai Vikneshwar Mani Jayaraman, Christopher Ré, Atri Rudra

We revisit the classical problem of exact inference on probabilistic graphical models (PGMs).

Training Classifiers with Natural Language Explanations

2 code implementations ACL 2018 Braden Hancock, Paroma Varma, Stephanie Wang, Martin Bringmann, Percy Liang, Christopher Ré

Training accurate classifiers requires many labels, but each label provides only limited information (one bit for binary classification).

Binary Classification General Classification +1

Representation Tradeoffs for Hyperbolic Embeddings

3 code implementations ICML 2018 Christopher De Sa, Albert Gu, Christopher Ré, Frederic Sala

Given a tree, we give a combinatorial construction that embeds the tree in hyperbolic space with arbitrarily low distortion without using optimization.

Hypertree Decompositions Revisited for PGMs

no code implementations5 Apr 2018 Aarthy Shivram Arun, Sai Vikneshwar Mani Jayaraman, Christopher Ré, Atri Rudra

We revisit the classical problem of exact inference on probabilistic graphical models (PGMs).

A Kernel Theory of Modern Data Augmentation

no code implementations16 Mar 2018 Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, Christopher Ré

Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines.

BIG-bench Machine Learning Data Augmentation

High-Accuracy Low-Precision Training

1 code implementation9 Mar 2018 Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R. Aberger, Kunle Olukotun, Christopher Ré

Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it.

Quantization Vocal Bursts Intensity Prediction

Snorkel: Rapid Training Data Creation with Weak Supervision

2 code implementations28 Nov 2017 Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré

In a user study, subject matter experts build models 2. 8x faster and increase predictive performance an average 45. 5% versus seven hours of hand labeling.

BIG-bench Machine Learning

Gaussian Quadrature for Kernel Features

no code implementations NeurIPS 2017 Tri Dao, Christopher De Sa, Christopher Ré

We show that deterministic feature maps can be constructed, for any $\gamma > 0$, to achieve error $\epsilon$ with $O(e^{e^\gamma} + \epsilon^{-1/\gamma})$ samples as $\epsilon$ goes to 0.

speech-recognition Speech Recognition

Inferring Generative Model Structure with Static Analysis

no code implementations NeurIPS 2017 Paroma Varma, Bryan He, Payal Bajaj, Imon Banerjee, Nishith Khandwala, Daniel L. Rubin, Christopher Ré

Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline.

Learning to Compose Domain-Specific Transformations for Data Augmentation

1 code implementation NeurIPS 2017 Alexander J. Ratner, Henry R. Ehrenberg, Zeshan Hussain, Jared Dunnmon, Christopher Ré

Data augmentation is a ubiquitous technique for increasing the size of labeled training sets by leveraging task-specific data transformations that preserve class labels.

Image Augmentation Relation Extraction +1

Accelerated Stochastic Power Iteration

2 code implementations10 Jul 2017 Christopher De Sa, Bryan He, Ioannis Mitliagkas, Christopher Ré, Peng Xu

We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity.

Dimensionality Reduction

ShortFuse: Biomedical Time Series Representations in the Presence of Structured Information

1 code implementation13 May 2017 Madalina Fiterau, Suvrat Bhooshan, Jason Fries, Charles Bournhonesque, Jennifer Hicks, Eni Halilaj, Christopher Ré, Scott Delp

In healthcare applications, temporal variables that encode movement, health status and longitudinal patient evolution are often accompanied by rich structured information such as demographics, diagnostics and medical exam data.

Time Series Time Series Analysis

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

no code implementations25 Oct 2016 Paroma Varma, Bryan He, Dan Iter, Peng Xu, Rose Yu, Christopher De Sa, Christopher Ré

Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to model the behavior of these sources over the entire training set.

Relation Extraction

Sub-sampled Newton Methods with Non-uniform Sampling

no code implementations NeurIPS 2016 Peng Xu, Jiyan Yang, Farbod Roosta-Khorasani, Christopher Ré, Michael W. Mahoney

As second-order methods prove to be effective in finding the minimizer to a high-precision, in this work, we propose randomized Newton-type algorithms that exploit \textit{non-uniform} sub-sampling of $\{\nabla^2 f_i(w)\}_{i=1}^{n}$, as well as inexact updates, as means to reduce the computational complexity.

Second-order methods

Parallel SGD: When does averaging help?

no code implementations23 Jun 2016 Jian Zhang, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré

Consider a number of workers running SGD independently on the same pool of data and averaging the models every once in a while -- a common but not well understood practice.

Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

1 code implementation14 Jun 2016 Stefan Hadjis, Ce Zhang, Ioannis Mitliagkas, Dan Iter, Christopher Ré

Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs.

Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much

no code implementations NeurIPS 2016 Bryan He, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré

Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions.

Asynchrony begets Momentum, with an Application to Deep Learning

3 code implementations31 May 2016 Ioannis Mitliagkas, Ce Zhang, Stefan Hadjis, Christopher Ré

Since asynchronous methods have better hardware efficiency, this result may shed light on when asynchronous execution is more efficient for deep learning systems.

Data Programming: Creating Large Training Sets, Quickly

4 code implementations NeurIPS 2016 Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré

Additionally, in initial user studies we observed that data programming may be an easier way for non-experts to create machine learning models when training data is limited or unavailable.

BIG-bench Machine Learning Slot Filling

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling

no code implementations24 Feb 2016 Christopher De Sa, Kunle Olukotun, Christopher Ré

Gibbs sampling is a Markov chain Monte Carlo technique commonly used for estimating marginal distributions.

Asynchronous stochastic convex optimization: the noise is in the noise and SGD don't care

no code implementations NeurIPS 2015 Sorathan Chaturapruek, John C. Duchi, Christopher Ré

We show that asymptotically, completely asynchronous stochastic gradient procedures achieve optimal (even to constant factors) convergence rates for the solution of convex optimization problems under nearly the same conditions required for asymptotic optimality of standard stochastic gradient procedures.

Stochastic Optimization

Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width

no code implementations NeurIPS 2015 Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

Gibbs sampling on factor graphs is a widely used inference technique, which often produces good empirical results.

Asynchronous stochastic convex optimization

1 code implementation4 Aug 2015 John C. Duchi, Sorathan Chaturapruek, Christopher Ré

We show that asymptotically, completely asynchronous stochastic gradient procedures achieve optimal (even to constant factors) convergence rates for the solution of convex optimization problems under nearly the same conditions required for asymptotic optimality of standard stochastic gradient procedures.

Stochastic Optimization

Building a Large-scale Multimodal Knowledge Base System for Answering Visual Queries

no code implementations20 Jul 2015 Yuke Zhu, Ce Zhang, Christopher Ré, Li Fei-Fei

The complexity of the visual world creates significant challenges for comprehensive visual understanding.

Retrieval

Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms

no code implementations22 Jun 2015 Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

with relaxed assumptions on the sparsity of the problem; (2) we analyze asynchronous SGD algorithms for non-convex matrix problems including matrix completion; and (3) we design and analyze an asynchronous SGD algorithm, called Buckwild!, that uses lower-precision arithmetic.

Matrix Completion

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

1 code implementation16 Apr 2015 Stefan Hadjis, Firas Abuzaid, Ce Zhang, Christopher Ré

We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals.

Weighted SGD for $\ell_p$ Regression with Randomized Preconditioning

no code implementations12 Feb 2015 Jiyan Yang, Yin-Lam Chow, Christopher Ré, Michael W. Mahoney

We aim to bridge the gap between these two methods in solving constrained overdetermined linear regression problems---e. g., $\ell_2$ and $\ell_1$ regression problems.

regression

Incremental Knowledge Base Construction Using DeepDive

no code implementations3 Feb 2015 Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, Christopher Ré

Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration.

Parallel Feature Selection Inspired by Group Testing

no code implementations NeurIPS 2014 Yingbo Zhou, Utkarsh Porwal, Ce Zhang, Hung Q. Ngo, XuanLong Nguyen, Christopher Ré, Venu Govindaraju

Superior performance of our method is demonstrated on a challenging relation extraction task from a very large data set that have both redundant features and sample size in the order of millions.

feature selection General Classification +1

Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems

no code implementations5 Nov 2014 Christopher De Sa, Kunle Olukotun, Christopher Ré

Stochastic gradient descent (SGD) on a low-rank factorization is commonly employed to speed up matrix problems including matrix completion, subspace tracking, and SDP relaxation.

Matrix Completion

Feature Engineering for Knowledge Base Construction

no code implementations24 Jul 2014 Christopher Ré, Amir Abbas Sadeghian, Zifei Shan, Jaeho Shin, Feiran Wang, Sen Wu, Ce Zhang

Our approach to KBC is based on joint probabilistic inference and learning, but we do not see inference as either a panacea or a magic bullet: inference is a tool that allows us to be systematic in how we construct, debug, and improve the quality of such systems.

Feature Engineering

A machine-compiled macroevolutionary history of Phanerozoic life

no code implementations11 Jun 2014 Shanan E. Peters, Ce Zhang, Miron Livny, Christopher Ré

Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of palaeontological data.

Data Integration Reading Comprehension

DimmWitted: A Study of Main-Memory Statistical Analytics

1 code implementation28 Mar 2014 Ce Zhang, Christopher Ré

We perform the first study of the tradeoff space of access methods and replication to support statistical analytics using first-order methods executed in the main memory of a Non-Uniform Memory Access (NUMA) machine.

Cannot find the paper you are looking for? You can Submit a new open access paper.