Search Results for author: Peter Bailis

Found 30 papers, 13 papers with code

Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems

no code implementations4 Mar 2024 Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou

We find empirically that across multiple language tasks, surprisingly, Voting Inference Systems' performance first increases but then decreases as a function of the number of LLM calls.

Language Modelling Large Language Model

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

1 code implementation3 Feb 2024 Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang

Autoregressive decoding of large language models (LLMs) is memory bandwidth bounded, resulting in high latency and significant wastes of the parallel processing power of modern accelerators.

Code Completion

Online Speculative Decoding

no code implementations11 Oct 2023 Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang

We develop a prototype of online speculative decoding based on online knowledge distillation and evaluate it using both synthetic and real query data on several popular LLMs.

Knowledge Distillation

Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates

no code implementations27 Jul 2021 Daniel Kang, John Guibas, Peter Bailis, Tatsunori Hashimoto, Yi Sun, Matei Zaharia

Given a dataset $\mathcal{D}$, we are interested in computing the mean of a subset of $\mathcal{D}$ which matches a predicate.

Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training

1 code implementation17 Feb 2021 Kai Sheng Tai, Peter Bailis, Gregory Valiant

Self-training is a standard approach to semi-supervised learning where the learner's own predictions on unlabeled data are used as supervision during training.

Classification General Classification +1

Leveraging Organizational Resources to Adapt Models to New Data Modalities

no code implementations23 Aug 2020 Sahaana Suri, Raghuveer Chanda, Neslihan Bulut, Pradyumna Narayana, Yemao Zeng, Peter Bailis, Sugato Basu, Girija Narlikar, Christopher Re, Abishek Sethi

As applications in large organizations evolve, the machine learning (ML) models that power them must adapt the same predictive tasks to newly arising data modalities (e. g., a new video content launch in a social media application requires existing text or image models to extend to video).

Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics

no code implementations25 Jul 2020 Daniel Kang, Ankit Mathur, Teja Veeramacheneni, Peter Bailis, Matei Zaharia

This runtime engine a) efficiently pipelines preprocessing and DNN execution for inference, b) places preprocessing operations on the CPU or GPU in a hardware- and input-aware manner, and c) efficiently manages memory and threading for high throughput execution.

Chromatic Learning for Sparse Datasets

no code implementations6 Jun 2020 Vladimir Feinberg, Peter Bailis

By leveraging the structural properties of the co-occurrence graph, CL can compress sparse datasets, such as KDD Cup 2012, that contain over 50M features down to 1024, using an order of magnitude fewer features than frequency-based truncation and the hashing trick while maintaining the same test error for linear models.

Feature Compression

Model Assertions for Monitoring and Improving ML Models

1 code implementation3 Mar 2020 Daniel Kang, Deepti Raghavan, Peter Bailis, Matei Zaharia

We propose methods of using model assertions at all stages of ML system deployment, including runtime monitoring, validating labels, and continuously improving ML models.

Active Learning

LIT: Learned Intermediate Representation Training for Model Compression

1 code implementation4 Sep 2019 Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia

In this work, we introduce Learned Intermediate representation Training (LIT), a novel model compression technique that outperforms a range of recent model compression techniques by leveraging the highly repetitive structure of modern DNNs (e. g., ResNet).

Image Classification Model Compression +2

Selection via Proxy: Efficient Data Selection for Deep Learning

1 code implementation ICLR 2020 Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia

By removing hidden layers from the target model, using smaller architectures, and training for fewer epochs, we create proxies that are an order of magnitude faster to train.

Active Learning Computational Efficiency

Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference

no code implementations3 Jun 2019 Peter Kraft, Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, Matei Zaharia

First, Willump automatically cascades feature computation for classification queries: Willump classifies most data inputs using only high-value, low-cost features selected through empirical observations of ML model performance, improving query performance by up to 5x without statistically significant accuracy loss.

BIG-bench Machine Learning

CrossTrainer: Practical Domain Adaptation with Loss Reweighting

1 code implementation7 May 2019 Justin Chen, Edward Gan, Kexin Rong, Sahaana Suri, Peter Bailis

Domain adaptation provides a powerful set of model training techniques given domain-specific training data and supplemental data with unknown relevance.

Domain Adaptation

Select Via Proxy: Efficient Data Selection For Training Deep Networks

no code implementations ICLR 2019 Cody Coleman, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia

In our approach, we first train a small proxy model quickly, which we then use to estimate the utility of individual training data points, and then select the most informative ones for training the large target model.

BIG-bench Machine Learning Image Classification +1

Equivariant Transformer Networks

3 code implementations25 Jan 2019 Kai Sheng Tai, Peter Bailis, Gregory Valiant

How can prior knowledge on the transformation invariances of a domain be incorporated into the architecture of a neural network?

General Classification Image Classification

LIT: Block-wise Intermediate Representation Training for Model Compression

no code implementations ICLR 2019 Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia

Knowledge distillation (KD) is a popular method for reducing the computational overhead of deep network inference, in which the output of a teacher model is used to train a smaller, faster student model.

Knowledge Distillation Model Compression

Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark

no code implementations4 Jun 2018 Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Chris Re, Matei Zaharia

In this work, we analyze the entries from DAWNBench, which received optimized submissions from multiple industrial groups, to investigate the behavior of TTA as a metric as well as trends in the best-performing entries.

Benchmarking BIG-bench Machine Learning

BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics

no code implementations2 May 2018 Daniel Kang, Peter Bailis, Matei Zaharia

We introduce two new query optimization techniques in BlazeIt that are not supported by prior work.

Databases

Model Specialization for Inference Via End-to-End Distillation, Pruning, and Cascades

no code implementations ICLR 2018 Daniel Kang, Karey Shi, Thao Ngyuen, Stephanie Mallard, Peter Bailis, Matei Zaharia

Thus, simply fine-tuning or transfer learn- ing from a general-purpose network inherits a large computational cost that may not be necessary for a given task.

General Classification Image Classification

Sketching Linear Classifiers over Data Streams

1 code implementation7 Nov 2017 Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant

We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model.

feature selection

To Index or Not to Index: Optimizing Exact Maximum Inner Product Search

1 code implementation5 Jun 2017 Firas Abuzaid, Geet Sethi, Peter Bailis, Matei Zaharia

The brute-force approach to solving exact MIPS is computationally expensive, thus spurring recent development of novel indexes and pruning techniques for this task.

Recommendation Systems

Infrastructure for Usable Machine Learning: The Stanford DAWN Project

no code implementations22 May 2017 Peter Bailis, Kunle Olukotun, Christopher Re, Matei Zaharia

Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering organizations.

BIG-bench Machine Learning

NoScope: Optimizing Neural Network Queries over Video at Scale

1 code implementation7 Mar 2017 Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, Matei Zaharia

Given a target video, object to detect, and reference neural network, NoScope automatically searches for and trains a sequence, or cascade, of models that preserves the accuracy of the reference network but is specialized to the target video and are therefore far less computationally expensive.

Binary Classification

Highly Available Transactions: Virtues and Limitations (Extended Version)

no code implementations1 Feb 2013 Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica

To minimize network latency and remain online during server failures and network partitions, many modern distributed data storage systems eschew transactional functionality, which provides strong semantic guarantees for groups of multiple operations over multiple data items.

Databases

Cannot find the paper you are looking for? You can Submit a new open access paper.