Search Results for author: Ion Stoica

Found 91 papers, 46 papers with code

Highly Available Transactions: Virtues and Limitations (Extended Version)

no code implementations1 Feb 2013 Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica

To minimize network latency and remain online during server failures and network partitions, many modern distributed data storage systems eschew transactional functionality, which provides strong semantic guarantees for groups of multiple operations over multiple data items.

Databases

SparkNet: Training Deep Networks in Spark

1 code implementation19 Nov 2015 Philipp Moritz, Robert Nishihara, Ion Stoica, Michael. I. Jordan

We introduce SparkNet, a framework for training deep networks in Spark.

Fast and Accurate Performance Analysis of LTE Radio Access Networks

no code implementations16 May 2016 Anand Padmanabha Iyer, Ion Stoica, Mosharaf Chowdhury, Li Erran Li

Our choice of this domain is influenced by its commonalities with several other domains that produce real-time data, our access to a large live dataset, and their real-time nature and dimensionality which makes it a natural fit for a popular analysis technique, machine learning (ML).

Feature Engineering Multi-Task Learning

Real-Time Machine Learning: The Missing Pieces

2 code implementations11 Mar 2017 Robert Nishihara, Philipp Moritz, Stephanie Wang, Alexey Tumanov, William Paul, Johann Schleier-Smith, Richard Liaw, Mehrdad Niknami, Michael. I. Jordan, Ion Stoica

Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making.

BIG-bench Machine Learning Decision Making

Multi-Level Discovery of Deep Options

no code implementations24 Mar 2017 Roy Fox, Sanjay Krishnan, Ion Stoica, Ken Goldberg

Augmenting an agent's control with useful higher-level behaviors called options can greatly reduce the sample complexity of reinforcement learning, but manually designing options is infeasible in high-dimensional and abstract state spaces.

A Berkeley View of Systems Challenges for AI

no code implementations15 Dec 2017 Ion Stoica, Dawn Song, Raluca Ada Popa, David Patterson, Michael W. Mahoney, Randy Katz, Anthony D. Joseph, Michael Jordan, Joseph M. Hellerstein, Joseph E. Gonzalez, Ken Goldberg, Ali Ghodsi, David Culler, Pieter Abbeel

With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production.

Machine Translation speech-recognition +1

Ray: A Distributed Framework for Emerging AI Applications

4 code implementations16 Dec 2017 Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael. I. Jordan, Ion Stoica

To meet the performance requirements, Ray employs a distributed scheduler and a distributed and fault-tolerant store to manage the system's control state.

reinforcement-learning Reinforcement Learning (RL)

RLlib: Abstractions for Distributed Reinforcement Learning

3 code implementations ICML 2018 Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael. I. Jordan, Ion Stoica

Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation.

reinforcement-learning Reinforcement Learning (RL)

Parametrized Hierarchical Procedures for Neural Programming

no code implementations ICLR 2018 Roy Fox, Richard Shin, Sanjay Krishnan, Ken Goldberg, Dawn Song, Ion Stoica

Neural programs are highly accurate and structured policies that perform algorithmic tasks by controlling the behavior of a computation mechanism.

Imitation Learning

Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

no code implementations28 Feb 2018 Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael. I. Jordan, Joseph E. Gonzalez, Sergey Levine

By enabling wider use of learned dynamics models within a model-free reinforcement learning algorithm, we improve value estimation, which, in turn, reduces the sample complexity of learning.

Continuous Control reinforcement-learning +1

Tune: A Research Platform for Distributed Model Selection and Training

4 code implementations13 Jul 2018 Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E. Gonzalez, Ion Stoica

We show that this interface meets the requirements for a broad range of hyperparameter search algorithms, allows straightforward scaling of search to large clusters, and simplifies algorithm implementation.

Hyperparameter Optimization Model Selection

Learning to Optimize Join Queries With Deep Reinforcement Learning

no code implementations9 Aug 2018 Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, Ion Stoica

Exhaustive enumeration of all possible join orders is often avoided, and most optimizers leverage heuristics to prune the search space.

Databases

InferLine: ML Inference Pipeline Composition Framework

1 code implementation5 Dec 2018 Daniel Crankshaw, Gur-Eyal Sela, Corey Zumar, Xiangxi Mo, Joseph E. Gonzalez, Ion Stoica, Alexey Tumanov

The dominant cost in production machine learning workloads is not training individual models but serving predictions from increasingly complex prediction pipelines spanning multiple models, machine learning frameworks, and parallel hardware accelerators.

Distributed, Parallel, and Cluster Computing

AutoPhase: Compiler Phase-Ordering for High Level Synthesis with Deep Reinforcement Learning

1 code implementation15 Jan 2019 Ameer Haj-Ali, Qijing Huang, William Moses, John Xiang, Ion Stoica, Krste Asanovic, John Wawrzynek

We implement a framework in the context of the LLVM compiler to optimize the ordering for HLS programs and compare the performance of deep reinforcement learning to state-of-the-art algorithms that address the phase-ordering problem.

reinforcement-learning Reinforcement Learning (RL)

The OoO VLIW JIT Compiler for GPU Inference

no code implementations28 Jan 2019 Paras Jain, Xiangxi Mo, Ajay Jain, Alexey Tumanov, Joseph E. Gonzalez, Ion Stoica

Current trends in Machine Learning~(ML) inference on hardware accelerated devices (e. g., GPUs, TPUs) point to alarmingly low utilization.

Cloud Programming Simplified: A Berkeley View on Serverless Computing

no code implementations9 Feb 2019 Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja Yadwadkar, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica, David A. Patterson

Serverless cloud computing handles virtually all the system administration operations needed to make it easier for programmers to use the cloud.

Operating Systems

Neural Packet Classification

no code implementations27 Feb 2019 Eric Liang, Hang Zhu, Xin Jin, Ion Stoica

First, many of the existing solutions are iteratively building a decision tree by splitting nodes in the tree.

Classification General Classification +2

Constrained Thompson Sampling for Wireless Link Optimization

no code implementations28 Feb 2019 Vidit Saxena, Joseph E. Gonzalez, Ion Stoica, Hugo Tullberg, Joakim Jaldén

We model rate selection as a stochastic multi-armed bandit (MAB) problem, where a finite set of transmission rates are modeled as independent bandit arms.

Thompson Sampling

Communication-efficient distributed SGD with Sketching

2 code implementations NeurIPS 2019 Nikita Ivkin, Daniel Rothchild, Enayat Ullah, Vladimir Braverman, Ion Stoica, Raman Arora

Large-scale distributed training of neural networks is often limited by network bandwidth, wherein the communication time overwhelms the local computation time.

Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection

no code implementations18 Apr 2019 Hang Zhu, Zhihao Bai, Jialin Li, Ellis Michael, Dan Ports, Ion Stoica, Xin Jin

Experimental results show that Harmonia improves the throughput of these protocols by up to 10X for a replication factor of 10, providing near-linear scalability up to the limit of our testbed.

Distributed, Parallel, and Cluster Computing

Deep Unsupervised Cardinality Estimation

1 code implementation10 May 2019 Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, Ion Stoica

To produce a truly usable estimator, we develop a Monte Carlo integration scheme on top of autoregressive models that can efficiently handle range queries with dozens of dimensions or more.

Density Estimation

Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules

3 code implementations14 May 2019 Daniel Ho, Eric Liang, Ion Stoica, Pieter Abbeel, Xi Chen

A key challenge in leveraging data augmentation for neural network training is choosing an effective augmentation policy from a large search space of candidate operations.

Image Augmentation

Helen: Maliciously Secure Coopetitive Learning for Linear Models

no code implementations16 Jul 2019 Wenting Zheng, Raluca Ada Popa, Joseph E. Gonzalez, Ion Stoica

Many organizations wish to collaboratively train machine learning models on their combined datasets for a common benefit (e. g., better medical research, or fraud detection).

Fraud Detection

A View on Deep Reinforcement Learning in System Optimization

no code implementations4 Aug 2019 Ameer Haj-Ali, Nesreen K. Ahmed, Ted Willke, Joseph Gonzalez, Krste Asanovic, Ion Stoica

We propose a set of essential metrics to guide future works in evaluating the efficacy of using deep reinforcement learning in system optimization.

reinforcement-learning Reinforcement Learning (RL)

NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning

1 code implementation20 Sep 2019 Ameer Haj-Ali, Nesreen K. Ahmed, Ted Willke, Sophia Shao, Krste Asanovic, Ion Stoica

However, these models are unable to capture the data dependency, the computation graph, or the organization of instructions.

Distributed, Parallel, and Cluster Computing Performance Programming Languages

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

2 code implementations7 Oct 2019 Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Kurt Keutzer, Ion Stoica, Joseph E. Gonzalez

We formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies.

Blink: Fast and Generic Collectives for Distributed ML

no code implementations11 Oct 2019 Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Jorgen Thelin, Nikhil Devanur, Ion Stoica

Model parameter synchronization across GPUs introduces high overheads for data-parallel training at scale.

Image Classification

HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline

no code implementations8 Jan 2020 Richard Liaw, Romil Bhardwaj, Lisa Dunlap, Yitian Zou, Joseph Gonzalez, Ion Stoica, Alexey Tumanov

Prior research in resource scheduling for machine learning training workloads has largely focused on minimizing job completion times.

Scheduling

Hoplite: Efficient and Fault-Tolerant Collective Communication for Task-Based Distributed Systems

1 code implementation13 Feb 2020 Siyuan Zhuang, Zhuohan Li, Danyang Zhuo, Stephanie Wang, Eric Liang, Robert Nishihara, Philipp Moritz, Ion Stoica

Task-based distributed frameworks (e. g., Ray, Dask, Hydro) have become increasingly popular for distributed applications that contain asynchronous and dynamic workloads, including asynchronous gradient descent, reinforcement learning, and model serving.

Distributed Computing reinforcement-learning +1

VCG Mechanism Design with Unknown Agent Values under Stochastic Bandit Feedback

no code implementations19 Apr 2020 Kirthevasan Kandasamy, Joseph E. Gonzalez, Michael. I. Jordan, Ion Stoica

To that end, we first define three notions of regret for the welfare, the individual utilities of each agent and that of the mechanism.

ProTuner: Tuning Programs with Monte Carlo Tree Search

no code implementations27 May 2020 Ameer Haj-Ali, Hasan Genc, Qijing Huang, William Moses, John Wawrzynek, Krste Asanović, Ion Stoica

We explore applying the Monte Carlo Tree Search (MCTS) algorithm in a notoriously difficult task: tuning programs for high-performance deep learning and image processing.

Scheduling

NeuroCard: One Cardinality Estimator for All Tables

1 code implementation15 Jun 2020 Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, Ion Stoica

Query optimizers rely on accurate cardinality estimates to produce good execution plans.

Variable Skipping for Autoregressive Range Density Estimation

1 code implementation ICML 2020 Eric Liang, Zongheng Yang, Ion Stoica, Pieter Abbeel, Yan Duan, Xi Chen

In this paper, we explore a technique, variable skipping, for accelerating range density estimation over deep autoregressive models.

Data Augmentation Density Estimation

FetchSGD: Communication-Efficient Federated Learning with Sketching

no code implementations15 Jul 2020 Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, Raman Arora

A key insight in the design of FetchSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch.

Federated Learning

RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem

1 code implementation NeurIPS 2021 Eric Liang, Zhanghao Wu, Michael Luo, Sven Mika, Joseph E. Gonzalez, Ion Stoica

Researchers and practitioners in the field of reinforcement learning (RL) frequently leverage parallel computation, which has led to a plethora of new algorithms and systems in the last few years.

reinforcement-learning Reinforcement Learning (RL)

Online Learning Demands in Max-min Fairness

no code implementations15 Dec 2020 Kirthevasan Kandasamy, Gur-Eyal Sela, Joseph E Gonzalez, Michael I Jordan, Ion Stoica

We describe mechanisms for the allocation of a scarce resource among multiple users in a way that is efficient, fair, and strategy-proof, but when users do not know their resource requirements.

Fairness

Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers

no code implementations19 Dec 2020 Romil Bhardwaj, Zhengxu Xia, Ganesh Ananthanarayanan, Junchen Jiang, Nikolaos Karianakis, Yuanchao Shu, Kevin Hsieh, Victor Bahl, Ion Stoica

Compressed models that are deployed on the edge servers for inference suffer from data drift, where the live video data diverges from the training data.

Scaling Replicated State Machines with Compartmentalization [Technical Report]

no code implementations31 Dec 2020 Michael Whittaker, Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas, Neil Giridharan, Joseph M. Hellerstein, Heidi Howard, Ion Stoica, Adriana Szekeres

In this paper, we introduce compartmentalization, the first comprehensive technique to eliminate state machine replication bottlenecks.

Distributed, Parallel, and Cluster Computing

TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

1 code implementation16 Feb 2021 Zhuohan Li, Siyuan Zhuang, Shiyuan Guo, Danyang Zhuo, Hao Zhang, Dawn Song, Ion Stoica

With this key idea, we design TeraPipe, a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models.

PAC Best Arm Identification Under a Deadline

no code implementations6 Jun 2021 Brijen Thananjeyan, Kirthevasan Kandasamy, Ion Stoica, Michael I. Jordan, Ken Goldberg, Joseph E. Gonzalez

In this work, the decision-maker is given a deadline of $T$ rounds, where, on each round, it can adaptively choose which arms to pull and how many times to pull them; this distinguishes the number of decisions made (i. e., time or number of rounds) from the number of samples acquired (cost).

Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments

no code implementations18 Jun 2021 Abdus Salam Azad, Edward Kim, Qiancheng Wu, Kimin Lee, Ion Stoica, Pieter Abbeel, Sanjit A. Seshia

To showcase the benefits, we interfaced SCENIC to an existing RTS environment Google Research Football(GRF) simulator and introduced a benchmark consisting of 32 realistic scenarios, encoded in SCENIC, to train RL agents and testing their generalization capabilities.

reinforcement-learning Reinforcement Learning (RL)

Accelerating Quadratic Optimization with Reinforcement Learning

1 code implementation NeurIPS 2021 Jeffrey Ichnowski, Paras Jain, Bartolomeo Stellato, Goran Banjac, Michael Luo, Francesco Borrelli, Joseph E. Gonzalez, Ion Stoica, Ken Goldberg

First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved.

reinforcement-learning Reinforcement Learning (RL)

Grounded Graph Decoding Improves Compositional Generalization in Question Answering

1 code implementation Findings (EMNLP) 2021 Yu Gai, Paras Jain, Wendi Zhang, Joseph E. Gonzalez, Dawn Song, Ion Stoica

Grounding enables the model to retain syntax information from the input in thereby significantly improving generalization over complex inputs.

Question Answering

Composing MPC with LQR and Neural Network for Amortized Efficiency and Stable Control

no code implementations14 Dec 2021 Fangyu Wu, Guanhua Wang, Siyuan Zhuang, Kehan Wang, Alexander Keimer, Ion Stoica, Alexandre Bayen

The proposed scheme does not require pre-computation and can improve the amortized running time of the composed MPC with a well-trained neural network.

Computational Efficiency Model Predictive Control

Balsa: Learning a Query Optimizer Without Expert Demonstrations

1 code implementation5 Jan 2022 Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, Ion Stoica

Query optimizers are a performance-critical component in every database system.

Representing Long-Range Context for Graph Neural Networks with Global Attention

1 code implementation NeurIPS 2021 Zhanghao Wu, Paras Jain, Matthew A. Wright, Azalia Mirhoseini, Joseph E. Gonzalez, Ion Stoica

Inspired by recent computer vision results that find position-invariant attention performant in learning long-range relationships, our method, which we call GraphTrans, applies a permutation-invariant Transformer module after a standard GNN module.

Graph Classification Graph Embedding

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

1 code implementation28 Jan 2022 Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica

Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations.

NumS: Scalable Array Programming for the Cloud

1 code implementation28 Jun 2022 Melih Elibol, Vinamra Benara, Samyu Yagati, Lianmin Zheng, Alvin Cheung, Michael I. Jordan, Ion Stoica

LSHS is a local search method which optimizes operator placement by minimizing maximum memory and network load on any given node within a distributed system.

regression Scheduling

POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

1 code implementation15 Jul 2022 Shishir G. Patil, Paras Jain, Prabal Dutta, Ion Stoica, Joseph E. Gonzalez

We demonstrate that it is possible to fine-tune both ResNet-18 and BERT within the memory constraints of a Cortex-M class embedded device while outperforming current edge training methods in energy efficiency.

Privacy Preserving

Scaling up Trustless DNN Inference with Zero-Knowledge Proofs

1 code implementation17 Oct 2022 Daniel Kang, Tatsunori Hashimoto, Ion Stoica, Yi Sun

In this work, we present the first practical ImageNet-scale method to verify ML model inference non-interactively, i. e., after the inference has been done.

Retrieval SNARKS +1

CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

1 code implementation19 Oct 2022 Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica

Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks.

Reinforcement Learning (RL) Representation Learning +1

On Optimizing the Communication of Model Parallelism

no code implementations10 Nov 2022 Yonghao Zhuang, Hexu Zhao, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

This pattern emerges when the two paradigms of model parallelism - intra-operator and inter-operator parallelism - are combined to support large models on large clusters.

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

2 code implementations22 Feb 2023 Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Model parallelism is conventionally viewed as a method to scale a single large deep learning model beyond the memory limits of a single device.

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

1 code implementation13 Mar 2023 Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang

As a result, when running OPT-175B on a single 16GB GPU, FlexGen achieves significantly higher throughput compared to state-of-the-art offloading systems, reaching a generation throughput of 1 token/s for the first time with an effective batch size of 144.

Language Modelling Large Language Model

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

5 code implementations NeurIPS 2023 Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences.

Chatbot Language Modelling +1

Efficient Memory Management for Large Language Model Serving with PagedAttention

4 code implementations12 Sep 2023 Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica

On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage.

Language Modelling Large Language Model +1

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

1 code implementation21 Sep 2023 Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications.

Chatbot Instruction Following

LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

1 code implementation5 Oct 2023 Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Xuezhe Ma, Hao Zhang

Through comprehensive experiments on single and cross-node training, we show that LightSeq achieves up to 1. 24-2. 01x end-to-end speedup, and a 2-8x longer sequence length on models with fewer heads, compared to Megatron-LM.

Online Speculative Decoding

no code implementations11 Oct 2023 Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang

We develop a prototype of online speculative decoding based on online knowledge distillation and evaluate it using both synthetic and real query data on several popular LLMs.

Knowledge Distillation

MemGPT: Towards LLMs as Operating Systems

no code implementations12 Oct 2023 Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, Joseph E. Gonzalez

Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis.

Management

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

1 code implementation6 Nov 2023 Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica

To capitalize on these opportunities, we present S-LoRA, a system designed for the scalable serving of many LoRA adapters.

Rethinking Benchmark and Contamination for Language Models with Rephrased Samples

1 code implementation8 Nov 2023 Shuo Yang, Wei-Lin Chiang, Lianmin Zheng, Joseph E. Gonzalez, Ion Stoica

Many have raised concerns about the trustworthiness of public benchmarks due to potential contamination in pre-training or fine-tuning datasets.

LLM-Assisted Code Cleaning For Training Accurate Code Generators

no code implementations25 Nov 2023 Naman jain, Tianjun Zhang, Wei-Lin Chiang, Joseph E. Gonzalez, Koushik Sen, Ion Stoica

In this work, we investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.

Code Generation

Efficiently Programming Large Language Models using SGLang

1 code implementation12 Dec 2023 Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Jeff Huang, Chuyue Sun, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng

SGLang is designed for the efficient programming of LLMs and incorporates primitives for common LLM programming patterns.

CodeScholar: Growing Idiomatic Code Examples

1 code implementation23 Dec 2023 Manish Shetty, Koushik Sen, Ion Stoica

A tool that could generate realistic, idiomatic, and contextual usage examples for one or more APIs would be immensely beneficial to developers.

Program Synthesis

SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads

no code implementations27 Dec 2023 Alind Khare, Dhruv Garg, Sukrit Kalra, Snigdha Grandhi, Ion Stoica, Alexey Tumanov

Serving models under such conditions requires these systems to strike a careful balance between the latency and accuracy requirements of the application and the overall efficiency of utilization of scarce resources.

Scheduling

Fairness in Serving Large Language Models

1 code implementation31 Dec 2023 Ying Sheng, Shiyi Cao, Dacheng Li, Banghua Zhu, Zhuohan Li, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica

High-demand LLM inference services (e. g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading.

Fairness Scheduling

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

1 code implementation3 Feb 2024 Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang

Autoregressive decoding of large language models (LLMs) is memory bandwidth bounded, resulting in high latency and significant wastes of the parallel processing power of modern accelerators.

Code Completion

Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems

no code implementations4 Mar 2024 Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou

We find empirically that across multiple language tasks, surprisingly, Voting Inference Systems' performance first increases but then decreases as a function of the number of LLM calls.

Language Modelling Large Language Model

Optimizing LLM Queries in Relational Workloads

no code implementations9 Mar 2024 Shu Liu, Asim Biswal, Audrey Cheng, Xiangxi Mo, Shiyi Cao, Joseph E. Gonzalez, Ion Stoica, Matei Zaharia

In this paper, we explore how to optimize LLM inference for analytical workloads that invoke LLMs within relational queries.

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

no code implementations12 Mar 2024 Naman jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica

Large Language Models (LLMs) applied to code-related applications have emerged as a prominent field, attracting significant interest from both academia and industry.

Code Generation

depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers

1 code implementation14 Mar 2024 Kaichao You, Runsheng Bai, Meng Cao, Jianmin Wang, Ion Stoica, Mingsheng Long

PyTorch \texttt{2. x} introduces a compiler designed to accelerate deep learning programs.

RAFT: Adapting Language Model to Domain Specific RAG

1 code implementation15 Mar 2024 Tianjun Zhang, Shishir G. Patil, Naman jain, Sheng Shen, Matei Zaharia, Ion Stoica, Joseph E. Gonzalez

In this paper, we present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "open-book" in-domain settings.

Language Modelling

Communication-Efficient Federated Learning with Sketching

no code implementations ICML 2020 Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Vladimir Braverman, Joseph Gonzalez, Ion Stoica, Raman Arora

A key insight in the design of FedSketchedSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch.

Federated Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.