Search Results for author: Ion Stoica

Found 74 papers, 35 papers with code

Communication-Efficient Federated Learning with Sketching

no code implementations ICML 2020 Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Vladimir Braverman, Joseph Gonzalez, Ion Stoica, Raman Arora

A key insight in the design of FedSketchedSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch.

Federated Learning

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

1 code implementation21 Sep 2023 Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric. P Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications.

Chatbot Instruction Following

Efficient Memory Management for Large Language Model Serving with PagedAttention

2 code implementations12 Sep 2023 Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica

On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage.

Language Modelling Large Language Model +1

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

2 code implementations9 Jun 2023 Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric. P Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences.

Chatbot Language Modelling +1

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

1 code implementation13 Mar 2023 Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang

As a result, when running OPT-175B on a single 16GB GPU, FlexGen achieves significantly higher throughput compared to state-of-the-art offloading systems, reaching a generation throughput of 1 token/s for the first time with an effective batch size of 144.

Language Modelling Large Language Model

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

2 code implementations22 Feb 2023 Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Model parallelism is conventionally viewed as a method to scale a single large deep learning model beyond the memory limits of a single device.

On Optimizing the Communication of Model Parallelism

no code implementations10 Nov 2022 Yonghao Zhuang, Hexu Zhao, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

This pattern emerges when the two paradigms of model parallelism - intra-operator and inter-operator parallelism - are combined to support large models on large clusters.

CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

1 code implementation19 Oct 2022 Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica

Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks.

Reinforcement Learning (RL) Representation Learning

Scaling up Trustless DNN Inference with Zero-Knowledge Proofs

no code implementations17 Oct 2022 Daniel Kang, Tatsunori Hashimoto, Ion Stoica, Yi Sun

In this work, we present the first practical ImageNet-scale method to verify ML model inference non-interactively, i. e., after the inference has been done.

Retrieval SNARKS

POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

1 code implementation15 Jul 2022 Shishir G. Patil, Paras Jain, Prabal Dutta, Ion Stoica, Joseph E. Gonzalez

We demonstrate that it is possible to fine-tune both ResNet-18 and BERT within the memory constraints of a Cortex-M class embedded device while outperforming current edge training methods in energy efficiency.

Privacy Preserving

NumS: Scalable Array Programming for the Cloud

no code implementations28 Jun 2022 Melih Elibol, Vinamra Benara, Samyu Yagati, Lianmin Zheng, Alvin Cheung, Michael I. Jordan, Ion Stoica

LSHS is a local search method which optimizes operator placement by minimizing maximum memory and network load on any given node within a distributed system.

regression Scheduling

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

1 code implementation28 Jan 2022 Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica

Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations.

Representing Long-Range Context for Graph Neural Networks with Global Attention

1 code implementation NeurIPS 2021 Zhanghao Wu, Paras Jain, Matthew A. Wright, Azalia Mirhoseini, Joseph E. Gonzalez, Ion Stoica

Inspired by recent computer vision results that find position-invariant attention performant in learning long-range relationships, our method, which we call GraphTrans, applies a permutation-invariant Transformer module after a standard GNN module.

Graph Classification Graph Embedding

Balsa: Learning a Query Optimizer Without Expert Demonstrations

1 code implementation5 Jan 2022 Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, Ion Stoica

Query optimizers are a performance-critical component in every database system.

Composing MPC with LQR and Neural Network for Amortized Efficiency and Stable Control

no code implementations14 Dec 2021 Fangyu Wu, Guanhua Wang, Siyuan Zhuang, Kehan Wang, Alexander Keimer, Ion Stoica, Alexandre Bayen

The proposed scheme does not require pre-computation and can improve the amortized running time of the composed MPC with a well-trained neural network.

Grounded Graph Decoding Improves Compositional Generalization in Question Answering

1 code implementation Findings (EMNLP) 2021 Yu Gai, Paras Jain, Wendi Zhang, Joseph E. Gonzalez, Dawn Song, Ion Stoica

Grounding enables the model to retain syntax information from the input in thereby significantly improving generalization over complex inputs.

Question Answering

Accelerating Quadratic Optimization with Reinforcement Learning

1 code implementation NeurIPS 2021 Jeffrey Ichnowski, Paras Jain, Bartolomeo Stellato, Goran Banjac, Michael Luo, Francesco Borrelli, Joseph E. Gonzalez, Ion Stoica, Ken Goldberg

First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved.

reinforcement-learning Reinforcement Learning (RL)

Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments

no code implementations18 Jun 2021 Abdus Salam Azad, Edward Kim, Qiancheng Wu, Kimin Lee, Ion Stoica, Pieter Abbeel, Sanjit A. Seshia

To showcase the benefits, we interfaced SCENIC to an existing RTS environment Google Research Football(GRF) simulator and introduced a benchmark consisting of 32 realistic scenarios, encoded in SCENIC, to train RL agents and testing their generalization capabilities.

reinforcement-learning Reinforcement Learning (RL)

PAC Best Arm Identification Under a Deadline

no code implementations6 Jun 2021 Brijen Thananjeyan, Kirthevasan Kandasamy, Ion Stoica, Michael I. Jordan, Ken Goldberg, Joseph E. Gonzalez

In this work, the decision-maker is given a deadline of $T$ rounds, where, on each round, it can adaptively choose which arms to pull and how many times to pull them; this distinguishes the number of decisions made (i. e., time or number of rounds) from the number of samples acquired (cost).

TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

1 code implementation16 Feb 2021 Zhuohan Li, Siyuan Zhuang, Shiyuan Guo, Danyang Zhuo, Hao Zhang, Dawn Song, Ion Stoica

With this key idea, we design TeraPipe, a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models.

Scaling Replicated State Machines with Compartmentalization [Technical Report]

no code implementations31 Dec 2020 Michael Whittaker, Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas, Neil Giridharan, Joseph M. Hellerstein, Heidi Howard, Ion Stoica, Adriana Szekeres

In this paper, we introduce compartmentalization, the first comprehensive technique to eliminate state machine replication bottlenecks.

Distributed, Parallel, and Cluster Computing

Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers

no code implementations19 Dec 2020 Romil Bhardwaj, Zhengxu Xia, Ganesh Ananthanarayanan, Junchen Jiang, Nikolaos Karianakis, Yuanchao Shu, Kevin Hsieh, Victor Bahl, Ion Stoica

Compressed models that are deployed on the edge servers for inference suffer from data drift, where the live video data diverges from the training data.

Online Learning Demands in Max-min Fairness

no code implementations15 Dec 2020 Kirthevasan Kandasamy, Gur-Eyal Sela, Joseph E Gonzalez, Michael I Jordan, Ion Stoica

We describe mechanisms for the allocation of a scarce resource among multiple users in a way that is efficient, fair, and strategy-proof, but when users do not know their resource requirements.


RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem

1 code implementation NeurIPS 2021 Eric Liang, Zhanghao Wu, Michael Luo, Sven Mika, Joseph E. Gonzalez, Ion Stoica

Researchers and practitioners in the field of reinforcement learning (RL) frequently leverage parallel computation, which has led to a plethora of new algorithms and systems in the last few years.

reinforcement-learning Reinforcement Learning (RL)

FetchSGD: Communication-Efficient Federated Learning with Sketching

no code implementations15 Jul 2020 Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, Raman Arora

A key insight in the design of FetchSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch.

Federated Learning

Variable Skipping for Autoregressive Range Density Estimation

1 code implementation ICML 2020 Eric Liang, Zongheng Yang, Ion Stoica, Pieter Abbeel, Yan Duan, Xi Chen

In this paper, we explore a technique, variable skipping, for accelerating range density estimation over deep autoregressive models.

Data Augmentation Density Estimation

NeuroCard: One Cardinality Estimator for All Tables

1 code implementation15 Jun 2020 Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, Ion Stoica

Query optimizers rely on accurate cardinality estimates to produce good execution plans.

ProTuner: Tuning Programs with Monte Carlo Tree Search

no code implementations27 May 2020 Ameer Haj-Ali, Hasan Genc, Qijing Huang, William Moses, John Wawrzynek, Krste Asanović, Ion Stoica

We explore applying the Monte Carlo Tree Search (MCTS) algorithm in a notoriously difficult task: tuning programs for high-performance deep learning and image processing.


VCG Mechanism Design with Unknown Agent Values under Stochastic Bandit Feedback

no code implementations19 Apr 2020 Kirthevasan Kandasamy, Joseph E. Gonzalez, Michael. I. Jordan, Ion Stoica

To that end, we first define three notions of regret for the welfare, the individual utilities of each agent and that of the mechanism.

Hoplite: Efficient and Fault-Tolerant Collective Communication for Task-Based Distributed Systems

1 code implementation13 Feb 2020 Siyuan Zhuang, Zhuohan Li, Danyang Zhuo, Stephanie Wang, Eric Liang, Robert Nishihara, Philipp Moritz, Ion Stoica

Task-based distributed frameworks (e. g., Ray, Dask, Hydro) have become increasingly popular for distributed applications that contain asynchronous and dynamic workloads, including asynchronous gradient descent, reinforcement learning, and model serving.

Distributed Computing reinforcement-learning +1

HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline

no code implementations8 Jan 2020 Richard Liaw, Romil Bhardwaj, Lisa Dunlap, Yitian Zou, Joseph Gonzalez, Ion Stoica, Alexey Tumanov

Prior research in resource scheduling for machine learning training workloads has largely focused on minimizing job completion times.


Blink: Fast and Generic Collectives for Distributed ML

no code implementations11 Oct 2019 Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Jorgen Thelin, Nikhil Devanur, Ion Stoica

Model parameter synchronization across GPUs introduces high overheads for data-parallel training at scale.

Image Classification

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

2 code implementations7 Oct 2019 Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Kurt Keutzer, Ion Stoica, Joseph E. Gonzalez

We formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies.

NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning

1 code implementation20 Sep 2019 Ameer Haj-Ali, Nesreen K. Ahmed, Ted Willke, Sophia Shao, Krste Asanovic, Ion Stoica

However, these models are unable to capture the data dependency, the computation graph, or the organization of instructions.

Distributed, Parallel, and Cluster Computing Performance Programming Languages

A View on Deep Reinforcement Learning in System Optimization

no code implementations4 Aug 2019 Ameer Haj-Ali, Nesreen K. Ahmed, Ted Willke, Joseph Gonzalez, Krste Asanovic, Ion Stoica

We propose a set of essential metrics to guide future works in evaluating the efficacy of using deep reinforcement learning in system optimization.

reinforcement-learning Reinforcement Learning (RL)

Helen: Maliciously Secure Coopetitive Learning for Linear Models

no code implementations16 Jul 2019 Wenting Zheng, Raluca Ada Popa, Joseph E. Gonzalez, Ion Stoica

Many organizations wish to collaboratively train machine learning models on their combined datasets for a common benefit (e. g., better medical research, or fraud detection).

Fraud Detection

Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules

3 code implementations14 May 2019 Daniel Ho, Eric Liang, Ion Stoica, Pieter Abbeel, Xi Chen

A key challenge in leveraging data augmentation for neural network training is choosing an effective augmentation policy from a large search space of candidate operations.

Image Augmentation

Deep Unsupervised Cardinality Estimation

1 code implementation10 May 2019 Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, Ion Stoica

To produce a truly usable estimator, we develop a Monte Carlo integration scheme on top of autoregressive models that can efficiently handle range queries with dozens of dimensions or more.

Density Estimation

Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection

no code implementations18 Apr 2019 Hang Zhu, Zhihao Bai, Jialin Li, Ellis Michael, Dan Ports, Ion Stoica, Xin Jin

Experimental results show that Harmonia improves the throughput of these protocols by up to 10X for a replication factor of 10, providing near-linear scalability up to the limit of our testbed.

Distributed, Parallel, and Cluster Computing

Communication-efficient distributed SGD with Sketching

2 code implementations NeurIPS 2019 Nikita Ivkin, Daniel Rothchild, Enayat Ullah, Vladimir Braverman, Ion Stoica, Raman Arora

Large-scale distributed training of neural networks is often limited by network bandwidth, wherein the communication time overwhelms the local computation time.

Constrained Thompson Sampling for Wireless Link Optimization

no code implementations28 Feb 2019 Vidit Saxena, Joseph E. Gonzalez, Ion Stoica, Hugo Tullberg, Joakim Jaldén

We model rate selection as a stochastic multi-armed bandit (MAB) problem, where a finite set of transmission rates are modeled as independent bandit arms.

Thompson Sampling

Neural Packet Classification

no code implementations27 Feb 2019 Eric Liang, Hang Zhu, Xin Jin, Ion Stoica

First, many of the existing solutions are iteratively building a decision tree by splitting nodes in the tree.

Classification General Classification +2

Cloud Programming Simplified: A Berkeley View on Serverless Computing

no code implementations9 Feb 2019 Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja Yadwadkar, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica, David A. Patterson

Serverless cloud computing handles virtually all the system administration operations needed to make it easier for programmers to use the cloud.

Operating Systems

The OoO VLIW JIT Compiler for GPU Inference

no code implementations28 Jan 2019 Paras Jain, Xiangxi Mo, Ajay Jain, Alexey Tumanov, Joseph E. Gonzalez, Ion Stoica

Current trends in Machine Learning~(ML) inference on hardware accelerated devices (e. g., GPUs, TPUs) point to alarmingly low utilization.

AutoPhase: Compiler Phase-Ordering for High Level Synthesis with Deep Reinforcement Learning

1 code implementation15 Jan 2019 Ameer Haj-Ali, Qijing Huang, William Moses, John Xiang, Ion Stoica, Krste Asanovic, John Wawrzynek

We implement a framework in the context of the LLVM compiler to optimize the ordering for HLS programs and compare the performance of deep reinforcement learning to state-of-the-art algorithms that address the phase-ordering problem.

reinforcement-learning Reinforcement Learning (RL)

InferLine: ML Inference Pipeline Composition Framework

1 code implementation5 Dec 2018 Daniel Crankshaw, Gur-Eyal Sela, Corey Zumar, Xiangxi Mo, Joseph E. Gonzalez, Ion Stoica, Alexey Tumanov

The dominant cost in production machine learning workloads is not training individual models but serving predictions from increasingly complex prediction pipelines spanning multiple models, machine learning frameworks, and parallel hardware accelerators.

Distributed, Parallel, and Cluster Computing

Learning to Optimize Join Queries With Deep Reinforcement Learning

no code implementations9 Aug 2018 Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, Ion Stoica

Exhaustive enumeration of all possible join orders is often avoided, and most optimizers leverage heuristics to prune the search space.


Tune: A Research Platform for Distributed Model Selection and Training

4 code implementations13 Jul 2018 Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E. Gonzalez, Ion Stoica

We show that this interface meets the requirements for a broad range of hyperparameter search algorithms, allows straightforward scaling of search to large clusters, and simplifies algorithm implementation.

Hyperparameter Optimization Model Selection

Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

no code implementations28 Feb 2018 Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael. I. Jordan, Joseph E. Gonzalez, Sergey Levine

By enabling wider use of learned dynamics models within a model-free reinforcement learning algorithm, we improve value estimation, which, in turn, reduces the sample complexity of learning.

Continuous Control reinforcement-learning +1

Parametrized Hierarchical Procedures for Neural Programming

no code implementations ICLR 2018 Roy Fox, Richard Shin, Sanjay Krishnan, Ken Goldberg, Dawn Song, Ion Stoica

Neural programs are highly accurate and structured policies that perform algorithmic tasks by controlling the behavior of a computation mechanism.

Imitation Learning

RLlib: Abstractions for Distributed Reinforcement Learning

3 code implementations ICML 2018 Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael. I. Jordan, Ion Stoica

Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation.

reinforcement-learning Reinforcement Learning (RL)

Ray: A Distributed Framework for Emerging AI Applications

4 code implementations16 Dec 2017 Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael. I. Jordan, Ion Stoica

To meet the performance requirements, Ray employs a distributed scheduler and a distributed and fault-tolerant store to manage the system's control state.

reinforcement-learning Reinforcement Learning (RL)

A Berkeley View of Systems Challenges for AI

no code implementations15 Dec 2017 Ion Stoica, Dawn Song, Raluca Ada Popa, David Patterson, Michael W. Mahoney, Randy Katz, Anthony D. Joseph, Michael Jordan, Joseph M. Hellerstein, Joseph E. Gonzalez, Ken Goldberg, Ali Ghodsi, David Culler, Pieter Abbeel

With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production.

Machine Translation speech-recognition +1

Multi-Level Discovery of Deep Options

no code implementations24 Mar 2017 Roy Fox, Sanjay Krishnan, Ion Stoica, Ken Goldberg

Augmenting an agent's control with useful higher-level behaviors called options can greatly reduce the sample complexity of reinforcement learning, but manually designing options is infeasible in high-dimensional and abstract state spaces.

Real-Time Machine Learning: The Missing Pieces

2 code implementations11 Mar 2017 Robert Nishihara, Philipp Moritz, Stephanie Wang, Alexey Tumanov, William Paul, Johann Schleier-Smith, Richard Liaw, Mehrdad Niknami, Michael. I. Jordan, Ion Stoica

Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making.

BIG-bench Machine Learning Decision Making

Fast and Accurate Performance Analysis of LTE Radio Access Networks

no code implementations16 May 2016 Anand Padmanabha Iyer, Ion Stoica, Mosharaf Chowdhury, Li Erran Li

Our choice of this domain is influenced by its commonalities with several other domains that produce real-time data, our access to a large live dataset, and their real-time nature and dimensionality which makes it a natural fit for a popular analysis technique, machine learning (ML).

Feature Engineering Multi-Task Learning

SparkNet: Training Deep Networks in Spark

1 code implementation19 Nov 2015 Philipp Moritz, Robert Nishihara, Ion Stoica, Michael. I. Jordan

We introduce SparkNet, a framework for training deep networks in Spark.

Highly Available Transactions: Virtues and Limitations (Extended Version)

no code implementations1 Feb 2013 Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica

To minimize network latency and remain online during server failures and network partitions, many modern distributed data storage systems eschew transactional functionality, which provides strong semantic guarantees for groups of multiple operations over multiple data items.


Cannot find the paper you are looking for? You can Submit a new open access paper.