no code implementations • ICML 2020 • Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Vladimir Braverman, Joseph Gonzalez, Ion Stoica, Raman Arora
A key insight in the design of FedSketchedSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch.
no code implementations • 28 Aug 2024 • Yichao Fu, Siqi Zhu, Runlong Su, Aurick Qiao, Ion Stoica, Hao Zhang
In this paper, we reexamine this assumption -- we show that, although predicting the exact generation length of each request is infeasible, it is possible to predict the relative ranks of output lengths in a batch of requests, using learning to rank.
1 code implementation • 11 Aug 2024 • Shuo Yang, Ying Sheng, Joseph E. Gonzalez, Ion Stoica, Lianmin Zheng
Our key insight is that the pattern of channel sparsity is relatively static, allowing us to use offline calibration to make it efficient at runtime, thereby enabling accurate and efficient identification of important tokens.
no code implementations • 7 Aug 2024 • Deevashwer Rathee, Dacheng Li, Ion Stoica, Hao Zhang, Raluca Popa
Many inference services based on large language models (LLMs) pose a privacy concern, either revealing user prompts to the service or the proprietary weights to the user.
no code implementations • 23 Jul 2024 • Jared Quincy Davis, Boris Hanin, Lingjiao Chen, Peter Bailis, Ion Stoica, Matei Zaharia
This work aims to inform future research and practice in the design of compound AI systems.
1 code implementation • 26 Jun 2024 • Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M Waleed Kadous, Ion Stoica
Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost.
no code implementations • 20 Jun 2024 • Xiaoxuan Liu, Cade Daniel, Langxiang Hu, Woosuk Kwon, Zhuohan Li, Xiangxi Mo, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang
SmartSpec dynamically determines the best speculation length for each request (from 0, i. e., no speculation, to many tokens) -- hence the associated speculative execution costs -- based on a new metric called goodput, which characterizes the current observed load of the entire system and the speculation accuracy.
2 code implementations • 17 Jun 2024 • Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Tianhao Wu, Banghua Zhu, Joseph E. Gonzalez, Ion Stoica
Under these principles, we developed BenchBuilder, a living benchmark that filters high-quality prompts from live data sources to enable offline evaluation on fresh, challenging prompts.
no code implementations • 31 May 2024 • Justin Cui, Wei-Lin Chiang, Ion Stoica, Cho-Jui Hsieh
OR-Bench comprises 80, 000 seemingly toxic prompts across 10 common rejection categories, a subset of around 1, 000 hard prompts that are challenging even for state-of-the-art LLMs, and an additional 600 toxic prompts to prevent indiscriminate responses.
2 code implementations • 26 May 2024 • Vinamra Benara, Chandan Singh, John X. Morris, Richard Antonello, Ion Stoica, Alexander G. Huth, Jianfeng Gao
Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks.
no code implementations • 29 Apr 2024 • Michael Luo, Justin Wong, Brandon Trabucco, Yanping Huang, Joseph E. Gonzalez, Zhifeng Chen, Ruslan Salakhutdinov, Ion Stoica
This paper explores the problem of matching the prompt to a set of relevant adapters, built on recent work that highlight the performance gains of composing adapters.
1 code implementation • 22 Apr 2024 • Tyler Griggs, Xiaoxuan Liu, Jiaxiang Yu, Doyoung Kim, Wei-Lin Chiang, Alvin Cheung, Ion Stoica
Based on this analysis, we introduce M\'elange, a GPU allocation framework that navigates these diverse LLM service characteristics and heterogeneous GPU option space to automatically and efficiently derive the minimal-cost GPU allocation for a given LLM service.
1 code implementation • 10 Apr 2024 • Shishir G. Patil, Tianjun Zhang, Vivian Fang, Noppapon C., Roy Huang, Aaron Hao, Martin Casado, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica
We believe this is critical to unlock the potential for LLM agents to interact with applications and services with limited (post-facto) human involvement.
no code implementations • 6 Apr 2024 • Suppakit Waiwitlikhit, Ion Stoica, Yi Sun, Tatsunori Hashimoto, Daniel Kang
In this work, we show that it is possible to simultaneously allow model providers to keep their model weights (but not architecture) and data secret while allowing other parties to trustlessly audit model and data properties.
1 code implementation • 15 Mar 2024 • Tianjun Zhang, Shishir G. Patil, Naman jain, Sheng Shen, Matei Zaharia, Ion Stoica, Joseph E. Gonzalez
In this paper, we present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "open-book" in-domain settings.
1 code implementation • 14 Mar 2024 • Kaichao You, Runsheng Bai, Meng Cao, Jianmin Wang, Ion Stoica, Mingsheng Long
PyTorch \texttt{2. x} introduces a compiler designed to accelerate deep learning programs.
no code implementations • 12 Mar 2024 • Naman jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica
Large Language Models (LLMs) applied to code-related applications have emerged as a prominent field, attracting significant interest from both academia and industry.
no code implementations • 9 Mar 2024 • Shu Liu, Asim Biswal, Audrey Cheng, Xiangxi Mo, Shiyi Cao, Joseph E. Gonzalez, Ion Stoica, Matei Zaharia
In this paper, we explore how to optimize LLM inference for analytical workloads that invoke LLMs within relational queries.
3 code implementations • 7 Mar 2024 • Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, Ion Stoica
To address this issue, we introduce Chatbot Arena, an open platform for evaluating LLMs based on human preferences.
no code implementations • 4 Mar 2024 • Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou
Many recent state-of-the-art results in language tasks were achieved using compound systems that perform multiple Language Model (LM) calls and aggregate their responses.
1 code implementation • 3 Feb 2024 • Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang
Autoregressive decoding of large language models (LLMs) is memory bandwidth bounded, resulting in high latency and significant wastes of the parallel processing power of modern accelerators.
2 code implementations • 31 Dec 2023 • Ying Sheng, Shiyi Cao, Dacheng Li, Banghua Zhu, Zhuohan Li, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica
High-demand LLM inference services (e. g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading.
no code implementations • 27 Dec 2023 • Alind Khare, Dhruv Garg, Sukrit Kalra, Snigdha Grandhi, Ion Stoica, Alexey Tumanov
Serving models under such conditions requires these systems to strike a careful balance between the latency and accuracy requirements of the application and the overall efficiency of utilization of scarce resources.
1 code implementation • 23 Dec 2023 • Manish Shetty, Koushik Sen, Ion Stoica
A tool that could generate realistic, idiomatic, and contextual usage examples for one or more APIs would be immensely beneficial to developers.
1 code implementation • 12 Dec 2023 • Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng
SGLang consists of a frontend language and a runtime.
no code implementations • 25 Nov 2023 • Naman jain, Tianjun Zhang, Wei-Lin Chiang, Joseph E. Gonzalez, Koushik Sen, Ion Stoica
In this work, we investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.
1 code implementation • 8 Nov 2023 • Shuo Yang, Wei-Lin Chiang, Lianmin Zheng, Joseph E. Gonzalez, Ion Stoica
Many have raised concerns about the trustworthiness of public benchmarks due to potential contamination in pre-training or fine-tuning datasets.
2 code implementations • 6 Nov 2023 • Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica
To capitalize on these opportunities, we present S-LoRA, a system designed for the scalable serving of many LoRA adapters.
1 code implementation • 12 Oct 2023 • Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, Joseph E. Gonzalez
Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis.
1 code implementation • 11 Oct 2023 • Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang
Adapting to query distribution mitigates the shifts between the training distribution of the draft model and the query distribution, enabling the draft model to more accurately predict the target model's outputs.
1 code implementation • 5 Oct 2023 • Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Xuezhe Ma, Ion Stoica, Joseph E. Gonzalez, Hao Zhang
FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU.
4 code implementations • 21 Sep 2023 • Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang
Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications.
6 code implementations • 12 Sep 2023 • Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica
On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage.
8 code implementations • NeurIPS 2023 • Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica
Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences.
Ranked #3 on Long-Context Understanding on Ada-LEval (TSort)
1 code implementation • 13 Mar 2023 • Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang
As a result, when running OPT-175B on a single 16GB GPU, FlexGen achieves significantly higher throughput compared to state-of-the-art offloading systems, reaching a generation throughput of 1 token/s for the first time with an effective batch size of 144.
2 code implementations • 22 Feb 2023 • Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, Ion Stoica
Model parallelism is conventionally viewed as a method to scale a single large deep learning model beyond the memory limits of a single device.
no code implementations • 11 Feb 2023 • Daniel Kang, Xuechen Li, Ion Stoica, Carlos Guestrin, Matei Zaharia, Tatsunori Hashimoto
Recent advances in instruction-following large language models (LLMs) have led to dramatic improvements in a range of NLP tasks.
no code implementations • 10 Nov 2022 • Yonghao Zhuang, Hexu Zhao, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang
This pattern emerges when the two paradigms of model parallelism - intra-operator and inter-operator parallelism - are combined to support large models on large clusters.
1 code implementation • 19 Oct 2022 • Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica
Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks.
1 code implementation • 17 Oct 2022 • Daniel Kang, Tatsunori Hashimoto, Ion Stoica, Yi Sun
In this work, we present the first practical ImageNet-scale method to verify ML model inference non-interactively, i. e., after the inference has been done.
1 code implementation • 16 Aug 2022 • Gur-Eyal Sela, Ionel Gog, Justin Wong, Kumar Krishna Agrawal, Xiangxi Mo, Sukrit Kalra, Peter Schafhalter, Eric Leong, Xin Wang, Bharathan Balaji, Joseph Gonzalez, Ion Stoica
These works evaluate accuracy offline, one image at a time.
1 code implementation • 15 Jul 2022 • Shishir G. Patil, Paras Jain, Prabal Dutta, Ion Stoica, Joseph E. Gonzalez
We demonstrate that it is possible to fine-tune both ResNet-18 and BERT within the memory constraints of a Cortex-M class embedded device while outperforming current edge training methods in energy efficiency.
1 code implementation • 28 Jun 2022 • Melih Elibol, Vinamra Benara, Samyu Yagati, Lianmin Zheng, Alvin Cheung, Michael I. Jordan, Ion Stoica
LSHS is a local search method which optimizes operator placement by minimizing maximum memory and network load on any given node within a distributed system.
1 code implementation • 28 Jan 2022 • Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica
Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations.
1 code implementation • NeurIPS 2021 • Zhanghao Wu, Paras Jain, Matthew A. Wright, Azalia Mirhoseini, Joseph E. Gonzalez, Ion Stoica
Inspired by recent computer vision results that find position-invariant attention performant in learning long-range relationships, our method, which we call GraphTrans, applies a permutation-invariant Transformer module after a standard GNN module.
1 code implementation • 5 Jan 2022 • Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, Ion Stoica
Query optimizers are a performance-critical component in every database system.
no code implementations • 14 Dec 2021 • Fangyu Wu, Guanhua Wang, Siyuan Zhuang, Kehan Wang, Alexander Keimer, Ion Stoica, Alexandre Bayen
The proposed scheme does not require pre-computation and can improve the amortized running time of the composed MPC with a well-trained neural network.
no code implementations • 7 Dec 2021 • Michael Luo, Ashwin Balakrishna, Brijen Thananjeyan, Suraj Nair, Julian Ibarz, Jie Tan, Chelsea Finn, Ion Stoica, Ken Goldberg
Safe exploration is critical for using reinforcement learning (RL) in risk-sensitive environments.
1 code implementation • Findings (EMNLP) 2021 • Yu Gai, Paras Jain, Wendi Zhang, Joseph E. Gonzalez, Dawn Song, Ion Stoica
Grounding enables the model to retain syntax information from the input in thereby significantly improving generalization over complex inputs.
1 code implementation • NeurIPS 2021 • Jeffrey Ichnowski, Paras Jain, Bartolomeo Stellato, Goran Banjac, Michael Luo, Francesco Borrelli, Joseph E. Gonzalez, Ion Stoica, Ken Goldberg
First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved.
no code implementations • 18 Jun 2021 • Abdus Salam Azad, Edward Kim, Qiancheng Wu, Kimin Lee, Ion Stoica, Pieter Abbeel, Sanjit A. Seshia
To showcase the benefits, we interfaced SCENIC to an existing RTS environment Google Research Football(GRF) simulator and introduced a benchmark consisting of 32 realistic scenarios, encoded in SCENIC, to train RL agents and testing their generalization capabilities.
no code implementations • 11 Jun 2021 • Wenshuo Guo, Kirthevasan Kandasamy, Joseph E Gonzalez, Michael I. Jordan, Ion Stoica
The allocations at a CE are Pareto efficient and fair.
no code implementations • 6 Jun 2021 • Brijen Thananjeyan, Kirthevasan Kandasamy, Ion Stoica, Michael I. Jordan, Ken Goldberg, Joseph E. Gonzalez
In this work, the decision-maker is given a deadline of $T$ rounds, where, on each round, it can adaptively choose which arms to pull and how many times to pull them; this distinguishes the number of decisions made (i. e., time or number of rounds) from the number of samples acquired (cost).
4 code implementations • 29 Apr 2021 • Jianfei Chen, Lianmin Zheng, Zhewei Yao, Dequan Wang, Ion Stoica, Michael W. Mahoney, Joseph E. Gonzalez
On all these tasks, ActNN compresses the activation to 2 bits on average, with negligible accuracy loss.
no code implementations • JSYS Mar_Papers 2021 • Michael Whittaker, Neil Giridharan, Adriana Szekeres, Joseph Hellerstein, Heidi Howard, Faisal Nawab, Ion Stoica
State machine replication protocols, like MultiPaxos and Raft, are at the heart of numerous distributed systems.
no code implementations • JSYS Mar_Papers 2021 • Michael Whittaker, Neil Giridharan, Adriana Szekeres, Joseph Hellerstein, Ion Stoica
MultiPaxos and Raft are the two most popular and widely deployed state machine replication protocols.
1 code implementation • 16 Feb 2021 • Zhuohan Li, Siyuan Zhuang, Shiyuan Guo, Danyang Zhuo, Hao Zhang, Dawn Song, Ion Stoica
With this key idea, we design TeraPipe, a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models.
no code implementations • 31 Dec 2020 • Michael Whittaker, Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas, Neil Giridharan, Joseph M. Hellerstein, Heidi Howard, Ion Stoica, Adriana Szekeres
In this paper, we introduce compartmentalization, the first comprehensive technique to eliminate state machine replication bottlenecks.
Distributed, Parallel, and Cluster Computing
no code implementations • 19 Dec 2020 • Romil Bhardwaj, Zhengxu Xia, Ganesh Ananthanarayanan, Junchen Jiang, Nikolaos Karianakis, Yuanchao Shu, Kevin Hsieh, Victor Bahl, Ion Stoica
Compressed models that are deployed on the edge servers for inference suffer from data drift, where the live video data diverges from the training data.
no code implementations • 15 Dec 2020 • Kirthevasan Kandasamy, Gur-Eyal Sela, Joseph E Gonzalez, Michael I Jordan, Ion Stoica
We describe mechanisms for the allocation of a scarce resource among multiple users in a way that is efficient, fair, and strategy-proof, but when users do not know their resource requirements.
1 code implementation • NeurIPS 2021 • Eric Liang, Zhanghao Wu, Michael Luo, Sven Mika, Joseph E. Gonzalez, Ion Stoica
Researchers and practitioners in the field of reinforcement learning (RL) frequently leverage parallel computation, which has led to a plethora of new algorithms and systems in the last few years.
no code implementations • 31 Oct 2020 • Brijen Thananjeyan, Kirthevasan Kandasamy, Ion Stoica, Michael I. Jordan, Ken Goldberg, Joseph E. Gonzalez
Second, we present an algorithm for a fixed deadline setting, where we are given a time deadline and need to maximize the probability of finding the best arm.
no code implementations • 15 Jul 2020 • Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, Raman Arora
A key insight in the design of FetchSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch.
1 code implementation • ICML 2020 • Eric Liang, Zongheng Yang, Ion Stoica, Pieter Abbeel, Yan Duan, Xi Chen
In this paper, we explore a technique, variable skipping, for accelerating range density estimation over deep autoregressive models.
1 code implementation • EMNLP 2021 • Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph E. Gonzalez, Ion Stoica
Recent work learns contextual representations of source code by reconstructing tokens from their context.
Ranked #1 on Method name prediction on CodeSearchNet
1 code implementation • 15 Jun 2020 • Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, Ion Stoica
Query optimizers rely on accurate cardinality estimates to produce good execution plans.
no code implementations • 11 Jun 2020 • Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, Ion Stoica
Ansor can find high-performance programs that are outside the search space of existing state-of-the-art approaches.
no code implementations • 27 May 2020 • Ameer Haj-Ali, Hasan Genc, Qijing Huang, William Moses, John Wawrzynek, Krste Asanović, Ion Stoica
We explore applying the Monte Carlo Tree Search (MCTS) algorithm in a notoriously difficult task: tuning programs for high-performance deep learning and image processing.
no code implementations • 19 Apr 2020 • Kirthevasan Kandasamy, Joseph E. Gonzalez, Michael. I. Jordan, Ion Stoica
To that end, we first define three notions of regret for the welfare, the individual utilities of each agent and that of the mechanism.
1 code implementation • 2 Mar 2020 • Qijing Huang, Ameer Haj-Ali, William Moses, John Xiang, Ion Stoica, Krste Asanovic, John Wawrzynek
We compare the performance of AutoPhase to state-of-the-art algorithms that address the phase-ordering problem.
1 code implementation • 13 Feb 2020 • Siyuan Zhuang, Zhuohan Li, Danyang Zhuo, Stephanie Wang, Eric Liang, Robert Nishihara, Philipp Moritz, Ion Stoica
Task-based distributed frameworks (e. g., Ray, Dask, Hydro) have become increasingly popular for distributed applications that contain asynchronous and dynamic workloads, including asynchronous gradient descent, reinforcement learning, and model serving.
no code implementations • 8 Jan 2020 • Richard Liaw, Romil Bhardwaj, Lisa Dunlap, Yitian Zou, Joseph Gonzalez, Ion Stoica, Alexey Tumanov
Prior research in resource scheduling for machine learning training workloads has largely focused on minimizing job completion times.
1 code implementation • 29 Dec 2019 • Roy Fox, Richard Shin, William Paul, Yitian Zou, Dawn Song, Ken Goldberg, Pieter Abbeel, Ion Stoica
Autonomous agents can learn by imitating teacher demonstrations of the intended behavior.
no code implementations • ICLR 2020 • Michael Luo, Jiahao Yao, Richard Liaw, Eric Liang, Ion Stoica
To address this, we propose a new distributed reinforcement learning algorithm, IMPACT.
5 code implementations • 22 Nov 2019 • Hasan Genc, Seah Kim, Alon Amid, Ameer Haj-Ali, Vighnesh Iyer, Pranav Prakash, Jerry Zhao, Daniel Grubb, Harrison Liew, Howard Mao, Albert Ou, Colin Schmidt, Samuel Steffl, John Wright, Ion Stoica, Jonathan Ragan-Kelley, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao
DNN accelerators are often developed and evaluated in isolation without considering the cross-stack, system-level effects in real-world environments.
no code implementations • 11 Oct 2019 • Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Jorgen Thelin, Nikhil Devanur, Ion Stoica
Model parameter synchronization across GPUs introduces high overheads for data-parallel training at scale.
2 code implementations • 7 Oct 2019 • Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Kurt Keutzer, Ion Stoica, Joseph E. Gonzalez
We formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies.
1 code implementation • 20 Sep 2019 • Ameer Haj-Ali, Nesreen K. Ahmed, Ted Willke, Sophia Shao, Krste Asanovic, Ion Stoica
However, these models are unable to capture the data dependency, the computation graph, or the organization of instructions.
Distributed, Parallel, and Cluster Computing Performance Programming Languages
no code implementations • 4 Aug 2019 • Ameer Haj-Ali, Nesreen K. Ahmed, Ted Willke, Joseph Gonzalez, Krste Asanovic, Ion Stoica
We propose a set of essential metrics to guide future works in evaluating the efficacy of using deep reinforcement learning in system optimization.
no code implementations • 16 Jul 2019 • Wenting Zheng, Raluca Ada Popa, Joseph E. Gonzalez, Ion Stoica
Many organizations wish to collaboratively train machine learning models on their combined datasets for a common benefit (e. g., better medical research, or fraud detection).
3 code implementations • 14 May 2019 • Daniel Ho, Eric Liang, Ion Stoica, Pieter Abbeel, Xi Chen
A key challenge in leveraging data augmentation for neural network training is choosing an effective augmentation policy from a large search space of candidate operations.
Ranked #4 on Image Classification on SVHN
1 code implementation • 10 May 2019 • Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, Ion Stoica
To produce a truly usable estimator, we develop a Monte Carlo integration scheme on top of autoregressive models that can efficiently handle range queries with dozens of dimensions or more.
no code implementations • 18 Apr 2019 • Hang Zhu, Zhihao Bai, Jialin Li, Ellis Michael, Dan Ports, Ion Stoica, Xin Jin
Experimental results show that Harmonia improves the throughput of these protocols by up to 10X for a replication factor of 10, providing near-linear scalability up to the limit of our testbed.
Distributed, Parallel, and Cluster Computing
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
2 code implementations • NeurIPS 2019 • Nikita Ivkin, Daniel Rothchild, Enayat Ullah, Vladimir Braverman, Ion Stoica, Raman Arora
Large-scale distributed training of neural networks is often limited by network bandwidth, wherein the communication time overwhelms the local computation time.
no code implementations • 28 Feb 2019 • Vidit Saxena, Joseph E. Gonzalez, Ion Stoica, Hugo Tullberg, Joakim Jaldén
We model rate selection as a stochastic multi-armed bandit (MAB) problem, where a finite set of transmission rates are modeled as independent bandit arms.
no code implementations • 27 Feb 2019 • Eric Liang, Hang Zhu, Xin Jin, Ion Stoica
First, many of the existing solutions are iteratively building a decision tree by splitting nodes in the tree.
no code implementations • 9 Feb 2019 • Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja Yadwadkar, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica, David A. Patterson
Serverless cloud computing handles virtually all the system administration operations needed to make it easier for programmers to use the cloud.
Operating Systems
no code implementations • 28 Jan 2019 • Paras Jain, Xiangxi Mo, Ajay Jain, Alexey Tumanov, Joseph E. Gonzalez, Ion Stoica
Current trends in Machine Learning~(ML) inference on hardware accelerated devices (e. g., GPUs, TPUs) point to alarmingly low utilization.
1 code implementation • 15 Jan 2019 • Ameer Haj-Ali, Qijing Huang, William Moses, John Xiang, Ion Stoica, Krste Asanovic, John Wawrzynek
We implement a framework in the context of the LLVM compiler to optimize the ordering for HLS programs and compare the performance of deep reinforcement learning to state-of-the-art algorithms that address the phase-ordering problem.
1 code implementation • 5 Dec 2018 • Daniel Crankshaw, Gur-Eyal Sela, Corey Zumar, Xiangxi Mo, Joseph E. Gonzalez, Ion Stoica, Alexey Tumanov
The dominant cost in production machine learning workloads is not training individual models but serving predictions from increasingly complex prediction pipelines spanning multiple models, machine learning frameworks, and parallel hardware accelerators.
Distributed, Parallel, and Cluster Computing
no code implementations • 9 Aug 2018 • Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, Ion Stoica
Exhaustive enumeration of all possible join orders is often avoided, and most optimizers leverage heuristics to prune the search space.
Databases
4 code implementations • 13 Jul 2018 • Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E. Gonzalez, Ion Stoica
We show that this interface meets the requirements for a broad range of hyperparameter search algorithms, allows straightforward scaling of search to large clusters, and simplifies algorithm implementation.
no code implementations • 28 Feb 2018 • Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael. I. Jordan, Joseph E. Gonzalez, Sergey Levine
By enabling wider use of learned dynamics models within a model-free reinforcement learning algorithm, we improve value estimation, which, in turn, reduces the sample complexity of learning.
no code implementations • ICLR 2018 • Roy Fox, Richard Shin, Sanjay Krishnan, Ken Goldberg, Dawn Song, Ion Stoica
Neural programs are highly accurate and structured policies that perform algorithmic tasks by controlling the behavior of a computation mechanism.
3 code implementations • ICML 2018 • Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael. I. Jordan, Ion Stoica
Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation.
4 code implementations • 16 Dec 2017 • Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael. I. Jordan, Ion Stoica
To meet the performance requirements, Ray employs a distributed scheduler and a distributed and fault-tolerant store to manage the system's control state.
no code implementations • 15 Dec 2017 • Ion Stoica, Dawn Song, Raluca Ada Popa, David Patterson, Michael W. Mahoney, Randy Katz, Anthony D. Joseph, Michael Jordan, Joseph M. Hellerstein, Joseph E. Gonzalez, Ken Goldberg, Ali Ghodsi, David Culler, Pieter Abbeel
With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production.
no code implementations • 24 Mar 2017 • Roy Fox, Sanjay Krishnan, Ion Stoica, Ken Goldberg
Augmenting an agent's control with useful higher-level behaviors called options can greatly reduce the sample complexity of reinforcement learning, but manually designing options is infeasible in high-dimensional and abstract state spaces.
2 code implementations • 11 Mar 2017 • Robert Nishihara, Philipp Moritz, Stephanie Wang, Alexey Tumanov, William Paul, Johann Schleier-Smith, Richard Liaw, Mehrdad Niknami, Michael. I. Jordan, Ion Stoica
Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making.
no code implementations • 9 Dec 2016 • Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, Ion Stoica
In this paper, we introduce Clipper, a general-purpose low-latency prediction serving system.
no code implementations • 16 May 2016 • Anand Padmanabha Iyer, Ion Stoica, Mosharaf Chowdhury, Li Erran Li
Our choice of this domain is influenced by its commonalities with several other domains that produce real-time data, our access to a large live dataset, and their real-time nature and dimensionality which makes it a natural fit for a popular analysis technique, machine learning (ML).
1 code implementation • 19 Nov 2015 • Philipp Moritz, Robert Nishihara, Ion Stoica, Michael. I. Jordan
We introduce SparkNet, a framework for training deep networks in Spark.
no code implementations • 1 Feb 2013 • Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica
To minimize network latency and remain online during server failures and network partitions, many modern distributed data storage systems eschew transactional functionality, which provides strong semantic guarantees for groups of multiple operations over multiple data items.
Databases