Search Results for author: Henry Hoffmann

Found 10 papers, 2 papers with code

SEASONS: Signal and Energy Aware Sensing on iNtermittent Systems

no code implementations13 Feb 2024 Pouya Mahdi Gholami, Henry Hoffmann

Both energy-aware, batteryless intermittent systems and signal-aware adaptive sampling algorithms (ASA) aim to maximize sensor data accuracy under energy constraints in edge devices.

CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving

1 code implementation11 Oct 2023 YuHan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, YuYang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, Junchen Jiang

Compared to the recent systems that reuse the KV cache, CacheGen reduces the KV cache size by 3. 7-4. 3x and the total delay in fetching and processing contexts by 2. 7-3. 2x while having negligible impact on the LLM response quality in accuracy or perplexity.

Language Modelling Quantization

Automatic and Efficient Customization of Neural Networks for ML Applications

no code implementations7 Oct 2023 YuHan Liu, Chengcheng Wan, Kuntai Du, Henry Hoffmann, Junchen Jiang, Shan Lu, Michael Maire

ML APIs have greatly relieved application developers of the burden to design and train their own neural network models -- classifying objects in an image can now be as simple as one line of Python code to call an API.

Acela: Predictable Datacenter-level Maintenance Job Scheduling

no code implementations10 Dec 2022 Yi Ding, Aijia Gao, Thibaud Ryden, Kaushik Mitra, Sukumar Kalmanje, Yanai Golany, Michael Carbin, Henry Hoffmann

While it is tempting to use prior machine learning techniques for predicting job duration, we find that the structure of the maintenance job scheduling problem creates a unique challenge.

Scheduling

SCOPE: Safe Exploration for Dynamic Computer Systems Optimization

no code implementations22 Apr 2022 Hyunji Kim, Ahsan Pervaiz, Henry Hoffmann, Michael Carbin, Yi Ding

Such solutions monitor past system executions to learn the system's behavior under different hardware resource allocations before dynamically tuning resources to optimize the application execution.

Safe Exploration

Cello: Efficient Computer Systems Optimization with Predictive Early Termination and Censored Regression

no code implementations11 Apr 2022 Yi Ding, Alex Renda, Ahsan Pervaiz, Michael Carbin, Henry Hoffmann

Our evaluation shows that compared to the state-of-the-art SEML approach in computer systems optimization, Cello improves latency by 1. 19X for minimizing latency under a power constraint, and improves energy by 1. 18X for minimizing energy under a latency constraint.

regression

NURD: Negative-Unlabeled Learning for Online Datacenter Straggler Prediction

1 code implementation16 Mar 2022 Yi Ding, Avinash Rao, Hyebin Song, Rebecca Willett, Henry Hoffmann

To predict stragglers accurately and early without labeled positive examples or assumptions on latency distributions, this paper presents NURD, a novel Negative-Unlabeled learning approach with Reweighting and Distribution-compensation that only trains on negative and unlabeled streaming data.

Orthogonalized SGD and Nested Architectures for Anytime Neural Networks

no code implementations ICML 2020 Chengcheng Wan, Henry Hoffmann, Shan Lu, Michael Maire

We propose a novel variant of SGD customized for training network architectures that support anytime behavior: such networks produce a series of increasingly accurate outputs over time.

ALERT: Accurate Learning for Energy and Timeliness

no code implementations31 Oct 2019 Chengcheng Wan, Muhammad Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire, Shan Lu

An increasing number of software applications incorporate runtime Deep Neural Networks (DNNs) to process sensor data and return inference results to humans.

Image Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.