Search Results for author: Shaoduo Gan

Found 10 papers, 5 papers with code

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

1 code implementation7 Apr 2024 ZiHao Wang, Bin Cui, Shaoduo Gan

In this work, we found that by identifying the importance of attention layers, we could optimize the KV-cache jointly from two dimensions, i. e., sequence-wise and layer-wise.

Language Modelling Large Language Model +1

Stochastic Gradient Descent without Full Data Shuffle

1 code implementation12 Jun 2022 Lijie Xu, Shuang Qiu, Binhang Yuan, Jiawei Jiang, Cedric Renggli, Shaoduo Gan, Kaan Kara, Guoliang Li, Ji Liu, Wentao Wu, Jieping Ye, Ce Zhang

In this paper, we first conduct a systematic empirical study on existing data shuffling strategies, which reveals that all existing strategies have room for improvement -- they all suffer in terms of I/O performance or convergence rate.

Computational Efficiency

Towards Demystifying Serverless Machine Learning Training

1 code implementation17 May 2021 Jiawei Jiang, Shaoduo Gan, Yue Liu, Fanlin Wang, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, Ce Zhang

The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-intensive applications such as ETL, query processing, or machine learning (ML).

BIG-bench Machine Learning

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed

2 code implementations4 Feb 2021 Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He

One of the most effective methods is error-compensated compression, which offers robust convergence speed even under 1-bit compression.

Scaling Unsupervised Domain Adaptation through Optimal Collaborator Selection and Lazy Discriminator Synchronization

no code implementations1 Jan 2021 Akhil Mathur, Shaoduo Gan, Anton Isopoussu, Fahim Kawsar, Nadia Berthouze, Nicholas Donald Lane

Breakthroughs in unsupervised domain adaptation (uDA) have opened up the possibility of adapting models from a label-rich source domain to unlabeled target domains.

Privacy Preserving Unsupervised Domain Adaptation

APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

no code implementations26 Aug 2020 Hanlin Tang, Shaoduo Gan, Samyam Rajbhandari, Xiangru Lian, Ji Liu, Yuxiong He, Ce Zhang

Adam is the important optimization algorithm to guarantee efficiency and accuracy for training many important tasks such as BERT and ImageNet.

Multi-Step Decentralized Domain Adaptation

no code implementations25 Sep 2019 Akhil Mathur, Shaoduo Gan, Anton Isopoussu, Fahim Kawsar, Nadia Berthouze, Nicholas D. Lane

Despite the recent breakthroughs in unsupervised domain adaptation (uDA), no prior work has studied the challenges of applying these methods in practical machine learning scenarios.

Privacy Preserving Unsupervised Domain Adaptation

Communication Compression for Decentralized Training

no code implementations NeurIPS 2018 Hanlin Tang, Shaoduo Gan, Ce Zhang, Tong Zhang, Ji Liu

In this paper, We explore a natural question: {\em can the combination of both techniques lead to a system that is robust to both bandwidth and latency?}

Cannot find the paper you are looking for? You can Submit a new open access paper.