Search Results for author: Dacheng Li

FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU.

145

Paper
Code

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

5 code implementations • NeurIPS 2023 • Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences.

Ranked #3 on Long-Context Understanding on Ada-LEval (TSort)

Chatbot Language Modelling +2

33,596

Paper
Code

Does compressing activations help model parallel training?

no code implementations • 6 Jan 2023 • Song Bian, Dacheng Li, Hongyi Wang, Eric P. Xing, Shivaram Venkataraman

Finally, we provide insights for future development of model parallelism compression algorithms.

Quantization

Paper
Add Code

MPCFormer: fast, performant and private Transformer inference with MPC

1 code implementation • 2 Nov 2022 • Dacheng Li, Rulin Shao, Hongyi Wang, Han Guo, Eric P. Xing, Hao Zhang

Through extensive evaluations, we show that MPCFORMER significantly speeds up Transformer inference in MPC settings while achieving similar ML performance to the input model.

Knowledge Distillation

Paper
Code

AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness

1 code implementation • 13 Oct 2022 • Dacheng Li, Hongyi Wang, Eric Xing, Hao Zhang

Scaling up model sizes can lead to fundamentally new capabilities in many machine learning (ML) tasks.

valid

Paper
Code

Dual Contradistinctive Generative Autoencoder

no code implementations • CVPR 2021 • Gaurav Parmar, Dacheng Li, Kwonjoon Lee, Zhuowen Tu

Our model, named dual contradistinctive generative autoencoder (DC-VAE), integrates an instance-level discriminative loss (maintaining the instance-level fidelity for the reconstruction/synthesis) with a set-level adversarial loss (encouraging the set-level fidelity for there construction/synthesis), both being contradistinctive.

Ranked #2 on Image Generation on LSUN Bedroom 128 x 128

Image Generation Image Reconstruction +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.