Search Results for author: Binhang Yuan

Found 19 papers, 10 papers with code

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

no code implementations • 5 Apr 2024 • Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Binhang Yuan, Wenhu Chen, Jie Fu, Ge Zhang

In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs.

Language Modelling Large Language Model

Paper
Add Code

DeFT: Flash Tree-attention with IO-Awareness for Efficient Tree-search-based LLM Inference

no code implementations • 30 Mar 2024 • Jinwei Yao, Kaiqi Chen, Kexun Zhang, Jiaxuan You, Binhang Yuan, Zeke Wang, Tao Lin

Decoding using tree search can greatly enhance the inference quality for transformer-based Large Language Models (LLMs).

Paper
Add Code

Exploring the Robustness of Decentralized Training for Large Language Models

no code implementations • 1 Dec 2023 • Lin Lu, Chenxi Dai, Wangcheng Tao, Binhang Yuan, Yanan sun, Pan Zhou

Decentralized training of large language models has emerged as an effective way to democratize this technology.

Federated Learning

Paper
Add Code

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

1 code implementation • 26 Oct 2023 • Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen

We show that contextual sparsity exists, that it can be accurately predicted, and that we can exploit it to speed up LLM inference in wall-clock time without compromising LLM's quality or in-context learning ability.

In-Context Learning

209

Paper
Code

Serving Deep Learning Model in Relational Databases

no code implementations • 7 Oct 2023 • Alexandre Eichenberger, Qi Lin, Saif Masood, Hong Min, Alexander Sim, Jie Wang, Yida Wang, Kesheng Wu, Binhang Yuan, Lixi Zhou, Jia Zou

Serving deep learning (DL) models on relational data has become a critical requirement across diverse commercial and scientific domains, sparking growing interest recently.

Paper
Add Code

Auto-Differentiation of Relational Computations for Very Large Scale Machine Learning

no code implementations • 31 May 2023 • Yuxin Tang, Zhimin Ding, Dimitrije Jankov, Binhang Yuan, Daniel Bourgeois, Chris Jermaine

The relational data model was designed to facilitate large-scale data management and analytics.

Management

Paper
Add Code

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

1 code implementation • 13 Mar 2023 • Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang

As a result, when running OPT-175B on a single 16GB GPU, FlexGen achieves significantly higher throughput compared to state-of-the-art offloading systems, reaching a generation throughput of 1 token/s for the first time with an effective batch size of 144.

Language Modelling Large Language Model

9,002

Paper
Code

Holistic Evaluation of Language Models

1 code implementation • 16 Nov 2022 • Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda

We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models.

Fairness Question Answering

1,635

Paper
Code

Stochastic Gradient Descent without Full Data Shuffle

1 code implementation • 12 Jun 2022 • Lijie Xu, Shuang Qiu, Binhang Yuan, Jiawei Jiang, Cedric Renggli, Shaoduo Gan, Kaan Kara, Guoliang Li, Ji Liu, Wentao Wu, Jieping Ye, Ce Zhang

In this paper, we first conduct a systematic empirical study on existing data shuffling strategies, which reveals that all existing strategies have room for improvement -- they all suffer in terms of I/O performance or convergence rate.

Computational Efficiency

Paper
Code

Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees

1 code implementation • 2 Jun 2022 • Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang

Communication compression is a crucial technique for modern distributed learning systems to alleviate their communication bottlenecks over slower networks.

Paper
Code

Decentralized Training of Foundation Models in Heterogeneous Environments

1 code implementation • 2 Jun 2022 • Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, Ce Zhang

Our key technical contribution is a scheduling algorithm that allocates different computational "tasklets" in the training of foundation models to a group of decentralized GPU devices connected by a slow heterogeneous network.

Scheduling

Paper
Code

Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters

1 code implementation • 10 Nov 2021 • Xiangru Lian, Binhang Yuan, XueFeng Zhu, Yulong Wang, Yongjun He, Honghuan Wu, Lei Sun, Haodong Lyu, Chengjun Liu, Xing Dong, Yiqiao Liao, Mingnan Luo, Congfei Zhang, Jingru Xie, Haonan Li, Lei Chen, Renjie Huang, Jianying Lin, Chengchun Shu, Xuezhong Qiu, Zhishan Liu, Dongying Kong, Lei Yuan, Hai Yu, Sen yang, Ce Zhang, Ji Liu

Specifically, in order to ensure both the training efficiency and the training accuracy, we design a novel hybrid training algorithm, where the embedding layer and the dense neural network are handled by different synchronization mechanisms; then we build a system called Persia (short for parallel recommendation training system with hybrid acceleration) to support this hybrid training algorithm.

Recommendation Systems

382

Paper
Code

BAGUA: Scaling up Distributed Learning with System Relaxations

1 code implementation • 3 Jul 2021 • Shaoduo Gan, Xiangru Lian, Rui Wang, Jianbin Chang, Chengjun Liu, Hongmei Shi, Shengzhuo Zhang, Xianghong Li, Tengxu Sun, Jiawei Jiang, Binhang Yuan, Sen yang, Ji Liu, Ce Zhang

Recent years have witnessed a growing list of systems for distributed data-parallel training.

Distributed Optimization Quantization

865

Paper
Code

Tensor Relational Algebra for Machine Learning System Design

no code implementations • 1 Sep 2020 • Binhang Yuan, Dimitrije Jankov, Jia Zou, Yuxin Tang, Daniel Bourgeois, Chris Jermaine

This implementation abstraction provides little built-in support for ML systems to scale past a single machine, or for handling large models with matrices or tensors that do not easily fit into the RAM of an ASIC.

BIG-bench Machine Learning

Paper
Add Code

A Federated Learning Framework for Healthcare IoT devices

no code implementations • 7 May 2020 • Binhang Yuan, Song Ge, Wenhui Xing

The Internet of Things (IoT) revolution has shown potential to give rise to many medical applications with access to large volumes of healthcare data collected by IoT devices.

Federated Learning

Paper
Add Code

Distributed Learning of Deep Neural Networks using Independent Subnet Training

2 code implementations • 4 Oct 2019 • Binhang Yuan, Cameron R. Wolfe, Chen Dun, Yuxin Tang, Anastasios Kyrillidis, Christopher M. Jermaine

These properties of IST can cope with issues due to distributed data, slow interconnects, or limited device memory, making IST a suitable approach for cases of mandatory distribution.

BIG-bench Machine Learning Image Classification +2

Paper
Code

Diagnosing Cardiac Abnormalities from 12-Lead Electrocardiograms Using Enhanced Deep Convolutional Neural Networks

no code implementations • 15 Aug 2019 • Binhang Yuan, Wenhui Xing

We train an enhanced deep convolutional neural network in order to identify eight cardiac abnormalities from the standard 12-lead electrocardiograms (ECGs) using the dataset of 14000 ECGs.

Paper
Add Code

Declarative Recursive Computation on an RDBMS, or, Why You Should Use a Database For Distributed Machine Learning

no code implementations • 25 Apr 2019 • Dimitrije Jankov, Shangyu Luo, Binhang Yuan, Zhuhua Cai, Jia Zou, Chris Jermaine, Zekai J. Gao

A number of popular systems, most notably Google's TensorFlow, have been implemented from the ground up to support machine learning tasks.

BIG-bench Machine Learning Management

Paper
Add Code

WaveletAE: A Wavelet-enhanced Autoencoder for Wind Turbine Blade Icing Detection

1 code implementation • 14 Feb 2019 • Binhang Yuan, Chen Wang, Chen Luo, Fei Jiang, Mingsheng Long, Philip S. Yu, Yu-An Liu

Quick detection of blade ice accretion is crucial for the maintenance of wind farms.

Anomaly Detection General Classification +3

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.