Search Results for author: Xupeng Miao

Found 23 papers, 16 papers with code

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

3 code implementations16 May 2023 Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia

Our evaluation shows that SpecInfer outperforms existing LLM serving systems by 1. 5-2. 8x for distributed LLM inference and by 2. 6-3. 5x for offloading-based LLM inference, while preserving the same generative performance.

Language Modelling Large Language Model

PointCLIP: Point Cloud Understanding by CLIP

2 code implementations CVPR 2022 Renrui Zhang, Ziyu Guo, Wei zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, Hongsheng Li

On top of that, we design an inter-view adapter to better extract the global feature and adaptively fuse the few-shot knowledge learned from 3D into CLIP pre-trained in 2D.

3D Open-Vocabulary Instance Segmentation Few-Shot Learning +6

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

2 code implementations25 Nov 2022 Xupeng Miao, Yujie Wang, Youhe Jiang, Chunan Shi, Xiaonan Nie, Hailin Zhang, Bin Cui

Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models.

Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference

1 code implementation27 May 2023 Zihao Yu, Haoyang Li, Fangcheng Fu, Xupeng Miao, Bin Cui

The key intuition behind our approach is to utilize the semantic mapping between the minor modifications on the input text and the affected regions on the output image.

Text-to-Image Generation

Improving Automatic Parallel Training via Balanced Memory Workload Optimization

1 code implementation5 Jul 2023 Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Shenhan Zhu, Xiaonan Nie, Yaofeng Tu, Bin Cui

Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models.

Navigate

CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

1 code implementation28 Sep 2022 Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzheng Ma, Xupeng Miao, Xuming He, Bin Cui

Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual representations with great transferability, which achieves promising accuracy for zero-shot classification.

Training-free 3D Point Cloud Classification Transfer Learning +1

SpotServe: Serving Generative Large Language Models on Preemptible Instances

1 code implementation27 Nov 2023 Xupeng Miao, Chunan Shi, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, Zhihao Jia

This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPUs at a much cheaper price than regular instances but may be preempted by the cloud at any time.

Graph Matching

Model-enhanced Vector Index

1 code implementation NeurIPS 2023 Hailin Zhang, Yujing Wang, Qi Chen, Ruiheng Chang, Ting Zhang, Ziming Miao, Yingyan Hou, Yang Ding, Xupeng Miao, Haonan Wang, Bochen Pang, Yuefeng Zhan, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Xing Xie, Mao Yang, Bin Cui

We empirically show that our model achieves better performance on the commonly used academic benchmarks MSMARCO Passage and Natural Questions, with comparable serving latency to dense retrieval solutions.

Natural Questions Quantization +1

Generative Dense Retrieval: Memory Can Be a Burden

1 code implementation19 Jan 2024 Peiwen Yuan, Xinglin Wang, Shaoxiong Feng, Boyuan Pan, Yiwei Li, HeDa Wang, Xupeng Miao, Kan Li

Memorizing-free matching mechanism from Dense Retrieval (DR) is then introduced to conduct fine-grained intra-cluster matching from clusters to relevant documents.

Retrieval

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

1 code implementation13 Jan 2024 Zhengxin Zhang, Dan Zhao, Xupeng Miao, Gabriele Oliaro, Qing Li, Yong Jiang, Zhihao Jia

Experiments show that QST can reduce the total memory footprint by up to 2. 3 $\times$ and speed up the finetuning process by up to 3 $\times$ while achieving competent performance compared with the state-of-the-art.

ROD: Reception-aware Online Distillation for Sparse Graphs

1 code implementation25 Jul 2021 Wentao Zhang, Yuezihan Jiang, Yang Li, Zeang Sheng, Yu Shen, Xupeng Miao, Liang Wang, Zhi Yang, Bin Cui

Unfortunately, many real-world networks are sparse in terms of both edges and labels, leading to sub-optimal performance of GNNs.

Clustering Graph Learning +5

Experimental Analysis of Large-scale Learnable Vector Storage Compression

1 code implementation27 Nov 2023 Hailin Zhang, Penghao Zhao, Xupeng Miao, Yingxia Shao, Zirui Liu, Tong Yang, Bin Cui

Learnable embedding vector is one of the most important applications in machine learning, and is widely used in various database-related domains.

Benchmarking

DeGNN: Characterizing and Improving Graph Neural Networks with Graph Decomposition

no code implementations10 Oct 2019 Xupeng Miao, Nezihe Merve Gürel, Wentao Zhang, Zhichao Han, Bo Li, Wei Min, Xi Rao, Hansheng Ren, Yinan Shan, Yingxia Shao, Yujie Wang, Fan Wu, Hui Xue, Yaming Yang, Zitao Zhang, Yang Zhao, Shuai Zhang, Yujing Wang, Bin Cui, Ce Zhang

Despite the wide application of Graph Convolutional Network (GCN), one major limitation is that it does not benefit from the increasing depth and suffers from the oversmoothing problem.

Memory-aware framework for fast and scalable second-order random walk over billion-edge natural graphs

no code implementations The VLDB Journal 2021 Yingxia Shao, Shiyue Huang, Yawen Li, Xupeng Miao, Bin Cui & Lei Chen

In this paper, to clearly compare the efficiency of various node sampling methods, we first design a cost model and propose two new node sampling methods: one follows the acceptance-rejection paradigm to achieve a better balance between memory and time cost, and the other is optimized for fast sampling the skewed probability distributions existed in natural graphs.

Community Detection Graph Embedding

ZOOMER: Boosting Retrieval on Web-scale Graphs by Regions of Interest

1 code implementation20 Mar 2022 Yuezihan Jiang, Yu Cheng, Hanyu Zhao, Wentao Zhang, Xupeng Miao, Yu He, Liang Wang, Zhi Yang, Bin Cui

We introduce ZOOMER, a system deployed at Taobao, the largest e-commerce platform in China, for training and serving GNN-based recommendations over web-scale graphs.

Retrieval

Towards Communication-efficient Vertical Federated Learning Training via Cache-enabled Local Updates

no code implementations29 Jul 2022 Fangcheng Fu, Xupeng Miao, Jiawei Jiang, Huanran Xue, Bin Cui

Vertical federated learning (VFL) is an emerging paradigm that allows different parties (e. g., organizations or enterprises) to collaboratively build machine learning models with privacy protection.

Vertical Federated Learning

Distributed Graph Neural Network Training: A Survey

no code implementations1 Nov 2022 Yingxia Shao, Hongzheng Li, Xizhi Gu, Hongbo Yin, Yawen Li, Xupeng Miao, Wentao Zhang, Bin Cui, Lei Chen

In recent years, many efforts have been made on distributed GNN training, and an array of training algorithms and systems have been proposed.

Distributed Computing

Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

no code implementations6 Mar 2023 Xiaonan Nie, Yi Liu, Fangcheng Fu, Jinbao Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, Bin Cui

Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially the Transformer models.

Management Scheduling

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

no code implementations8 Apr 2023 Xiaonan Nie, Xupeng Miao, Zilong Wang, Zichao Yang, Jilong Xue, Lingxiao Ma, Gang Cao, Bin Cui

We first present an empirical analysis on the problems and opportunities of training MoE models, which motivates us to overcome the routing imbalance and fluctuation problems by a dynamic expert management and device placement mechanism.

Scheduling

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

no code implementations23 Dec 2023 Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia

In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data.

Language Modelling Large Language Model

Cannot find the paper you are looking for? You can Submit a new open access paper.