Search Results for author: Yuzhang Shang

Found 35 papers, 21 papers with code

Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis

no code implementations18 Feb 2025 Jiaqi Zhao, Ming Wang, Miao Zhang, Yuzhang Shang, Xuebo Liu, YaoWei Wang, Min Zhang, Liqiang Nie

Then, we conduct extensive experiments with the baseline within each class, covering models with various sizes (7B-70B), bitwidths, training levels (LLaMA1/2/3/3. 1), architectures (Mixtral, DeepSeekMoE and Mamba) and modality (LLaVA1. 5 and VILA1. 5) on a wide range of evaluation metrics. Through comparative analysis on the results, we summarize the superior of each PTQ strategy and modelsize-bitwidth trade-off considering the performance.

Benchmarking Mamba +1

PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models

1 code implementation18 Feb 2025 Jiaqi Zhao, Miao Zhang, Ming Wang, Yuzhang Shang, Kaihao Zhang, Weili Guan, YaoWei Wang, Min Zhang

To explore the real limit of PTQ, we propose an extremely low-bit PTQ method called PTQ1. 61, which enables weight quantization to 1. 61-bit for the first time.

Binarization Quantization

DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation

1 code implementation17 Feb 2025 Zhihang Yuan, Siyuan Wang, Rui Xie, Hanling Zhang, Tongcheng Fang, Yuzhang Shang, Shengen Yan, Guohao Dai, Yu Wang

In this paper, we propose the Dynamic Latent Frame Rate VAE (DLFR-VAE), a training-free paradigm that can make use of adaptive temporal compression in latent space.

Video Generation

E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling

no code implementations18 Dec 2024 Zhihang Yuan, Yuzhang Shang, Hanling Zhang, Tongcheng Fang, Rui Xie, Bingxin Xu, Yan Yan, Shengen Yan, Guohao Dai, Yu Wang

Our approach not only enhances computational efficiency but also aligns naturally with image generation principles by operating in continuous token space and following a hierarchical generation process from coarse to fine details.

Computational Efficiency Denoising +1

freePruner: A Training-free Approach for Large Multimodal Model Acceleration

no code implementations23 Nov 2024 Bingxin Xu, Yuzhang Shang, Yunhao Ge, Qian Lou, Yan Yan

Large Multimodal Models (LMMs) have demonstrated impressive capabilities in visual-language tasks but face significant deployment challenges due to their high computational demands.

Quantization Question Answering +2

Prompt Diffusion Robustifies Any-Modality Prompt Learning

no code implementations26 Oct 2024 Yingjun Du, Gaowen Liu, Yuzhang Shang, Yuguang Yao, Ramana Kompella, Cees G. M. Snoek

This paper introduces prompt diffusion, which uses a diffusion model to gradually refine the prompts to obtain a customized prompt for each sample.

Computational Efficiency Domain Generalization +1

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

1 code implementation14 Oct 2024 Mu Cai, Reuben Tan, Jianrui Zhang, Bocheng Zou, Kai Zhang, Feng Yao, Fangrui Zhu, Jing Gu, Yiwu Zhong, Yuzhang Shang, Yao Dou, Jaden Park, Jianfeng Gao, Yong Jae Lee, Jianwei Yang

TemporalBench consists of ~10K video question-answer pairs, derived from ~2K high-quality human annotations detailing the temporal dynamics in video clips.

2k Benchmarking +4

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

1 code implementation30 Sep 2024 Weitai Kang, Haifeng Huang, Yuzhang Shang, Mubarak Shah, Yan Yan

RIG generates two key instruction data: 1) the Adversarial Instruction-following data, which features mixed negative and positive samples to enhance the model's discriminative understanding.

Instruction Following Language Modeling +2

Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner

no code implementations19 Sep 2024 Yuzhang Shang, Bingxin Xu, Weitai Kang, Mu Cai, Yuheng Li, Zehao Wen, Zhen Dong, Kurt Keutzer, Yong Jae Lee, Yan Yan

In this paper, we first identify the primary challenges in interpolating Video-LLMs: (1) the video encoder and modality alignment projector are fixed, preventing the integration of additional frames into Video-LLMs, and (2) the LLM backbone is limited in its content length capabilities, which complicates the processing of an increased number of video tokens.

DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture

1 code implementation5 Sep 2024 Qianlong Xiang, Miao Zhang, Yuzhang Shang, Jianlong Wu, Yan Yan, Liqiang Nie

Furthermore, considering that the source data is either unaccessible or too enormous to store for current generative models, we introduce a new paradigm for their distillation without source data, termed Data-Free Knowledge Distillation for Diffusion Models (DKDM).

Data-free Knowledge Distillation Denoising

Distilling Long-tailed Datasets

1 code implementation24 Aug 2024 Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan

It reduces the distance between the student and the biased expert trajectories and prevents the tail class bias from being distilled to the synthetic dataset.

Dataset Distillation Efficient Neural Network

Dataset Quantization with Active Learning based Adaptive Sampling

1 code implementation9 Jul 2024 Zhenghao Zhao, Yuzhang Shang, Junyi Wu, Yan Yan

In addition, we introduce a novel pipeline for dataset quantization, utilizing feature space from the final stage of dataset quantization to generate more precise dataset bins.

Active Learning Dataset Distillation +1

PTQ4DiT: Post-training Quantization for Diffusion Transformers

1 code implementation25 May 2024 Junyi Wu, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, Yan Yan

SSC extends this approach by dynamically adjusting the balanced salience to capture the temporal variations in activation.

Image Generation Quantization

Efficient Multitask Dense Predictor via Binarization

no code implementations CVPR 2024 Yuzhang Shang, Dan Xu, Gaowen Liu, Ramana Rao Kompella, Yan Yan

Moreover, we introduce a knowledge distillation mechanism to correct the direction of information flow in backward propagation.

Binarization Knowledge Distillation +2

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

1 code implementation22 Mar 2024 Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan

In response, we propose PruMerge, a novel adaptive visual token reduction strategy that significantly reduces the number of visual tokens without compromising the performance of LMMs.

Language Modelling Large Language Model +4

FBPT: A Fully Binary Point Transformer

no code implementations15 Mar 2024 Zhixing Hou, Yuzhang Shang, Yan Yan

This paper presents a novel Fully Binary Point Cloud Transformer (FBPT) model which has the potential to be widely applied and expanded in the fields of robotics and mobile devices.

Binarization Point Cloud Classification

Online Multi-spectral Neuron Tracing

no code implementations10 Mar 2024 Bin Duan, Yuzhang Shang, Dawen Cai, Yan Yan

In this paper, we propose an online multi-spectral neuron tracing method with uniquely designed modules, where no offline training are required.

LLM Inference Unveiled: Survey and Roofline Model Insights

2 code implementations26 Feb 2024 Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer

Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model for systematic analysis of LLM inference techniques.

Knowledge Distillation Language Modelling +5

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

1 code implementation6 Feb 2024 Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Junchi Yan, Yan Yan

We empirically verify that our approach modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization.

Image Generation Model Compression +1

Enhancing Post-training Quantization Calibration through Contrastive Learning

no code implementations CVPR 2024 Yuzhang Shang, Gaowen Liu, Ramana Rao Kompella, Yan Yan

We aim to calibrate the quantized activations by maximizing the mutual information between the pre- and post-quantized activations.

Contrastive Learning Quantization

MIM4DD: Mutual Information Maximization for Dataset Distillation

1 code implementation NeurIPS 2023 Yuzhang Shang, Zhihang Yuan, Yan Yan

Thus, we introduce mutual information (MI) as the metric to quantify the shared information between the synthetic and the real datasets, and devise MIM4DD numerically maximizing the MI via a newly designed optimizable objective within a contrastive learning framework to update the synthetic dataset.

Contrastive Learning Dataset Distillation

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

1 code implementation10 Dec 2023 Zhihang Yuan, Yuzhang Shang, Yue Song, Qiang Wu, Yan Yan, Guangyu Sun

Based on the success of the low-rank decomposition of projection matrices in the self-attention module, we further introduce ASVD to compress the KV cache.

PB-LLM: Partially Binarized Large Language Models

2 code implementations29 Sep 2023 Yuzhang Shang, Zhihang Yuan, Qiang Wu, Zhen Dong

This paper explores network binarization, a radical form of quantization, compressing model weights to a single bit, specifically for Large Language Models (LLMs) compression.

Binarization Quantization

Causal-DFQ: Causality Guided Data-free Network Quantization

1 code implementation ICCV 2023 Yuzhang Shang, Bingxin Xu, Gaowen Liu, Ramana Kompella, Yan Yan

Inspired by the causal understanding, we propose the Causality-guided Data-free Network Quantization method, Causal-DFQ, to eliminate the reliance on data via approaching an equilibrium of causality-driven intervened distributions.

Data Free Quantization Neural Network Compression

RPTQ: Reorder-based Post-training Quantization for Large Language Models

1 code implementation3 Apr 2023 Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu

In this paper, we identify that the challenge in quantizing activations in LLMs arises from varying ranges across channels, rather than solely the presence of outliers.

Quantization

BPT: Binary Point Cloud Transformer for Place Recognition

no code implementations2 Mar 2023 Zhixing Hou, Yuzhang Shang, Tian Gao, Yan Yan

To solve this issue, we propose a binary point cloud transformer for place recognition.

Post-training Quantization on Diffusion Models

1 code implementation CVPR 2023 Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, Yan Yan

These approaches define a forward diffusion process for transforming data into noise and a backward denoising process for sampling data from noise.

Denoising Noise Estimation +1

Lipschitz Continuity Retained Binary Neural Network

1 code implementation13 Jul 2022 Yuzhang Shang, Dan Xu, Bin Duan, Ziliang Zong, Liqiang Nie, Yan Yan

Relying on the premise that the performance of a binary neural network can be largely restored with eliminated quantization error between full-precision weight vectors and their corresponding binary vectors, existing works of network binarization frequently adopt the idea of model robustness to reach the aforementioned objective.

Binarization Quantization

Network Binarization via Contrastive Learning

1 code implementation6 Jul 2022 Yuzhang Shang, Dan Xu, Ziliang Zong, Liqiang Nie, Yan Yan

Neural network binarization accelerates deep models by quantizing their weights and activations into 1-bit.

Binarization Contrastive Learning +2

Supplementing Missing Visions via Dialog for Scene Graph Generations

1 code implementation23 Apr 2022 Zhenghao Zhao, Ye Zhu, Xiaoguang Zhu, Yuzhang Shang, Yan Yan

Most current AI systems rely on the premise that the input visual data are sufficient to achieve competitive performance in various computer vision tasks.

Graph Generation Scene Graph Generation

Win the Lottery Ticket via Fourier Analysis: Frequencies Guided Network Pruning

no code implementations30 Jan 2022 Yuzhang Shang, Bin Duan, Ziliang Zong, Liqiang Nie, Yan Yan

Extensive experiments on CIFAR-10 and CIFAR-100 demonstrate the superiority of our novel Fourier analysis based MBP compared to other traditional MBP algorithms.

Knowledge Distillation Network Pruning

Contrastive Mutual Information Maximization for Binary Neural Networks

no code implementations29 Sep 2021 Yuzhang Shang, Dan Xu, Ziliang Zong, Liqiang Nie, Yan Yan

Neural network binarization accelerates deep models by quantizing their weights and activations into 1-bit.

Binarization Contrastive Learning +2

Lipschitz Continuity Guided Knowledge Distillation

no code implementations ICCV 2021 Yuzhang Shang, Bin Duan, Ziliang Zong, Liqiang Nie, Yan Yan

Knowledge distillation has become one of the most important model compression techniques by distilling knowledge from larger teacher networks to smaller student ones.

Knowledge Distillation Model Compression +2

Cannot find the paper you are looking for? You can Submit a new open access paper.