no code implementations • 18 Feb 2025 • Sifan Zhou, Shuo Wang, Zhihang Yuan, Mingjia Shi, Yuzhang Shang, Dawei Yang
Large Language Models (LLMs) fine-tuning technologies have achieved remarkable results.
no code implementations • 18 Feb 2025 • Jiaqi Zhao, Ming Wang, Miao Zhang, Yuzhang Shang, Xuebo Liu, YaoWei Wang, Min Zhang, Liqiang Nie
Then, we conduct extensive experiments with the baseline within each class, covering models with various sizes (7B-70B), bitwidths, training levels (LLaMA1/2/3/3. 1), architectures (Mixtral, DeepSeekMoE and Mamba) and modality (LLaVA1. 5 and VILA1. 5) on a wide range of evaluation metrics. Through comparative analysis on the results, we summarize the superior of each PTQ strategy and modelsize-bitwidth trade-off considering the performance.
1 code implementation • 18 Feb 2025 • Jiaqi Zhao, Miao Zhang, Ming Wang, Yuzhang Shang, Kaihao Zhang, Weili Guan, YaoWei Wang, Min Zhang
To explore the real limit of PTQ, we propose an extremely low-bit PTQ method called PTQ1. 61, which enables weight quantization to 1. 61-bit for the first time.
1 code implementation • 17 Feb 2025 • Zhihang Yuan, Siyuan Wang, Rui Xie, Hanling Zhang, Tongcheng Fang, Yuzhang Shang, Shengen Yan, Guohao Dai, Yu Wang
In this paper, we propose the Dynamic Latent Frame Rate VAE (DLFR-VAE), a training-free paradigm that can make use of adaptive temporal compression in latent space.
no code implementations • 18 Dec 2024 • Zhihang Yuan, Yuzhang Shang, Hanling Zhang, Tongcheng Fang, Rui Xie, Bingxin Xu, Yan Yan, Shengen Yan, Guohao Dai, Yu Wang
Our approach not only enhances computational efficiency but also aligns naturally with image generation principles by operating in continuous token space and following a hierarchical generation process from coarse to fine details.
no code implementations • 23 Nov 2024 • Bingxin Xu, Yuzhang Shang, Yunhao Ge, Qian Lou, Yan Yan
Large Multimodal Models (LMMs) have demonstrated impressive capabilities in visual-language tasks but face significant deployment challenges due to their high computational demands.
no code implementations • 26 Oct 2024 • Yingjun Du, Gaowen Liu, Yuzhang Shang, Yuguang Yao, Ramana Kompella, Cees G. M. Snoek
This paper introduces prompt diffusion, which uses a diffusion model to gradually refine the prompts to obtain a customized prompt for each sample.
1 code implementation • 14 Oct 2024 • Mu Cai, Reuben Tan, Jianrui Zhang, Bocheng Zou, Kai Zhang, Feng Yao, Fangrui Zhu, Jing Gu, Yiwu Zhong, Yuzhang Shang, Yao Dou, Jaden Park, Jianfeng Gao, Yong Jae Lee, Jianwei Yang
TemporalBench consists of ~10K video question-answer pairs, derived from ~2K high-quality human annotations detailing the temporal dynamics in video clips.
1 code implementation • 30 Sep 2024 • Weitai Kang, Haifeng Huang, Yuzhang Shang, Mubarak Shah, Yan Yan
RIG generates two key instruction data: 1) the Adversarial Instruction-following data, which features mixed negative and positive samples to enhance the model's discriminative understanding.
no code implementations • 19 Sep 2024 • Yuzhang Shang, Bingxin Xu, Weitai Kang, Mu Cai, Yuheng Li, Zehao Wen, Zhen Dong, Kurt Keutzer, Yong Jae Lee, Yan Yan
In this paper, we first identify the primary challenges in interpolating Video-LLMs: (1) the video encoder and modality alignment projector are fixed, preventing the integration of additional frames into Video-LLMs, and (2) the LLM backbone is limited in its content length capabilities, which complicates the processing of an increased number of video tokens.
1 code implementation • 5 Sep 2024 • Qianlong Xiang, Miao Zhang, Yuzhang Shang, Jianlong Wu, Yan Yan, Liqiang Nie
Furthermore, considering that the source data is either unaccessible or too enormous to store for current generative models, we introduce a new paradigm for their distillation without source data, termed Data-Free Knowledge Distillation for Diffusion Models (DKDM).
1 code implementation • 24 Aug 2024 • Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan
It reduces the distance between the student and the biased expert trajectories and prevents the tail class bias from being distilled to the synthetic dataset.
1 code implementation • 9 Jul 2024 • Zhenghao Zhao, Yuzhang Shang, Junyi Wu, Yan Yan
In addition, we introduce a novel pipeline for dataset quantization, utilizing feature space from the final stage of dataset quantization to generate more precise dataset bins.
2 code implementations • 27 May 2024 • Kai Wang, Mingjia Shi, Yukun Zhou, Zekai Li, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Hanwang Zhang, Yang You
Training diffusion models is always a computation-intensive task.
1 code implementation • 25 May 2024 • Junyi Wu, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, Yan Yan
SSC extends this approach by dynamically adjusting the balanced salience to capture the temporal variations in activation.
no code implementations • CVPR 2024 • Yuzhang Shang, Dan Xu, Gaowen Liu, Ramana Rao Kompella, Yan Yan
Moreover, we introduce a knowledge distillation mechanism to correct the direction of information flow in backward propagation.
1 code implementation • 22 Mar 2024 • Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan
In response, we propose PruMerge, a novel adaptive visual token reduction strategy that significantly reduces the number of visual tokens without compromising the performance of LMMs.
no code implementations • 15 Mar 2024 • Zhixing Hou, Yuzhang Shang, Yan Yan
This paper presents a novel Fully Binary Point Cloud Transformer (FBPT) model which has the potential to be widely applied and expanded in the fields of robotics and mobile devices.
no code implementations • 10 Mar 2024 • Bin Duan, Yuzhang Shang, Dawen Cai, Yan Yan
In this paper, we propose an online multi-spectral neuron tracing method with uniquely designed modules, where no offline training are required.
2 code implementations • 26 Feb 2024 • Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model for systematic analysis of LLM inference techniques.
1 code implementation • 6 Feb 2024 • Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Junchi Yan, Yan Yan
We empirically verify that our approach modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization.
no code implementations • CVPR 2024 • Yuzhang Shang, Gaowen Liu, Ramana Rao Kompella, Yan Yan
We aim to calibrate the quantized activations by maximizing the mutual information between the pre- and post-quantized activations.
1 code implementation • NeurIPS 2023 • Yuzhang Shang, Zhihang Yuan, Yan Yan
Thus, we introduce mutual information (MI) as the metric to quantify the shared information between the synthetic and the real datasets, and devise MIM4DD numerically maximizing the MI via a newly designed optimizable objective within a contrastive learning framework to update the synthetic dataset.
1 code implementation • 10 Dec 2023 • Zhihang Yuan, Yuzhang Shang, Yue Song, Qiang Wu, Yan Yan, Guangyu Sun
Based on the success of the low-rank decomposition of projection matrices in the self-attention module, we further introduce ASVD to compress the KV cache.
2 code implementations • 29 Sep 2023 • Yuzhang Shang, Zhihang Yuan, Qiang Wu, Zhen Dong
This paper explores network binarization, a radical form of quantization, compressing model weights to a single bit, specifically for Large Language Models (LLMs) compression.
1 code implementation • ICCV 2023 • Yuzhang Shang, Bingxin Xu, Gaowen Liu, Ramana Kompella, Yan Yan
Inspired by the causal understanding, we propose the Causality-guided Data-free Network Quantization method, Causal-DFQ, to eliminate the reliance on data via approaching an equilibrium of causality-driven intervened distributions.
1 code implementation • 3 Apr 2023 • Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu
In this paper, we identify that the challenge in quantizing activations in LLMs arises from varying ranges across channels, rather than solely the presence of outliers.
no code implementations • 2 Mar 2023 • Zhixing Hou, Yuzhang Shang, Tian Gao, Yan Yan
To solve this issue, we propose a binary point cloud transformer for place recognition.
1 code implementation • CVPR 2023 • Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, Yan Yan
These approaches define a forward diffusion process for transforming data into noise and a backward denoising process for sampling data from noise.
1 code implementation • 13 Jul 2022 • Yuzhang Shang, Dan Xu, Bin Duan, Ziliang Zong, Liqiang Nie, Yan Yan
Relying on the premise that the performance of a binary neural network can be largely restored with eliminated quantization error between full-precision weight vectors and their corresponding binary vectors, existing works of network binarization frequently adopt the idea of model robustness to reach the aforementioned objective.
1 code implementation • 6 Jul 2022 • Yuzhang Shang, Dan Xu, Ziliang Zong, Liqiang Nie, Yan Yan
Neural network binarization accelerates deep models by quantizing their weights and activations into 1-bit.
1 code implementation • 23 Apr 2022 • Zhenghao Zhao, Ye Zhu, Xiaoguang Zhu, Yuzhang Shang, Yan Yan
Most current AI systems rely on the premise that the input visual data are sufficient to achieve competitive performance in various computer vision tasks.
no code implementations • 30 Jan 2022 • Yuzhang Shang, Bin Duan, Ziliang Zong, Liqiang Nie, Yan Yan
Extensive experiments on CIFAR-10 and CIFAR-100 demonstrate the superiority of our novel Fourier analysis based MBP compared to other traditional MBP algorithms.
no code implementations • 29 Sep 2021 • Yuzhang Shang, Dan Xu, Ziliang Zong, Liqiang Nie, Yan Yan
Neural network binarization accelerates deep models by quantizing their weights and activations into 1-bit.
no code implementations • ICCV 2021 • Yuzhang Shang, Bin Duan, Ziliang Zong, Liqiang Nie, Yan Yan
Knowledge distillation has become one of the most important model compression techniques by distilling knowledge from larger teacher networks to smaller student ones.