Search Results for author: Mengzhao Chen

Found 17 papers, 15 papers with code

Scaling Law for Quantization-Aware Training

3 code implementations20 May 2025 Mengzhao Chen, Chaoyi Zhang, Jing Liu, Yutao Zeng, Zeyue Xue, Zhiheng Liu, Yunshui Li, Jin Ma, Jie Huang, Xun Zhou, Ping Luo

Through 268 QAT experiments, we show that quantization error decreases as model size increases, but rises with more training tokens and coarser quantization granularity.

Quantization

Model Merging in Pre-training of Large Language Models

no code implementations17 May 2025 Yunshui Li, Yiyuan Ma, Shen Yan, Chaoyi Zhang, Jing Liu, Jianqiao Lu, Ziwen Xu, Mengzhao Chen, Minrui Wang, Shiyi Zhan, Jin Ma, Xunhao Lai, Yao Luo, Xingyan Bin, Hongbin Ren, Mingji Han, Wenhao Hao, Bairen Yi, Lingjun Liu, Bole Ma, Xiaoying Jia, Zhou Xun, Siyuan Qiao, Liang Xiang, Yonghui Wu

Model merging has emerged as a promising technique for enhancing large language models, though its application in large-scale pre-training remains relatively unexplored.

Mixture-of-Experts

DanceGRPO: Unleashing GRPO on Visual Generation

1 code implementation12 May 2025 Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, Ping Luo

This paper introduces DanceGRPO, the first unified framework to adapt Group Relative Policy Optimization (GRPO) to visual generation paradigms, unleashing one unified RL algorithm across two generative paradigms (diffusion models and rectified flows), three tasks (text-to-image, text-to-video, image-to-video), four foundation models (Stable Diffusion, HunyuanVideo, FLUX, SkyReel-I2V), and five reward models (image/video aesthetics, text-image alignment, video motion quality, and binary reward).

Denoising reinforcement-learning +3

Enhance-A-Video: Better Generated Video for Free

1 code implementation11 Feb 2025 Yang Luo, Xuanlei Zhao, Mengzhao Chen, Kaipeng Zhang, Wenqi Shao, Kai Wang, Zhangyang Wang, Yang You

DiT-based video generation has achieved remarkable results, but research into enhancing existing models remains relatively unexplored.

Video Generation

LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation

no code implementations22 Jan 2025 Jiahao Wang, Ning Kang, Lewei Yao, Mengzhao Chen, Chengyue Wu, Songyang Zhang, Shuchen Xue, Yong liu, Taiqiang Wu, Xihui Liu, Kaipeng Zhang, Shifeng Zhang, Wenqi Shao, Zhenguo Li, Ping Luo

(3) Hybrid knowledge distillation objective: using a pre-trained diffusion Transformer to help the training of the student linear Transformer, supervising not only the predicted noise but also the variance of the reverse diffusion process.

Knowledge Distillation Mamba +2

PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization

1 code implementation7 Oct 2024 Mengzhao Chen, Yi Liu, Jiahao Wang, Yi Bin, Wenqi Shao, Ping Luo

In this work, we propose PrefixQuant, a novel quantization method that achieves state-of-the-art performance across various precision levels (W4A4KV4 and W4A8KV4) and granularities (dynamic and static quantization) by effectively isolating token-wise outliers.

Common Sense Reasoning Quantization

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

1 code implementation10 Jul 2024 Mengzhao Chen, Wenqi Shao, Peng Xu, Jiahao Wang, Peng Gao, Kaipeng Zhang, Ping Luo

To the best of our knowledge, Block-AP is the first method to enable direct training of all parameters in a block-wise manner, reducing accuracy loss in low-bit scenarios by enhancing the solution space during optimization.

Quantization

Adapting LLaMA Decoder to Vision Transformer

1 code implementation10 Apr 2024 Jiahao Wang, Wenqi Shao, Mengzhao Chen, Chengyue Wu, Yong liu, Taiqiang Wu, Kaipeng Zhang, Songyang Zhang, Kai Chen, Ping Luo

We first "LLaMAfy" a standard ViT step-by-step to align with LLaMA's architecture, and find that directly applying a causal mask to the self-attention brings an attention collapse issue, resulting in the failure to the network training.

Computational Efficiency Decoder +2

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

2 code implementations18 Feb 2024 Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo

Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc.

Question Answering Text Summarization

I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

1 code implementation16 Nov 2023 Yunshan Zhong, Jiawei Hu, Mengzhao Chen, Rongrong Ji

Albeit the scalable performance of vision transformers (ViTs), the dense computational costs (training & inference) undermine their position in industrial applications.

Quantization

SMMix: Self-Motivated Image Mixing for Vision Transformers

1 code implementation ICCV 2023 Mengzhao Chen, Mingbao Lin, Zhihang Lin, Yuxin Zhang, Fei Chao, Rongrong Ji

Due to the subtle designs of the self-motivated paradigm, our SMMix is significant in its smaller training overhead and better performance than other CutMix variants.

Super Vision Transformer

1 code implementation23 May 2022 Mingbao Lin, Mengzhao Chen, Yuxin Zhang, Chunhua Shen, Rongrong Ji, Liujuan Cao

Experimental results on ImageNet demonstrate that our SuperViT can considerably reduce the computational costs of ViT models with even performance increase.

CF-ViT: A General Coarse-to-Fine Method for Vision Transformer

1 code implementation8 Mar 2022 Mengzhao Chen, Mingbao Lin, Ke Li, Yunhang Shen, Yongjian Wu, Fei Chao, Rongrong Ji

Our proposed CF-ViT is motivated by two important observations in modern ViT models: (1) The coarse-grained patch splitting can locate informative regions of an input image.

OptG: Optimizing Gradient-driven Criteria in Network Sparsity

1 code implementation30 Jan 2022 Yuxin Zhang, Mingbao Lin, Mengzhao Chen, Fei Chao, Rongrong Ji

We prove that supermask training is to accumulate the criteria of gradient-driven sparsity for both removed and preserved weights, and it can partly solve the independence paradox.

Fine-grained Data Distribution Alignment for Post-Training Quantization

1 code implementation9 Sep 2021 Yunshan Zhong, Mingbao Lin, Mengzhao Chen, Ke Li, Yunhang Shen, Fei Chao, Yongjian Wu, Rongrong Ji

While post-training quantization receives popularity mostly due to its evasion in accessing the original complete training dataset, its poor performance also stems from scarce images.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.