Search Results for author: Shengen Yan

Found 19 papers, 11 papers with code

DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation

no code implementations17 Feb 2025 Zhihang Yuan, Siyuan Wang, Rui Xie, Hanling Zhang, Tongcheng Fang, Yuzhang Shang, Shengen Yan, Guohao Dai, Yu Wang

In this paper, we propose the Dynamic Latent Frame Rate VAE (DLFR-VAE), a training-free paradigm that can make use of adaptive temporal compression in latent space.

Video Generation

FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models

1 code implementation30 Dec 2024 Tianyu Fu, Tengxuan Liu, Qinghao Han, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang

Leveraging the unique properties of similarity over importance, we introduce FrameFusion, a novel approach that combines similarity-based merging with importance-based pruning for better token reduction in LVLMs.

Question Answering Token Reduction +1

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

1 code implementation27 Dec 2024 Shiyao Li, Yingchun Hu, Xuefei Ning, Xihui Liu, Ke Hong, Xiaotao Jia, Xiuhong Li, Yaqi Yan, Pei Ran, Guohao Dai, Shengen Yan, Huazhong Yang, Yu Wang

Therefore, treating tokens from different modalities equally, as in existing PTQ methods, may over-emphasize the insensitive modalities, leading to significant accuracy loss.

Quantization

E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling

no code implementations18 Dec 2024 Zhihang Yuan, Yuzhang Shang, Hanling Zhang, Tongcheng Fang, Rui Xie, Bingxin Xu, Yan Yan, Shengen Yan, Guohao Dai, Yu Wang

Our approach not only enhances computational efficiency but also aligns naturally with image generation principles by operating in continuous token space and following a hierarchical generation process from coarse to fine details.

Computational Efficiency Denoising +1

CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios

1 code implementation16 Sep 2024 Luning Wang, Shiyao Li, Xuefei Ning, Zhihang Yuan, Shengen Yan, Guohao Dai, Yu Wang

Therefore, we introduce CSKV, a training-efficient Channel Shrinking technique for KV cache compression: (1) We first analyze the singular value distribution of the KV cache, revealing significant redundancy and compression potential along the channel dimension.

Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

1 code implementation1 Jul 2024 Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

For example, we demonstrate that pruning up to 75% of experts in Mixtral $8\times7$B-Instruct results in a substantial reduction in parameters with minimal performance loss.

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

1 code implementation21 Jun 2024 Tianyu Fu, Haofeng Huang, Xuefei Ning, Genghan Zhang, Boju Chen, Tianqi Wu, Hongyi Wang, Zixiao Huang, Shiyao Li, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

Existing methods typically employ a uniform sparse attention mask, applying the same sparse pattern across different attention heads and input lengths.

Language Modeling Language Modelling +3

DiTFastAttn: Attention Compression for Diffusion Transformer Models

no code implementations12 Jun 2024 Zhihang Yuan, Hanling Zhang, Pu Lu, Xuefei Ning, Linfeng Zhang, Tianchen Zhao, Shengen Yan, Guohao Dai, Yu Wang

Diffusion Transformers (DiT) excel at image and video generation but face computational challenges due to the quadratic complexity of self-attention operators.

2k Image Generation +1

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

1 code implementation4 Jun 2024 Tianchen Zhao, Tongcheng Fang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang

Diffusion transformers (DiTs) have exhibited remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions.

Quantization Video Generation

Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

1 code implementation2 Apr 2024 Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Sergey Yekhanin, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

For example, LCSC achieves better performance using 1 number of function evaluation (NFE) than the base model with 2 NFE on consistency distillation, and decreases the NFE of DM from 15 to 9 while maintaining the generation quality on CIFAR-10.

Evaluating Quantized Large Language Models

1 code implementation28 Feb 2024 Shiyao Li, Xuefei Ning, Luning Wang, Tengxuan Liu, Xiangsheng Shi, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

Specifically, PTQ can effectively mitigate memory consumption and reduce computational overhead in LLMs.

Mamba Quantization

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

1 code implementation6 Feb 2024 Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang

In contrast, the average context lengths of mainstream benchmarks are insufficient (5k-21k), and they suffer from potential knowledge leakage and inaccurate metrics, resulting in biased evaluation.

16k Benchmarking

A Simulation Platform for Multi-tenant Machine Learning Services on Thousands of GPUs

no code implementations10 Jan 2022 Ruofan Liang, Bingsheng He, Shengen Yan, Peng Sun

Multi-tenant machine learning services have become emerging data-intensive workloads in data centers with heavy usage of GPU resources.

BIG-bench Machine Learning Scheduling

Characterization and Prediction of Deep Learning Workloads in Large-Scale GPU Datacenters

1 code implementation3 Sep 2021 Qinghao Hu, Peng Sun, Shengen Yan, Yonggang Wen, Tianwei Zhang

Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services in both the research community and industry.

Management Scheduling

Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes

1 code implementation19 Feb 2019 Peng Sun, Wansen Feng, Ruobing Han, Shengen Yan, Yonggang Wen

To address this problem, we propose a communication backend named GradientFlow for distributed DNN training, and employ a set of network optimization techniques.

Distributed, Parallel, and Cluster Computing

Deep Image: Scaling up Image Recognition

no code implementations13 Jan 2015 Ren Wu, Shengen Yan, Yi Shan, Qingqing Dang, Gang Sun

We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep learning.

Data Augmentation Deep Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.