Search Results for author: Yefei He

Found 18 papers, 10 papers with code

Neighboring Autoregressive Modeling for Efficient Visual Generation

1 code implementation12 Mar 2025 Yefei He, Yuanyu He, Shaoxuan He, Feng Chen, Hong Zhou, Kaipeng Zhang, Bohan Zhuang

In this paper, we propose Neighboring Autoregressive Modeling (NAR), a novel paradigm that formulates autoregressive visual generation as a progressive outpainting procedure, following a near-to-far ``next-neighbor prediction" mechanism.

Text-to-Image Generation Video Generation

ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality

1 code implementation5 Dec 2024 Yefei He, Feng Chen, Yuanyu He, Shaoxuan He, Hong Zhou, Kaipeng Zhang, Bohan Zhuang

By decoding multiple tokens simultaneously in a single forward pass, the number of forward passes required to generate an image is significantly reduced, resulting in a substantial improvement in generation efficiency.

Image Generation

ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification

no code implementations11 Oct 2024 Yefei He, Feng Chen, Jing Liu, Wenqi Shao, Hong Zhou, Kaipeng Zhang, Bohan Zhuang

The efficiency of large vision-language models (LVLMs) is constrained by the computational bottleneck of the attention mechanism during the prefill phase and the memory bottleneck of fetching the key-value (KV) cache in the decoding phase, particularly in scenarios involving high-resolution images or videos.

MME Quantization +1

ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models

no code implementations13 Jun 2024 Jing Liu, Ruihao Gong, Mingyang Zhang, Yefei He, Jianfei Cai, Bohan Zhuang

LLM development involves pre-training a foundation model on massive data, followed by fine-tuning on task-specific data to create specialized experts.

Code Generation domain classification +3

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models

no code implementations23 May 2024 Akide Liu, Jing Liu, Zizheng Pan, Yefei He, Gholamreza Haffari, Bohan Zhuang

In this paper, we present a simple yet effective approach, called MiniCache, to compress the KV cache across layers from a novel depth perspective, significantly reducing the memory footprint for LLM inference.

Quantization

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification

1 code implementation23 May 2024 Yefei He, Luoming Zhang, Weijia Wu, Jing Liu, Hong Zhou, Bohan Zhuang

In terms of efficiency, ZipCache also showcases a $37. 3\%$ reduction in prefill-phase latency, a $56. 9\%$ reduction in decoding-phase latency, and a $19. 8\%$ reduction in GPU memory usage when evaluating LLaMA3-8B model with a input length of $4096$.

GSM8K Quantization

DragAnything: Motion Control for Anything using Entity Representation

2 code implementations12 Mar 2024 Weijia Wu, Zhuang Li, YuChao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang

We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation.

Object Video Generation

Towards Accurate Post-training Quantization for Reparameterized Models

1 code implementation25 Feb 2024 Luoming Zhang, Yefei He, Wen Fei, Zhenyu Lou, Weijia Wu, YangWei Ying, Hong Zhou

Our framework outperforms previous methods by approximately 1\% for 8-bit PTQ and 2\% for 6-bit PTQ, showcasing its superior performance.

Quantization

Paragraph-to-Image Generation with Information-Enriched Diffusion Model

1 code implementation24 Nov 2023 Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang

In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation.

Image Generation Language Modeling +2

Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM

no code implementations7 Oct 2023 Luoming Zhang, Wen Fei, Weijia Wu, Yefei He, Zhenyu Lou, Hong Zhou

Fine-grained quantization has smaller quantization loss, consequently achieving superior performance.

Quantization

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

1 code implementation5 Oct 2023 Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

In this paper, we introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency.

Denoising Image Generation +2

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

1 code implementation NeurIPS 2023 Weijia Wu, Yuzhong Zhao, Hao Chen, YuChao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen

To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation.

Dataset Generation Decoder +7

PTQD: Accurate Post-Training Quantization for Diffusion Models

1 code implementation NeurIPS 2023 Yefei He, Luping Liu, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process.

Denoising Image Generation +1

BiViT: Extremely Compressed Binary Vision Transformers

no code implementations ICCV 2023 Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.

Binarization object-detection +1

BiViT: Extremely Compressed Binary Vision Transformer

no code implementations14 Nov 2022 Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.

Binarization object-detection +1

Binarizing by Classification: Is soft function really necessary?

no code implementations16 May 2022 Yefei He, Luoming Zhang, Weijia Wu, Hong Zhou

Extensive experiments demonstrate that the proposed method yields surprising performance both in image classification and human pose estimation tasks.

 Ranked #1 on Binarization on ImageNet (Top 1 Accuracy metric)

Binarization Binary Classification +3

Data-Free Quantization with Accurate Activation Clipping and Adaptive Batch Normalization

no code implementations8 Apr 2022 Yefei He, Luoming Zhang, Weijia Wu, Hong Zhou

In this paper, we present a simple yet effective data-free quantization method with accurate activation clipping and adaptive batch normalization.

Data Free Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.