Search Results for author: Rongyu Zhang

Found 24 papers, 7 papers with code

SpikeGen: Generative Framework for Visual Spike Stream Processing

no code implementations23 May 2025 Gaole Dai, Menghang Dong, Rongyu Zhang, Ruichuan An, Shanghang Zhang, Tiejun Huang

These models not only facilitate the conditional fusion of existing information from both spike and RGB modalities but also enable the conditional generation based on latent priors.

Deblurring Novel View Synthesis +1

NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment

no code implementations22 May 2025 Shuhao Han, Haotian Fan, Fangyuan Kong, Wenjie Liao, Chunle Guo, Chongyi Li, Radu Timofte, Liang Li, Tao Li, Junhui Cui, Yunqiu Wang, Yang Tai, Jingwei Sun, Jianhui Sun, Xinli Yue, Tianyi Wang, Huan Hou, Junda Lu, Xinyang Huang, Zitang Zhou, Zijian Zhang, Xuhui Zheng, Xuecheng Wu, Chong Peng, Xuezhi Cao, Trong-Hieu Nguyen-Mau, Minh-Hoang Le, Minh-Khoa Le-Phan, Duy-Nam Ly, Hai-Dang Nguyen, Minh-Triet Tran, Yukang Lin, Yan Hong, Chuanbiao Song, Siyuan Li, Jun Lan, Zhichao Zhang, Xinyue Li, Wei Sun, ZiCheng Zhang, Yunhao Li, Xiaohong Liu, Guangtao Zhai, Zitong Xu, Huiyu Duan, Jiarui Wang, Guangji Ma, Liu Yang, Lu Liu, Qiang Hu, Xiongkuo Min, Zichuan Wang, Zhenchen Tang, Bo Peng, Jing Dong, Fengbin Guan, Zihao Yu, Yiting Lu, Wei Luo, Xin Li, Minhao Lin, Haofeng Chen, Xuanxuan He, Kele Xu, Qisheng Xu, Zijian Gao, Tianjiao Wan, Bo-Cheng Qiu, Chih-Chung Hsu, Chia-Ming Lee, Yu-Fan Lin, Bo Yu, Zehao Wang, Da Mu, Mingxiu Chen, Junkang Fang, Huamei Sun, Wending Zhao, Zhiyu Wang, Wang Liu, Weikang Yu, Puhong Duan, Bin Sun, Xudong Kang, Shutao Li, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Jiarong He, Zhishan Qiao, Yongqing Huang, Zewen Chen, Zhe Pang, Juan Wang, Jian Guo, Zhizhuo Shao, Ziyu Feng, Bing Li, Weiming Hu, Hesong Li, Dehua Liu, Zeming Liu, Qingsong Xie, Ruichen Wang, Zhihao LI, Yuqi Liang, Jianqi Bi, Jun Luo, Junfeng Yang, Can Li, Jing Fu, Hongwei Xu, Mingrui Long, Lulin Tang

A total of 211 participants have registered in the structure track.

Image Restoration Text to Image Generation +1

FBQuant: FeedBack Quantization for Large Language Models

no code implementations25 Jan 2025 Yijiang Liu, Hengyu Fang, Liulu He, Rongyu Zhang, Yichuan Bai, Yuan Du, Li Du

Deploying Large Language Models (LLMs) on edge devices is increasingly important, as it eliminates reliance on network connections, reduces expensive API calls, and enhances user privacy.

Quantization

EVA: An Embodied World Model for Future Video Anticipation

no code implementations20 Oct 2024 Xiaowei Chi, Chun-Kai Fan, Hengyuan Zhang, Xingqun Qi, Rongyu Zhang, Anthony Chen, Chi-Min Chan, Wei Xue, Qifeng Liu, Shanghang Zhang, Yike Guo

It leverages the complementary strengths of pre-trained vision-language and video generation models, enabling them to function as a world model in embodied scenarios.

Language Modeling Language Modelling +5

PAT: Pruning-Aware Tuning for Large Language Models

2 code implementations27 Aug 2024 Yijiang Liu, Huanrui Yang, Youxin Chen, Rongyu Zhang, Miao Wang, Yuan Du, Li Du

Since the model fine-tuning refines the general and chaotic knowledge in pre-trained models, we aim to incorporate structural pruning with the fine-tuning, and propose the Pruning-Aware Tuning (PAT) paradigm to eliminate model redundancy while preserving the model performance to the maximum extend.

FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models

1 code implementation15 Aug 2024 Zhongyu Zhao, Menghang Dong, Rongyu Zhang, Wenzhao Zheng, Yunpeng Zhang, Huanrui Yang, Dalong Du, Kurt Keutzer, Shanghang Zhang

Recent research has demonstrated that Feed-Forward Networks (FFNs) in Large Language Models (LLMs) play a pivotal role in storing diverse linguistic and factual knowledge.

Mixture-of-Experts

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

1 code implementation30 Jul 2024 Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Ruibin Yuan, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen, Shanghang Zhang, Qifeng Liu, Yike Guo

To fulfill this gap, we present MMTrail, a large-scale multi-modality video-language dataset incorporating more than 20M trailer clips with visual captions, and 2M high-quality clips with multimodal captions.

Audio Generation Image to Video Generation +2

Implicit Neural Image Field for Biological Microscopy Image Compression

1 code implementation29 May 2024 Gaole Dai, Cheng-Ching Tseng, Qingpo Wuwu, Rongyu Zhang, Shaokang Wang, Ming Lu, Tiejun Huang, Yu Zhou, Ali Ata Tuz, Matthias Gunzer, Jianxu Chen, Shanghang Zhang

The rapid pace of innovation in biological microscopy imaging has led to large images, putting pressure on data storage and impeding efficient sharing, management, and visualization.

Image Compression Management

Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

1 code implementation26 May 2024 Rongyu Zhang, Aosong Cheng, Yulin Luo, Gaole Dai, Huanrui Yang, Jiaming Liu, ran Xu, Li Du, Yuan Du, Yanbing Jiang, Shanghang Zhang

Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models.

feature selection Mixture-of-Experts +1

Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data

no code implementations10 May 2024 Rongyu Zhang, Yun Chen, Chenrui Wu, Fangxin Wang, Bo Li

Federated learning (FL) offers a privacy-centric distributed learning framework, enabling model training on individual clients and central aggregation without necessitating data exchange.

Autonomous Vehicles image-classification +3

Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning

no code implementations13 Apr 2024 Yijiang Liu, Rongyu Zhang, Huanrui Yang, Kurt Keutzer, Yuan Du, Li Du, Shanghang Zhang

Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications, ranging from content generation to interactive entertainment, and artistic creation.

Diversity Mixture-of-Experts

M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation

1 code implementation29 Nov 2023 Xiaowei Chi, Rongyu Zhang, Zhengkai Jiang, Yijiang Liu, Yatian Wang, Xingqun Qi, Wenhan Luo, Peng Gao, Shanghang Zhang, Qifeng Liu, Yike Guo

Moreover, to further enhance the effectiveness of $M^{3}Adapter$ while preserving the coherence of semantic context comprehension, we introduce a two-stage $M^{3}FT$ fine-tuning strategy.

Image Generation Language Modelling +1

Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning with Hierarchical Aggregation

no code implementations27 Mar 2023 Rongyu Zhang, Xiaowei Chi, Guiliang Liu, Wenyi Zhang, Yuan Du, Fangxin Wang

Multimodal learning has seen great success mining data features from multiple modalities with remarkable model performance improvement.

Decoder Federated Learning +1

BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks

no code implementations CVPR 2023 Xiaowei Chi, Jiaming Liu, Ming Lu, Rongyu Zhang, Zhaoqing Wang, Yandong Guo, Shanghang Zhang

In order to find them, we further propose a LiDAR-guided sampling strategy to leverage the statistical distribution of LiDAR to determine the heights of local slices.

3D Object Detection Autonomous Driving +1

Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-world

no code implementations CVPR 2023 Yulu Gan, Mingjie Pan, Rongyu Zhang, Zijian Ling, Lingran Zhao, Jiaming Liu, Shanghang Zhang

To enable the device model to deal with changing environments, we propose a new learning paradigm of Cloud-Device Collaborative Continual Adaptation, which encourages collaboration between cloud and device and improves the generalization of the device model.

Device-Cloud Collaboration object-detection +3

BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object Detection

no code implementations30 Nov 2022 Jiaming Liu, Rongyu Zhang, Xiaoqi Li, Xiaowei Chi, Zehui Chen, Ming Lu, Yandong Guo, Shanghang Zhang

In this paper, we propose a Multi-space Alignment Teacher-Student (MATS) framework to ease the domain shift accumulation, which consists of a Depth-Aware Teacher (DAT) and a Geometric-space Aligned Student (GAS) model.

3D Object Detection Autonomous Driving +4

Multi-Frames Temporal Abnormal Clues Learning Method for Face Anti-Spoofing

no code implementations8 Aug 2022 Heng Cong, Rongyu Zhang, Jiarong He, Jin Gao

Face anti-spoofing researches are widely used in face recognition and has received more attention from industry and academics.

Face Anti-Spoofing Face Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.