Search Results for author: Xiawu Zheng

Found 55 papers, 32 papers with code

VITA: Towards Open-Source Interactive Omni Multimodal LLM

no code implementations9 Aug 2024 Chaoyou Fu, Haojia Lin, Zuwei Long, Yunhang Shen, Meng Zhao, Yifan Zhang, Shaoqi Dong, Xiong Wang, Di Yin, Long Ma, Xiawu Zheng, Ran He, Rongrong Ji, Yunsheng Wu, Caifeng Shan, Xing Sun

The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas.

Language Modelling Large Language Model +2

Multi-branch Collaborative Learning Network for 3D Visual Grounding

1 code implementation7 Jul 2024 Zhipeng Qian, Yiwei Ma, Zhekai Lin, Jiayi Ji, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji

3D referring expression comprehension (3DREC) and segmentation (3DRES) have overlapping objectives, indicating their potential for collaboration.

3D visual grounding Referring Expression +1

Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

1 code implementation28 Jun 2024 Quanmin Liang, Zhilin Huang, Xiawu Zheng, Feidiao Yang, Jun Peng, Kai Huang, Yonghong Tian

FFM is designed for the fusion of contextual information within neighboring event streams, leveraging the coupling relationship between positive and negative events to alleviate the misleading of noises in the respective branches.

Object Recognition Super-Resolution +1

Local Manifold Learning for No-Reference Image Quality Assessment

no code implementations27 Jun 2024 Timin Gao, Wensheng Pan, Yan Zhang, Sicheng Zhao, Shengchuan Zhang, Xiawu Zheng, Ke Li, Liujuan Cao, Rongrong Ji

This crop is then used to cluster other crops from the same image as the positive class, while crops from different images are treated as negative classes to increase inter-class distance.

Contrastive Learning NR-IQA

Depth-Guided Semi-Supervised Instance Segmentation

no code implementations25 Jun 2024 Xin Chen, Jie Hu, Xiawu Zheng, Jianghang Lin, Liujuan Cao, Rongrong Ji

Additionally, to manage the variability of depth images during training, we introduce the Depth Controller.

Depth Estimation Instance Segmentation +2

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

no code implementations14 Jun 2024 Chenyu Zhou, Mengdan Zhang, Peixian Chen, Chaoyou Fu, Yunhang Shen, Xiawu Zheng, Xing Sun, Rongrong Ji

In support of this task, we further craft a new VEGA dataset, tailored for the IITC task on scientific content, and devised a subtask, Image-Text Association (ITA), to refine image-text correlation skills.

Reading Comprehension

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

no code implementations31 May 2024 Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei LI, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing Sun

With Video-MME, we extensively evaluate various state-of-the-art MLLMs, including GPT-4 series and Gemini 1. 5 Pro, as well as open-source image models like InternVL-Chat-V1. 5 and video models like LLaVA-NeXT-Video.

Bilateral Event Mining and Complementary for Event Stream Super-Resolution

1 code implementation CVPR 2024 Zhilin Huang, Quanmin Liang, Yijie Yu, Chujun Qin, Xiawu Zheng, Kai Huang, Zikun Zhou, Wenming Yang

In this paper, we propose a bilateral event mining and complementary network (BMCNet) to fully leverage the potential of each event and capture the shared information to complement each other simultaneously.

Object Recognition Super-Resolution +1

GraCo: Granularity-Controllable Interactive Segmentation

no code implementations CVPR 2024 Yian Zhao, Kehan Li, Zesen Cheng, Pengchong Qiao, Xiawu Zheng, Rongrong Ji, Chang Liu, Li Yuan, Jie Chen

In this work, we introduce Granularity-Controllable Interactive Segmentation (GraCo), a novel approach that allows precise control of prediction granularity by introducing additional parameters to input.

Interactive Segmentation Segmentation

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

no code implementations24 Apr 2024 Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji

This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability.

Decision Making Logical Reasoning +1

Multi-Modal Prompt Learning on Blind Image Quality Assessment

1 code implementation23 Apr 2024 Wensheng Pan, Timin Gao, Yan Zhang, Runze Hu, Xiawu Zheng, Enwei Zhang, Yuting Gao, Yutao Liu, Yunhang Shen, Ke Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.

Motion-aware Latent Diffusion Models for Video Frame Interpolation

no code implementations21 Apr 2024 Zhilin Huang, Yijie Yu, Ling Yang, Chujun Qin, Bing Zheng, Xiawu Zheng, Zikun Zhou, YaoWei Wang, Wenming Yang

With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest.

Motion Estimation Video Frame Interpolation +1

Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization

no code implementations17 Apr 2024 Yongdong Luo, Haojia Lin, Xiawu Zheng, Yigeng Jiang, Fei Chao, Jie Hu, Guannan Jiang, Songan Zhang, Rongrong Ji

3D Visual Grounding (3DVG) and 3D Dense Captioning (3DDC) are two crucial tasks in various 3D applications, which require both shared and complementary information in localization and visual-language relationships.

3D dense captioning 3D visual grounding +1

AffineQuant: Affine Transformation Quantization for Large Language Models

1 code implementation19 Mar 2024 Yuexiao Ma, Huixia Li, Xiawu Zheng, Feng Ling, Xuefeng Xiao, Rui Wang, Shilei Wen, Fei Chao, Rongrong Ji

Among these techniques, Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its noteworthy compression efficiency and cost-effectiveness in the context of training.

Quantization

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

1 code implementation5 Mar 2024 Gen Luo, Yiyi Zhou, Yuxin Zhang, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji

Contrary to previous works, we study this problem from the perspective of image resolution, and reveal that a combination of low- and high-resolution visual features can effectively mitigate this shortcoming.

Visual Question Answering

EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs

1 code implementation19 Feb 2024 Song Guo, Fan Wu, Lei Zhang, Xiawu Zheng, Shengchuan Zhang, Fei Chao, Yiyu Shi, Rongrong Ji

For instance, on the Wikitext2 dataset with LlamaV1-7B at 70% sparsity, our proposed EBFT achieves a perplexity of 16. 88, surpassing the state-of-the-art DSnoT with a perplexity of 75. 14.

Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation

1 code implementation18 Jan 2024 Zesen Cheng, Kehan Li, Hao Li, Peng Jin, Chang Liu, Xiawu Zheng, Rongrong Ji, Jie Chen

To mold instance queries to follow Brownian bridge and accomplish alignment with class texts, we design Bridge-Text Alignment (BTA) to learn discriminative bridge-level representations of instances via contrastive objectives.

Instance Segmentation Semantic Segmentation +1

Binding-Adaptive Diffusion Models for Structure-Based Drug Design

1 code implementation15 Jan 2024 Zhilin Huang, Ling Yang, Zaixi Zhang, Xiangxin Zhou, Yu Bao, Xiawu Zheng, Yuwei Yang, Yu Wang, Wenming Yang

Then the selected protein-ligand subcomplex is processed with SE(3)-equivariant neural networks, and transmitted back to each atom of the complex for augmenting the target-aware 3D molecule diffusion generation with binding interaction information.

Avg

RepAn: Enhanced Annealing through Re-parameterization

1 code implementation CVPR 2024 Xiang Fei, Xiawu Zheng, Yan Wang, Fei Chao, Chenglin Wu, Liujuan Cao

The simulated annealing algorithm aims to improve model convergence through multiple restarts of training.

Incremental Learning

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

1 code implementation CVPR 2024 Xinzi Cao, Xiawu Zheng, Guanhong Wang, Weijiang Yu, Yunhang Shen, Ke Li, Yutong Lu, Yonghong Tian

The LER optimizes the distribution of potential known class samples in unlabeled data thus ensuring the preservation of knowledge related to known categories while learning novel classes.

Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity

no code implementations11 Dec 2023 Xudong Li, Timin Gao, Runze Hu, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Jingyuan Zheng, Yunhang Shen, Ke Li, Yutao Liu, Pingyang Dai, Rongrong Ji

Specifically, QFM-IQM enhances the semantic noise distinguish capabilities by matching image pairs with similar quality scores but varying semantic features as adversarial semantic noise and adaptively adjusting the upstream task's features by reducing sensitivity to adversarial noise perturbation.

Contrastive Learning feature selection +1

Less is More: Learning Reference Knowledge Using No-Reference Image Quality Assessment

no code implementations1 Dec 2023 Xudong Li, Jingyuan Zheng, Xiawu Zheng, Runze Hu, Enwei Zhang, Yuting Gao, Yunhang Shen, Ke Li, Yutao Liu, Pingyang Dai, Yan Zhang, Rongrong Ji

Concretely, by innovatively introducing a novel feature distillation method in IQA, we propose a new framework to learn comparative knowledge from non-aligned reference images.

Inductive Bias NR-IQA

AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration

1 code implementation ICCV 2023 Lijiang Li, Huixia Li, Xiawu Zheng, Jie Wu, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan, Fei Chao, Rongrong Ji

Therefore, we propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training.

Image Generation single-image-generation

DLIP: Distilling Language-Image Pre-training

no code implementations24 Aug 2023 Huafeng Kuang, Jie Wu, Xiawu Zheng, Ming Li, Xuefeng Xiao, Rui Wang, Min Zheng, Rongrong Ji

Furthermore, DLIP succeeds in retaining more than 95% of the performance with 22. 4% parameters and 24. 8% FLOPs compared to the teacher model and accelerates inference speed by 2. 7x.

Image Captioning Image-text Retrieval +5

A Unified Framework for 3D Point Cloud Visual Grounding

1 code implementation23 Aug 2023 Haojia Lin, Yongdong Luo, Xiawu Zheng, Lijiang Li, Fei Chao, Taisong Jin, Donghao Luo, Yan Wang, Liujuan Cao, Rongrong Ji

This elaborate design enables 3DRefTR to achieve both well-performing 3DRES and 3DREC capacities with only a 6% additional latency compared to the original 3DREC model.

Referring Expression Referring Expression Comprehension +1

Knowledge Prompt-tuning for Sequential Recommendation

1 code implementation14 Aug 2023 Jianyang Zhai, Xiawu Zheng, Chang-Dong Wang, Hui Li, Yonghong Tian

Pre-trained language models (PLMs) have demonstrated strong performance in sequential recommendation (SR), which are utilized to extract general knowledge.

General Knowledge Sequential Recommendation

Learning Sparse Neural Networks with Identity Layers

no code implementations14 Jul 2023 Mingjian Ni, Guangyao Chen, Xiawu Zheng, Peixi Peng, Li Yuan, Yonghong Tian

Applying such theory, we propose a plug-and-play CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR, which utilizes CKA to reduce feature similarity between layers and increase network sparsity.

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

3 code implementations23 Jun 2023 Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, Yunsheng Wu, Rongrong Ji

Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image.

Benchmarking Language Modelling +4

Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning

1 code implementation CVPR 2023 Yu Wang, Pengchong Qiao, Chang Liu, Guoli Song, Xiawu Zheng, Jie Chen

We argue that an overlooked problem of robust SSL is its corrupted information on semantic level, practically limiting the development of the field.

Data-Efficient Image Quality Assessment with Attention-Panel Decoder

1 code implementation11 Apr 2023 Guanyi Qin, Runze Hu, Yutao Liu, Xiawu Zheng, Haotian Liu, Xiu Li, Yan Zhang

Blind Image Quality Assessment (BIQA) is a fundamental task in computer vision, which however remains unresolved due to the complex distortion conditions and diversified image contents.

Decoder

A Unified Framework for Soft Threshold Pruning

1 code implementation25 Feb 2023 Yanqi Chen, Zhengyu Ma, Wei Fang, Xiawu Zheng, Zhaofei Yu, Yonghong Tian

In this work, we reformulate soft threshold pruning as an implicit optimization problem solved using the Iterative Shrinkage-Thresholding Algorithm (ISTA), a classic method from the fields of sparse recovery and compressed sensing.

Scheduling

Automatic Network Pruning via Hilbert-Schmidt Independence Criterion Lasso under Information Bottleneck Principle

1 code implementation ICCV 2023 Song Guo, Lei Zhang, Xiawu Zheng, Yan Wang, Yuchao Li, Fei Chao, Chenglin Wu, Shengchuan Zhang, Rongrong Ji

In this paper, we try to solve this problem by introducing a principled and unified framework based on Information Bottleneck (IB) theory, which further guides us to an automatic pruning approach.

Network Pruning

Meta Architecture for Point Cloud Analysis

1 code implementation CVPR 2023 Haojia Lin, Xiawu Zheng, Lijiang Li, Fei Chao, Shanshan Wang, Yan Wang, Yonghong Tian, Rongrong Ji

However, the lack of a unified framework to interpret those networks makes any systematic comparison, contrast, or analysis challenging, and practically limits healthy development of the field.

3D Semantic Segmentation

Iterative Data Refinement for Self-Supervised MR Image Reconstruction

no code implementations24 Nov 2022 Xue Liu, Juan Zou, Xiawu Zheng, Cheng Li, Hairong Zheng, Shanshan Wang

Then, we design an effective self-supervised training data refinement method to reduce this data bias.

Image Reconstruction

DIGEST: Deeply supervIsed knowledGE tranSfer neTwork learning for brain tumor segmentation with incomplete multi-modal MRI scans

no code implementations15 Nov 2022 Haoran Li, Cheng Li, Weijian Huang, Xiawu Zheng, Yan Xi, Shanshan Wang

In this work, we propose a Deeply supervIsed knowledGE tranSfer neTwork (DIGEST), which achieves accurate brain tumor segmentation under different modality-missing scenarios.

Brain Tumor Segmentation Image Segmentation +3

Adaptive PromptNet For Auxiliary Glioma Diagnosis without Contrast-Enhanced MRI

no code implementations15 Nov 2022 Yeqi Wang, Weijian Huang, Cheng Li, Xiawu Zheng, Yusong Lin, Shanshan Wang

Multi-contrast magnetic resonance imaging (MRI)-based automatic auxiliary glioma diagnosis plays an important role in the clinic.

Training-free Transformer Architecture Search

1 code implementation CVPR 2022 Qinqin Zhou, Kekai Sheng, Xiawu Zheng, Ke Li, Xing Sun, Yonghong Tian, Jie Chen, Rongrong Ji

Recently, Vision Transformer (ViT) has achieved remarkable success in several computer vision tasks.

Diversity

Neural Architecture Search With Representation Mutual Information

1 code implementation CVPR 2022 Xiawu Zheng, Xiang Fei, Lei Zhang, Chenglin Wu, Fei Chao, Jianzhuang Liu, Wei Zeng, Yonghong Tian, Rongrong Ji

Building upon RMI, we further propose a new search algorithm termed RMI-NAS, facilitating with a theorem to guarantee the global optimal of the searched architecture.

Neural Architecture Search

OMPQ: Orthogonal Mixed Precision Quantization

1 code implementation16 Sep 2021 Yuexiao Ma, Taisong Jin, Xiawu Zheng, Yan Wang, Huixia Li, Yongjian Wu, Guannan Jiang, Wei zhang, Rongrong Ji

Instead of solving a problem of the original integer programming, we propose to optimize a proxy metric, the concept of network orthogonality, which is highly correlated with the loss of the integer programming but also easy to optimize with linear programming.

AutoML Quantization

An Information Theory-inspired Strategy for Automatic Network Pruning

1 code implementation19 Aug 2021 Xiawu Zheng, Yuexiao Ma, Teng Xi, Gang Zhang, Errui Ding, Yuchao Li, Jie Chen, Yonghong Tian, Rongrong Ji

This practically limits the application of model compression when the model needs to be deployed on a wide range of devices.

AutoML Model Compression +1

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient

1 code implementation4 Jun 2021 Shaokun Zhang, Xiawu Zheng, Chenyi Yang, Yuchao Li, Yan Wang, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, Rongrong Ji

Motivated by the necessity of efficient inference across various constraints on BERT, we propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere.

AutoML Model Compression

On Evolving Attention Towards Domain Adaptation

no code implementations25 Mar 2021 Kekai Sheng, Ke Li, Xiawu Zheng, Jian Liang, WeiMing Dong, Feiyue Huang, Rongrong Ji, Xing Sun

However, considering that the configuration of attention, i. e., the type and the position of attention module, affects the performance significantly, it is more generalized to optimize the attention configuration automatically to be specialized for arbitrary UDA scenario.

Partial Domain Adaptation Unsupervised Domain Adaptation

EC-DARTS: Inducing Equalized and Consistent Optimization Into DARTS

no code implementations ICCV 2021 Qinqin Zhou, Xiawu Zheng, Liujuan Cao, Bineng Zhong, Teng Xi, Gang Zhang, Errui Ding, Mingliang Xu, Rongrong Ji

EC-DARTS decouples different operations based on their categories to optimize the operation weights so that the operation gap between them is shrinked.

PAMS: Quantized Super-Resolution via Parameterized Max Scale

1 code implementation ECCV 2020 Huixia Li, Chenqian Yan, Shaohui Lin, Xiawu Zheng, Yuchao Li, Baochang Zhang, Fan Yang, Rongrong Ji

Specifically, most state-of-the-art SR models without batch normalization have a large dynamic quantization range, which also serves as another cause of performance drop.

Quantization Super-Resolution +1

Binarized Neural Architecture Search for Efficient Object Recognition

no code implementations8 Sep 2020 Hanlin Chen, Li'an Zhuo, Baochang Zhang, Xiawu Zheng, Jianzhuang Liu, Rongrong Ji, David Doermann, Guodong Guo

In this paper, binarized neural architecture search (BNAS), with a search space of binarized convolutions, is introduced to produce extremely compressed models to reduce huge computational cost on embedded devices for edge computing.

Edge-computing Face Recognition +3

Rethinking Performance Estimation in Neural Architecture Search

1 code implementation CVPR 2020 Xiawu Zheng, Rongrong Ji, Qiang Wang, Qixiang Ye, Zhenguo Li, Yonghong Tian, Qi Tian

In this paper, we provide a novel yet systematic rethinking of PE in a resource constrained regime, termed budgeted PE (BPE), which precisely and effectively estimates the performance of an architecture sampled from an architecture space.

Neural Architecture Search

Binarized Neural Architecture Search

no code implementations25 Nov 2019 Hanlin Chen, Li'an Zhuo, Baochang Zhang, Xiawu Zheng, Jianzhuang Liu, David Doermann, Rongrong Ji

A variant, binarized neural architecture search (BNAS), with a search space of binarized convolutions, can produce extremely compressed models.

Neural Architecture Search

DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning

1 code implementation28 May 2019 Xiawu Zheng, Chenyi Yang, Shaokun Zhang, Yan Wang, Baochang Zhang, Yongjian Wu, Yunsheng Wu, Ling Shao, Rongrong Ji

With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints, which is practical for on-device models across diverse search spaces and constraints.

Neural Architecture Search

Multinomial Distribution Learning for Effective Neural Architecture Search

1 code implementation ICCV 2019 Xiawu Zheng, Rongrong Ji, Lang Tang, Baochang Zhang, Jianzhuang Liu, Qi Tian

Therefore, NAS can be transformed to a multinomial distribution learning problem, i. e., the distribution is optimized to have a high expectation of the performance.

Neural Architecture Search

Cannot find the paper you are looking for? You can Submit a new open access paper.