Search Results for author: Yunhang Shen

Found 48 papers, 32 papers with code

Enabling Deep Residual Networks for Weakly Supervised Object Detection

no code implementations ECCV 2020 Yunhang Shen, Rongrong Ji, Yan Wang, Zhiwei Chen, Feng Zheng, Feiyue Huang, Yunsheng Wu

Weakly supervised object detection (WSOD) has attracted extensive research attention due to its great flexibility of exploiting large-scale image-level annotation for detector training.

Object object-detection +1

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

1 code implementation5 Dec 2024 Bo Tong, Bokai Lai, Yiyi Zhou, Gen Luo, Yunhang Shen, Ke Li, Xiaoshuai Sun, Rongrong Ji

Despite a big leap forward in capability, multimodal large language models (MLLMs) tend to behave like a sloth in practical use, i. e., slow response and large latency.

Descriptive

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification

1 code implementation1 Dec 2024 Wenxuan Huang, Zijie Zhai, Yunhang Shen, Shaoshen Cao, Fei Zhao, Xiangfeng Xu, Zheyu Ye, Shaohui Lin

To address this problem, we proposed a dynamic vision-language context sparsification framework Dynamic-LLaVA, which dynamically reduces the redundancy of vision context in the prefill stage and decreases the memory and computation overhead of the generated language context during decoding.

Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment

no code implementations13 Nov 2024 Zihao Huang, Xudong Li, Bohan Fu, Xiaohui Chu, Ke Li, Yunhang Shen, Yan Zhang

This paper addresses two primary challenges: the significant redundancy of information across different scales, and the confusion caused by combining features from these scales, which may vary widely in quality.

Blind Image Quality Assessment Contrastive Learning

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

no code implementations1 Nov 2024 Xiong Wang, Yangze Li, Chaoyou Fu, Yunhang Shen, Lei Xie, Ke Li, Xing Sun, Long Ma

Our main contribution is that the speech input and output modalities can be easily connected to a textual LLM while keeping the LLM's parameters frozen throughout the training process.

VITA: Towards Open-Source Interactive Omni Multimodal LLM

1 code implementation9 Aug 2024 Chaoyou Fu, Haojia Lin, Zuwei Long, Yunhang Shen, Meng Zhao, Yifan Zhang, Shaoqi Dong, Xiong Wang, Di Yin, Long Ma, Xiawu Zheng, Ran He, Rongrong Ji, Yunsheng Wu, Caifeng Shan, Xing Sun

The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas.

Language Modelling Large Language Model +2

HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

1 code implementation27 Jun 2024 Liujuan Cao, Jianghang Lin, Zebo Hong, Yunhang Shen, Shaohui Lin, Chao Chen, Rongrong Ji

Most WSOD methods rely on traditional object proposals to generate candidate regions and are confronted with unstable training, which easily gets stuck in a poor local optimum.

Object object-detection +1

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

no code implementations14 Jun 2024 Chenyu Zhou, Mengdan Zhang, Peixian Chen, Chaoyou Fu, Yunhang Shen, Xiawu Zheng, Xing Sun, Rongrong Ji

In support of this task, we further craft a new VEGA dataset, tailored for the IITC task on scientific content, and devised a subtask, Image-Text Association (ITA), to refine image-text correlation skills.

Reading Comprehension

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

1 code implementation31 May 2024 Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei LI, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing Sun

With Video-MME, we extensively evaluate various state-of-the-art MLLMs, including GPT-4 series and Gemini 1. 5 Pro, as well as open-source image models like InternVL-Chat-V1. 5 and video models like LLaVA-NeXT-Video.

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

no code implementations24 Apr 2024 Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji

This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability.

Decision Making Logical Reasoning +1

Multi-Modal Prompt Learning on Blind Image Quality Assessment

1 code implementation23 Apr 2024 Wensheng Pan, Timin Gao, Yan Zhang, Runze Hu, Xiawu Zheng, Enwei Zhang, Yuting Gao, Yutao Liu, Yunhang Shen, Ke Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.

Fusion-Mamba for Cross-modality Object Detection

no code implementations14 Apr 2024 Wenhao Dong, Haodong Zhu, Shaohui Lin, Xiaoyan Luo, Yunhang Shen, Xuhui Liu, Juan Zhang, Guodong Guo, Baochang Zhang

In this paper, we investigate cross-modality fusion by associating cross-modal features in a hidden state space based on an improved Mamba with a gating mechanism.

Mamba Object +2

A General and Efficient Training for Transformer via Token Expansion

1 code implementation CVPR 2024 Wenxuan Huang, Yunhang Shen, Jiao Xie, Baochang Zhang, Gaoqi He, Ke Li, Xing Sun, Shaohui Lin

The remarkable performance of Vision Transformers (ViTs) typically requires an extremely large training cost.

Rethinking Centered Kernel Alignment in Knowledge Distillation

1 code implementation22 Jan 2024 Zikai Zhou, Yunhang Shen, Shitong Shao, Linrui Gong, Shaohui Lin

This paper first provides a theoretical perspective to illustrate the effectiveness of CKA, which decouples CKA to the upper bound of Maximum Mean Discrepancy~(MMD) and a constant term.

Image Classification Knowledge Distillation +2

Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud Semantic Segmentation via Decoupling Optimization

no code implementations13 Jan 2024 Mengtian Li, Shaohui Lin, Zihan Wang, Yunhang Shen, Baochang Zhang, Lizhuang Ma

Semi-supervised learning (SSL), thanks to the significant reduction of data annotation costs, has been an active research topic for large-scale 3D scene understanding.

Pseudo Label Representation Learning +2

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

1 code implementation CVPR 2024 Xinzi Cao, Xiawu Zheng, Guanhong Wang, Weijiang Yu, Yunhang Shen, Ke Li, Yutong Lu, Yonghong Tian

The LER optimizes the distribution of potential known class samples in unlabeled data thus ensuring the preservation of knowledge related to known categories while learning novel classes.

Weakly Supervised Open-Vocabulary Object Detection

no code implementations19 Dec 2023 Jianghang Lin, Yunhang Shen, Bingquan Wang, Shaohui Lin, Ke Li, Liujuan Cao

Despite weakly supervised object detection (WSOD) being a promising step toward evading strong instance-level annotations, its capability is confined to closed-set categories within a single training dataset.

Attribute Novel Concepts +6

SPD-DDPM: Denoising Diffusion Probabilistic Models in the Symmetric Positive Definite Space

1 code implementation13 Dec 2023 Yunchen Li, Zhou Yu, Gaoqi He, Yunhang Shen, Ke Li, Xing Sun, Shaohui Lin

On the other hand, the model unconditionally learns the probability distribution of the data $p(X)$ and generates samples that conform to this distribution.

Denoising Traffic Prediction

Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity

no code implementations11 Dec 2023 Xudong Li, Timin Gao, Runze Hu, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Jingyuan Zheng, Yunhang Shen, Ke Li, Yutao Liu, Pingyang Dai, Rongrong Ji

Specifically, QFM-IQM enhances the semantic noise distinguish capabilities by matching image pairs with similar quality scores but varying semantic features as adversarial semantic noise and adaptively adjusting the upstream task's features by reducing sensitivity to adversarial noise perturbation.

Contrastive Learning feature selection +1

Aligning and Prompting Everything All at Once for Universal Visual Perception

2 code implementations CVPR 2024 Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, Shaohui Lin, Rongrong Ji

However, predominant paradigms, driven by casting instance-level tasks as an object-word alignment, bring heavy cross-modality interaction, which is not effective in prompting object detection and visual grounding.

Object object-detection +6

Less is More: Learning Reference Knowledge Using No-Reference Image Quality Assessment

no code implementations1 Dec 2023 Xudong Li, Jingyuan Zheng, Xiawu Zheng, Runze Hu, Enwei Zhang, Yuting Gao, Yunhang Shen, Ke Li, Yutao Liu, Pingyang Dai, Yan Zhang, Rongrong Ji

Concretely, by innovatively introducing a novel feature distillation method in IQA, we propose a new framework to learn comparative knowledge from non-aligned reference images.

Inductive Bias NR-IQA

Woodpecker: Hallucination Correction for Multimodal Large Language Models

1 code implementation24 Oct 2023 Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, Enhong Chen

Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content.

Hallucination

Filter Pruning for Efficient CNNs via Knowledge-driven Differential Filter Sampler

1 code implementation1 Jul 2023 Shaohui Lin, Wenxuan Huang, Jiao Xie, Baochang Zhang, Yunhang Shen, Zhou Yu, Jungong Han, David Doermann

In this paper, we propose a novel Knowledge-driven Differential Filter Sampler~(KDFS) with Masked Filter Modeling~(MFM) framework for filter pruning, which globally prunes the redundant filters based on the prior knowledge of a pre-trained model in a differential and non-alternative optimization.

Decoder Image Classification +1

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

3 code implementations23 Jun 2023 Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, Yunsheng Wu, Rongrong Ji

Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image.

Benchmarking Language Modelling +4

Active Teacher for Semi-Supervised Object Detection

1 code implementation CVPR 2022 Peng Mi, Jianghang Lin, Yiyi Zhou, Yunhang Shen, Gen Luo, Xiaoshuai Sun, Liujuan Cao, Rongrong Fu, Qiang Xu, Rongrong Ji

In this paper, we study teacher-student learning from the perspective of data initialization and propose a novel algorithm called Active Teacher(Source code are available at: \url{https://github. com/HunterJ-Lin/ActiveTeacher}) for semi-supervised object detection (SSOD).

Diversity Object +3

FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning

1 code implementation1 Dec 2022 Yulei Qin, Xingyu Chen, Chao Chen, Yunhang Shen, Bo Ren, Yun Gu, Jie Yang, Chunhua Shen

Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain.

Contrastive Learning Representation Learning

ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement

1 code implementation25 Sep 2022 Dongli Tan, Jiang-Jiang Liu, Xingyu Chen, Chao Chen, Ruixin Zhang, Yunhang Shen, Shouhong Ding, Rongrong Ji

In this paper, we propose an efficient structure named Efficient Correspondence Transformer (ECO-TR) by finding correspondences in a coarse-to-fine manner, which significantly improves the efficiency of functional correspondence model.

Outlier Detection

LAB-Net: LAB Color-Space Oriented Lightweight Network for Shadow Removal

1 code implementation27 Aug 2022 Hong Yang, Gongrui Nan, Mingbao Lin, Fei Chao, Yunhang Shen, Ke Li, Rongrong Ji

Finally, the LSA modules are further developed to fully use the prior information in non-shadow regions to cleanse the shadow regions.

Shadow Removal

Efficient Decoder-free Object Detection with Transformers

2 code implementations14 Jun 2022 Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao, Xing Sun, Ke Li, Chunhua Shen

A natural usage of ViTs in detection is to replace the CNN-based backbone with a transformer-based backbone, which is straightforward and effective, with the price of bringing considerable computation burden for inference.

Decoder Object +1

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation

1 code implementation2 Apr 2022 Jing He, Yiyi Zhou, Qi Zhang, Jun Peng, Yunhang Shen, Xiaoshuai Sun, Chao Chen, Rongrong Ji

Pixel synthesis is a promising research paradigm for image generation, which can well exploit pixel-wise prior knowledge for generation.

Image Generation regression

End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation

1 code implementation1 Apr 2022 Mingrui Wu, Jiaxin Gu, Yunhang Shen, Mingbao Lin, Chao Chen, Xiaoshuai Sun

Extensive experiments on HICO-Det dataset demonstrate that our model discovers potential interactive pairs and enables the recognition of unseen HOIs.

Human-Object Interaction Detection Knowledge Distillation +4

SeqTR: A Simple yet Universal Network for Visual Grounding

3 code implementations30 Mar 2022 Chaoyang Zhu, Yiyi Zhou, Yunhang Shen, Gen Luo, Xingjia Pan, Mingbao Lin, Chao Chen, Liujuan Cao, Xiaoshuai Sun, Rongrong Ji

In this paper, we propose a simple yet universal network termed SeqTR for visual grounding tasks, e. g., phrase localization, referring expression comprehension (REC) and segmentation (RES).

Decoder Referring Expression +3

CF-ViT: A General Coarse-to-Fine Method for Vision Transformer

1 code implementation8 Mar 2022 Mengzhao Chen, Mingbao Lin, Ke Li, Yunhang Shen, Yongjian Wu, Fei Chao, Rongrong Ji

Our proposed CF-ViT is motivated by two important observations in modern ViT models: (1) The coarse-grained patch splitting can locate informative regions of an input image.

Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks

1 code implementation8 Mar 2022 Yunshan Zhong, Mingbao Lin, Xunchao Li, Ke Li, Yunhang Shen, Fei Chao, Yongjian Wu, Rongrong Ji

However, these methods suffer from severe performance degradation when quantizing the SR models to ultra-low precision (e. g., 2-bit and 3-bit) with the low-cost layer-wise quantizer.

Quantization Super-Resolution

HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization

1 code implementation CVPR 2022 Mengtian Li, Yuan Xie, Yunhang Shen, Bo Ke, Ruizhi Qiao, Bo Ren, Shaohui Lin, Lizhuang Ma

To address the huge labeling cost in large-scale point cloud semantic segmentation, we propose a novel hybrid contrastive regularization (HybridCR) framework in weakly-supervised setting, which obtains competitive performance compared to its fully-supervised counterpart.

Diversity Semantic Segmentation +2

LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization

no code implementations10 Dec 2021 Zhiwei Chen, Changan Wang, Yabiao Wang, Guannan Jiang, Yunhang Shen, Ying Tai, Chengjie Wang, Wei zhang, Liujuan Cao

In this paper, we propose a novel framework built upon the transformer, termed LCTR (Local Continuity TRansformer), which targets at enhancing the local perception capability of global features among long-range feature dependencies.

Inductive Bias Object +1

Fine-grained Data Distribution Alignment for Post-Training Quantization

1 code implementation9 Sep 2021 Yunshan Zhong, Mingbao Lin, Mengzhao Chen, Ke Li, Yunhang Shen, Fei Chao, Yongjian Wu, Rongrong Ji

While post-training quantization receives popularity mostly due to its evasion in accessing the original complete training dataset, its poor performance also stems from scarce images.

Quantization

UWSOD: Toward Fully-Supervised-Level Capacity Weakly Supervised Object Detection

1 code implementation NeurIPS 2020 Yunhang Shen, Rongrong Ji, Zhiwei Chen, Yongjian Wu, Feiyue Huang

In this paper, we propose a unified WSOD framework, termed UWSOD, to develop a high-capacity general detection model with only image-level labels, which is self-contained and does not require external modules or additional supervision.

Object object-detection +2

Noise-Aware Fully Webly Supervised Object Detection

no code implementations CVPR 2020 Yunhang Shen, Rongrong Ji, Zhiwei Chen, Xiaopeng Hong, Feng Zheng, Jianzhuang Liu, Mingliang Xu, Qi Tian

We investigate the emerging task of learning object detectors with sole image-level labels on the web without requiring any other supervision like precise annotations or additional images from well-annotated benchmark datasets.

Object object-detection +1

Cyclic Guidance for Weakly Supervised Joint Detection and Segmentation

1 code implementation CVPR 2019 Yunhang Shen, Rongrong Ji, Yan Wang, Yongjian Wu, Liujuan Cao

In this paper, we join weakly supervised object detection and segmentation tasks with a multi-task learning scheme for the first time, which uses their respective failure patterns to complement each other's learning.

Image-level Supervised Instance Segmentation Multi-Task Learning +6

Cannot find the paper you are looking for? You can Submit a new open access paper.