Search Results for author: Yanning Zhang

Found 172 papers, 55 papers with code

Language Embedding Meets Dynamic Graph: A New Exploration for Neural Architecture Representation Learning

no code implementations9 Jun 2025 Haizhao Jing, Haokui Zhang, Zhenhao Shang, Rong Xiao, Peng Wang, Yanning Zhang

Specifically, inspired by large language models (LLMs), we propose a language embedding framework where both neural architectures and hardware platform specifications are projected into a unified semantic space through tokenization and LLM processing, enabling zero-shot prediction across different hardware platforms for the first time.

Attribute Graph Representation Learning

Application of convolutional neural networks in image super-resolution

no code implementations3 Jun 2025 Chunwei Tian, Mingjian Song, WangMeng Zuo, Bo Du, Yanning Zhang, Shichao Zhang

Due to strong learning abilities of convolutional neural networks (CNNs), they have become mainstream methods for image super-resolution.

Image Super-Resolution

Adaptive Spatial Augmentation for Semi-supervised Semantic Segmentation

no code implementations29 May 2025 Lingyan Ran, YaLi Li, Tao Zhuo, Shizhou Zhang, Yanning Zhang

In semi-supervised semantic segmentation (SSSS), data augmentation plays a crucial role in the weak-to-strong consistency regularization framework, as it enhances diversity and improves model generalization.

Data Augmentation Diversity +1

Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition

1 code implementation9 May 2025 Congqi Cao, Peiheng Han, Yueran Zhang, Yating Yu, Qinyi Lv, Lingtong Min, Yanning Zhang

Large-scale pre-trained models have achieved remarkable success in language and image tasks, leading an increasing number of studies to explore the application of pre-trained image models, such as CLIP, in the domain of few-shot action recognition (FSAR).

cross-modal alignment Few-Shot action recognition +2

No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves

1 code implementation5 May 2025 Dengyang Jiang, Mengmeng Wang, Liuzhuozheng Li, Lei Zhang, Haoyu Wang, Wei Wei, Guang Dai, Yanning Zhang, Jingdong Wang

Recent studies have demonstrated that learning a meaningful internal representation can both accelerate generative training and enhance the generation quality of diffusion transformers.

Image Generation Representation Learning

Vision and Intention Boost Large Language Model in Long-Term Action Anticipation

no code implementations3 May 2025 Congqi Cao, Lanshu Hu, Yating Yu, Yanning Zhang

To tackle these limitations single-modality methods face, we propose a novel Intention-Conditioned Vision-Language (ICVL) model in this study that fully leverages the rich semantic information of visual data and the powerful reasoning capabilities of LLMs.

Action Anticipation In-Context Learning +4

FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement

1 code implementation27 Apr 2025 Kangbiao Shi, Yixu Feng, Tao Hu, Yu Cao, Peng Wu, Yijin Liang, Yanning Zhang, Qingsen Yan

The advent of Deep Neural Networks (DNNs) has driven remarkable progress in low-light image enhancement (LLIE), with diverse architectures (e. g., CNNs and Transformers) and color spaces (e. g., sRGB, HSV, HVI) yielding impressive results.

Low-Light Image Enhancement

SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model

no code implementations14 Apr 2025 Zongcan Ding, Haodong Zhang, Peng Wu, Guansong Pang, Zhiwei Yang, Peng Wang, Yanning Zhang

Extensive experiments on four benchmarks demonstrate that SlowFastVAD effectively combines the strengths of both fast and slow detectors, and achieves remarkable detection accuracy and interpretability with significantly reduced computational overhead, making it well-suited for real-world VAD applications with high reliability requirements.

Anomaly Detection Domain Adaptation +6

Boosting HDR Image Reconstruction via Semantic Knowledge Transfer

no code implementations19 Mar 2025 Qingsen Yan, Tao Hu, Genggeng Chen, Wei Dong, Yanning Zhang

Recovering High Dynamic Range (HDR) images from multiple Low Dynamic Range (LDR) images becomes challenging when the LDR images exhibit noticeable degradation and missing content.

HDR Reconstruction Image Reconstruction +1

AxisPose: Model-Free Matching-Free Single-Shot 6D Object Pose Estimation via Axis Generation

no code implementations9 Mar 2025 Yang Zou, Zhaoshuai Qi, Yating Liu, Zihao Xu, Weipeng Sun, Weiyi Liu, Xingyuan Li, Jiaqi Yang, Yanning Zhang

Specifically, AxisPose constructs an Axis Generation Module (AGM) to capture the latent geometric distribution of object axes through a diffusion model.

3D Feature Matching 6D Pose Estimation +4

HVI: A New color space for Low-light Image Enhancement

1 code implementation CVPR 2025 Qingsen Yan, Yixu Feng, Cheng Zhang, Guansong Pang, Kangbiao Shi, Peng Wu, Wei Dong, Jinqiu Sun, Yanning Zhang

Low-Light Image Enhancement (LLIE) is a crucial computer vision task that aims to restore detailed visual information from corrupted low-light images.

Low-Light Image Enhancement

C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation

1 code implementation27 Feb 2025 Yuhao Li, Mirana Claire Angel, Salman Khan, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan

Furthermore, we introduce a new video object interaction (VOI) dataset to evaluate the generation quality of motion controlled video generation methods.

Object Video Generation

Learning to Generalize without Bias for Open-Vocabulary Action Recognition

no code implementations27 Feb 2025 Yating Yu, Congqi Cao, Yifan Zhang, Yanning Zhang

Leveraging the effective visual-text alignment and static generalizability from CLIP, recent video learners adopt CLIP initialization with further regularization or recombination for generalization in open-vocabulary action recognition in-context.

Meta-Learning Open Vocabulary Action Recognition

MoE$^2$: Optimizing Collaborative Inference for Edge Large Language Models

no code implementations16 Jan 2025 Lyudong Jin, Yanning Zhang, Yanhan Li, Shurong Wang, Howard H. Yang, Jian Wu, Meng Zhang

Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing tasks.

Collaborative Inference

Efficient Decoupled Feature 3D Gaussian Splatting via Hierarchical Compression

no code implementations CVPR 2025 Zhenqi Dai, Ting Liu, Yanning Zhang

To mitigate this, we propose Decoupled Feature 3D Gaussian Splatting (DF-3DGS), a novel method that decouples the color and semantic fields, thereby reducing the number of 3D Gaussians required for semantic representation.

3DGS Quantization +1

Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding

1 code implementation CVPR 2025 Wei Suo, Lijun Zhang, Mengyang Sun, Lin Yuanbo Wu, Peng Wang, Yanning Zhang

Large Vision-Language Models (LVLMs) have obtained impressive performance in visual content understanding and multi-modal reasoning.

Hallucination

Low-Biased General Annotated Dataset Generation

no code implementations CVPR 2025 Dengyang Jiang, Haoyu Wang, Lei Zhang, Wei Wei, Guang Dai, Mengmeng Wang, Jingdong Wang, Yanning Zhang

Pre-training backbone networks on a general annotated dataset (e. g., ImageNet) that comprises numerous manually collected images with category annotations has proven to be indispensable for enhancing the generalization capacity of downstream visual tasks.

Dataset Generation Image Generation

Dual-Granularity Semantic Guided Sparse Routing Diffusion Model for General Pansharpening

1 code implementation CVPR 2025 Yinghui Xing, Litao Qu, Shizhou Zhang, Di Xu, Yingkun Yang, Yanning Zhang

To address the domain gap produced by varying satellite sensors and distinct scenes, we propose a dual-granularity semantic guided sparse routing diffusion model for general pansharpening.

Pansharpening

Revisiting Generative Replay for Class Incremental Object Detection

1 code implementation CVPR 2025 Shizhou Zhang, Xueqiang Lv, Yinghui Xing, Qirui Wu, Di Xu, Yanning Zhang

Furthermore, we propose to use a Similarity-based Cross Sampling mechanism to select more valuable confusing data between old and new tasks to more effectively mitigate catastrophic forgetting and reduce the false alarm rate for the new task.

class-incremental learning Class Incremental Learning +4

3D Registration in 30 Years: A Survey

1 code implementation18 Dec 2024 Jiaqi Yang, Chu'ai Zhang, Zhengbao Wang, Xinyue Cao, Xuan Ouyang, Xiyu Zhang, Zhenxuan Zeng, Zhao Zeng, Borui Lu, Zhiyi Xia, Qian Zhang, Yulan Guo, Yanning Zhang

3D point cloud registration is a fundamental problem in computer vision, computer graphics, robotics, remote sensing, and etc.

Point Cloud Registration Survey

Unbiased General Annotated Dataset Generation

no code implementations14 Dec 2024 Dengyang Jiang, Haoyu Wang, Lei Zhang, Wei Wei, Guang Dai, Mengmeng Wang, Jingdong Wang, Yanning Zhang

Pre-training backbone networks on a general annotated dataset (e. g., ImageNet) that comprises numerous manually collected images with category annotations has proven to be indispensable for enhancing the generalization capacity of downstream visual tasks.

Dataset Generation Image Generation

Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models

no code implementations9 Dec 2024 Wei Suo, Ji Ma, Mengyang Sun, Lin Yuanbo Wu, Peng Wang, Yanning Zhang

Although Large Vision-Language Models (LVLMs) have achieved impressive results, their high computational cost poses a significant barrier to wider application.

All Self-Supervised Learning

Hyperspectral Image Spectral-Spatial Feature Extraction via Tensor Principal Component Analysis

no code implementations8 Dec 2024 Yuemei Ren, Liang Liao, Stephen John Maybank, Yanning Zhang, Xin Liu

This paper addresses the challenge of spectral-spatial feature extraction for hyperspectral image classification by introducing a novel tensor-based framework.

Hyperspectral image analysis Hyperspectral Image Classification +1

Sustainable Self-evolution Adversarial Training

no code implementations3 Dec 2024 Wenxuan Wang, Chenglei Wang, Huihui Qi, Menghao Ye, Xuelin Qian, Peng Wang, Yanning Zhang

With the wide application of deep neural network models in various computer vision tasks, there has been a proliferation of adversarial example generation strategies aimed at deeply exploring model security.

Adversarial Defense Continual Learning

CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning

no code implementations25 Nov 2024 Duo Wu, Jinghe Wang, Yuan Meng, Yanning Zhang, Le Sun, Zhi Wang

To push this paradigm toward practical applications, it is crucial for LLMs to consider tool execution costs (e. g. execution time) for tool planning.

Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution

1 code implementation19 Nov 2024 Yang Zou, Zhixin Chen, Zhipeng Zhang, Xingyuan Li, Long Ma, JinYuan Liu, Peng Wang, Yanning Zhang

In this work, we emphasize the infrared spectral distribution fidelity and propose a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity.

Image Enhancement Image Super-Resolution +2

Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning

no code implementations3 Nov 2024 Fei Zhou, Peng Wang, Lei Zhang, Zhenghua Chen, Wei Wei, Chen Ding, Guosheng Lin, Yanning Zhang

Meta-learning offers a promising avenue for few-shot learning (FSL), enabling models to glean a generalizable feature embedding through episodic training on synthetic FSL tasks in a source domain.

cross-domain few-shot learning

Day-Night Adaptation: An Innovative Source-free Adaptation Framework for Medical Image Segmentation

no code implementations17 Oct 2024 Ziyang Chen, Yiwen Ye, Yongsheng Pan, Jingfeng Zhang, Yanning Zhang, Yong Xia

To facilitate adaptation while preserving data privacy, source-free domain adaptation (SFDA) and test-time adaptation (TTA) have emerged as effective paradigms, relying solely on target domain data.

Image Segmentation Medical Image Segmentation +4

UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation

1 code implementation30 Sep 2024 Cheng Zhang, Dong Gong, Jiumei He, Yu Zhu, Jinqiu Sun, Yanning Zhang

Inspired by the success of deep generative models and fine-tuning techniques, we proposed a universal image restoration framework based on multiple low-rank adapters (LoRA) from multi-domain transfer learning.

Multi-Task Learning Unified Image Restoration

Deep Learning for Video Anomaly Detection: A Review

no code implementations9 Sep 2024 Peng Wu, Chengyu Pan, Yuting Yan, Guansong Pang, Peng Wang, Yanning Zhang

Video anomaly detection (VAD) aims to discover behaviors or events deviating from the normality in videos.

Anomaly Detection Deep Learning +1

Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

2 code implementations14 Aug 2024 Shizhou Zhang, Wenlong Luo, De Cheng, Qingchun Yang, Lingyan Ran, Yinghui Xing, Yanning Zhang

In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185, 907 images and 5, 576 tracklets, featuring 2, 788 distinct identities.

Language Modeling Language Modelling +1

Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts

no code implementations12 Aug 2024 Peng Wu, Xuerong Zhou, Guansong Pang, Zhiwei Yang, Qingsen Yan, Peng Wang, Yanning Zhang

Existing works typically involve extracting global features from full-resolution video frames and training frame-level classifiers to detect anomalies in the temporal dimension.

Anomaly Detection Event Detection +3

Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition

no code implementations1 Aug 2024 Congqi Cao, Yueran Zhang, Yating Yu, Qinyi Lv, Lingtong Min, Yanning Zhang

Existing works in few-shot action recognition mostly fine-tune a pre-trained image model and design sophisticated temporal alignment modules at feature level.

Few-Shot action recognition Few Shot Action Recognition

A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap

1 code implementation31 Jul 2024 Lijun Zhang, Wei Suo, Peng Wang, Yanning Zhang

On one hand, considering the crucial role of human-object pairs information in HOI tasks, the feature alignment module aligns the human-object pairs by aggregating instance information.

Human-Object Interaction Detection Image Reconstruction +3

Visual Prompt Selection for In-Context Learning Segmentation

1 code implementation14 Jul 2024 Wei Suo, Lanqing Lai, Mengyang Sun, Hanwang Zhang, Peng Wang, Yanning Zhang

As a fundamental and extensively studied task in computer vision, image segmentation aims to locate and identify different semantic concepts at the pixel level.

Diversity Image Segmentation +3

Indoor 3D Reconstruction with an Unknown Camera-Projector Pair

no code implementations2 Jul 2024 Zhaoshuai Qi, Yifeng Hao, Rui Hu, Wenyou Chang, Jiaqi Yang, Yanning Zhang

Structured light-based method with a camera-projector pair (CPP) plays a vital role in indoor 3D reconstruction, especially for scenes with weak textures.

3D Reconstruction

Visual Prompt Tuning in Null Space for Continual Learning

1 code implementation9 Jun 2024 Yue Lu, Shizhou Zhang, De Cheng, Yinghui Xing, Nannan Wang, Peng Wang, Yanning Zhang

Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL), by selecting and updating relevant prompts in the vision-transformer models.

Continual Learning Visual Prompt Tuning

Multi-Granularity Language-Guided Multi-Object Tracking

1 code implementation7 Jun 2024 Yuhao Li, Muzammal Naseer, Jiale Cao, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan

To this end, we propose a new multi-object tracking framework, named LG-MOT, that explicitly leverages language information at different levels of granularity (scene-and instance-level) and combines it with standard visual features to obtain discriminative representations.

Multi-Object Tracking Object +1

RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision

no code implementations29 May 2024 Jinzhong Wang, Xuetao Tian, Shun Dai, Tao Zhuo, Haorui Zeng, Hongjuan Liu, Jiaqi Liu, Xiuwei Zhang, Yanning Zhang

Multispectral object detection, utilizing both visible (RGB) and thermal infrared (T) modals, has garnered significant attention for its robust performance across diverse weather and lighting conditions.

Multispectral Object Detection Object +2

C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning

no code implementations21 May 2024 Ji Ma, Wei Suo, Peng Wang, Yanning Zhang

Vision-Language Instruction Tuning (VLIT) is a critical training phase for Large Vision-Language Models (LVLMs).

Contrastive Learning

Dual-Modal Prompting for Sketch-Based Image Retrieval

no code implementations29 Apr 2024 Liying Gao, Bingliang Jiao, Peng Wang, Shizhou Zhang, Hanwang Zhang, Yanning Zhang

In this study, we aim to tackle two major challenges of this task simultaneously: i) zero-shot, dealing with unseen categories, and ii) fine-grained, referring to intra-category instance-level retrieval.

Retrieval Sketch-Based Image Retrieval

CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task

1 code implementation22 Apr 2024 Kangzhen Yang, Tao Hu, Kexin Dai, Genggeng Chen, Yu Cao, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images.

Deblurring Denoising +2

NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

3 code implementations22 Apr 2024 Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin, Yu Zhu, Peng Wu, Wei Dong, Jinqiu Sun, Yanning Zhang, Qingsen Yan, Wenbin Zou, Weipeng Yang, Yunxiang Li, Qiaomu Wei, Tian Ye, Sixiang Chen, Zhao Zhang, Suiyi Zhao, Bo wang, Yan Luo, Zhichao Zuo, Mingshen Wang, Junhu Wang, Yanyan Wei, Xiaopeng Sun, Yu Gao, Jiancheng Huang, Hongming Chen, Xiang Chen, Hui Tang, Yuanbin Chen, Yuanbo Zhou, Xinwei Dai, Xintao Qiu, Wei Deng, Qinquan Gao, Tong Tong, Mingjia Li, Jin Hu, Xinyu He, Xiaojie Guo, sabarinathan, K Uma, A Sasithradevi, B Sathya Bama, S. Mohamed Mansoor Roomi, V. Srivatsav, Jinjuan Wang, Long Sun, Qiuying Chen, Jiahong Shao, Yizhi Zhang, Marcos V. Conde, Daniel Feijoo, Juan C. Benito, Alvaro García, Jaeho Lee, Seongwan Kim, Sharif S M A, Nodirkhuja Khujaev, Roman Tsoy, Ali Murtaza, Uswah Khairuddin, Ahmad 'Athif Mohd Faudzi, Sampada Malagi, Amogh Joshi, Nikhil Akalwadi, Chaitra Desai, Ramesh Ashok Tabib, Uma Mudenagudi, Wenyi Lian, Wenjing Lian, Jagadeesh Kalyanshetti, Vijayalaxmi Ashok Aralikatti, Palani Yashaswini, Nitish Upasi, Dikshit Hegde, Ujwala Patil, Sujata C, Xingzhuo Yan, Wei Hao, Minghan Fu, Pooja Choksy, Anjali Sarvaiya, Kishor Upla, Kiran Raja, Hailong Yan, Yunkai Zhang, Baiang Li, Jingyi Zhang, Huan Zheng

This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results.

4k Low-Light Image Enhancement +1

GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

1 code implementation CVPR 2024 Jiang Wu, Rui Li, Haofei Xu, Wenxun Zhao, Yu Zhu, Jinqiu Sun, Yanning Zhang

More specifically, we correspond and propagate adjacent costs to the reference pixel by leveraging the local geometric smoothness in conjunction with surface normals.

3D Reconstruction

Generating Content for HDR Deghosting from Frequency View

no code implementations CVPR 2024 Tao Hu, Qingsen Yan, Yuankai Qi, Yanning Zhang

To address this challenge, we propose the Low-Frequency aware Diffusion (LF-Diff) model for ghost-free HDR imaging.

HDR Reconstruction regression

A self-supervised CNN for image watermark removal

1 code implementation9 Mar 2024 Chunwei Tian, Menghua Zheng, Tiancai Jiao, WangMeng Zuo, Yanning Zhang, Chia-Wen Lin

Popular convolutional neural networks mainly use paired images in a supervised way for image watermark removal.

Perceptive self-supervised learning network for noisy image watermark removal

1 code implementation4 Mar 2024 Chunwei Tian, Menghua Zheng, Bo Li, Yanning Zhang, Shichao Zhang, David Zhang

Specifically, mentioned paired watermark images are obtained in a self supervised way, and paired noisy images (i. e., noisy and reference images) are obtained in a supervised way.

Self-Supervised Learning

Semi-Supervised Semantic Segmentation Based on Pseudo-Labels: A Survey

no code implementations4 Mar 2024 Lingyan Ran, YaLi Li, Guoqiang Liang, Yanning Zhang

Semantic segmentation is an important and popular research area in computer vision that focuses on classifying pixels in an image based on their semantics.

Image Segmentation Pseudo Label +3

You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement

1 code implementation8 Feb 2024 Qingsen Yan, Yixu Feng, Cheng Zhang, Pei Wang, Peng Wu, Wei Dong, Jinqiu Sun, Yanning Zhang

Further, we design a novel Color and Intensity Decoupling Network (CIDNet) with two branches dedicated to processing the decoupled image brightness and color in the HVI space.

Low-light Image Deblurring and Enhancement Low-Light Image Enhancement

Instance by Instance: An Iterative Framework for Multi-instance 3D Registration

no code implementations6 Feb 2024 Xinyue Cao, Xiyu Zhang, Yuxin Cheng, Zhaoshuai Qi, Yanning Zhang, Jiaqi Yang

Multi-instance registration is a challenging problem in computer vision and robotics, where multiple instances of an object need to be registered in a standard coordinate system.

Boosting Multi-view Stereo with Late Cost Aggregation

1 code implementation22 Jan 2024 Jiang Wu, Rui Li, Yu Zhu, Wenxun Zhao, Jinqiu Sun, Yanning Zhang

To address this challenge, we present a late aggregation approach that allows for aggregating pairwise costs throughout the network feed-forward process, achieving accurate estimations with only minor changes of the plain CasMVSNet.

Blocking Geometric Matching

CrossDiff: Exploring Self-Supervised Representation of Pansharpening via Cross-Predictive Diffusion Model

no code implementations10 Jan 2024 Yinghui Xing, Litao Qu, Shizhou Zhang, Kai Zhang, Yanning Zhang

Fusion of a panchromatic (PAN) image and corresponding multispectral (MS) image is also known as pansharpening, which aims to combine abundant spatial details of PAN and spectral information of MS. Due to the absence of high-resolution MS images, available deep-learning-based methods usually follow the paradigm of training at reduced resolution and testing at both reduced and full resolution.

Pansharpening

DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based Single Image Super-resolution

no code implementations30 Nov 2023 Axi Niu, Kang Zhang, Joshua Tian Jin Tee, Trung X. Pham, Jinqiu Sun, Chang D. Yoo, In So Kweon, Yanning Zhang

It is well known the adversarial optimization of GAN-based image super-resolution (SR) methods makes the preceding SR model generate unpleasant and undesirable artifacts, leading to large distortion.

Attribute Data Augmentation +1

Open-Vocabulary Video Anomaly Detection

no code implementations CVPR 2024 Peng Wu, Xuerong Zhou, Guansong Pang, Yujia Sun, Jing Liu, Peng Wang, Yanning Zhang

Particularly, we devise a semantic knowledge injection module to introduce semantic knowledge from large language models for the detection task, and design a novel anomaly synthesis module to generate pseudo unseen anomaly videos with the help of large vision generation models for the classification task.

Anomaly Detection Weakly-supervised Video Anomaly Detection

Multiple Object Tracking based on Occlusion-Aware Embedding Consistency Learning

no code implementations5 Nov 2023 Yaoqi Hu, Axi Niu, Yu Zhu, Qingsen Yan, Jinqiu Sun, Yanning Zhang

The OPM predicts occlusion information for each true detection, facilitating the selection of valid samples for consistency learning of the track's visual embedding.

Multiple Object Tracking Object +1

Towards High-quality HDR Deghosting with Conditional Diffusion Models

no code implementations2 Nov 2023 Qingsen Yan, Tao Hu, Yuan Sun, Hao Tang, Yu Zhu, Wei Dong, Luc van Gool, Yanning Zhang

To address this challenge, we formulate the HDR deghosting problem as an image generation that leverages LDR features as the diffusion model's condition, consisting of the feature condition generator and the noise predictor.

Denoising Image Generation

Adapt Anything: Tailor Any Image Classifiers across Domains And Categories Using Text-to-Image Diffusion Models

no code implementations25 Oct 2023 WeiJie Chen, Haoyu Wang, Shicai Yang, Lei Zhang, Wei Wei, Yanning Zhang, Luojun Lin, Di Xie, Yueting Zhuang

Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as the corresponding unlabeled target data.

Domain Adaptation image-classification +1

A cross Transformer for image denoising

1 code implementation16 Oct 2023 Chunwei Tian, Menghua Zheng, WangMeng Zuo, Shichao Zhang, Yanning Zhang, Chia-Wen Ling

To avoid loss of key information, PB uses three heterogeneous networks to implement multiple interactions of multi-level features to broadly search for extra information for improving the adaptability of an obtained denoiser for complex scenes.

Image Denoising

Human-centric Behavior Description in Videos: New Benchmark and Model

no code implementations4 Oct 2023 Lingru Zhou, Yiqi Gao, Manqing Zhang, Peng Wu, Peng Wang, Yanning Zhang

To address this challenge, we construct a human-centric video surveillance captioning dataset, which provides detailed descriptions of the dynamic behaviors of 7, 820 individuals.

Video Captioning

Ground-to-Aerial Person Search: Benchmark Dataset and Approach

1 code implementation24 Aug 2023 Shizhou Zhang, Qingchun Yang, De Cheng, Yinghui Xing, Guoqiang Liang, Peng Wang, Yanning Zhang

In this work, we construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS, which contains 31, 770 images of 260, 559 annotated bounding boxes for 2, 644 identities appearing in both of the UAVs and ground surveillance cameras.

Knowledge Distillation Person Search

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

1 code implementation22 Aug 2023 Peng Wu, Xuerong Zhou, Guansong Pang, Lingru Zhou, Qingsen Yan, Peng Wang, Yanning Zhang

With the benefit of dual branch, VadCLIP achieves both coarse-grained and fine-grained video anomaly detection by transferring pre-trained knowledge from CLIP to WSVAD task.

Anomaly Detection Binary Classification +1

Learning multi-domain feature relation for visible and Long-wave Infrared image patch matching

no code implementations9 Aug 2023 Xiuwei Zhang, Yanping Li, Zhaoshuai Qi, Yi Sun, Yanning Zhang

Recently, learning-based algorithms have achieved promising performance on cross-spectral image patch matching, which, however, is still far from satisfactory for practical application.

Patch Matching Relation

Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization

1 code implementation9 Aug 2023 Tianyu Liu, Peng Zhang, Wei Huang, Yufei zha, Tao You, Yanning Zhang

By decoupling the gradients of visual and audio modalities, the discriminative visual representations of sound sources can be learned with the designed Induction Vector in a bootstrap manner, which also enables the audio modality to be aligned with the visual modality consistently.

Contrastive Learning Sound Source Localization

All-in-one Multi-degradation Image Restoration Network via Hierarchical Degradation Representation

no code implementations6 Aug 2023 Cheng Zhang, Yu Zhu, Qingsen Yan, Jinqiu Sun, Yanning Zhang

To address this issue, we propose a novel All-in-one Multi-degradation Image Restoration Network (AMIRNet) that can effectively capture and utilize accurate degradation representation for image restoration.

All Contrastive Learning +4

Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model

1 code implementation24 Jul 2023 Peng Wu, Jing Liu, Xiangteng He, Yuxin Peng, Peng Wang, Yanning Zhang

In this context, we propose a novel task called Video Anomaly Retrieval (VAR), which aims to pragmatically retrieve relevant anomalous videos by cross-modalities, e. g., language descriptions and synchronous audios.

Anomaly Detection Retrieval +2

Pre-train, Adapt and Detect: Multi-Task Adapter Tuning for Camouflaged Object Detection

no code implementations20 Jul 2023 Yinghui Xing, Dexuan Kong, Shizhou Zhang, Geng Chen, Lingyan Ran, Peng Wang, Yanning Zhang

Camouflaged object detection (COD), aiming to segment camouflaged objects which exhibit similar patterns with the background, is a challenging task.

Multi-Task Learning object-detection +1

VS-TransGRU: A Novel Transformer-GRU-based Framework Enhanced by Visual-Semantic Fusion for Egocentric Action Anticipation

no code implementations8 Jul 2023 Congqi Cao, Ze Sun, Qinyi Lv, Lingtong Min, Yanning Zhang

Egocentric action anticipation is a challenging task that aims to make advanced predictions of future actions from current and historical observations in the first-person view.

Action Anticipation Decoder

ACDMSR: Accelerated Conditional Diffusion Models for Single Image Super-Resolution

no code implementations3 Jul 2023 Axi Niu, Pham Xuan Trung, Kang Zhang, Jinqiu Sun, Yu Zhu, In So Kweon, Yanning Zhang

To speed up inference and further enhance the performance, our research revisits diffusion models in image super-resolution and proposes a straightforward yet significant diffusion model-based super-resolution method called ACDMSR (accelerated conditional diffusion model for image super-resolution).

Denoising Image Super-Resolution +1

Learning from Multi-Perception Features for Real-Word Image Super-resolution

no code implementations26 May 2023 Axi Niu, Kang Zhang, Trung X. Pham, Pei Wang, Jinqiu Sun, In So Kweon, Yanning Zhang

Currently, there are two popular approaches for addressing real-world image super-resolution problems: degradation-estimation-based and blind-based methods.

Image Super-Resolution

A New Comprehensive Benchmark for Semi-supervised Video Anomaly Detection and Anticipation

no code implementations CVPR 2023 Congqi Cao, Yue Lu, Peng Wang, Yanning Zhang

At present, it is the largest semi-supervised VAD dataset with the largest number of scenes and classes of anomalies, the longest duration, and the only one considering the scene-dependent anomaly.

Anomaly Detection Video Anomaly Detection

3D Registration with Maximal Cliques

1 code implementation CVPR 2023 Xiyu Zhang, Jiaqi Yang, Shikun Zhang, Yanning Zhang

The key insight is to loosen the previous maximum clique constraint, and mine more local consensus information in a graph for accurate pose hypotheses generation: 1) A compatibility graph is constructed to render the affinity relationship between initial correspondences.

Point Cloud Registration

Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes

1 code implementation CVPR 2023 Rui Li, Dong Gong, Wei Yin, Hao Chen, Yu Zhu, Kaixuan Wang, Xiaozhi Chen, Jinqiu Sun, Yanning Zhang

To let the geometric perception learned from multi-view cues in static areas propagate to the monocular representation in dynamic areas and let monocular cues enhance the representation of multi-view cost volume, we propose a cross-cue fusion (CCF) module, which includes the cross-cue attention (CCA) to encode the spatially non-local relative intra-relations from each source to enhance the representation of the other.

Autonomous Driving Depth Estimation

A Unified HDR Imaging Method with Pixel and Patch Level

no code implementations CVPR 2023 Qingsen Yan, Weiye Chen, Song Zhang, Yu Zhu, Jinqiu Sun, Yanning Zhang

The proposed HyHDRNet consists of a content alignment subnetwork and a Transformer-based fusion subnetwork.

MixCycle: Mixup Assisted Semi-Supervised 3D Single Object Tracking with Cycle Consistency

1 code implementation ICCV 2023 Qiao Wu, Jiaqi Yang, Kun Sun, Chu'ai Zhang, Yanning Zhang, Mathieu Salzmann

Specifically, we introduce two cycle-consistency strategies for supervision: 1) Self tracking cycles, which leverage labels to help the model converge better in the early stages of training; 2) forward-backward cycles, which strengthen the tracker's robustness to motion variations and the template noise caused by the template update strategy.

3D Single Object Tracking Data Augmentation +1

Co-Occurrence Matters: Learning Action Relation for Temporal Action Localization

no code implementations15 Mar 2023 Congqi Cao, Yizhe WANG, Yue Lu, Xin Zhang, Yanning Zhang

Existing works in this field mainly suffer from two weaknesses: (1) They often neglect the multi-label case and only focus on temporal modeling.

Relation Temporal Action Localization

PSNet: a deep learning model based digital phase shifting algorithm from a single fringe image

no code implementations14 Mar 2023 Zhaoshuai Qi, Xiaojun Liu, Xiaolin Liu, Jiaqi Yang, Yanning Zhang

As the gold standard for phase retrieval, phase-shifting algorithm (PS) has been widely used in optical interferometry, fringe projection profilometry, etc.

Retrieval

GRAN: Ghost Residual Attention Network for Single Image Super Resolution

no code implementations28 Feb 2023 Axi Niu, Pei Wang, Yu Zhu, Jinqiu Sun, Qingsen Yan, Yanning Zhang

GRAB consists of the Ghost Module and Channel and Spatial Attention Module (CSAM) to alleviate the generation of redundant features.

Image Super-Resolution

New Insights on Relieving Task-Recency Bias for Online Class Incremental Learning

1 code implementation16 Feb 2023 Guoqiang Liang, Zhaojie Chen, Zhaoqiang Chen, Shiyu Ji, Yanning Zhang

In all settings, the online class incremental learning (OCIL), where incoming samples from data stream can be used only once, is more challenging and can be encountered more frequently in real world.

class-incremental learning Class Incremental Learning +2

Take a Prior from Other Tasks for Severe Blur Removal

no code implementations14 Feb 2023 Pei Wang, Danna Xue, Yu Zhu, Jinqiu Sun, Qingsen Yan, Sung-Eui Yoon, Yanning Zhang

For general scene deblurring, the feature space of the blurry image and corresponding sharp image under the high-level vision task is closer, which inspires us to rely on other tasks (e. g. classification) to learn a comprehensive prior in severe blur removal cases.

Deblurring Image Deblurring +1

MS-DETR: Multispectral Pedestrian Detection Transformer with Loosely Coupled Fusion and Modality-Balanced Optimization

1 code implementation1 Feb 2023 Yinghui Xing, Shuo Yang, Song Wang, Shizhou Zhang, Guoqiang Liang, Xiuwei Zhang, Yanning Zhang

Multispectral pedestrian detection is an important task for many around-the-clock applications, since the visible and thermal modalities can provide complementary information especially under low light conditions.

Decoder Pedestrian Detection

Revisiting Prototypical Network for Cross Domain Few-Shot Learning

1 code implementation CVPR 2023 Fei Zhou, Peng Wang, Lei Zhang, Wei Wei, Yanning Zhang

Prototypical Network is a popular few-shot solver that aims at establishing a feature metric generalizable to novel few-shot classification (FSC) tasks using deep neural networks.

cross-domain few-shot learning Knowledge Distillation

Weakly Supervised Video Anomaly Detection Based on Cross-Batch Clustering Guidance

no code implementations16 Dec 2022 Congqi Cao, Xin Zhang, Shizhou Zhang, Peng Wang, Yanning Zhang

To enhance the discriminative power of features, we propose a batch clustering based loss to encourage a clustering branch to generate distinct normal and abnormal clusters based on a batch of data.

Anomaly Detection Clustering +1

Generalizable Person Re-Identification via Viewpoint Alignment and Fusion

no code implementations5 Dec 2022 Bingliang Jiao, Lingqiao Liu, Liying Gao, Guosheng Lin, Ruiqi Wu, Shizhou Zhang, Peng Wang, Yanning Zhang

The key insight of this design is that the cross-attention mechanism in the transformer could be an ideal solution to align the discriminative texture clues from the original image with the canonical view image, which could compensate for the low-quality texture information of the canonical view image.

Domain Generalization Generalizable Person Re-identification +1

Multi-stage image denoising with the wavelet transform

1 code implementation26 Sep 2022 Chunwei Tian, Menghua Zheng, WangMeng Zuo, Bob Zhang, Yanning Zhang, David Zhang

In this paper, we propose a multi-stage image denoising CNN with the wavelet transform (MWDCNN) via three stages, i. e., a dynamic convolutional block (DCB), two cascaded wavelet transform and enhancement blocks (WEBs) and a residual block (RB).

Image Denoising

A heterogeneous group CNN for image super-resolution

1 code implementation26 Sep 2022 Chunwei Tian, Yanning Zhang, WangMeng Zuo, Chia-Wen Lin, David Zhang, Yixuan Yuan

To prevent loss of original information, a multi-level enhancement mechanism guides a CNN to achieve a symmetric architecture for promoting expressive ability of HGSRCNN.

Image Super-Resolution

Context Recovery and Knowledge Retrieval: A Novel Two-Stream Framework for Video Anomaly Detection

1 code implementation7 Sep 2022 Congqi Cao, Yue Lu, Yanning Zhang

For the context recovery stream, we propose a spatiotemporal U-Net which can fully utilize the motion information to predict the future frame.

Anomaly Detection Retrieval +1

Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model

1 code implementation17 Aug 2022 Yinghui Xing, Qirui Wu, De Cheng, Shizhou Zhang, Guoqiang Liang, Peng Wang, Yanning Zhang

To make the final image feature concentrate more on the target visual concept, a Class-Aware Visual Prompt Tuning (CAVPT) scheme is further proposed in our DPT, where the class-aware visual prompt is generated dynamically by performing the cross attention between text prompts features and image patch token embeddings to encode both the downstream task-related information and visual instance information.

General Knowledge Language Modelling +1

PC-GANs: Progressive Compensation Generative Adversarial Networks for Pan-sharpening

no code implementations29 Jul 2022 Yinghui Xing, Shuyuan Yang, Song Wang, Yan Zhang, Yanning Zhang

Most of the available deep learning-based pan-sharpening methods sharpen the multispectral images through a one-step scheme, which strongly depends on the reconstruction ability of the network.

Generative Adversarial Network Pansharpening

Pansharpening via Frequency-Aware Fusion Network with Explicit Similarity Constraints

1 code implementation18 Jul 2022 Yinghui Xing, Yan Zhang, Houjun He, Xiuwei Zhang, Yanning Zhang

The process of fusing a high spatial resolution (HR) panchromatic (PAN) image and a low spatial resolution (LR) multispectral (MS) image to obtain an HRMS image is known as pansharpening.

Pansharpening

SlimSeg: Slimmable Semantic Segmentation with Boundary Supervision

no code implementations13 Jul 2022 Danna Xue, Fei Yang, Pei Wang, Luis Herranz, Jinqiu Sun, Yu Zhu, Yanning Zhang

Accurate semantic segmentation models typically require significant computational resources, inhibiting their use in practical applications.

Knowledge Distillation Segmentation +1

Going the Extra Mile in Face Image Quality Assessment: A Novel Database and Model

no code implementations11 Jul 2022 Shaolin Su, Hanhe Lin, Vlad Hosu, Oliver Wiedemann, Jinqiu Sun, Yu Zhu, Hantao Liu, Yanning Zhang, Dietmar Saupe

An accurate computational model for image quality assessment (IQA) benefits many vision applications, such as image filtering, image processing, and image generation.

Face Image Quality Face Image Quality Assessment +4

NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

no code implementations25 May 2022 Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, Jin Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang, Javen Qinfeng Shi, Dong Gong, Dan Zhu, Mengdi Sun, Guannan Chen, Yang Hu, Haowei Li, Baozhu Zou, Zhen Liu, Wenjie Lin, Ting Jiang, Chengzhi Jiang, Xinpeng Li, Mingyan Han, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Juan Marín-Vega, Michael Sloth, Peter Schneider-Kamp, Richard Röttger, Chunyang Li, Long Bao, Gang He, Ziyao Xu, Li Xu, Gen Zhan, Ming Sun, Xing Wen, Junlin Li, Shuang Feng, Fei Lei, Rui Liu, Junxiang Ruan, Tianhong Dai, Wei Li, Zhan Lu, Hengyan Liu, Peian Huang, Guangyu Ren, Yonglin Luo, Chang Liu, Qiang Tu, Fangya Li, Ruipeng Gang, Chenghua Li, Jinjing Li, Sai Ma, Chenming Liu, Yizhen Cao, Steven Tel, Barthelemy Heyrman, Dominique Ginhac, Chul Lee, Gahyeon Kim, Seonghyun Park, An Gia Vien, Truong Thanh Nhat Mai, Howoon Yoon, Tu Vo, Alexander Holston, Sheir Zaheer, Chan Y. Park

The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i. e. solutions can not exceed a given number of operations).

Image Restoration Vocal Bursts Intensity Prediction

Exploring and Evaluating Image Restoration Potential in Dynamic Scenes

1 code implementation CVPR 2022 Cheng Zhang, Shaolin Su, Yu Zhu, Qingsen Yan, Jinqiu Sun, Yanning Zhang

In this paper, to better study an image's potential value that can be explored for restoration, we propose a novel concept, referring to image restoration potential (IRP).

Image Restoration

An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection

no code implementations10 Mar 2022 Ganglai Wang, Peng Zhang, Lei Xie, Wei Huang, Yufei zha, Yanning Zhang

DeepFake based digital facial forgery is threatening the public media security, especially when lip manipulation has been used in talking face generation, the difficulty of fake video detection is further improved.

Decision Making Face Detection +2

Audio-visual speech separation based on joint feature representation with cross-modal attention

no code implementations5 Mar 2022 Junwen Xiong, Peng Zhang, Lei Xie, Wei Huang, Yufei zha, Yanning Zhang

Multi-modal based speech separation has exhibited a specific advantage on isolating the target character in multi-talker noisy environments.

Optical Flow Estimation Speech Separation

Adaptive Graph Convolutional Networks for Weakly Supervised Anomaly Detection in Videos

no code implementations14 Feb 2022 Congqi Cao, Xin Zhang, Shizhou Zhang, Peng Wang, Yanning Zhang

For weakly supervised anomaly detection, most existing work is limited to the problem of inadequate video representation due to the inability of modeling long-term contextual information.

Graph Learning Supervised Anomaly Detection +1

Fast Adversarial Training with Noise Augmentation: A Unified Perspective on RandStart and GradAlign

no code implementations11 Feb 2022 Axi Niu, Kang Zhang, Chaoning Zhang, Chenshuang Zhang, In So Kweon, Chang D. Yoo, Yanning Zhang

The former works only for a relatively small perturbation 8/255 with the l_\infty constraint, and GradAlign improves it by extending the perturbation size to 16/255 (with the l_\infty constraint) but at the cost of being 3 to 4 times slower.

Data Augmentation

Multi-Domain Joint Training for Person Re-Identification

no code implementations6 Jan 2022 Lu Yang, Lingqiao Liu, Yunlong Wang, Peng Wang, Yanning Zhang

Our discovery is that training with such an adaptive model can better benefit from more training samples.

Person Re-Identification

Learnable Locality-Sensitive Hashing for Video Anomaly Detection

1 code implementation15 Nov 2021 Yue Lu, Congqi Cao, Yanning Zhang

In this paper, we propose a novel distance-based VAD method to take advantage of all the available normal data efficiently and flexibly.

Abnormal Event Detection In Video Video Anomaly Detection

NAS-FCOS: Efficient Search for Object Detection Architectures

1 code implementation24 Oct 2021 Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, Chunhua Shen, Yanning Zhang

Neural Architecture Search (NAS) has shown great potential in effectively reducing manual effort in network design by automatically discovering optimal architectures.

Neural Architecture Search Object +2

Text-based Person Search in Full Images via Semantic-Driven Proposal Generation

1 code implementation27 Sep 2021 Shizhou Zhang, De Cheng, Wenlong Luo, Yinghui Xing, Duo Long, Hao Li, Kai Niu, Guoqiang Liang, Yanning Zhang

Finding target persons in full scene images with a query of text description has important practical applications in intelligent video surveillance. However, different from the real-world scenarios where the bounding boxes are not available, existing text-based person retrieval methods mainly focus on the cross modal matching between the query text descriptions and the gallery of cropped pedestrian images.

Person Search Retrieval +3

Unsupervised Cross-Modal Distillation for Thermal Infrared Tracking

1 code implementation31 Jul 2021 Jingxian Sun, Lichao Zhang, Yufei zha, Abel Gonzalez-Garcia, Peng Zhang, Wei Huang, Yanning Zhang

To solve this problem, we propose to distill representations of the TIR modality from the RGB modality with Cross-Modal Distillation (CMD) on a large amount of unlabeled paired RGB-TIR data.

Transfer Learning

Unsupervised Video Summarization with a Convolutional Attentive Adversarial Network

no code implementations24 May 2021 Guoqiang Liang, Yanbing Lv, Shucheng Li, Shizhou Zhang, Yanning Zhang

Specifically, the generator employs a fully convolutional sequence network to extract global representation of a video, and an attention-based network to output normalized importance scores.

Generative Adversarial Network Unsupervised Video Summarization

Center Prediction Loss for Re-identification

no code implementations30 Apr 2021 Lu Yang, Yunlong Wang, Lingqiao Liu, Peng Wang, Lu Chi, Zehuan Yuan, Changhu Wang, Yanning Zhang

In this paper, we propose a new loss based on center predictivity, that is, a sample must be positioned in a location of the feature space such that from it we can roughly predict the location of the center of same-class samples.

Prediction

Dynamic Image Restoration and Fusion Based on Dynamic Degradation

no code implementations26 Apr 2021 Aiqing Fang, Xinbo Zhao, Jiaqi Yang, Yanning Zhang

In addition, a dynamic degradation kernel is proposed to improve the robustness of image restoration and fusion.

Image Restoration

Efficient Spatialtemporal Context Modeling for Action Recognition

no code implementations20 Mar 2021 Congqi Cao, Yue Lu, Yifan Zhang, Dongmei Jiang, Yanning Zhang

Inspired from 2D criss-cross attention used in segmentation task, we propose a recurrent 3D criss-cross attention (RCCA-3D) module to model the dense long-range spatiotemporal contextual information in video for action recognition.

Action Recognition Relation

Pluggable Weakly-Supervised Cross-View Learning for Accurate Vehicle Re-Identification

no code implementations9 Mar 2021 Lu Yang, Hongbang Liu, Jinghao Zhou, Lingqiao Liu, Lei Zhang, Peng Wang, Yanning Zhang

Learning cross-view consistent feature representation is the key for accurate vehicle Re-identification (ReID), since the visual appearance of vehicles changes significantly under different viewpoints.

Vehicle Re-Identification

Learning Depth via Leveraging Semantics: Self-supervised Monocular Depth Estimation with Both Implicit and Explicit Semantic Guidance

no code implementations11 Feb 2021 Rui Li, Xiantuo He, Danna Xue, Shaolin Su, Qing Mao, Yu Zhu, Jinqiu Sun, Yanning Zhang

While the mappings between image and pixel-wise depth are well-studied in current methods, the correlation between image, depth and scene semantics, however, is less considered.

Monocular Depth Estimation

Non-uniform Motion Deblurring with Blurry Component Divided Guidance

no code implementations15 Jan 2021 Pei Wang, Wei Sun, Qingsen Yan, Axi Niu, Rui Li, Yu Zhu, Jinqiu Sun, Yanning Zhang

To tackle the above problems, we present a deep two-branch network to deal with blurry images via a component divided module, which divides an image into two components based on the representation of blurry degree.

Decoder Image Deblurring +1

Towards Accurate Camouflaged Object Detection with Mixture Convolution and Interactive Fusion

no code implementations14 Jan 2021 Geng Chen, Xinrui Chen, Bo Dong, Mingchen Zhuge, Yongxiong Wang, Hongbo Bi, Jian Chen, Peng Wang, Yanning Zhang

Our method detects camouflaged objects with an effective fusion strategy, which aggregates the rich context information from a large receptive field.

object-detection Object Detection

Semantic-Guided Representation Enhancement for Self-supervised Monocular Trained Depth Estimation

no code implementations15 Dec 2020 Rui Li, Qing Mao, Pei Wang, Xiantuo He, Yu Zhu, Jinqiu Sun, Yanning Zhang

Based on this framework, we enhance the local feature representation by sampling and feeding the point-based features that locate on the semantic edges to an individual Semantic-guided Edge Enhancement module (SEEM), which is specifically designed for promoting depth estimation on the challenging semantic borders.

Depth Estimation Semantic Segmentation

Unsupervised Alternating Optimization for Blind Hyperspectral Imagery Super-resolution

no code implementations3 Dec 2020 Jiangtao Nie, Lei Zhang, Wei Wei, Zhiqiang Lang, Yanning Zhang

One of the main reason comes from the fact that the predefined degeneration models (e. g. blur in spatial domain) utilized by most HSI SR methods often exist great discrepancy with the real one, which results in these deep models overfit and ultimately degrade their performance on real data.

Meta-Learning Super-Resolution

Meta-Generating Deep Attentive Metric for Few-shot Classification

no code implementations3 Dec 2020 Lei Zhang, Fei Zhou, Wei Wei, Yanning Zhang

To mitigate this problem, we present a novel deep metric meta-generation method that turns to an orthogonal direction, ie, learning to adaptively generate a specific metric for a new FSL task based on the task description (eg, a few labelled samples).

Classification Few-Shot Learning +1

On Efficient and Robust Metrics for RANSAC Hypotheses and 3D Rigid Registration

no code implementations10 Nov 2020 Jiaqi Yang, Zhiqiang Huang, Siwen Quan, Qian Zhang, Yanning Zhang, Zhiguo Cao

This paper focuses on developing efficient and robust evaluation metrics for RANSAC hypotheses to achieve accurate 3D rigid registration.

Few-shot Action Recognition with Implicit Temporal Alignment and Pair Similarity Optimization

no code implementations13 Oct 2020 Congqi Cao, Yajuan Li, Qinyi Lv, Peng Wang, Yanning Zhang

Few-shot learning aims to recognize instances from novel classes with few labeled samples, which has great value in research and application.

Few-Shot action recognition Few Shot Action Recognition +4

AE-Netv2: Optimization of Image Fusion Efficiency and Network Architecture

no code implementations5 Oct 2020 Aiqing Fang, Xinbo Zhao, Jiaqi Yang, Beibei Qin, Yanning Zhang

Finally, we explore the commonness and characteristics of different image fusion tasks, which provides a research basis for further research on the continuous learning characteristics of human brain in the field of image fusion.

AE-Net: Autonomous Evolution Image Fusion Method Inspired by Human Cognitive Mechanism

no code implementations17 Jul 2020 Aiqing Fang, Xinbo Zhao, Jiaqi Yang, Shihao Cao, Yanning Zhang

Firstly, the relationship between human brain cognitive mechanism and image fusion task is analyzed and a physical model is established to simulate human brain cognitive mechanism.

IllumiNet: Transferring Illumination from Planar Surfaces to Virtual Objects in Augmented Reality

no code implementations12 Jul 2020 Di Xu, Zhen Li, Yanning Zhang, Qi Cao

This paper presents an illumination estimation method for virtual objects in real environment by learning.

A Robust Attentional Framework for License Plate Recognition in the Wild

no code implementations6 Jun 2020 Linjiang Zhang, Peng Wang, Hui Li, Zhen Li, Chunhua Shen, Yanning Zhang

On the other hand, the 2D attentional based license plate recognizer with an Xception-based CNN encoder is capable of recognizing license plates with different patterns under various scenarios accurately and robustly.

Image Generation License Plate Recognition

Attention-based network for low-light image enhancement

no code implementations20 May 2020 Cheng Zhang, Qingsen Yan, Yu Zhu, Xianjun Li, Jinqiu Sun, Yanning Zhang

Extensive experiments demonstrate the superiority of the proposed network in terms of suppressing the chromatic aberration and noise artifacts in enhancement, especially when the low-light image has severe noise.

Denoising Low-Light Image Enhancement

Learning to Compare Relation: Semantic Alignment for Few-Shot Learning

no code implementations29 Feb 2020 Congqi Cao, Yanning Zhang

First, we introduce a semantic alignment loss to align the relation statistics of the features from samples that belong to the same category.

Few-Shot Learning Metric Learning +1

Learning to Zoom-in via Learning to Zoom-out: Real-world Super-resolution by Generating and Adapting Degradation

no code implementations8 Jan 2020 Dong Gong, Wei Sun, Qinfeng Shi, Anton Van Den Hengel, Yanning Zhang

Most learning-based super-resolution (SR) methods aim to recover high-resolution (HR) image from a given low-resolution (LR) image via learning on LR-HR image pairs.

Super-Resolution

Cross-Modal Image Fusion Theory Guided by Subjective Visual Attention

no code implementations23 Dec 2019 Aiqing Fang, Xinbo Zhao, Yanning Zhang

In order to improve the robustness and contextual awareness of image fusion tasks, we proposed a multi-task auxiliary learning image fusion theory guided by subjective attention.

Auxiliary Learning

A Cross-Modal Image Fusion Method Guided by Human Visual Characteristics

no code implementations18 Dec 2019 Aiqing Fang, Xinbo Zhao, Jiaqi Yang, Yanning Zhang

The characteristics of feature selection, nonlinear combination and multi-task auxiliary learning mechanism of the human visual perception system play an important role in real-world scenarios, but the research of image fusion theory based on the characteristics of human visual perception is less.

Auxiliary Learning feature selection

Person Re-identification in Aerial Imagery

1 code implementation14 Aug 2019 Shizhou Zhang, Qi Zhang, Yifei Yang, Xing Wei, Peng Wang, Bingliang Jiao, Yanning Zhang

Our method can learn a discriminative and compact feature representation for ReID in aerial imagery and can be trained in an end-to-end fashion efficiently.

object-detection Object Detection +1

A Performance Evaluation of Correspondence Grouping Methods for 3D Rigid Data Matching

no code implementations5 Jul 2019 Jiaqi Yang, Ke Xian, Peng Wang, Yanning Zhang

Seeking consistent point-to-point correspondences between 3D rigid data (point clouds, meshes, or depth maps) is a fundamental problem in 3D computer vision.

3D Object Recognition Point Cloud Registration +1

Evaluating Local Geometric Feature Representations for 3D Rigid Data Matching

no code implementations29 Jun 2019 Jiaqi Yang, Siwen Quan, Peng Wang, Yanning Zhang

The outcomes present interesting findings that may shed new light on this community and provide complementary perspectives to existing evaluations on the topic of local geometric feature description.

Object Recognition Point Cloud Registration +1

Vehicle Re-identification in Aerial Imagery: Dataset and Approach

no code implementations ICCV 2019 Peng Wang, Bingliang Jiao, Lu Yang, Yifei Yang, Shizhou Zhang, Wei Wei, Yanning Zhang

It is capable of explicitly detecting discriminative parts for each specific vehicle and significantly outperforms the evaluated baselines and state-of-the-art vehicle ReID approaches.

Vehicle Re-Identification

Pixel-aware Deep Function-mixture Network for Spectral Super-Resolution

no code implementations24 Mar 2019 Lei Zhang, Zhiqiang Lang, Peng Wang, Wei Wei, Shengcai Liao, Ling Shao, Yanning Zhang

To address this problem, we propose a pixel-aware deep function-mixture network for SSR, which is composed of a new class of modules, termed function-mixture (FM) blocks.

Spectral Super-Resolution Super-Resolution

MPTV: Matching Pursuit Based Total Variation Minimization for Image Deconvolution

no code implementations12 Oct 2018 Dong Gong, Mingkui Tan, Qinfeng Shi, Anton Van Den Hengel, Yanning Zhang

Compared to existing methods, MPTV is less sensitive to the choice of the trade-off parameter between data fitting and regularization.

Image Deconvolution

A Pulmonary Nodule Detection Model Based on Progressive Resolution and Hierarchical Saliency

no code implementations2 Jul 2018 Jun-Jie Zhang, Yong Xia, Yanning Zhang

Detection of pulmonary nodules on chest CT is an essential step in the early diagnosis of lung cancer, which is critical for best patient care.

Accurate Spectral Super-resolution from Single RGB Image Using Multi-scale CNN

no code implementations10 Jun 2018 Yiqi Yan, Lei Zhang, Jun Li, Wei Wei, Yanning Zhang

Different from traditional hyperspectral super-resolution approaches that focus on improving the spatial resolution, spectral super-resolution aims at producing a high-resolution hyperspectral image from the RGB observation with super-resolution in spectral domain.

Spectral Reconstruction Spectral Super-Resolution +1

Adaptive Importance Learning for Improving Lightweight Image Super-resolution Network

no code implementations5 Jun 2018 Lei Zhang, Peng Wang, Chunhua Shen, Lingqiao Liu, Wei Wei, Yanning Zhang, Anton Van Den Hengel

In this study, we revisit this problem from an orthog- onal view, and propose a novel learning strategy to maxi- mize the pixel-wise fitting capacity of a given lightweight network architecture.

Image Super-Resolution

Learning Deep Gradient Descent Optimization for Image Deconvolution

1 code implementation10 Apr 2018 Dong Gong, Zhen Zhang, Qinfeng Shi, Anton Van Den Hengel, Chunhua Shen, Yanning Zhang

Extensive experiments on synthetic benchmarks and challenging real-world images demonstrate that the proposed deep optimization method is effective and robust to produce favorable results as well as practical for real-world image deblurring applications.

Image Deblurring Image Deconvolution

Significantly Fast and Robust Fuzzy C-MeansClustering Algorithm Based on MorphologicalReconstruction and Membership Filtering

no code implementations IEEE 2018 Tao Lei, Xiaohong Jia, Yanning Zhang, Lifeng He, Hongy-ing Meng, Senior Member, and Asoke K. Nandi, Fellow, IEEE

However, the introduction oflocal spatial information often leads to a high computationalcomplexity, arising out of an iterative calculation of the distancebetween pixels within local spatial neighbors and clusteringcenters.

Clustering Image Segmentation +1

Self-Paced Kernel Estimation for Robust Blind Image Deblurring

no code implementations ICCV 2017 Dong Gong, Mingkui Tan, Yanning Zhang, Anton Van Den Hengel, Qinfeng Shi

Rather than attempt to identify outliers to the model a priori, we instead propose to sequentially identify inliers, and gradually incorporate them into the estimation process.

Image Deblurring

Beyond Low Rank: A Data-Adaptive Tensor Completion Method

no code implementations3 Aug 2017 Lei Zhang, Wei Wei, Qinfeng Shi, Chunhua Shen, Anton Van Den Hengel, Yanning Zhang

The prior for the non-low-rank structure is established based on a mixture of Gaussians which is shown to be flexible enough, and powerful enough, to inform the completion process for a variety of real tensor data.

From Motion Blur to Motion Flow: a Deep Learning Solution for Removing Heterogeneous Motion Blur

no code implementations CVPR 2017 Dong Gong, Jie Yang, Lingqiao Liu, Yanning Zhang, Ian Reid, Chunhua Shen, Anton Van Den Hengel, Qinfeng Shi

The critical observation underpinning our approach is thus that learning the motion flow instead allows the model to focus on the cause of the blur, irrespective of the image content.

Tensor Power Iteration for Multi-Graph Matching

no code implementations CVPR 2016 Xinchu Shi, Haibin Ling, Weiming Hu, Junliang Xing, Yanning Zhang

Due to its wide range of applications, matching between two graphs has been extensively studied and remains an active topic.

Graph Matching

Blind Image Deconvolution by Automatic Gradient Activation

no code implementations CVPR 2016 Dong Gong, Mingkui Tan, Yanning Zhang, Anton Van Den Hengel, Qinfeng Shi

We show here that a subset of the image gradients are adequate to estimate the blur kernel robustly, no matter the gradient image is sparse or not.

Image Deconvolution

Hyperspectral Compressive Sensing Using Manifold-Structured Sparsity Prior

no code implementations ICCV 2015 Lei Zhang, Wei Wei, Yanning Zhang, Fei Li, Chunhua Shen, Qinfeng Shi

To reconstruct hyperspectral image (HSI) accurately from a few noisy compressive measurements, we present a novel manifold-structured sparsity prior based hyperspectral compressive sensing (HCS) method in this study.

Compressive Sensing

Modeling Deformable Gradient Compositions for Single-Image Super-Resolution

no code implementations CVPR 2015 Yu Zhu, Yanning Zhang, Boyan Bonev, Alan L. Yuille

Based on the fact that singular primitive patches are more invariant to the scale change (i. e. have less ambiguity across different scales), we represent the non-singular primitives as compositions of singular ones, each of which is allowed some deformation.

Image Super-Resolution Triplet

Reweighted Laplace Prior Based Hyperspectral Compressive Sensing for Unknown Sparsity

no code implementations CVPR 2015 Lei Zhang, Wei Wei, Yanning Zhang, Chunna Tian, Fei Li

To address this problem, a novel reweighted Laplace prior based hyperspectral compressive sensing method is proposed in this study.

Compressive Sensing Noise Estimation

Constraint Reduction using Marginal Polytope Diagrams for MAP LP Relaxations

no code implementations17 Dec 2013 Zhen Zhang, Qinfeng Shi, Yanning Zhang, Chunhua Shen, Anton Van Den Hengel

We show that using Marginal Polytope Diagrams allows the number of constraints to be reduced without loosening the LP relaxations.

Part-Based Visual Tracking with Online Latent Structural Learning

no code implementations CVPR 2013 Rui Yao, Qinfeng Shi, Chunhua Shen, Yanning Zhang, Anton Van Den Hengel

Despite many advances made in the area, deformable targets and partial occlusions continue to represent key problems in visual tracking.

Structured Prediction Visual Tracking

Multi-image Blind Deblurring Using a Coupled Adaptive Sparse Prior

no code implementations CVPR 2013 Haichao Zhang, David Wipf, Yanning Zhang

This paper presents a robust algorithm for estimating a single latent sharp image given multiple blurry and/or noisy observations.

Deblurring

Cannot find the paper you are looking for? You can Submit a new open access paper.