no code implementations • 14 Apr 2025 • Yating Liu, Yaowei Li, Xiangyuan Lan, Wenming Yang, Zimo Liu, Qingmin Liao
Text-based Person Retrieval (TPR) as a multi-modal task, which aims to retrieve the target person from a pool of candidate images given a text description, has recently garnered considerable attention due to the progress of contrastive visual-language pre-trained model.
no code implementations • 3 Apr 2025 • Sinchee Chin, Fan Zhang, Xiaochen Yang, Jing-Hao Xue, Wenming Yang, Peng Jia, Guijin Wang, Luo Yingqun
Time Series Anomaly Detection (TSAD) is essential for uncovering rare and potentially harmful events in unlabeled time series data.
no code implementations • 18 Jan 2025 • Jiaqi Lin, Zhihao LI, Binxiao Huang, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Xiaofei Wu, Fenglong Song, Wenming Yang
We validate our method on several appearance-variant scenes, and demonstrate that it achieves state-of-the-art rendering quality with minimal training time and memory usage, without compromising rendering speeds.
no code implementations • 17 Jan 2025 • Huiyun Cao, Yuan Shi, Bin Xia, Xiaoyu Jin, Wenming Yang
Specifically, DiffStereo first learns latent high-frequency representations (LHFR) of HQ images.
no code implementations • 19 Nov 2024 • Zhehan Kan, Ce Zhang, Zihan Liao, Yapeng Tian, Wenming Yang, Junyuan Xiao, Xu Li, Dongmei Jiang, YaoWei Wang, Qingmin Liao
Large Vision-Language Model (LVLM) systems have demonstrated impressive vision-language reasoning capabilities but suffer from pervasive and severe hallucination issues, posing significant risks in critical domains such as healthcare and autonomous systems.
1 code implementation • 5 Nov 2024 • Yifan Wang, Xiaochen Yang, Fanqi Pu, Qingmin Liao, Wenming Yang
Specifically, EH-FAM employs multi-head attention with a global receptive field to extract semantic features for small-scale objects and leverages lightweight convolutional modules to efficiently aggregate visual features across different scales.
1 code implementation • IEEE International Conference on Image Processing (ICIP) 2024 • Yichen Shi, Feifei Zhang, Wenming Yang, Guijin Wang, Nan Su
By exploring the appearance asymmetry and the consequent feature space asymmetry, we devise a main branch and two agent regression tasks.
Ranked #1 on
Gaze Estimation
on EYEDIAP
1 code implementation • 25 Oct 2024 • Fanqi Pu, Yifan Wang, Jiru Deng, Wenming Yang
However, due to depth errors originating from the object's visual surface, the height of the bounding box often fails to represent the actual projected central height, which undermines the effectiveness of geometric depth.
Ranked #3 on
Monocular 3D Object Detection
on KITTI Cars Hard
no code implementations • 17 Oct 2024 • Junhao Gu, Peng-Tao Jiang, Hao Zhang, Mi Zhou, Jinwei Chen, Wenming Yang, Bo Li
To address this challenge, we introduce ConsisSR to handle both semantic and pixel-level consistency.
no code implementations • 15 Oct 2024 • Sin Chee Chin, Xuan Zhang, Lee Yeong Khang, Wenming Yang
The second stage of CONSULT uses PatchCore for conventional feature extraction via the fine-tuned weights from the first stage.
no code implementations • 14 Oct 2024 • Xuan Zhang, Sin Chee Chin, Tingxuan Gao, Wenming Yang
In contrast, our research shows that data mixing, a potent augmentation technique for long-tailed recognition, can generate pseudo-OOD data that exhibit the features of both in-distribution (ID) data and OOD data.
no code implementations • 29 Aug 2024 • Xiaoyu Jin, Zunnan Xu, Mingwen Ou, Wenming Yang
Character animation is a transformative field in computer graphics and vision, enabling dynamic and realistic video animations from static images.
1 code implementation • 19 Aug 2024 • Zhengchao Huang, Bin Xia, Zicheng Lin, Zhun Mou, Wenming Yang, Jiaya Jia
The rapid advancement of deepfake technologies has sparked widespread public concern, particularly as face forgery poses a serious threat to public information security.
no code implementations • 6 Aug 2024 • Xinyi Zhang, Qiqi Bao, Qinpeng Cui, Wenming Yang, Qingmin Liao
This architecture introduces local enhancement with GCN by capturing relationships between neighboring joints, thus producing new representations to complement Mamba's outputs.
no code implementations • 25 Jul 2024 • Jintong Hu, Bin Xia, Bin Chen, Wenming Yang, Lei Zhang
Although these approaches have shown promising results, their performance is constrained by the limited representation ability of discrete latent codes in the encoded features.
1 code implementation • 18 Jul 2024 • Boyuan Wang, Yun Qu, Yuhang Jiang, Jianzhun Shao, Chang Liu, Wenming Yang, Xiangyang Ji
Conventional state representations in reinforcement learning often omit critical task-related details, presenting a significant challenge for value networks in establishing accurate mappings from states to task rewards.
1 code implementation • 20 Jun 2024 • Jintong Hu, Siyan Chen, Zhiyi Pan, Sen Zeng, Wenming Yang
Although Convolutional Neural Networks (CNNs) and non-local attention methods have achieved notable success in medical image segmentation, they either struggle to capture long-range spatial dependencies due to their reliance on local features, or face significant computational and feature integration challenges when attempting to address this issue with global attention mechanisms.
1 code implementation • 27 May 2024 • Zongkai Zhang, Zidong Xu, Wenming Yang, Qingmin Liao, Jing-Hao Xue
To bridge these gaps, we propose a novel binarized deep convolution (BDC) unit that effectively enhances performance while increasing the number of binarized convolutional layers.
1 code implementation • CVPR 2024 • Zhilin Huang, Quanmin Liang, Yijie Yu, Chujun Qin, Xiawu Zheng, Kai Huang, Zikun Zhou, Wenming Yang
In this paper, we propose a bilateral event mining and complementary network (BMCNet) to fully leverage the potential of each event and capture the shared information to complement each other simultaneously.
no code implementations • 21 Apr 2024 • Zhilin Huang, Yijie Yu, Ling Yang, Chujun Qin, Bing Zheng, Xiawu Zheng, Zikun Zhou, YaoWei Wang, Wenming Yang
With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest.
no code implementations • 12 Apr 2024 • Jingrui Ye, Zongkai Zhang, Yujiao Jiang, Qingmin Liao, Wenming Yang, Zongqing Lu
OccGaussian initializes 3D Gaussian distributions in the canonical space, and we perform occlusion feature query at occluded regions, the aggregated pixel-align feature is extracted to compensate for the missing information.
1 code implementation • IEEE ROBOTICS AND AUTOMATION LETTERS 2023 • Siang Chen, Wei Tang, Pengwei Xie, Wenming Yang, Guijin Wang
Specifically, Gaussian encoding and the grid-based strategy are applied to predict grasp heatmaps as guidance to aggregate local points into graspable regions and provide global semantic information.
Ranked #4 on
Robotic Grasping
on GraspNet-1Billion
1 code implementation • 25 Mar 2024 • Jintong Hu, Hui Che, Zishuo Li, Wenming Yang
Ultrasound imaging is crucial for evaluating organ morphology and function, yet depth adjustment can degrade image quality and field-of-view, presenting a depth-dependent dilemma.
no code implementations • 19 Mar 2024 • Jintong Hu, Bin Xia, Bingchen Li, Wenming Yang
Deep learning-based denoiser has been the focus of recent development on image denoising.
1 code implementation • 18 Mar 2024 • Yuan Shi, Bin Xia, Xiaoyu Jin, Xing Wang, Tianyu Zhao, Xin Xia, Xuefeng Xiao, Wenming Yang
To address these challenges, we propose VmambaIR, which introduces State Space Models (SSMs) with linear complexity into comprehensive image restoration tasks.
1 code implementation • CVPR 2024 • Jiaqi Lin, Zhihao LI, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, Youliang Yan, Wenming Yang
Existing NeRF-based methods for large scene reconstruction often have limitations in visual quality and rendering speed.
no code implementations • 3 Feb 2024 • Yanjun Liu, Wenming Yang, Qingmin Liao
To fill this gap, we introduce DiffVein, a unified diffusion model-based framework which simultaneously addresses vein segmentation and authentication tasks.
no code implementations • 21 Jan 2024 • Xiaoyu Jin, Yuan Shi, Bin Xia, Wenming Yang
By employing a pretrained multi-modal large language model and a vision language model, we generate text descriptions and encode them as context embedding with degradation information for the degraded image.
1 code implementation • 15 Jan 2024 • Zhilin Huang, Ling Yang, Zaixi Zhang, Xiangxin Zhou, Yu Bao, Xiawu Zheng, Yuwei Yang, Yu Wang, Wenming Yang
Then the selected protein-ligand subcomplex is processed with SE(3)-equivariant neural networks, and transmitted back to each atom of the complex for augmenting the target-aware 3D molecule diffusion generation with binding interaction information.
1 code implementation • 10 Jan 2024 • Hongbo Kang, Yong Wang, Mengyuan Liu, Doudou Wu, Peng Liu, Xinlin Yuan, Wenming Yang
To address these two challenges, we propose a diffusion-based refinement framework called DRPose, which refines the output of deterministic models by reverse diffusion and achieves more suitable multi-hypothesis prediction for the current pose benchmark by multi-step refinement with multiple noises.
1 code implementation • CVPR 2024 • Peng Lu, Tao Jiang, Yining Li, Xiangtai Li, Kai Chen, Wenming Yang
Real-time multi-person pose estimation presents significant challenges in balancing speed and precision.
Ranked #1 on
Multi-Person Pose Estimation
on CrowdPose
(using extra training data)
no code implementations • 16 Nov 2023 • Yuan Shi, Bin Xia, Rui Zhu, Qingmin Liao, Wenming Yang
Color-guided depth map super-resolution (CDSR) improve the spatial resolution of a low-quality depth map with the corresponding high-quality color map, benefiting various applications such as 3D reconstruction, virtual reality, and augmented reality.
no code implementations • 31 Oct 2023 • Yuxin Ye, Wenming Yang, Yapeng Tian
LAVSS is inspired by the correlation between spatial audio and visual location.
1 code implementation • 18 Sep 2023 • Yating Liu, Yaowei Li, Zimo Liu, Wenming Yang, YaoWei Wang, Qingmin Liao
Text-based Person Retrieval (TPR) aims to retrieve the target person images given a textual query.
1 code implementation • ICCV 2023 • Xiuzhe Wu, Pengfei Hu, Yang Wu, Xiaoyang Lyu, Yan-Pei Cao, Ying Shan, Wenming Yang, Zhongqian Sun, Xiaojuan Qi
Therefore, directly learning a mapping function from speech to the entire head image is prone to ambiguity, particularly when using a short video for training.
no code implementations • 26 Aug 2023 • Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Radu Timotfe, Luc van Gool
Compared to traditional DMs, the compact IPR enables DiffI2I to obtain more accurate outcomes and employ a lighter denoising network and fewer iterations.
1 code implementation • 15 Aug 2023 • Yue Lv, Jinxi Xiang, Jun Zhang, Wenming Yang, Xiao Han, Wei Yang
We thus introduce a dynamic gating network on top of the low-rank adaptation method, in order to decide which decoder layer should employ adaptation.
1 code implementation • 10 Aug 2023 • Hongbo Kang, Yong Wang, Mengyuan Liu, Doudou Wu, Peng Liu, Wenming Yang
Notably, our model achieves state-of-the-art performance on all action categories in the Human3. 6M dataset using detected 2D poses from CPN, and our code is available at: https://github. com/KHB1698/DC-GCT.
Ranked #35 on
3D Human Pose Estimation
on MPI-INF-3DHP
(AUC metric)
1 code implementation • 5 Jul 2023 • Jiamiao Zhang, Yichen Chi, Jun Lyu, Wenming Yang, Yapeng Tian
Limited by imaging systems, the reconstruction of Magnetic Resonance Imaging (MRI) images from partial measurement is essential to medical imaging research.
no code implementations • 29 May 2023 • Ruofan Zhang, Jinjin Gu, Haoyu Chen, Chao Dong, Yulun Zhang, Wenming Yang
In this work, we introduce a novel approach to craft training degradation distributions using a small set of reference images.
1 code implementation • 24 May 2023 • Yichen Chi, Junhao Gu, Jiamiao Zhang, Wenming Yang, Yapeng Tian
We explicitly tackle motion blurs in egocentric videos using a Dual Branch Deblur Network (DB$^2$Net) in the VSR framework.
1 code implementation • ICCV 2023 • Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Luc van Gool
Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network.
no code implementations • 13 Feb 2023 • Yanjun Liu, Wenming Yang
Instead of using ground-truth labels as direct supervision, our relative and corner loss are derived from the homogeneous transformation, which renders the model to learn the geometric consistency between objects.
no code implementations • 28 Jan 2023 • Yuzhen Qin, Li Sun, Hui Chen, Wei-Qiang Zhang, Wenming Yang, Jintao Fei, Guijin Wang
However, it is challenging to develop a single-lead-based ECG interpretation model for multiple diseases diagnosis due to the lack of some key disease information.
no code implementations • ICCV 2023 • Yingfan Tao, Jingna Sun, Hao Yang, Li Chen, Xu Wang, Wenming Yang, Daniel Du, Min Zheng
LGLA consists of two core components: a Class-aware Logit Adjustment (CLA) strategy and an Adaptive Angular Weighted (AAW) loss.
1 code implementation • 30 Nov 2022 • Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc van Gool
It consists of a knowledge distillation based implicit degradation estimator network (KD-IDE) and an efficient SR network.
no code implementations • 23 Nov 2022 • Yang Li, Guijin Wang, Zhourui Xia, Wenming Yang, Li Sun
Auxiliary diagnosis of cardiac electrophysiological status can be obtained through the analysis of 12-lead electrocardiograms (ECGs).
no code implementations • 9 Oct 2022 • Jinjin Gu, Haoming Cai, Chenyu Dong, Ruofan Zhang, Yulun Zhang, Wenming Yang, Chun Yuan
We finally use a guided fusion operation to integrate the sharp edges generated by the network and flat areas by the interpolation method to get the final SR image.
2 code implementations • 2 Oct 2022 • Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc van Gool
In this study, we reconsider components in binary convolution, such as residual connection, BatchNorm, activation function, and structure, for IR tasks.
1 code implementation • 28 Jul 2022 • Bin Xia, Yapeng Tian, Yulun Zhang, Yucheng Hang, Wenming Yang, Qingmin Liao
The most of CNN based super-resolution (SR) methods assume that the degradation is known (\eg, bicubic).
1 code implementation • CVPR 2023 • Bin Xia, Jingwen He, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Luc van Gool
In SSL, we design pruning schemes for several key components in VSR models, including residual blocks, recurrent networks, and upsampling networks.
1 code implementation • CVPR 2022 • Yucheng Hang, Bin Xia, Wenming Yang, Qingmin Liao
In addition, we propose a background-attentional adaptive instance normalization (BAIN) to achieve an attention-weighted background feature distribution according to the foreground-background feature similarity.
1 code implementation • 14 Mar 2022 • Hai Wang, Xiaoyu Xiang, Yapeng Tian, Wenming Yang, Qingmin Liao
Second, we put forward a spatial-temporal deformable feature aggregation (STDFA) module, in which spatial and temporal contexts in dynamic video frames are adaptively captured and aggregated to enhance SR reconstruction.
1 code implementation • 12 Jan 2022 • Bin Xia, Yapeng Tian, Yucheng Hang, Wenming Yang, Qingmin Liao, Jie zhou
To improve matching efficiency, we design a novel Embedded PatchMacth scheme with random samples propagation, which involves end-to-end training with asymptotic linear computational cost to the input size.
1 code implementation • 11 Jan 2022 • Bin Xia, Yucheng Hang, Yapeng Tian, Wenming Yang, Qingmin Liao, Jie zhou
To demonstrate the effectiveness of ENLCA, we build an architecture called Efficient Non-Local Contrastive Network (ENLCN) by adding a few of our modules in a simple backbone.
2 code implementations • 2 Aug 2021 • Liyang Liu, Shilong Zhang, Zhanghui Kuang, Aojun Zhou, Jing-Hao Xue, Xinjiang Wang, Yimin Chen, Wenming Yang, Qingmin Liao, Wayne Zhang
Our method can be used to prune any structures including those with coupled channels.
Ranked #4 on
Network Pruning
on ImageNet
no code implementations • 7 May 2021 • Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, SungJun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, Jingyu Guo, ZiRui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang, Yifan Chen, Yujiu Yang, Yang Li, Tao Zhang, Longtao Feng, Yiting Liao, Junlin Li, William Thong, Jose Costa Pereira, Ales Leonardis, Steven McDonagh, Kele Xu, Lehan Yang, Hengxing Cai, Pengfei Sun, Seyed Mehdi Ayyoubzadeh, Ali Royat, Sid Ahmed Fezza, Dounia Hammou, Wassim Hamidouche, Sewoong Ahn, Gwangjin Yoon, Koki Tsubota, Hiroaki Akutsu, Kiyoharu Aizawa
This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021.
no code implementations • 6 May 2021 • Jingyu Guo, Wei Wang, Wenming Yang, Qingmin Liao, Jie zhou
In this paper, we introduce a brand new scheme, namely external-reference image quality assessment (ER-IQA), by introducing external reference images to bridge the gap between FR and NR-IQA.
1 code implementation • 21 Apr 2021 • Ren Yang, Radu Timofte, Jing Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng Li, Thomas Tanay, Fenglong Song, Wentao Chao, Qiang Guo, Yan Liu, Jiang Li, Xiaochao Qu, Dewang Hou, Jiayu Yang, Lyn Jiang, Di You, Zhenyu Zhang, Chong Mou, Iaroslav Koshelev, Pavel Ostyakov, Andrey Somov, Jia Hao, Xueyi Zou, Shijie Zhao, Xiaopeng Sun, Yiting Liao, Yuanzhi Zhang, Qing Wang, Gen Zhan, Mengxi Guo, Junlin Li, Ming Lu, Zhan Ma, Pablo Navarrete Michelini, Hai Wang, Yiyun Chen, Jingyu Guo, Liliang Zhang, Wenming Yang, Sijung Kim, Syehoon Oh, Yucong Wang, Minjie Cai, Wei Hao, Kangdi Shi, Liangyan Li, Jun Chen, Wei Gao, Wang Liu, XiaoYu Zhang, Linjie Zhou, Sixin Lin, Ru Wang
This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results.
1 code implementation • 26 Mar 2021 • Wenhao Li, Hong Liu, Runwei Ding, Mengyuan Liu, Pichao Wang, Wenming Yang
The modified VTE is termed as Strided Transformer Encoder (STE), which is built upon the outputs of VTE.
Ranked #2 on
3D Human Pose Estimation
on HumanEva-I
3 code implementations • ICLR 2021 • Liyang Liu, Yi Li, Zhanghui Kuang, Jing-Hao Xue, Yimin Chen, Wenming Yang, Qingmin Liao, Wayne Zhang
Multi-task learning (MTL) has been widely used in representation learning.
1 code implementation • 13 Sep 2020 • Yucheng Hang, Qingmin Liao, Wenming Yang, Yupeng Chen, Jie zhou
The adaptive spatial attention branch (ASAB) and the adaptive channel attention branch (ACAB) constitute the adaptive dual attention module (ADAM), which can capture the long-range spatial and channel-wise contextual information to expand the receptive field and distinguish different types of information for more effective feature representations.
no code implementations • 28 Mar 2020 • Juncheng Zhang, Qingmin Liao, Shaojun Liu, Haoyu Ma, Wenming Yang, Jing-Hao Xue
In this letter, we introduce a large and realistic multi-focus dataset called Real-MFF, which contains 710 pairs of source images with corresponding ground truth images.
1 code implementation • 9 Sep 2019 • Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue, Qingmin Liao
In this paper, we develop a concise but efficient network architecture called linear compressing based skip-connecting network (LCSCNet) for image super-resolution.
Ranked #19 on
Image Super-Resolution
on BSD100 - 3x upscaling
1 code implementation • ICCV 2019 • Wei Wang, Ruiming Guo, Yapeng Tian, Wenming Yang
Deep learning methods have witnessed the great progress in image restoration with specific metrics (e. g., PSNR, SSIM).
2 code implementations • 15 Feb 2019 • Wenming Yang, Wei Wang, Xuechen Zhang, Shuifa Sun, Qingmin Liao
Specifically, a spindle block is composed of a dimension extension unit, a feature exploration unit and a feature refinement unit.
Ranked #17 on
Image Super-Resolution
on Manga109 - 3x upscaling
1 code implementation • 11 Dec 2018 • Peng Lu, Gao Huang, Hangyu Lin, Wenming Yang, Guodong Guo, Yanwei Fu
This paper proposes a novel approach for Sketch-Based Image Retrieval (SBIR), for which the key is to bridge the gap between sketches and photos in terms of the data representation.
1 code implementation • 9 Aug 2018 • Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue
Single image super-resolution (SISR) is a notoriously challenging ill-posed problem, which aims to obtain a high-resolution (HR) output from one of its low-resolution (LR) versions.