no code implementations • 11 Jul 2024 • Zijie Yue, Miaojing Shi, Hanli Wang, Shuai Ding, Qijun Chen, Shanlin Yang
Next, we introduce a frequency-oriented vision-text pair generation method by carefully creating contrastive spatio-temporal maps from positive and negative samples and designing proper text prompts to describe their relative ratios of signal frequencies.
no code implementations • 12 Jun 2024 • Juncheng Wu, Zhangkai Ni, Hanli Wang, Wenhan Yang, Yuyin Zhou, Shiqi Wang
In this paper, we present Deep Degradation Response (DDR), a method to quantify changes in image deep features under varying degradation conditions.
1 code implementation • 29 May 2024 • Zhangkai Ni, Yue Liu, Keyan Ding, Wenhan Yang, Hanli Wang, Shiqi Wang
To bridge these gaps, we propose integrating deep features from pre-trained visual models with a statistical analysis model into a Multi-scale Deep Feature Statistics (MDFS) model for achieving opinion-unaware BIQA (OU-BIQA), thereby eliminating the reliance on human rating data and significantly improving training efficiency.
1 code implementation • CVPR 2024 • Zhangkai Ni, Juncheng Wu, Zian Wang, Wenhan Yang, Hanli Wang, Lin Ma
This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution, which heavily rely on precisely aligned paired datasets with pixel-level alignments.
1 code implementation • 14 Dec 2023 • Zhangkai Ni, Peiqi Yang, Wenhan Yang, Hanli Wang, Lin Ma, Sam Kwong
Through this, we construct a novel collaborative module that aligns information from various views and meanwhile imposes self-supervised constraints to ensure multi-view consistency in both geometry and appearance.
1 code implementation • IEEE Transactions on Multimedia 2023 • Dongjie Ye, Zhangkai Ni, Wenhan Yang, Hanli Wang, Shiqi Wang, Sam Kwong
Benefiting from the learned memory, more complex distributions of reference images in the entire dataset can be “remembered” to facilitate the adjustment of the testing samples more adaptively.
Ranked #3 on Low-Light Image Enhancement on LOL-v2
no code implementations • 30 Jan 2023 • Jian Zhu, Hanli Wang, Miaojing Shi
On the other hand, BLIP-2 as an MLLM is employed to process images and texts, and the referring expressions in texts involving specific visual objects are modified with linguistic object labels to serve as comprehensible MLLM inputs.
no code implementations • 13 Sep 2022 • Yu Tian, Zhangkai Ni, Baoliang Chen, Shurun Wang, Shiqi Wang, Hanli Wang, Sam Kwong
In particular, in order to maximum redundancy removal without impairment of robust identity information, we apply the encoder with multiple feature extraction and attention-based feature decomposition modules to progressively decompose face features into two uncorrelated components, i. e., identity and residual features, via self-supervised learning.
1 code implementation • 6 Sep 2022 • Yue Liu, Zhangkai Ni, Shiqi Wang, Hanli Wang, Sam Kwong
In this paper, a novel and effective image quality assessment (IQA) algorithm based on frequency disparity for high dynamic range (HDR) images is proposed, termed as local-global frequency feature-based model (LGFM).
no code implementations • IEEE Transactions on Circuits and Systems for Video Technology 2022 • Jinjing Gu, Hanli Wang
In this work, a coherent visual storytelling (CoVS) framework is designed to address the above-mentioned problems.
Ranked #4 on Visual Storytelling on VIST
no code implementations • 3 Jul 2022 • Zhangkai Ni, Wenhan Yang, Hanli Wang, Shiqi Wang, Lin Ma, Sam Kwong
Getting rid of the fundamental limitations in fitting to the paired training data, recent unsupervised low-light enhancement methods excel in adjusting illumination and contrast of images.
no code implementations • 12 Mar 2022 • Qinyu Li, Tengpeng Li, Hanli Wang, Chang Wen Chen
In this work, a comprehensive study is conducted on video paragraph captioning, with the goal to generate paragraph-level descriptions for a given video.
no code implementations • 10 Mar 2022 • Tengpeng Li, Hanli Wang, Bin He, Chang Wen Chen
Third, a unified one-stage story generation model with encoder-decoder structure is proposed to simultaneously train and infer the knowledge-enriched attention network, group-wise semantic module and multi-modal story generation decoder in an end-to-end fashion.
no code implementations • 10 Mar 2022 • Ran Chen, Hanli Wang, Lei Wang, Sam Kwong
Second, previous approaches only consider learning single-stream similarity alignment (i. e., image-to-text level or text-to-image level), which is inadequate to fully use similarity information for image-text matching.
no code implementations • 28 Jan 2022 • Yu Tian, Zhangkai Ni, Baoliang Chen, Shiqi Wang, Hanli Wang, Sam Kwong
However, little work has been dedicated to automatic quality assessment of such GAN-generated face images (GFIs), even less have been devoted to generalized and robust quality assessment of GFIs generated with unseen GAN model.
no code implementations • ACM Multimedia Asia 2022 • Ruichao Fan, Hanli Wang, Jinjing Gu, and Xianhui Liu
As there is no ground-truth topic information, a pre-trained BERT model based on visual contents and annotated stories is utilized to mine topics.
Ranked #2 on Visual Storytelling on VIST
1 code implementation • 31 Dec 2021 • Dongjie Ye, Zhangkai Ni, Hanli Wang, Jian Zhang, Shiqi Wang, Sam Kwong
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
no code implementations • CVPR 2018 • Hanzhang Wang, Hanli Wang, Kaisheng Xu
Vision-to-language tasks require a unified semantic understanding of visual content.
1 code implementation • CVPR 2016 • Bowen Zhang, Li-Min Wang, Zhe Wang, Yu Qiao, Hanli Wang
The deep two-stream architecture exhibited excellent performance on video based action recognition.
Ranked #73 on Action Recognition on UCF101
no code implementations • 21 Feb 2015 • Weiyao Lin, Yuanzhe Chen, Jianxin Wu, Hanli Wang, Bin Sheng, Hongxiang Li
Based on this network, we further model people in the scene as packages while human activities can be modeled as the process of package transmission in the network.