no code implementations • 26 Nov 2023 • Jiawang Bai, Kuofeng Gao, Shaobo Min, Shu-Tao Xia, Zhifeng Li, Wei Liu
Contrastive Vision-Language Pre-training, known as CLIP, has shown promising effectiveness in addressing downstream image recognition tasks.
no code implementations • 12 Oct 2023 • Pandeng Li, Hongtao Xie, Jiannan Ge, Lei Zhang, Shaobo Min, Yongdong Zhang
Hence, we address this problem by decomposing video information into reconstruction-dependent and semantic-dependent information, which disentangles the semantic extraction from reconstruction constraint.
no code implementations • 7 Apr 2022 • Jie Jiang, Shaobo Min, Weijie Kong, Dihong Gong, Hongfa Wang, Zhifeng Li, Wei Liu
With multi-level representations for video and text, hierarchical contrastive learning is designed to explore fine-grained cross-modal relationships, i. e., frame-word, clip-phrase, and video-sentence, which enables HCMI to achieve a comprehensive semantic comparison between video and text modalities.
Ranked #1 on Video Retrieval on MSR-VTT-1kA (using extra training data)
no code implementations • NeurIPS 2021 • Chaoqun Wang, Shaobo Min, Xuejin Chen, Xiaoyan Sun, Houqiang Li
This enables DPPN to produce visual representations with accurate attribute localization ability, which benefits the semantic-visual alignment and representation transferability.
no code implementations • 13 Jun 2021 • Shaobo Min, Qi Dai, Hongtao Xie, Chuang Gan, Yongdong Zhang, Jingdong Wang
Cross-modal correlation provides an inherent supervision for video unsupervised representation learning.
no code implementations • 5 Apr 2021 • Chaoqun Wang, Xuejin Chen, Shaobo Min, Xiaoyan Sun, Houqiang Li
First, DCEN leverages task labels to cluster representations of the same semantic category by cross-modal contrastive learning and exploring semantic-visual complementarity.
no code implementations • NeurIPS 2020 • Shaobo Min, Hongtao Xie, Hantao Yao, Xuran Deng, Zheng-Jun Zha, Yongdong Zhang
In this paper, we introduce a new task, named Hierarchical Granularity Transfer Learning (HGTL), to recognize sub-level categories with basic-level annotations and semantic descriptions for hierarchical categories.
no code implementations • 31 May 2020 • Hantao Yao, Shaobo Min, Yongdong Zhang, Changsheng Xu
Then, an attentional graph attribute embedding is proposed to reduce the semantic bias between seen and unseen categories, which utilizes the graph operation to capture the semantic relationship between categories.
1 code implementation • CVPR 2020 • Shaobo Min, Hantao Yao, Hongtao Xie, Chaoqun Wang, Zheng-Jun Zha, Yongdong Zhang
Recent methods focus on learning a unified semantic-aligned visual representation to transfer knowledge between two domains, while ignoring the effect of semantic-free visual representation in alleviating the biased recognition problem.
1 code implementation • 30 Mar 2020 • Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, Yongdong Zhang
In this paper, we propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation in terms of square-root, low-rank, and sparsity.
1 code implementation • 12 Aug 2019 • Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, Yongdong Zhang
In contrast to previous methods, the DSEN decomposes the domain-shared projection function into one domain-invariant and two domain-specific sub-functions to explore the similarities and differences between two domains.
no code implementations • 31 Jul 2018 • Shaobo Min, Xuejin Chen, Zheng-Jun Zha, Feng Wu, Yongdong Zhang
\begin{abstract} Learning-based methods suffer from a deficiency of clean annotations, especially in biomedical segmentation.