1 code implementation • 26 Nov 2024 • Mengzhao Wang, Huafeng Li, Yafei Zhang, Jinxing Li, Minghong Xie, Dapeng Tao
The retrieval branch uses inter-video contrastive learning to roughly align the global features of paragraphs and videos, reducing modality differences and constructing a coarse-grained feature space to break free from the need for correspondence between paragraphs and videos.
1 code implementation • 31 Oct 2024 • Minghong Xie, Mengzhao Wang, Huafeng Li, Yafei Zhang, Dapeng Tao, Zhengtao Yu
In addition, a corresponding target object position progressive correction strategy is defined based on the hierarchical matching mechanism to achieve accurate positioning for the target object described in the text.
1 code implementation • 17 May 2024 • Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu, Shirui Pan, Bo Du
To ensure masking uniformity of subgraphs across these scales, we propose a novel coarse-to-fine strategy that initiates masking at the coarsest scale and progressively back-projects the mask to the finer scales.
1 code implementation • CVPR 2024 • Wentao Tan, Changxing Ding, Jiayu Jiang, Fei Wang, Yibing Zhan, Dapeng Tao
Thus, we propose a novel method that uses MLLMs to caption images according to various templates.
1 code implementation • 24 Apr 2024 • Chuang Liu, Yuyao Wang, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu
To this end, we introduce a novel structure-guided masking strategy (i. e., StructMAE), designed to refine the existing GMAE models.
1 code implementation • 7 Mar 2024 • Huafeng Li, Zhenmei Yang, Yafei Zhang, Dapeng Tao, Zhengtao Yu
This network, comprising single-frame HDR reconstruction with enhanced stop image (SHDR-ESI) and SHDR-ESI-assisted multi-exposure HDR reconstruction (SHDRA-MHDR), effectively leverages the ghost-free characteristic of single-frame HDR reconstruction and the detail-enhancing capability of ESI in oversaturated areas.
no code implementations • 6 Feb 2024 • Yanfang Zhang, Yiliu Sun, Yibing Zhan, Dapeng Tao, DaCheng Tao, Chen Gong
The experimental results on popular LLMs, such as GPT-3. 5-turbo and Gemini-pro, show that our IR method enhances the overall accuracy of factual reasoning by 27. 33% and mathematical proof by 31. 43%, when compared with traditional DR methods.
1 code implementation • 11 Jan 2024 • Xiaoyan Yu, Neng Dong, Liehuang Zhu, Hao Peng, Dapeng Tao
Additionally, acknowledging the complementary nature of semantic details across different modalities, we integrate text features from the bimodal language descriptions to achieve comprehensive semantics.
no code implementations • 9 Dec 2023 • Chuang Liu, Yibing Zhan, Xueqi Ma, Liang Ding, Dapeng Tao, Jia Wu, Wenbin Hu, Bo Du
Graph Transformers (GTs) have achieved impressive results on various graph-related tasks.
1 code implementation • CVPR 2024 • Yijun Yang, Tianyi Zhou, Kanxue Li, Dapeng Tao, Lusong Li, Li Shen, Xiaodong He, Jing Jiang, Yuhui Shi
While large language models (LLMs) excel in a simulated world of texts, they struggle to interact with the more realistic world without perceptions of other modalities such as visual or audio signals.
1 code implementation • 26 Nov 2023 • Lei Wang, Yibing Zhan, Leilei Ma, Dapeng Tao, Liang Ding, Chen Gong
The "splice" in our method is two-fold: 1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; 2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together.
no code implementations • 7 Sep 2023 • Yichen Ouyang, Wenhao Chai, Jiayi Ye, Dapeng Tao, Yibing Zhan, Gaoang Wang
In light of the above issues, we present Consist3D, a three-stage framework Chasing for semantic-, geometric-, and saturation-Consistent Text-to-3D generation from a single image, in which the first two stages aim to learn parameterized consistency tokens, and the last stage is for optimization.
no code implementations • 23 Aug 2023 • Huafeng Li, Shedan Yang, Yafei Zhang, Dapeng Tao, Zhengtao Yu
In addition, to further reduce the negative impact of modal discrepancy and text diversity on cross-modal matching, we propose to use other sample knowledge of the same modality, i. e., external knowledge to enhance identity-consistent features and weaken identity-inconsistent features.
1 code implementation • 22 Jul 2023 • Yafei Zhang, Zhiyuan Li, Huafeng Li, Dapeng Tao
To this end, a multi-modal MR brain tumor segmentation method with tumor prototype-driven and multi-expert integration is proposed.
no code implementations • 13 Jul 2023 • Haoran Wang, Qinghua Cheng, Baosheng Yu, Yibing Zhan, Dapeng Tao, Liang Ding, Haibin Ling
We evaluated our method on three popular egocentric action recognition datasets, Something-Something V2, H2O, and EPIC-KITCHENS-100, and the experimental results demonstrate the effectiveness of the proposed method for handling data scarcity problems, including long-tailed and few-shot egocentric action recognition.
no code implementations • 8 Jul 2023 • Huafeng Li, Le Xu, Yafei Zhang, Dapeng Tao, Zhengtao Yu
In this work, the changes of views, posture, background and modal discrepancy are considered as the main factors that cause the perturbations of person identity features.
no code implementations • 18 Jul 2022 • Chuang Liu, Xueqi Ma, Yibing Zhan, Liang Ding, Dapeng Tao, Bo Du, Wenbin Hu, Danilo Mandic
However, the LTH-based methods suffer from two major drawbacks: 1) they require exhaustive and iterative training of dense models, resulting in an extremely large training computation cost, and 2) they only trim graph structures and model parameters but ignore the node feature dimension, where significant redundancy exists.
no code implementations • 21 Aug 2020 • Jinfeng Li, Weifeng Liu, Yicong Zhou, Jun Yu, Dapeng Tao
Traditional domain adaptation algorithms assume that enough labeled data, which are treated as the prior knowledge are available in the source domain.
no code implementations • 22 Oct 2019 • Yuanxin Zhu, Zhao Yang, Li Wang, Sai Zhao, Xiao Hu, Dapeng Tao
With the joint supervision of Cross-Entropy (CE) loss and HC loss, the network is trained to achieve two vital objectives, inter-class discrepancy and intra-class cross-modality similarity as much as possible.
Cross-Modality Person Re-identification Person Re-Identification
2 code implementations • 24 Jun 2019 • Sihui Luo, Xinchao Wang, Gongfan Fang, Yao Hu, Dapeng Tao, Mingli Song
An increasing number of well-trained deep networks have been released online by researchers and developers, enabling the community to reuse them in a plug-and-play way without accessing the training annotations.
1 code implementation • CVPR 2019 • Jingwen Ye, Yixin Ji, Xinchao Wang, Kairi Ou, Dapeng Tao, Mingli Song
In this paper, we investigate a novel deep-model reusing task.
no code implementations • 6 Mar 2019 • De Xie, Cheng Deng, Hao Wang, Chao Li, Dapeng Tao
Two-stream architecture have shown strong performance in video classification task.
no code implementations • 21 Jun 2018 • Xueqi Ma, Weifeng Liu, Dapeng Tao, Yicong Zhou
Therefore, we develop an ensemble p-Laplacian regularization (EpLapR) to fully approximate the intrinsic manifold of the data distribution.
no code implementations • 22 Apr 2018 • Fusheng Hao, Jun Cheng, Lei Wang, Xinchao Wang, Jianzhong Cao, Xiping Hu, Dapeng Tao
Discriminative features are obtained by constraining the deep CNNs to map training samples to the corresponding anchors as close as possible.
no code implementations • 7 Aug 2016 • Yanan Guo, Lei LI, Weifeng Liu, Jun Cheng, Dapeng Tao
Since human actions can be characterized by multiple feature representations extracted from Kinect and inertial sensors, multiview features must be encoded into a unified space optimal for human action recognition.