no code implementations • 11 Mar 2023 • Teng Wang, Jinrui Zhang, Feng Zheng, Wenhao Jiang, Ran Cheng, Ping Luo
TEG learns to adaptively ground the possible event proposals given a set of sentences by estimating the cross-modal distance in a joint semantic space.
1 code implementation • CVPR 2022 • Zhihui Lin, Tianyu Yang, Maomao Li, Ziyu Wang, Chun Yuan, Wenhao Jiang, Wei Liu
Matching-based methods, especially those based on space-time memory, are significantly ahead of other solutions in semi-supervised video object segmentation (VOS).
Ranked #4 on
Semi-Supervised Video Object Segmentation
on MOSE
Semantic Segmentation
Semi-Supervised Video Object Segmentation
+1
1 code implementation • 17 Jun 2022 • Teng Wang, Wenhao Jiang, Zhichao Lu, Feng Zheng, Ran Cheng, Chengguo Yin, Ping Luo
Existing vision-language pre-training (VLP) methods primarily rely on paired image-text datasets, which are either annotated by enormous human labors, or crawled from the internet followed by elaborate data cleaning techniques.
2 code implementations • 28 Jan 2022 • Ziyu Wang, Wenhao Jiang, Yiming Zhu, Li Yuan, Yibing Song, Wei Liu
In contrast with vision transformers and CNNs, the success of MLP-like models shows that simple information fusion operations among tokens and channels can yield a good representation power for deep recognition models.
no code implementations • 11 May 2021 • Guiyu Tian, Wenhao Jiang, Wei Liu, Yadong Mu
To this end, MorphNet jointly optimizes two objectives for sample-adaptive poisoning: a reconstruction loss that preserves the visual similarity between benign / poisoned point clouds, and a classification loss that enforces a modern recognition model of point clouds tends to mis-classify the poisoned sample to a pre-specified target category.
1 code implementation • CVPR 2021 • Tian Pan, Yibing Song, Tianyu Yang, Wenhao Jiang, Wei Liu
By empowering the temporal robustness of the encoder and modeling the temporal decay of the keys, our VideoMoCo improves MoCo temporally based on contrastive learning.
Ranked #73 on
Action Recognition
on HMDB-51
no code implementations • ECCV 2020 • Shaoxiang Chen, Wenhao Jiang, Wei Liu, Yu-Gang Jiang
Inspired by the fact that there exist cross-modal interactions in the human brain, we propose a novel method for learning pairwise modality interactions in order to better exploit complementary information for each pair of modalities in videos and thus improve performances on both tasks.
1 code implementation • 11 Sep 2019 • Jingwen Wang, Lin Ma, Wenhao Jiang
The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence).
1 code implementation • 3 Sep 2019 • GuanXiong Luo, Na Zhao, Wenhao Jiang, Edward S. Hui, Peng Cao
Purpose: To develop a deep learning-based Bayesian inference for MRI reconstruction.
1 code implementation • ICCV 2019 • Bairui Wang, Lin Ma, Wei zhang, Wenhao Jiang, Jingwen Wang, Wei Liu
In this paper, we propose to guide the video caption generation with Part-of-Speech (POS) information, based on a gated fusion of multiple representations of input videos.
no code implementations • 24 Jun 2019 • Wenhao Jiang, Zhiyu Liu, Kit-Hang Lee, Shihui Chen, Yui-Lun Ng, Qi Dou, Hing-Chiu Chang, Ka-Wai Kwok
Abdominal magnetic resonance imaging (MRI) provides a straightforward way of characterizing tissue and locating lesions of patients as in standard diagnosis.
no code implementations • 2 Feb 2019 • Bairui Wang, Lin Ma, Wei zhang, Wenhao Jiang, Feng Zhang
In this paper, we propose a novel model with a hierarchical photo-scene encoder and a reconstructor for the task of album storytelling.
no code implementations • ECCV 2018 • Wenhao Jiang, Lin Ma, Yu-Gang Jiang, Wei Liu, Tong Zhang
In this paper, in order to exploit the complementary information from multiple encoders, we propose a novel Recurrent Fusion Network (RFNet) for tackling image captioning.
no code implementations • 3 Apr 2018 • Wenhao Jiang, Lin Ma, Xinpeng Chen, Hanwang Zhang, Wei Liu
Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task.
1 code implementation • CVPR 2018 • Jingwen Wang, Wenhao Jiang, Lin Ma, Wei Liu, Yong Xu
We propose a bidirectional proposal method that effectively exploits both past and future contexts to make proposal predictions.
1 code implementation • CVPR 2018 • Xinpeng Chen, Lin Ma, Wenhao Jiang, Jian Yao, Wei Liu
Recently, caption generation with an encoder-decoder framework has been extensively studied and applied in different domains, such as image captioning, code captioning, and so on.
no code implementations • CVPR 2017 • Hao-Zhi Huang, Hao Wang, Wenhan Luo, Lin Ma, Wenhao Jiang, Xiaolong Zhu, Zhifeng Li, Wei Liu
More specifically, a hybrid loss is proposed to capitalize on the content information of input frames, the style information of a given style image, and the temporal information of consecutive frames.
no code implementations • 5 Sep 2015 • Wenhao Jiang, Cheng Deng, Wei Liu, Feiping Nie, Fu-Lai Chung, Heng Huang
Domain adaptation problems arise in a variety of applications, where a training dataset from the \textit{source} domain and a test dataset from the \textit{target} domain typically follow different distributions.