no code implementations • 3 Apr 2018 • Wenhao Jiang, Lin Ma, Xinpeng Chen, Hanwang Zhang, Wei Liu
Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task.
no code implementations • 5 Sep 2015 • Wenhao Jiang, Cheng Deng, Wei Liu, Feiping Nie, Fu-Lai Chung, Heng Huang
Domain adaptation problems arise in a variety of applications, where a training dataset from the \textit{source} domain and a test dataset from the \textit{target} domain typically follow different distributions.
no code implementations • ECCV 2018 • Wenhao Jiang, Lin Ma, Yu-Gang Jiang, Wei Liu, Tong Zhang
In this paper, in order to exploit the complementary information from multiple encoders, we propose a novel Recurrent Fusion Network (RFNet) for tackling image captioning.
no code implementations • CVPR 2017 • Hao-Zhi Huang, Hao Wang, Wenhan Luo, Lin Ma, Wenhao Jiang, Xiaolong Zhu, Zhifeng Li, Wei Liu
More specifically, a hybrid loss is proposed to capitalize on the content information of input frames, the style information of a given style image, and the temporal information of consecutive frames.
no code implementations • 2 Feb 2019 • Bairui Wang, Lin Ma, Wei zhang, Wenhao Jiang, Feng Zhang
In this paper, we propose a novel model with a hierarchical photo-scene encoder and a reconstructor for the task of album storytelling.
Ranked #5 on Image-guided Story Ending Generation on VIST-E
no code implementations • 24 Jun 2019 • Wenhao Jiang, Zhiyu Liu, Kit-Hang Lee, Shihui Chen, Yui-Lun Ng, Qi Dou, Hing-Chiu Chang, Ka-Wai Kwok
Abdominal magnetic resonance imaging (MRI) provides a straightforward way of characterizing tissue and locating lesions of patients as in standard diagnosis.
no code implementations • ECCV 2020 • Shaoxiang Chen, Wenhao Jiang, Wei Liu, Yu-Gang Jiang
Inspired by the fact that there exist cross-modal interactions in the human brain, we propose a novel method for learning pairwise modality interactions in order to better exploit complementary information for each pair of modalities in videos and thus improve performances on both tasks.
no code implementations • 11 May 2021 • Guiyu Tian, Wenhao Jiang, Wei Liu, Yadong Mu
To this end, MorphNet jointly optimizes two objectives for sample-adaptive poisoning: a reconstruction loss that preserves the visual similarity between benign / poisoned point clouds, and a classification loss that enforces a modern recognition model of point clouds tends to mis-classify the poisoned sample to a pre-specified target category.
no code implementations • 24 Apr 2023 • Yihan Zhang, Wenhao Jiang, Feng Zheng, Chiu C. Tan, Xinghua Shi, Hongchang Gao
This motivates us to study decentralized minimax optimization algorithms for the nonconvex-nonconcave problem.
no code implementations • 23 Oct 2023 • Huiyu Mai, Wenhao Jiang, Zhihong Deng
Unsupervised text style transfer aims at training a generative model that can alter the style of the input sentence while preserving its content without using any parallel data.
no code implementations • 3 Nov 2023 • Ziyu Wang, Wenhao Jiang, Zixuan Zhang, Wei Tang, Junchi Yan
Sequential processes in real-world often carry a combination of simple subsystems that interact with each other in certain forms.
no code implementations • 18 Feb 2024 • Peng Xing, Yinghui Li, Shirong Ma, Xinnian Liang, Haojing Huang, Yangning Li, Hai-Tao Zheng, Wenhao Jiang, Ying Shen
Chinese Spelling Correction (CSC) aims to detect and correct spelling errors in given sentences.
no code implementations • 18 Feb 2024 • Yinghui Li, Shang Qin, Jingheng Ye, Shirong Ma, Yangning Li, Libo Qin, Xuming Hu, Wenhao Jiang, Hai-Tao Zheng, Philip S. Yu
To promote the CGEC field to better adapt to the era of LLMs, we rethink the roles of LLMs in the CGEC task so that they can be better utilized and explored in CGEC.
1 code implementation • 2 Feb 2024 • Wenhao Jiang, Duo Li, Menghan Hu, Guangtao Zhai, Xiaokang Yang, Xiao-Ping Zhang
To tackle the issues of catastrophic forgetting and overfitting in few-shot class-incremental learning (FSCIL), previous work has primarily concentrated on preserving the memory of old knowledge during the incremental phase.
1 code implementation • 7 Mar 2024 • Yangning Li, Qingsong Lv, Tianyu Yu, Yinghui Li, Shulin Huang, Tingwei Lu, Xuming Hu, Wenhao Jiang, Hai-Tao Zheng, Hui Wang
To solve this issue, we first introduce negative seed entities in the inputs, which belong to the same fine-grained semantic class as the positive seed entities but differ in certain attributes.
1 code implementation • 3 Sep 2019 • GuanXiong Luo, Na Zhao, Wenhao Jiang, Edward S. Hui, Peng Cao
Purpose: To develop a deep learning-based Bayesian inference for MRI reconstruction.
1 code implementation • 17 Jun 2022 • Teng Wang, Wenhao Jiang, Zhichao Lu, Feng Zheng, Ran Cheng, Chengguo Yin, Ping Luo
Existing vision-language pre-training (VLP) methods primarily rely on paired image-text datasets, which are either annotated by enormous human labors, or crawled from the internet followed by elaborate data cleaning techniques.
1 code implementation • 11 Mar 2023 • Teng Wang, Jinrui Zhang, Feng Zheng, Wenhao Jiang, Ran Cheng, Ping Luo
Our framework is easily extensible to tasks covering visually-grounded language understanding and generation.
1 code implementation • CVPR 2022 • Zhihui Lin, Tianyu Yang, Maomao Li, Ziyu Wang, Chun Yuan, Wenhao Jiang, Wei Liu
Matching-based methods, especially those based on space-time memory, are significantly ahead of other solutions in semi-supervised video object segmentation (VOS).
Semantic Segmentation Semi-Supervised Video Object Segmentation +1
1 code implementation • 11 Sep 2019 • Jingwen Wang, Lin Ma, Wenhao Jiang
The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence).
1 code implementation • ICCV 2019 • Bairui Wang, Lin Ma, Wei zhang, Wenhao Jiang, Jingwen Wang, Wei Liu
In this paper, we propose to guide the video caption generation with Part-of-Speech (POS) information, based on a gated fusion of multiple representations of input videos.
1 code implementation • CVPR 2018 • Xinpeng Chen, Lin Ma, Wenhao Jiang, Jian Yao, Wei Liu
Recently, caption generation with an encoder-decoder framework has been extensively studied and applied in different domains, such as image captioning, code captioning, and so on.
1 code implementation • CVPR 2021 • Tian Pan, Yibing Song, Tianyu Yang, Wenhao Jiang, Wei Liu
By empowering the temporal robustness of the encoder and modeling the temporal decay of the keys, our VideoMoCo improves MoCo temporally based on contrastive learning.
Ranked #76 on Action Recognition on HMDB-51
1 code implementation • CVPR 2018 • Jingwen Wang, Wenhao Jiang, Lin Ma, Wei Liu, Yong Xu
We propose a bidirectional proposal method that effectively exploits both past and future contexts to make proposal predictions.
2 code implementations • 28 Jan 2022 • Ziyu Wang, Wenhao Jiang, Yiming Zhu, Li Yuan, Yibing Song, Wei Liu
In contrast with vision transformers and CNNs, the success of MLP-like models shows that simple information fusion operations among tokens and channels can yield a good representation power for deep recognition models.
4 code implementations • 3 Oct 2023 • Bin Zhu, Bin Lin, Munan Ning, Yang Yan, Jiaxi Cui, Hongfa Wang, Yatian Pang, Wenhao Jiang, Junwu Zhang, Zongwei Li, Wancai Zhang, Zhifeng Li, Wei Liu, Li Yuan
We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.
Ranked #1 on Zero-shot Audio Classification on VGG-Sound (using extra training data)