no code implementations • 17 Dec 2024 • Yuhong Chen, Ailin Song, Huifeng Yin, Shuai Zhong, Fuhai Chen, Qi Xu, Shiping Wang, Mingkun Xu
However, traditional multi-view learning approaches are tailored for scenarios with fixed data views, falling short of emulating the intricate cognitive procedures of the human brain processing signals sequentially.
no code implementations • 10 Dec 2024 • Fuhai Chen, Pengpeng Huang, Xuri Ge, Jie Huang, Zishuo Bao
However, multimodal sentiment analysis is affected by unimodal data bias, e. g., text sentiment is misleading due to explicit sentiment semantic, leading to low accuracy in the final sentiment classification.
no code implementations • 1 Aug 2024 • Xuri Ge, Junchen Fu, Fuhai Chen, Shan An, Nicu Sebe, Joemon M. Jose
Facial action units (AUs), as defined in the Facial Action Coding System (FACS), have received significant research interest owing to their diverse range of applications in facial state analysis.
no code implementations • 5 Jun 2024 • Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Jie Wang, Joemon M. Jose
In this paper, we propose a Hybrid-modal Interaction with multiple Relational Enhancements (termed \textit{Hire}) for image-text matching, which correlates the intra- and inter-modal semantics between objects and words with implicit and explicit relationship modelling.
1 code implementation • 26 Apr 2024 • Xuri Ge, Songpei Xu, Fuhai Chen, Jie Wang, Guoxin Wang, Shan An, Joemon M. Jose
In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed 3SHNet) for high-precision, high-efficiency and high-generalization image-sentence retrieval.
Ranked #1 on
Cross-Modal Retrieval
on MSCOCO
no code implementations • 17 Oct 2022 • Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Joemon M. Jose
To correlate the context of objects with the textual context, we further refine the visual semantic representation via the cross-level object-sentence and word-image based interactive attention.
no code implementations • 13 Mar 2022 • Chengpeng Dai, Fuhai Chen, Xiaoshuai Sun, Rongrong Ji, Qixiang Ye, Yongjian Wu
Recently, automatic video captioning has attracted increasing attention, where the core challenge lies in capturing the key semantic items, like objects and actions as well as their spatial-temporal correlations from the redundant frames and semantic content.
no code implementations • 12 Mar 2022 • Fuhai Chen, Rongrong Ji, Chengpeng Dai, Xuri Ge, Shengchuang Zhang, Xiaojing Ma, Yue Gao
Echocardiography is widely used to clinical practice for diagnosis and treatment, e. g., on the common congenital heart defects.
no code implementations • 12 Mar 2022 • Fuhai Chen, Xuri Ge, Xiaoshuai Sun, Yue Gao, Jianzhuang Liu, Fufeng Chen, Wenjie Li
The key of referring expression comprehension lies in capturing the cross-modal visual-linguistic relevance.
1 code implementation • 15 Nov 2021 • Haotong Zhang, Fuhai Chen, Angela Yao
We present a (semi-) weakly supervised method using only a small number of fully-labelled sequences and predominantly sequences in which only the (one) upcoming action is labelled.
no code implementations • 5 Aug 2021 • Xuri Ge, Fuhai Chen, Joemon M. Jose, Zhilong Ji, Zhongqin Wu, Xiao Liu
In this work, we propose to address the above issue from two aspects: (i) constructing intrinsic structure (along with relations) among the fragments of respective modalities, e. g., "dog $\to$ play $\to$ ball" in semantic structure for an image, and (ii) seeking explicit inter-modal structural and semantic correspondence between the visual and textual modalities.
1 code implementation • 13 Dec 2020 • Jiayi Ji, Yunpeng Luo, Xiaoshuai Sun, Fuhai Chen, Gen Luo, Yongjian Wu, Yue Gao, Rongrong Ji
The latter contains a Global Adaptive Controller that can adaptively fuse the global information into the decoder to guide the caption generation.
no code implementations • NeurIPS 2019 • Fuhai Chen, Rongrong Ji, Jiayi Ji, Xiaoshuai Sun, Baochang Zhang, Xuri Ge, Yongjian Wu, Feiyue Huang, Yan Wang
To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema.
no code implementations • 9 Oct 2019 • Fuhai Chen, Rongrong Ji, Chengpeng Dai, Xiaoshuai Sun, Chia-Wen Lin, Jiayi Ji, Baochang Zhang, Feiyue Huang, Liujuan Cao
Specially, we propose a novel Structured-Spatial Semantic Embedding model for image deblurring (termed S3E-Deblur), which introduces a novel Structured-Spatial Semantic tree model (S3-tree) to bridge two basic tasks in computer vision: image deblurring (ImD) and image captioning (ImC).
no code implementations • 7 Aug 2019 • Chen Shen, Rongrong Ji, Fuhai Chen, Xiaoshuai Sun, Xiangming Li
Specifically, the proposed module first embeds the scene concepts into factored weights explicitly and attends the visual information extracted from the input image.
no code implementations • CVPR 2018 • Fuhai Chen, Rongrong Ji, Xiaoshuai Sun, Yongjian Wu, Jinsong Su
In offline optimization, we adopt an end-to-end formulation, which jointly trains the visual tree parser, the structured relevance and diversity constraints, as well as the LSTM based captioning model.