Search Results for author: Xuri Ge

Found 22 papers, 3 papers with code

The 1st EReL@MIR Workshop on Efficient Representation Learning for Multimodal Information Retrieval

no code implementations21 Apr 2025 Junchen Fu, Xuri Ge, Xin Xin, HaiTao Yu, Yue Feng, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose

Multimodal representation learning has garnered significant attention in the AI community, largely due to the success of large pre-trained multimodal foundation models like LLaMA, GPT, Mistral, and CLIP.

Cross-Modal Retrieval Information Retrieval +3

Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis

no code implementations14 Apr 2025 Kaiwen Zheng, Xuri Ge, Junchen Fu, Jun Peng, Joemon M. Jose

First, we compile a new Multimodal Face Dataset (MFA) by generating detailed multilevel language descriptions of face, incorporating Action Unit (AU) and emotion descriptions, by leveraging GPT-4o.

Emotion Recognition Representation Learning

Multimodal Sentiment Analysis Based on Causal Reasoning

no code implementations10 Dec 2024 Fuhai Chen, Pengpeng Huang, Xuri Ge, Jie Huang, Zishuo Bao

However, multimodal sentiment analysis is affected by unimodal data bias, e. g., text sentiment is misleading due to explicit sentiment semantic, leading to low accuracy in the final sentiment classification.

counterfactual Counterfactual Inference +2

R^3AG: First Workshop on Refined and Reliable Retrieval Augmented Generation

no code implementations27 Oct 2024 Zihan Wang, Xuri Ge, Joemon M. Jose, HaiTao Yu, Weizhi Ma, Zhaochun Ren, Xin Xin

At the end of the workshop, we aim to have a clearer understanding of how to improve the reliability and applicability of RAG with more robust information retrieval and language generation.

Information Retrieval Language Modelling +3

HpEIS: Learning Hand Pose Embeddings for Multimedia Interactive Systems

no code implementations11 Oct 2024 Songpei Xu, Xuri Ge, Chaitanya Kaul, Roderick Murray-Smith

We present a novel Hand-pose Embedding Interactive System (HpEIS) as a virtual sensor, which maps users' flexible hand poses to a two-dimensional visual space using a Variational Autoencoder (VAE) trained on a variety of hand poses.

Data Augmentation

Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning

no code implementations1 Aug 2024 Xuri Ge, Junchen Fu, Fuhai Chen, Shan An, Nicu Sebe, Joemon M. Jose

Facial action units (AUs), as defined in the Facial Action Coding System (FACS), have received significant research interest owing to their diverse range of applications in facial state analysis.

Facial Action Unit Detection Representation Learning

Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching

no code implementations5 Jun 2024 Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Jie Wang, Joemon M. Jose

In this paper, we propose a Hybrid-modal Interaction with multiple Relational Enhancements (termed \textit{Hire}) for image-text matching, which correlates the intra- and inter-modal semantics between objects and words with implicit and explicit relationship modelling.

cross-modal alignment Image-text matching +2

Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition

no code implementations26 May 2024 Tong Shi, Xuri Ge, Joemon M. Jose, Nicolas Pugeault, Paul Henderson

Capturing complex temporal relationships between video and audio modalities is vital for Audio-Visual Emotion Recognition (AVER).

Emotion Recognition Optical Flow Estimation

3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting

1 code implementation26 Apr 2024 Xuri Ge, Songpei Xu, Fuhai Chen, Jie Wang, Guoxin Wang, Shan An, Joemon M. Jose

In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed 3SHNet) for high-precision, high-efficiency and high-generalization image-sentence retrieval.

Cross-Modal Retrieval Retrieval +2

CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora

no code implementations23 Feb 2024 Zijun Long, Xuri Ge, Richard McCreadie, Joemon Jose

Text-to-image retrieval aims to find the relevant images based on a text query, which is important in various use-cases, such as digital libraries, e-commerce, and multimedia databases.

Computational Efficiency Image Retrieval +2

The Relationship Between Speech Features Changes When You Get Depressed: Feature Correlations for Improving Speed and Performance of Depression Detection

no code implementations6 Jul 2023 Fuxiang Tao, Wei Ma, Xuri Ge, Anna Esposito, Alessandro Vinciarelli

The results show that the models used in the experiments improve in terms of training speed and performance when fed with feature correlation matrices rather than with feature vectors.

Depression Detection Feature Correlation

Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval

no code implementations17 Oct 2022 Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Joemon M. Jose

To correlate the context of objects with the textual context, we further refine the visual semantic representation via the cross-level object-sentence and word-image based interactive attention.

cross-modal alignment Object +3

MGRR-Net: Multi-level Graph Relational Reasoning Network for Facial Action Units Detection

no code implementations4 Apr 2022 Xuri Ge, Joemon M. Jose, Songpei Xu, Xiao Liu, Hu Han

While the region-level feature learning from local face patches features via graph neural network can encode the correlation across different AUs, the pixel-wise and channel-wise feature learning via graph attention network can enhance the discrimination ability of AU features from global face features.

Graph Attention Graph Neural Network +1

Automatic Facial Paralysis Estimation with Facial Action Units

no code implementations3 Mar 2022 Xuri Ge, Joemon M. Jose, Pengcheng Wang, Arunachalam Iyer, Xiao Liu, Hu Han

In this paper, we propose a novel Adaptive Local-Global Relational Network (ALGRNet) for facial AU detection and use it to classify facial paralysis severity.

Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval

no code implementations5 Aug 2021 Xuri Ge, Fuhai Chen, Joemon M. Jose, Zhilong Ji, Zhongqin Wu, Xiao Liu

In this work, we propose to address the above issue from two aspects: (i) constructing intrinsic structure (along with relations) among the fragments of respective modalities, e. g., "dog $\to$ play $\to$ ball" in semantic structure for an image, and (ii) seeking explicit inter-modal structural and semantic correspondence between the visual and textual modalities.

cross-modal alignment Retrieval +3

Variational Structured Semantic Inference for Diverse Image Captioning

no code implementations NeurIPS 2019 Fuhai Chen, Rongrong Ji, Jiayi Ji, Xiaoshuai Sun, Baochang Zhang, Xuri Ge, Yongjian Wu, Feiyue Huang, Yan Wang

To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema.

Decoder Diversity +1

Cannot find the paper you are looking for? You can Submit a new open access paper.