Search Results for author: Siyang Song

Found 34 papers, 23 papers with code

CauSkelNet: Causal Representation Learning for Human Behaviour Analysis

no code implementations23 Sep 2024 Xingrui Gu, Chuyi Jiang, Erte Wang, Zekun Wu, Qiang Cui, Leimin Tian, Lianlong Wu, Siyang Song, Chuang Yu

Constrained by the lack of model interpretability and a deep understanding of human movement in traditional movement recognition machine learning methods, this study introduces a novel representation learning method based on causal inference to better understand human joint dynamics and complex behaviors.

Causal Inference Representation Learning

Robust Facial Reactions Generation: An Emotion-Aware Framework with Modality Compensation

no code implementations22 Jul 2024 Guanyu Hu, Jie Wei, Siyang Song, Dimitrios Kollias, Xinyu Yang, Zhonglin Sun, Odysseus Kaloidas

In scenarios where speech modality data is missing, the performance of appropriate generation shows an improvement, and when facial data is missing, it only exhibits minimal degradation.

Graph in Graph Neural Network

1 code implementation30 Jun 2024 Jiongshu Wang, Jing Yang, Jiankang Deng, Hatice Gunes, Siyang Song

Given a set of graphs or a data sample whose components can be represented by a set of graphs (called multi-graph data sample), our GIG network starts with a GIG sample generation (GSG) module which encodes the input as a \textbf{GIG sample}, where each GIG vertex includes a graph.

Action Recognition Graph Neural Network

LOGCAN++: Adaptive Local-global class-aware network for semantic segmentation of remote sensing imagery

1 code implementation24 Jun 2024 Xiaowen Ma, Rongrong Lian, Zhenkai Wu, Hongbo Guo, Mengting Ma, Sensen Wu, Zhenhong Du, Siyang Song, Wei zhang

In particular, we introduce affine transformations in the LCA module for adaptive extraction of local class representations to effectively tolerate scale and orientation variations in remotely sensed images.

Image Segmentation Segmentation +2

Rethinking Remote Sensing Change Detection With A Mask View

1 code implementation21 Jun 2024 Xiaowen Ma, Zhenkai Wu, Rongrong Lian, Wei zhang, Siyang Song

Consequently, we further propose the instance network CDMaskFormer customized for the change detection task, which includes: (i) a Spatial-temporal convolutional attention-based instantiated change extractor to capture spatio-temporal context simultaneously with lightweight operations; and (ii) a scene-guided axial attention-instantiated transformer decoder to extract more spatial details.

Change Detection Decoder

Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition

1 code implementation CVPR 2024 Zihan Wang, Siyang Song, Cheng Luo, Songhe Deng, Weicheng Xie, Linlin Shen

Human facial action units (AUs) are mutually related in a hierarchical manner, as not only they are associated with each other in both spatial and temporal domains but also AUs located in the same/close facial regions show stronger relationships than those of different facial regions.

REACT 2024: the Second Multiple Appropriate Facial Reaction Generation Challenge

1 code implementation10 Jan 2024 Siyang Song, Micol Spitale, Cheng Luo, Cristina Palmero, German Barquero, Hengde Zhu, Sergio Escalera, Michel Valstar, Tobias Baur, Fabien Ringeval, Elisabeth Andre, Hatice Gunes

In dyadic interactions, humans communicate their intentions and state of mind using verbal and non-verbal cues, where multiple different facial reactions might be appropriate in response to a specific speaker behaviour.

Domain Separation Graph Neural Networks for Saliency Object Ranking

1 code implementation CVPR 2024 Zijian Wu, Jun Lu, Jing Han, Lianfa Bai, Yi Zhang, Zhuang Zhao, Siyang Song

Then we propose a Shape-Texture Graph Domain Separation (STGDS) module to separate the task-relevant and irrelevant information of target objects by explicitly modelling the relationship between each pair of objects in terms of their shapes and textures respectively.

Graph Neural Network

Advancing Saliency Ranking with Human Fixations: Dataset Models and Benchmarks

1 code implementation CVPR 2024 Bowen Deng, Siyang Song, Andrew P. French, Denis Schluppeck, Michael P. Pound

Saliency ranking detection (SRD) has emerged as a challenging task in computer vision aiming not only to identify salient objects within images but also to rank them based on their degree of saliency.

Saliency Ranking

Multi-level graph learning for audio event classification and human-perceived annoyance rating prediction

1 code implementation15 Dec 2023 Yuanbo Hou, Qiaoqiao Ren, Siyang Song, Yuxin Song, Wenwu Wang, Dick Botteldooren

Specifically, this paper proposes a lightweight multi-level graph learning (MLGL) based on local and global semantic graphs to simultaneously perform audio event classification (AEC) and human annoyance rating prediction (ARP).

Graph Learning

MRecGen: Multimodal Appropriate Reaction Generator

no code implementations5 Jul 2023 Jiaqi Xu, Cheng Luo, Weicheng Xie, Linlin Shen, Xiaofeng Liu, Lu Liu, Hatice Gunes, Siyang Song

Verbal and non-verbal human reaction generation is a challenging task, as different reactions could be appropriate for responding to the same behaviour.

REACT2023: the first Multi-modal Multiple Appropriate Facial Reaction Generation Challenge

1 code implementation11 Jun 2023 Siyang Song, Micol Spitale, Cheng Luo, German Barquero, Cristina Palmero, Sergio Escalera, Michel Valstar, Tobias Baur, Fabien Ringeval, Elisabeth Andre, Hatice Gunes

The Multi-modal Multiple Appropriate Facial Reaction Generation Challenge (REACT2023) is the first competition event focused on evaluating multimedia processing and machine learning techniques for generating human-appropriate facial reactions in various dyadic interaction scenarios, with all participants competing strictly under the same conditions.

ReactFace: Multiple Appropriate Facial Reaction Generation in Dyadic Interactions

1 code implementation25 May 2023 Cheng Luo, Siyang Song, Weicheng Xie, Micol Spitale, Linlin Shen, Hatice Gunes

ReactFace generates multiple different but appropriate photo-realistic human facial reactions by (i) learning an appropriate facial reaction distribution representing multiple appropriate facial reactions; and (ii) synchronizing the generated facial reactions with the speaker's verbal and non-verbal behaviours at each time stamp, resulting in realistic 2D facial reaction sequences.

Reversible Graph Neural Network-based Reaction Distribution Learning for Multiple Appropriate Facial Reactions Generation

1 code implementation24 May 2023 Tong Xu, Micol Spitale, Hao Tang, Lu Liu, Hatice Gunes, Siyang Song

This means that we approach this problem by considering the generation of a distribution of the listener's appropriate facial reactions instead of multiple different appropriate facial reactions, i. e., 'many' appropriate facial reaction labels are summarised as 'one' distribution label during training.

Graph Neural Network

Spatio-Temporal AU Relational Graph Representation Learning For Facial Action Units Detection

1 code implementation19 Mar 2023 Zihan Wang, Siyang Song, Cheng Luo, Yuzhi Zhou, shiling Wu, Weicheng Xie, Linlin Shen

This paper presents our Facial Action Units (AUs) detection submission to the fifth Affective Behavior Analysis in-the-wild Competition (ABAW).

Graph Learning Graph Representation Learning

Multiple Appropriate Facial Reaction Generation in Dyadic Interaction Settings: What, Why and How?

1 code implementation13 Feb 2023 Siyang Song, Micol Spitale, Yiming Luo, Batuhan Bal, Hatice Gunes

However, none attempted to automatically generate multiple appropriate reactions in the context of dyadic interactions and evaluate the appropriateness of those reactions using objective measures.

Shift from Texture-bias to Shape-bias: Edge Deformation-based Augmentation for Robust Object Recognition

no code implementations ICCV 2023 Xilin He, Qinliang Lin, Cheng Luo, Weicheng Xie, Siyang Song, Feng Liu, Linlin Shen

Recent studies have shown the vulnerability of CNNs under perturbation noises, which is partially caused by the reason that the well-trained CNNs are too biased toward the object texture, i. e., they make predictions mainly based on texture cues.

Object Recognition

Fourier-Net: Fast Image Registration with Band-limited Deformation

1 code implementation29 Nov 2022 Xi Jia, Joseph Bartlett, Wei Chen, Siyang Song, Tianyang Zhang, Xinxing Cheng, Wenqi Lu, Zhaowen Qiu, Jinming Duan

Specifically, instead of our Fourier-Net learning to output a full-resolution displacement field in the spatial domain, we learn its low-dimensional representation in a band-limited Fourier domain.

Ranked #3 on Medical Image Registration on OASIS (val dsc metric)

Decoder Medical Image Registration +1

GRATIS: Deep Learning Graph Representation with Task-specific Topology and Multi-dimensional Edge Features

1 code implementation19 Nov 2022 Siyang Song, Yuxin Song, Cheng Luo, Zhiyuan Song, Selim Kuzucu, Xi Jia, Zhijiang Guo, Weicheng Xie, Linlin Shen, Hatice Gunes

Our framework is effective, robust and flexible, and is a plug-and-play module that can be combined with different backbones and Graph Neural Networks (GNNs) to generate a task-specific graph representation from various graph and non-graph data.

Graph Representation Learning

Multi-dimensional Edge-based Audio Event Relational Graph Representation Learning for Acoustic Scene Classification

1 code implementation27 Oct 2022 Yuanbo Hou, Siyang Song, Chuang Yu, Yuxin Song, Wenwu Wang, Dick Botteldooren

Experiments on a polyphonic acoustic scene dataset show that the proposed ERGL achieves competitive performance on ASC by using only a limited number of embeddings of audio events without any data augmentations.

Acoustic Scene Classification Graph Representation Learning +1

An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality Recognition

1 code implementation17 Oct 2022 Rongfan Liao, Siyang Song, Hatice Gunes

Personality determines a wide variety of human daily and working behaviours, and is crucial for understanding human internal and external states.

Benchmarking

Scale-free and Task-agnostic Attack: Generating Photo-realistic Adversarial Patterns with Patch Quilting Generator

no code implementations12 Aug 2022 Xiangbo Gao, Cheng Luo, Qinliang Lin, Weicheng Xie, Minmin Liu, Linlin Shen, Keerthy Kusumam, Siyang Song

\noindent Traditional L_p norm-restricted image attack algorithms suffer from poor transferability to black box scenarios and poor robustness to defense algorithms.

Adversarial Attack Image Classification +4

Learning Multi-dimensional Edge Feature-based AU Relation Graph for Facial Action Unit Recognition

2 code implementations2 May 2022 Cheng Luo, Siyang Song, Weicheng Xie, Linlin Shen, Hatice Gunes

While the relationship between a pair of AUs can be complex and unique, existing approaches fail to specifically and explicitly represent such cues for each pair of AUs in each facial display.

Facial Action Unit Detection Relation

Two-stage Temporal Modelling Framework for Video-based Depression Recognition using Graph Representation

no code implementations30 Nov 2021 Jiaqi Xu, Siyang Song, Keerthy Kusumam, Hatice Gunes, Michel Valstar

The short-term depressive behaviour modelling stage first deep learns depression-related facial behavioural features from multiple short temporal scales, where a Depression Feature Enhancement (DFE) module is proposed to enhance the depression-related clues for all temporal scales and remove non-depression noises.

Inferring User Facial Affect in Work-like Settings

no code implementations22 Nov 2021 Chaudhary Muhammad Aqdus Ilyas, Siyang Song, Hatice Gunes

Unlike the six basic emotions of happiness, sadness, fear, anger, disgust and surprise, modelling and predicting dimensional affect in terms of valence (positivity - negativity) and arousal (intensity) has proven to be more flexible, applicable and useful for naturalistic and real-world settings.

Learning Graph Representation of Person-specific Cognitive Processes from Audio-visual Behaviours for Automatic Personality Recognition

no code implementations26 Oct 2021 Siyang Song, Zilong Shao, Shashank Jaiswal, Linlin Shen, Michel Valstar, Hatice Gunes

This approach builds on two following findings in cognitive science: (i) human cognition partially determines expressed behaviour and is directly linked to true personality traits; and (ii) in dyadic interactions individuals' nonverbal behaviours are influenced by their conversational partner behaviours.

Neural Architecture Search

EMOPAIN Challenge 2020: Multimodal Pain Evaluation from Facial and Bodily Expressions

no code implementations21 Jan 2020 Joy O. Egede, Siyang Song, Temitayo A. Olugbade, Chongyang Wang, Amanda Williams, Hongy-ing Meng, Min Aung, Nicholas D. Lane, Michel Valstar, Nadia Bianchi-Berthouze

The EmoPain 2020 Challenge is the first international competition aimed at creating a uniform platform for the comparison of machine learning and multimedia processing methods of automatic chronic pain assessment from human expressive behaviour, and also the identification of pain-related behaviours.

Emotion Recognition

AVEC 2019 Workshop and Challenge: State-of-Mind, Detecting Depression with AI, and Cross-Cultural Affect Recognition

no code implementations10 Jul 2019 Fabien Ringeval, Björn Schuller, Michel Valstar, NIcholas Cummins, Roddy Cowie, Leili Tavabi, Maximilian Schmitt, Sina Alisamir, Shahin Amiriparian, Eva-Maria Messner, Siyang Song, Shuo Liu, Ziping Zhao, Adria Mallol-Ragolta, Zhao Ren, Mohammad Soleymani, Maja Pantic

The Audio/Visual Emotion Challenge and Workshop (AVEC 2019) "State-of-Mind, Detecting Depression with AI, and Cross-cultural Affect Recognition" is the ninth competition event aimed at the comparison of multimedia processing and machine learning methods for automatic audiovisual health and emotion analysis, with all participants competing strictly under the same conditions.

Emotion Recognition

Inferring Dynamic Representations of Facial Actions from a Still Image

no code implementations4 Apr 2019 Siyang Song, Enrique Sánchez-Lozano, Linlin Shen, Alan Johnston, Michel Valstar

We present a novel approach to capture multiple scales of such temporal dynamics, with an application to facial Action Unit (AU) intensity estimation and dimensional affect estimation.

Cannot find the paper you are looking for? You can Submit a new open access paper.