1 code implementation • 27 Sep 2024 • Zhonglin Sun, Siyang Song, Ioannis Patras, Georgios Tzimiropoulos
Privacy issue is a main concern in developing face recognition techniques.
no code implementations • 23 Sep 2024 • Xingrui Gu, Chuyi Jiang, Erte Wang, Zekun Wu, Qiang Cui, Leimin Tian, Lianlong Wu, Siyang Song, Chuang Yu
Constrained by the lack of model interpretability and a deep understanding of human movement in traditional movement recognition machine learning methods, this study introduces a novel representation learning method based on causal inference to better understand human joint dynamics and complex behaviors.
no code implementations • 22 Jul 2024 • Guanyu Hu, Jie Wei, Siyang Song, Dimitrios Kollias, Xinyu Yang, Zhonglin Sun, Odysseus Kaloidas
In scenarios where speech modality data is missing, the performance of appropriate generation shows an improvement, and when facial data is missing, it only exhibits minimal degradation.
1 code implementation • 30 Jun 2024 • Jiongshu Wang, Jing Yang, Jiankang Deng, Hatice Gunes, Siyang Song
Given a set of graphs or a data sample whose components can be represented by a set of graphs (called multi-graph data sample), our GIG network starts with a GIG sample generation (GSG) module which encodes the input as a \textbf{GIG sample}, where each GIG vertex includes a graph.
1 code implementation • 24 Jun 2024 • Xiaowen Ma, Rongrong Lian, Zhenkai Wu, Hongbo Guo, Mengting Ma, Sensen Wu, Zhenhong Du, Siyang Song, Wei zhang
In particular, we introduce affine transformations in the LCA module for adaptive extraction of local class representations to effectively tolerate scale and orientation variations in remotely sensed images.
1 code implementation • 21 Jun 2024 • Xiaowen Ma, Zhenkai Wu, Rongrong Lian, Wei zhang, Siyang Song
Consequently, we further propose the instance network CDMaskFormer customized for the change detection task, which includes: (i) a Spatial-temporal convolutional attention-based instantiated change extractor to capture spatio-temporal context simultaneously with lightweight operations; and (ii) a scene-guided axial attention-instantiated transformer decoder to extract more spatial details.
Ranked #3 on Change Detection on SYSU-CD
1 code implementation • CVPR 2024 • Zihan Wang, Siyang Song, Cheng Luo, Songhe Deng, Weicheng Xie, Linlin Shen
Human facial action units (AUs) are mutually related in a hierarchical manner, as not only they are associated with each other in both spatial and temporal domains but also AUs located in the same/close facial regions show stronger relationships than those of different facial regions.
2 code implementations • 6 Feb 2024 • Qinliang Lin, Cheng Luo, Zenghao Niu, Xilin He, Weicheng Xie, Yuanbo Hou, Linlin Shen, Siyang Song
Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems.
1 code implementation • 10 Jan 2024 • Siyang Song, Micol Spitale, Cheng Luo, Cristina Palmero, German Barquero, Hengde Zhu, Sergio Escalera, Michel Valstar, Tobias Baur, Fabien Ringeval, Elisabeth Andre, Hatice Gunes
In dyadic interactions, humans communicate their intentions and state of mind using verbal and non-verbal cues, where multiple different facial reactions might be appropriate in response to a specific speaker behaviour.
1 code implementation • CVPR 2024 • Zijian Wu, Jun Lu, Jing Han, Lianfa Bai, Yi Zhang, Zhuang Zhao, Siyang Song
Then we propose a Shape-Texture Graph Domain Separation (STGDS) module to separate the task-relevant and irrelevant information of target objects by explicitly modelling the relationship between each pair of objects in terms of their shapes and textures respectively.
1 code implementation • CVPR 2024 • Bowen Deng, Siyang Song, Andrew P. French, Denis Schluppeck, Michael P. Pound
Saliency ranking detection (SRD) has emerged as a challenging task in computer vision aiming not only to identify salient objects within images but also to rank them based on their degree of saliency.
1 code implementation • 15 Dec 2023 • Yuanbo Hou, Qiaoqiao Ren, Siyang Song, Yuxin Song, Wenwu Wang, Dick Botteldooren
Specifically, this paper proposes a lightweight multi-level graph learning (MLGL) based on local and global semantic graphs to simultaneously perform audio event classification (AEC) and human annoyance rating prediction (ARP).
1 code implementation • 5 Oct 2023 • Yuanbo Hou, Siyang Song, Chuang Yu, Wenwu Wang, Dick Botteldooren
The results show the feasibility of recognizing diverse acoustic scenes based on the audio event-relational graph.
Acoustic Scene Classification Graph Representation Learning +1
1 code implementation • 23 Aug 2023 • Yuanbo Hou, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie, Jian Kang, Wenwu Wang, Dick Botteldooren
Sound events in daily life carry rich information about the objective world.
no code implementations • 5 Jul 2023 • Jiaqi Xu, Cheng Luo, Weicheng Xie, Linlin Shen, Xiaofeng Liu, Lu Liu, Hatice Gunes, Siyang Song
Verbal and non-verbal human reaction generation is a challenging task, as different reactions could be appropriate for responding to the same behaviour.
1 code implementation • 11 Jun 2023 • Siyang Song, Micol Spitale, Cheng Luo, German Barquero, Cristina Palmero, Sergio Escalera, Michel Valstar, Tobias Baur, Fabien Ringeval, Elisabeth Andre, Hatice Gunes
The Multi-modal Multiple Appropriate Facial Reaction Generation Challenge (REACT2023) is the first competition event focused on evaluating multimedia processing and machine learning techniques for generating human-appropriate facial reactions in various dyadic interaction scenarios, with all participants competing strictly under the same conditions.
1 code implementation • 25 May 2023 • Cheng Luo, Siyang Song, Weicheng Xie, Micol Spitale, Linlin Shen, Hatice Gunes
ReactFace generates multiple different but appropriate photo-realistic human facial reactions by (i) learning an appropriate facial reaction distribution representing multiple appropriate facial reactions; and (ii) synchronizing the generated facial reactions with the speaker's verbal and non-verbal behaviours at each time stamp, resulting in realistic 2D facial reaction sequences.
1 code implementation • 24 May 2023 • Tong Xu, Micol Spitale, Hao Tang, Lu Liu, Hatice Gunes, Siyang Song
This means that we approach this problem by considering the generation of a distribution of the listener's appropriate facial reactions instead of multiple different appropriate facial reactions, i. e., 'many' appropriate facial reaction labels are summarised as 'one' distribution label during training.
1 code implementation • 19 Mar 2023 • Zihan Wang, Siyang Song, Cheng Luo, Yuzhi Zhou, shiling Wu, Weicheng Xie, Linlin Shen
This paper presents our Facial Action Units (AUs) detection submission to the fifth Affective Behavior Analysis in-the-wild Competition (ABAW).
1 code implementation • 13 Feb 2023 • Siyang Song, Micol Spitale, Yiming Luo, Batuhan Bal, Hatice Gunes
However, none attempted to automatically generate multiple appropriate reactions in the context of dyadic interactions and evaluate the appropriateness of those reactions using objective measures.
no code implementations • ICCV 2023 • Xilin He, Qinliang Lin, Cheng Luo, Weicheng Xie, Siyang Song, Feng Liu, Linlin Shen
Recent studies have shown the vulnerability of CNNs under perturbation noises, which is partially caused by the reason that the well-trained CNNs are too biased toward the object texture, i. e., they make predictions mainly based on texture cues.
1 code implementation • 29 Nov 2022 • Xi Jia, Joseph Bartlett, Wei Chen, Siyang Song, Tianyang Zhang, Xinxing Cheng, Wenqi Lu, Zhaowen Qiu, Jinming Duan
Specifically, instead of our Fourier-Net learning to output a full-resolution displacement field in the spatial domain, we learn its low-dimensional representation in a band-limited Fourier domain.
Ranked #3 on Medical Image Registration on OASIS (val dsc metric)
1 code implementation • 19 Nov 2022 • Siyang Song, Yuxin Song, Cheng Luo, Zhiyuan Song, Selim Kuzucu, Xi Jia, Zhijiang Guo, Weicheng Xie, Linlin Shen, Hatice Gunes
Our framework is effective, robust and flexible, and is a plug-and-play module that can be combined with different backbones and Graph Neural Networks (GNNs) to generate a task-specific graph representation from various graph and non-graph data.
1 code implementation • 27 Oct 2022 • Yuanbo Hou, Siyang Song, Chuang Yu, Yuxin Song, Wenwu Wang, Dick Botteldooren
Experiments on a polyphonic acoustic scene dataset show that the proposed ERGL achieves competitive performance on ASC by using only a limited number of embeddings of audio events without any data augmentations.
Acoustic Scene Classification Graph Representation Learning +1
1 code implementation • 17 Oct 2022 • Rongfan Liao, Siyang Song, Hatice Gunes
Personality determines a wide variety of human daily and working behaviours, and is crucial for understanding human internal and external states.
no code implementations • 12 Aug 2022 • Xiangbo Gao, Cheng Luo, Qinliang Lin, Weicheng Xie, Minmin Liu, Linlin Shen, Keerthy Kusumam, Siyang Song
\noindent Traditional L_p norm-restricted image attack algorithms suffer from poor transferability to black box scenarios and poor robustness to defense algorithms.
2 code implementations • 2 May 2022 • Cheng Luo, Siyang Song, Weicheng Xie, Linlin Shen, Hatice Gunes
While the relationship between a pair of AUs can be complex and unique, existing approaches fail to specifically and explicitly represent such cues for each pair of AUs in each facial display.
Ranked #5 on Facial Action Unit Detection on DISFA
no code implementations • 30 Nov 2021 • Jiaqi Xu, Siyang Song, Keerthy Kusumam, Hatice Gunes, Michel Valstar
The short-term depressive behaviour modelling stage first deep learns depression-related facial behavioural features from multiple short temporal scales, where a Depression Feature Enhancement (DFE) module is proposed to enhance the depression-related clues for all temporal scales and remove non-depression noises.
no code implementations • 22 Nov 2021 • Chaudhary Muhammad Aqdus Ilyas, Siyang Song, Hatice Gunes
Unlike the six basic emotions of happiness, sadness, fear, anger, disgust and surprise, modelling and predicting dimensional affect in terms of valence (positivity - negativity) and arousal (intensity) has proven to be more flexible, applicable and useful for naturalistic and real-world settings.
no code implementations • 26 Oct 2021 • Siyang Song, Zilong Shao, Shashank Jaiswal, Linlin Shen, Michel Valstar, Hatice Gunes
This approach builds on two following findings in cognitive science: (i) human cognition partially determines expressed behaviour and is directly linked to true personality traits; and (ii) in dyadic interactions individuals' nonverbal behaviours are influenced by their conversational partner behaviours.
no code implementations • 21 Jan 2020 • Joy O. Egede, Siyang Song, Temitayo A. Olugbade, Chongyang Wang, Amanda Williams, Hongy-ing Meng, Min Aung, Nicholas D. Lane, Michel Valstar, Nadia Bianchi-Berthouze
The EmoPain 2020 Challenge is the first international competition aimed at creating a uniform platform for the comparison of machine learning and multimedia processing methods of automatic chronic pain assessment from human expressive behaviour, and also the identification of pain-related behaviours.
no code implementations • 10 Jul 2019 • Fabien Ringeval, Björn Schuller, Michel Valstar, NIcholas Cummins, Roddy Cowie, Leili Tavabi, Maximilian Schmitt, Sina Alisamir, Shahin Amiriparian, Eva-Maria Messner, Siyang Song, Shuo Liu, Ziping Zhao, Adria Mallol-Ragolta, Zhao Ren, Mohammad Soleymani, Maja Pantic
The Audio/Visual Emotion Challenge and Workshop (AVEC 2019) "State-of-Mind, Detecting Depression with AI, and Cross-cultural Affect Recognition" is the ninth competition event aimed at the comparison of multimedia processing and machine learning methods for automatic audiovisual health and emotion analysis, with all participants competing strictly under the same conditions.
no code implementations • 4 Apr 2019 • Siyang Song, Enrique Sánchez-Lozano, Linlin Shen, Alan Johnston, Michel Valstar
We present a novel approach to capture multiple scales of such temporal dynamics, with an application to facial Action Unit (AU) intensity estimation and dimensional affect estimation.
1 code implementation • 3 May 2018 • Siyang Song, Shuimei Zhang, Björn Schuller, Linlin Shen, Michel Valstar
The performance of speaker-related systems usually degrades heavily in practical applications largely due to the presence of background noise.