Recently, supervised methods, which often require substantial amounts of class labels, have achieved promising results for EEG representation learning.
Recent works have shown that videos or multi-view images carry rich information regarding the hand, allowing for the development of more robust HPE systems.
To develop a system capable of classifying running styles using wearables, we collect a dataset from 10 healthy runners performing 8 different pre-defined running styles.
Experiments are performed on the Oulu-CASIA dataset and the performance is compared to other works in FER.
Classification of human emotions can play an essential role in the design and improvement of human-machine systems.
We validate the performance of our proposed architecture in the context of two multi-perspective visual recognition tasks namely lip reading and face recognition.
Then, we employ the teacher network to learn the discriminative features embedded in capsules by adopting a lightweight model (student network) to mimic the teacher using the privileged knowledge.
A novel object detection method is presented that handles freely rotated objects of arbitrary sizes, including tiny objects as small as $2\times 2$ pixels.
A new method is proposed for human motion predition by learning temporal and spatial dependencies in an end-to-end deep neural network.
Moreover, face recognition experiments demonstrate that our hallucinated depth along with the input RGB images boosts performance across various architectures when compared to a single RGB modality by average values of +1. 2%, +2. 6%, and +2. 6% for IIIT-D, EURECOM, and LFW datasets respectively.
We propose a novel keypoint voting scheme based on intersecting spheres, that is more accurate than existing schemes and allows for a smaller set of more disperse keypoints.
We then investigate the possibility of human behavior being altered as a result of the smart home and the human model adapting to one-another.
A subset of the in the wild dataset contains facial images with different expressions, annotated for usage in the context of face expression recognition tests.
Our novel attention mechanism directs the deep network "where to look" for visual features in the RGB image by focusing the attention of the network using depth features extracted by a Convolution Neural Network (CNN).
Our DL models accurately detect the chronic stress exposure group (AUROC=0. 982+/-0. 002), the individual psychological stress score (R2=0. 943+/-0. 009) and FSI at 34 weeks of gestation (R2=0. 946+/-0. 013), as well as the maternal hair cortisol at birth reflecting chronic stress exposure (0. 931+/-0. 006).
We propose the use of self-supervised learning for human activity recognition with smartphone accelerometer data.
In this context, we propose a novel deep network, learning to transfer multi-scale partial gait representations using capsules to obtain more discriminative gait features.
Our proposed model has been extensively tested on two large-scale CASIA-B and OU-MVLP gait datasets using four different test protocols and has been compared to a number of state-of-the-art and baseline solutions.
Electrocardiogram (ECG) is the electrical measurement of cardiac activity, whereas Photoplethysmogram (PPG) is the optical measurement of volumetric changes in blood circulation.
Our model uses thin-ResNet for extracting speaker embeddings from utterances and a Siamese capsule network and dynamic routing as the Back-end to calculate a similarity score between the embeddings.
The acquisition of massive data on parcel delivery motivates postal operators to foster the development of predictive systems to improve customer service.
We also evaluate FluentNet on this dataset, showing the strong performance of our model versus a number of benchmark techniques.
Speech representations extracted by deep learning models are being used in a wide range of tasks such as speech recognition, speaker recognition, and speech emotion recognition.
The results show the wide-spread applicability for stacked convolutional autoencoders to be used with machine learning for affective computing.
This paper presents the novel Riemannian Fusion Network (RFNet), a deep neural architecture for learning spatial and temporal information from Electroencephalogram (EEG) for a number of different EEG-based Brain Computer Interface (BCI) tasks and applications.
Smart devices in the Internet of Things (IoT) paradigm provide a variety of unobtrusive and pervasive means for continuous monitoring of bio-metrics and health information.
A novel attention aware method is proposed to fuse two image modalities, RGB and depth, for enhanced RGB-D facial recognition.
Six different signal transformations are applied to the ECG signals, and transformation recognition is performed as pretext tasks.
To enable the system to focus on the most salient parts of the learned multimodal representations, we propose an architecture composed of a capsule attention mechanism following a deep Long Short-Term Memory (LSTM) network.
Stuttering is a speech impediment affecting tens of millions of people on an everyday basis.
Our proposed architecture consists of two main networks, a signal transformation recognition network and an emotion recognition network.
Recent advances in deep pose estimation models have proven to be effective in a wide range of applications such as health monitoring, sports, animations, and robotics.
Classifying limb movements using brain activity is an important task in Brain-computer Interfaces (BCI) that has been successfully used in multiple application domains, ranging from human-computer interaction to medical and biomedical applications.
Simulations are a pedagogical means of enabling a risk-free way for healthcare practitioners to learn, maintain, or enhance their knowledge and skills.
Optical marker-based motion capture is a vital tool in applications such as motion and behavioural analysis, animation, and biomechanics.
In this context, this paper proposes two novel LSTM cell architectures that are able to jointly learn from multiple sequences simultaneously acquired, targeting to create richer and more effective models for recognition tasks.