Jointly considering multiple camera views (multi-view) is very effective for pedestrian detection under occlusion.
Empathy is a social skill that indicates an individual's ability to understand others.
As such, enforcing a high similarity for positive pairs and a low similarity for negative pairs may not always be achievable, and in the case of some pairs, forcing so may be detrimental to the performance.
Because the digital twins individually mimic user bias, the resulting DT training set better reflects the characteristics of the target scenario and allows us to train more effective product detection and tracking models.
To this end, we design a user interface to generate an automatic feedback mechanism that integrates Pavlok and a deep learning based model to detect certain behaviours via an integrated user interface i. e. mobile or desktop application.
The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture which effectively captures multimodal manipulations.
Ranked #1 on Temporal Forgery Localization on ForgeryNet
We consider a scenario where we have access to the target domain, but cannot afford on-the-fly training data annotation, and instead would like to construct an alternative training set from a large-scale data pool such that a competitive model can be obtained.
The former measures how suitable a training set is for a target domain, while the latter studies how challenging a test set is for a learned model.
Fine-tuning is widely applied in image classification tasks as a transfer learning approach.
We show that under this new taxonomy, many of the applications where transfer learning has been shown to be ineffective or even hinder performance are to be expected when taking into account the source and target datasets and the techniques used.
To this end, we present a review in the form of a taxonomy on existing works of skeleton-based action recognition.
This article aims to use graphic engines to simulate a large number of training data that have free annotations and possibly strongly resemble to real-world data.
To this end, we develop a protocol to automatically synthesize large scale MiE training data that allow us to train improved recognition models for real-world test data.
Varying the proportions of male and female faces in the training data can have a substantial effect on behavior on the test data: we found that the seemingly obvious choice of 50:50 proportions was not the best for this dataset to reduce biased behavior on female faces, which was 71% unbiased as compared to our top unbiased rate of 84%.
A model is either pre-trained or not pre-trained.
Ranked #1 on Image Classification on Caltech-256 (using extra training data)
Such GNNs are incapable of learning relative positions between graph nodes within a graph.
The prevalent convolutional neural network (CNN) based image denoising methods extract features of images to restore the clean ground truth, achieving high denoising accuracy.
Recent skeleton-based action recognition methods extract features from 3D joint coordinates as spatial-temporal cues, using these representations in a graph neural network for feature fusion to boost recognition performance.
Ranked #17 on Skeleton Based Action Recognition on NTU RGB+D 120
InvDN transforms the noisy input into a low-resolution clean image and a latent representation containing noise.
Instead of constraining the translation process by using a reference image, the users can command the model to retouch the generated images by involving the semantic information in the generation process.
The personal identity information in original EEGs are transformed into disguised ones with a CycleGANbased EEG disguising model.
To illustrate this, we first investigate the performance of our networks with supervised learning, then with unsupervised learning.
Smiles play a vital role in the understanding of social interactions within different communities, and reveal the physical state of mind of people in both real and deceptive ways.
Sentence compression is a Natural Language Processing (NLP) task aimed at shortening original sentences and preserving their key information.
This inspired our research which explores the performance of two models from pixel transformation in frontal facial synthesis, Pix2Pix and CycleGAN.
Identifying the information lossless condition for deep neural architectures is important, because tasks such as image restoration require keep the detailed information of the input data as much as possible.
For example, acted anger can be expressed when stimuli is not genuinely angry with an aim to manipulate the observer.
Between synthetic and real data, there is a two-level domain gap, i. e., content level and appearance level.
We show that optimising the parameters of classification neural networks with softmax cross-entropy is equivalent to maximising the mutual information between inputs and labels under the balanced data assumption.
We study the factors that influence the perception of group-level cohesion and propose methods for estimating the human-perceived cohesion on the group cohesiveness scale.
Wagner's modularity inducing problem domain is a key contribution to the study of the evolution of modularity, including both evolutionary theory and evolutionary computation.
This paper describes our approach, called EPUTION, for the open trial of the SemEval- 2018 Task 2, Multilingual Emoji Prediction.