Besides, an adversarial learning strategy is followed to model the discriminator between the target-domain known and unknown classes.
Various historical languages, which used to be lingua franca of science and arts, deserve the attention of current NLP research.
We introduce Net2Brain, a graphical and command-line user interface toolbox for comparing the representational spaces of artificial deep neural networks (DNNs) and human brain recordings.
Therefore, a core challenge of the assessment process is to identify when experts from different disciplines talk about the same problem but use different terminologies.
We use iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment.
In this paper we present MusicVideos (MuVi), a novel dataset for affective multimedia content analysis to study how the auditory and visual modalities contribute to the perceived emotion of media.
Inspired by the notion of generative feature replay, we propose a novel framework called Feature Replay based Incremental Domain Adaptation (FRIDA) which leverages a new incremental generative adversarial network (GAN) called domain-generic auxiliary classification GAN (DGAC-GAN) for producing domain-specific feature representations seamlessly.
The models that use all visual, audio, and text features simultaneously as their inputs performed better than those using features extracted from each modality separately.
In this work, we propose different variants of the self-attention based network for emotion prediction from movies, which we call AttendAffectNet.
In this paper, we tackle an open research question in transfer learning, which is selecting a model initialization to achieve high performance on a new task, given several pre-trained models.
Relative to the attention-based (Attn) model, we discover that the connectionist temporal classification (CTC) model is more robust to noise and occlusion, and better at generalizing to different word lengths.
In this work, we present a novel method to learn a local cross-domain descriptor for 2D image and 3D point cloud matching.
When reducing the training data to only using the train set, our method results in 309 confusions for the Multi-target speaker identification task, which is 46% better than the baseline model.
Interestingly, we also observe that the optical flow is more informative than the RGB in videos, and overall, models using audio features are more accurate than those based on video features when making the final prediction of evoked emotions.
Recently, researchers of natural intelligence have begun using those AI models to explore how the brain performs such tasks.
We next evaluate the relationship of RSA with the transfer learning performance on Taskonomy tasks and a new task: Pascal VOC semantic segmentation.
Deep learning techniques have become the to-go models for most vision-related tasks on 2D images.
Ranked #2 on 3D Instance Segmentation on SceneNN
To the best of our knowledge, our method is the first data augmentation technique focused on improving performance in unsupervised anomaly detection.
Also, for all tested networks, when trained on targets in isolation, we find that recognition accuracy of the networks decreases the closer the flankers are to the target and the more flankers there are.
We show that the algorithm to extract diverse M -solutions from a Conditional Random Field (called divMbest ) takes exactly the form of a Herding procedure , i. e. a deterministic dynamical system that produces a sequence of hypotheses that respect a set of observed moment constraints.
To see this, first, we report results in ImageNet that lead to a revision of the hypothesis that adversarial perturbations are a consequence of CNNs acting as a linear classifier: CNNs act locally linearly to changes in the image regions with objects recognized by the CNN, and in other regions the CNN may act non-linearly.
Within this processing pipeline, the common trend is to learn the feature coding templates, often referred as codebook entries, filters, or over-complete basis.
In a series of papers by Dai and colleagues [1, 2], a feature map (or kernel) was introduced for semi- and unsupervised learning.
We define a robust and fast to evaluate energy function, based on enforcing color similarity between the bound- aries and the superpixel color histogram.
SVMs suffer from various drawbacks in terms of selecting the right kernel, which depends on the image descriptors, as well as computational and memory efficiency.