Identifying the worries of individuals and societies plays a crucial role in providing social support and enhancing policy decision-making.
We propose a novel approach to infer the network structure for DQN models operating with high-dimensional continuous actions.
In particular, we adopt a disentangled representation technique to ensure the features of different factors in each modality are independent to each other.
Given that our framework is model-agnostic, we apply it to the existing popular baselines and validate its effectiveness on the benchmark dataset.
From the results on four datasets regarding the above three tasks, our method yields remarkable performance improvements compared with the baselines, demonstrating its superiority on reducing the modality bias problem.
First, we theoretically show that an adversary can upper-bound the distributional shift which guarantees the attack's invisibility.
To explore these issues, we formulate a new semi-supervised continual learning method, which can be generically applied to existing continual learning models.
Raven's Progressive Matrices (RPM) is highly correlated with human intelligence, and it has been widely used to measure the abstract reasoning ability of humans.
Motivated by scenarios where data is used for diverse prediction tasks, we study whether fair representation can be used to guarantee fairness for unknown tasks and for multiple fairness notions simultaneously.
The concept module generates semantically meaningful features for primitive concepts, whereas the visual module extracts visual features for attributes and objects from input images.
A recent adversarial training (AT) study showed that the number of projected gradient descent (PGD) steps to successfully attack a point (i. e., find an adversarial example in its proximity) is an effective measure of the robustness of this point.
Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.
In this work, we take a step towards training robust models for cross-domain pose estimation task, which brings together ideas from causal representation learning and generative adversarial networks.
Hyperspectral compressive imaging takes advantage of compressive sensing theory to achieve coded aperture snapshot measurement without temporal scanning, and the entire three-dimensional spatial-spectral data is captured by a two-dimensional projection during a single integration period.
A new model is trained with these labels to generalize reliably despite the label noise.
no code implementations • 22 Sep 2020 • Konstantinos Nikolaidis, Stein Kristiansen, Thomas Plagemann, Vera Goebel, Knut Liestøl, Mohan Kankanhalli, Gunn Marit Traaen, Britt Øverland, Harriet Akre, Lars Aakerøy, Sigurd Steinshamn
In this work, we present an approach for unsupervised domain adaptation (DA) with the constraint, that the labeled source data are not directly available, and instead only access to a classifier trained on the source data is provided.
1 code implementation • 21 Sep 2020 • Konstantinos Nikolaidis, Stein Kristiansen, Thomas Plagemann, Vera Goebel, Knut Liestøl, Mohan Kankanhalli, Gunn Marit Traaen, Britt Øverland, Harriet Akre, Lars Aakerøy, Sigurd Steinshamn
We use sleep monitoring data from both an open and a large closed clinical study and evaluate whether (1) end-users can create and successfully use customized classification models for sleep apnea detection, and (2) the identity of participants in the study is protected.
We argue that the key to Byzantine detection is monitoring of gradients of the model parameters of clients.
When the federated learning is adopted among competitive agents with siloed datasets, agents are self-interested and participate only if they are fairly rewarded.
We demonstrate the effectiveness of the proposed model on two different large-scale and publicly available datasets, YFCC100M and NUS-WIDE.
In this paper, we corroborate based on three subjective experiments on a novel image dataset that objects in natural images are inherently perceived to have varying levels of importance.
Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models.
Therefore, it is important to develop algorithms that can leverage off-the-shelf labeled dataset to learn useful knowledge for the target task.
To enable research in this direction, we introduce 360Action, the first omnidirectional video dataset for multi-person action recognition.
To overcome this limitation, we propose a novel mask transfer network (MTN), which can greatly boost the processing speed of VOS and also achieve a reasonable accuracy.
Finally, by sequentially examining each state transition in the video graph, our method can detect and explain how those actions are executed with prior knowledge, just like the logical manner of thinking by humans.
To tackle this problem, in this paper, we propose a novel Multimodal Attentive Metric Learning (MAML) method to model user diverse preferences for various items.
Supervised machine learning applications in the health domain often face the problem of insufficient training datasets.
Benefiting from the advancement of computer vision, natural language processing and information retrieval techniques, visual question answering (VQA), which aims to answer questions about an image or a video, has received lots of attentions over the past few years.
In addition, analysis of the intra-class compactness and inter-class separability demonstrates the advantages of the proposed function over the softmax function, which is consistent with the performance improvement.
Advertisements (ads) often contain strong affective content to capture viewer attention and convey an effective message to the audience.
Our analytical studies reveal that the step factor h in the Euler method is able to control the robustness of ResNet in both its training and generalization.
Despite the success of deep neural networks (DNNs) in image classification tasks, the human-level performance relies on massive training data with high-quality manual annotations, which are expensive and time-consuming to collect.
Ranked #14 on Image Classification on Clothing1M (using extra training data)
Then the aspect importance is integrated into a novel aspect-aware latent factor model (ALFM), which learns user's and item's latent factors based on ratings.
Moreover, our method achieves better performance than the best unsupervised offline algorithm on the DAVIS-2016 benchmark dataset.
Ranked #20 on Unsupervised Video Object Segmentation on DAVIS 2016
Contrary to the popular notion that ad affect hinges on the narrative and the clever use of linguistic and social cues, we find that actively attended objects and the coarse scene structure better encode affective information as compared to individual scene objects or conspicuous background elements.
However, due to the domain shift problem, the performance of Web images trained deep classifiers tend to degrade when directly deployed to videos.
To enable the study of this problem, there exist a vast number of action datasets, which are recorded under controlled laboratory settings, real-world surveillance environments, or crawled from the Internet.
This paper addresses the problem of active learning of a multi-output Gaussian process (MOGP) model representing multiple types of coexisting correlated environmental phenomena.