Modern AI tools, such as generative adversarial networks, have transformed our ability to create and modify visual data with photorealistic results.
First, we find that the fusion model is usually both more accurate, and more robust against single-source attacks than single-sensor deep neural networks.
In this paper, we propose a confidence segmentation (ConfSeg) module that builds confidence score for each pixel in CAM without introducing additional hyper-parameters.
Weakly Supervised Object Localization (WSOL) methodsusually rely on fully convolutional networks in order to ob-tain class activation maps(CAMs) of targeted labels.
To overcome such limitation, we propose a GAN based EM learning framework that can maximize the likelihood of images and estimate the latent variables with only the constraint of L-Lipschitz continuity.
Furthermore, we study multiple modalities including description and transcripts for the purpose of boosting video understanding.