Our model is trained to predict human images in arbitrary poses, which encourages it to extract disentangled and expressive neural textures representing the appearance of different semantic entities.
The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications.
Finally, we warp the source features using a content-aware sampling method with the obtained local attention coefficients.
Image inpainting techniques have shown significant improvements by using deep neural networks recently.
Point cloud is a fundamental 3D representation which is widely used in real world applications such as autonomous driving.
Remarkably, we obtain the frame-level AUC score of 82. 12% on UCF-Crime.
Weakly supervised temporal action detection is a Herculean task in understanding untrimmed videos, since no supervisory signal except the video-level category label is available on training data.
In this paper, we propose a novel salient object detection algorithm for RGB-D images using center-dark channel priors.
One is the lack of tremendous amount of annotated data to train a network.