Hence, we develop an approach based on intermediate representations of poses and appearance: our pose-guided appearance rendering network firstly encodes the targets' poses using an encoder-decoder neural network.
Normalizing flows transform a simple base distribution into a complex target distribution and have proved to be powerful models for data generation and density estimation.
In this work, we propose a novel probabilistic sequence model that excels at capturing high variability in time series data, both across sequences and within an individual sequence.
Event sequences can be modeled by temporal point processes (TPPs) to capture their asynchronous and probabilistic nature.
In this paper, we aim to characterize adversarial examples based on spatial context information in semantic segmentation.
We explore a key architectural aspect of deep convolutional neural networks: the pattern of internal skip connections used to aggregate outputs of earlier layers for consumption by deeper layers.
An architecture combining a hierarchical temporal model for predicting human poses and encoder-decoder convolutional neural networks for rendering target appearances is proposed.