Deep neural networks are susceptible to catastrophic forgetting: when encountering a new task, they can only remember the new task and fail to preserve its ability to accomplish previously learned tasks.
Hence, we develop an approach based on intermediate representations of poses and appearance: our pose-guided appearance rendering network firstly encodes the targets' poses using an encoder-decoder neural network.
In contrast, we propose a parameter efficient framework, Piggyback GAN, which learns the current task by building a set of convolutional and deconvolutional filters that are factorized into filters of the models trained on previous tasks.
Human activity videos involve rich, varied interactions between people and objects.
This makes it possible to perform image-conditioned generation tasks in a lifelong learning setting.
An architecture combining a hierarchical temporal model for predicting human poses and encoder-decoder convolutional neural networks for rendering target appearances is proposed.
This paper presents a deep neural-network-based hierarchical graphical model for individual and group activity recognition in surveillance scenes.