Human motion modeling is a classic problem in computer vision and graphics.
Challenges in modeling human motion include high dimensional prediction as well
as extremely complicated dynamics.We present a novel approach to human motion
modeling based on convolutional neural networks (CNN). The hierarchical
structure of CNN makes it capable of capturing both spatial and temporal
correlations effectively. In our proposed approach,a convolutional long-term
encoder is used to encode the whole given motion sequence into a long-term
hidden variable, which is used with a decoder to predict the remainder of the
sequence. The decoder itself also has an encoder-decoder structure, in which
the short-term encoder encodes a shorter sequence to a short-term hidden
variable, and the spatial decoder maps the long and short-term hidden variable
to motion predictions. By using such a model, we are able to capture both
invariant and dynamic information of human motion, which results in more
accurate predictions. Experiments show that our algorithm outperforms the
state-of-the-art methods on the Human3.6M and CMU Motion Capture datasets. Our
code is available at the project website.