Hierarchical feature extractors such as Convolutional Networks (ConvNets)
have achieved impressive performance on a variety of classification tasks using
purely feedforward processing. Feedforward architectures can learn rich
representations of the input space but do not explicitly model dependencies in
the output spaces, that are quite structured for tasks such as articulated
human pose estimation or object segmentation...
Here we propose a framework that
expands the expressive power of hierarchical feature extractors to encompass
both input and output spaces, by introducing top-down feedback. Instead of
directly predicting the outputs in one go, we use a self-correcting model that
progressively changes an initial solution by feeding back error predictions, in
a process we call Iterative Error Feedback (IEF). IEF shows excellent
performance on the task of articulated pose estimation in the challenging MPII
and LSP benchmarks, matching the state-of-the-art without requiring ground
truth scale annotation.