Perceptual judgment of image similarity by humans relies on rich internal representations ranging from low-level features to high-level concepts, scene properties and even cultural associations.
While great advances are made in pattern recognition and machine learning, the successes of such fields remain restricted to narrow applications and seem to break down when training data is scarce, a shift in domain occurs, or when intelligent reasoning is required for rapid adaptation to new environments.
The implications of this intriguing property of deep neural networks are discussed and we suggest ways to harness it to create more robust representations.
Given an existing trained neural network, it is often desirable to learn new capabilities without hindering performance of those already learned.
Convolutional neural networks have been shown to develop internal representations, which correspond closely to semantically meaningful objects and parts, although trained solely on class labels.
Action recognition in still images has seen major improvement in recent years due to advances in human pose estimation, object recognition and stronger feature representations.
In this paper we demonstrate how recognition is improved by obtaining precise localization of the action-object and consequently extracting details of the object shape together with the actor-object interaction.