Paper

Tinkering Under the Hood: Interactive Zero-Shot Learning with Net Surgery

We consider the task of visual net surgery, in which a CNN can be reconfigured without extra data to recognize novel concepts that may be omitted from the training set. While most prior work make use of linguistic cues for such "zero-shot" learning, we do so by using a pictorial language representation of the training set, implicitly learned by a CNN, to generalize to new classes. To this end, we introduce a set of visualization techniques that better reveal the activation patterns and relations between groups of CNN filters. We next demonstrate that knowledge of pictorial languages can be used to rewire certain CNN neurons into a part model, which we call a pictorial language classifier. We demonstrate the robustness of simple PLCs by applying them in a weakly supervised manner: labeling unlabeled concepts for visual classes present in the training data. Specifically we show that a PLC built on top of a CNN trained for ImageNet classification can localize humans in Graz-02 and determine the pose of birds in PASCAL-VOC without extra labeled data or additional training. We then apply PLCs in an interactive zero-shot manner, demonstrating that pictorial languages are expressive enough to detect a set of visual classes in MS-COCO that never appear in the ImageNet training set.

Results in Papers With Code
(↓ scroll down to see all results)