A newly proposed work exploits Convolutional-Deconvolutional-Convolutional (CDC) filters to upsample the predictions of 3D ConvNets, making it possible to perform per-frame action predictions and achieving promising performance in terms of temporal action localization.
Object detection is an import task of computer vision. A variety of methods have been proposed, but methods using the weak labels still do not have a satisfactory result. In this paper, we propose a new framework that using the weakly supervised method's output as the pseudo-strong labels to train a strongly supervised model. One weakly supervised method is treated as black-box to generate class-specific bounding boxes on train dataset. A de-noise method is then applied to the noisy bounding boxes. Then the de-noised pseudo-strong labels are used to train a strongly object detection network. The whole framework is still weakly supervised because the entire process only uses the image-level labels. The experiment results on PASCAL VOC 2007 prove the validity of our framework, and we get result 43. 4% on mean average precision compared to 39. 5% of the previous best result and 34. 5% of the initial method, respectively. And this frame work is simple and distinct, and is promising to be applied to other method easily.
This study focuses on human recognition with gait feature obtained by Kinect and shows that gait feature can effectively distinguish from different human beings through a novel representation -- relative distance-based gait features.