Pedestrian detection based on the combination of Convolutional Neural Network
(i.e., CNN) and traditional handcrafted features (i.e., HOG+LUV) has achieved
great success. Generally, HOG+LUV are used to generate the candidate proposals
and then CNN classifies these proposals. Despite its success, there is still
room for improvement. For example, CNN classifies these proposals by the
full-connected layer features while proposal scores and the features in the
inner-layers of CNN are ignored. In this paper, we propose a unifying framework
called Multilayer Channel Features (MCF) to overcome the drawback. It firstly
integrates HOG+LUV with each layer of CNN into a multi-layer image channels.
Based on the multi-layer image channels, a multi-stage cascade AdaBoost is then
learned. The weak classifiers in each stage of the multi-stage cascade is
learned from the image channels of corresponding layer. With more abundant
features, MCF achieves the state-of-the-art on Caltech pedestrian dataset
(i.e., 10.40% miss rate). Using new and accurate annotations, MCF achieves
7.98% miss rate. As many non-pedestrian detection windows can be quickly
rejected by the first few stages, it accelerates detection speed by 1.43 times.
By eliminating the highly overlapped detection windows with lower scores after
the first stage, it's 4.07 times faster with negligible performance loss.