libHOG: Energy-Efficient Histogram of Oriented Gradient Computation
Histogram of Oriented Gradients (HOG) features are the underlying representation in automotive computer vision applications such as collision avoidance and lane keeping. In these applications, we have observed that HOG feature computation is often a slow and energy-intensive component of the overall pipeline. In this paper, we focus on reducing both the time taken and the energy used for computing Felzenszwalb HOG features. We achieve our results though a combination of reduced precision, SIMD parallelism, algorithmic changes, and outer-loop parallelism. In particular, we address a bottleneck in histogram accumulation by phrasing the problem as a gather instead of the (traditional) scatter. Additionally, we explore the tradeoffs of using L1 instead of L2 norms to compute gradients, which enables smaller operands and more SIMD parallelism. Overall, we are able to compute multiresolution HOG pyramids at 70fps for 640x480 images on a multicore CPU. This is a 3.6x speedup over the best known HOG implementation and a 29x speedup over the popular voc-release5 HOG code. This is also a 3.6x - 22x reduction in energy per frame compared to previous HOG implementations. Our open-source implementation is available for download.
PDF