Robust People Detection using Computer Vision

semanticscholar 2013  ·  Endri Dibra ·

People detection is a challenging and interesting field of research that has many applications, one of which being video surveillance. Besides monitoring or detection of abnormalities, there is an increasing trend in using video surveillance to draw statistics that can help in managing and planning. In an environment like a restaurant, an automatic system could be used to determine the transition of people and the presence of seated people at tables. This information could help to direct the service for the tables. The purpose of this thesis was then to develop a framework for automatic people detection in restaurants. Adopting state-of-the-art Histograms of Oriented Gradients (HOG) as feature descriptors and Linear Support Vector Machines (SVM) to train classifiers, we propose a novel method for people detection in scenes seen from an oblique camera view. Unlike other holistic methods, which train one general people classifier from available datasets of people, like the INRIA dataset, and search through the whole image applying the classifier model in a sliding window multi-scale fashion, we propose a method based on training of multiple specific classifiers in specific positions in the scene. This scheme allows users to overcome the difficulties that arise from people occlusions and foreshortening that are inherent in other holistic approaches. Furthermore, by performing training only with data generated by synthetic models of humans in synthetic 3D models of the scene, we avoid the problem of acquiring datasets of real images for training. In addition, by performing training only with synthetic models we provide ourselves with a rich dataset of humans in various poses and articulations, not only for standing configurations, but also for sitting (our method is general enough to extend to other poses, but detection of standing and sitting people is the focus here). We demonstrate that this method outperforms the state-of-the-art holistic HOG approach by comparing it to four detectors trained on the INRIA dataset and on three different datasets from our scene real footage. In the end, we show that our detector, which performs single scale detection only on previously trained scene positions, is twice faster when compared to the sliding window multi-scale methods.

PDF
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here