This paper introduces geometry and object shape and pose costs for
multi-object tracking in urban driving scenarios. Using images from a monocular
camera alone, we devise pairwise costs for object tracks, based on several 3D
cues such as object pose, shape, and motion. The proposed costs are agnostic to
the data association method and can be incorporated into any optimization
framework to output the pairwise data associations. These costs are easy to
implement, can be computed in real-time, and complement each other to account
for possible errors in a tracking-by-detection framework. We perform an
extensive analysis of the designed costs and empirically demonstrate consistent
improvement over the state-of-the-art under varying conditions that employ a
range of object detectors, exhibit a variety in camera and object motions, and,
more importantly, are not reliant on the choice of the association framework.
We also show that, by using the simplest of associations frameworks (two-frame
Hungarian assignment), we surpass the state-of-the-art in multi-object-tracking
on road scenes. More qualitative and quantitative results can be found at the
following URL: https://junaidcs032.github.io/Geometry_ObjectShape_MOT/.