RESOUND: Towards Action Recognition without Representation Bias

ECCV 2018 · Yingwei Li, Yi Li, Nuno Vasconcelos ·

While large datasets have proven to be a key enabler for progress in computer vision, they can have biases that lead to erroneous conclusions. The notion of the representation bias of a dataset is proposed to combat this problem. It captures the fact that representations other than the ground-truth representation can achieve good performance on any given dataset. When this is the case, the dataset is said not to be well calibrated. Dataset calibration is shown to be a necessary condition for the standard state-of-the-art evaluation practice to converge to the ground-truth representation. A procedure, RESOUND, is proposed to quantify and minimize representation bias. Its application to the problem of action recognition shows that current datasets are biased towards static representations (objects, scenes and people). Two versions of RESOUND are studied. An Explicit RESOUND procedure is proposed to assemble new datasets by sampling existing datasets. An implicit RESOUND procedure is used to guide the creation of a new dataset, Diving48, of over 18,000 video clips of competitive diving actions, spanning 48 ï¬ne-grained dive classes. Experimental evaluation conï¬rms the eï¬ectiveness of RESOUND to reduce the static biases of current datasets.

PDF Abstract