State of the art methods for semantic image segmentation are trained in a
supervised fashion using a large corpus of fully labeled training images.
However, gathering such a corpus is expensive, due to human annotation effort,
in contrast to gathering unlabeled data. We propose an active learning-based
strategy, called CEREALS, in which a human only has to hand-label a few,
automatically selected, regions within an unlabeled image corpus. This
minimizes human annotation effort while maximizing the performance of a
semantic image segmentation method. The automatic selection procedure is
achieved by: a) using a suitable information measure combined with an estimate
about human annotation effort, which is inferred from a learned cost model, and
b) exploiting the spatial coherency of an image. The performance of CEREALS is
demonstrated on Cityscapes, where we are able to reduce the annotation effort
to 17%, while keeping 95% of the mean Intersection over Union (mIoU) of a model
that was trained with the fully annotated training set of Cityscapes.