In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image.
Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding, ranking user preferences, ad placement, etc.
Per-pixel ground-truth depth data is challenging to acquire at scale.
#4 best model for Monocular Depth Estimation on KITTI Eigen split
Semantic segmentation is an important tool for visual scene understanding and a meaningful measure of uncertainty is essential for decision making.
We show that SegNet provides good performance with competitive inference time and more efficient inference memory-wise as compared to other architectures.
#2 best model for Scene Segmentation on SUN-RGBD
Although CNN has shown strong capability to extract semantics from raw pixels, its capacity to capture spatial relationships of pixels across rows and columns of an image is not fully explored.
SOTA for Lane Detection on TuSimple
Convolutional networks for image classification progressively reduce resolution until the image is represented by tiny feature maps in which the spatial structure of the scene is no longer discernible.
Real-time scene understanding has become crucial in many applications such as autonomous driving.
#3 best model for Real-Time Object Detection on PASCAL VOC 2007
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.