With the rapid development of data-driven techniques, data has played an essential role in various computer vision tasks.
Single-image room layout reconstruction aims to reconstruct the enclosed 3D structure of a room from a single image.
Then, we leverage the room layout prior, a strong structural constraint of the indoor scene, to guide the generation of target views.
This paper proposes a framework for the interactive video object segmentation (VOS) in the wild where users can choose some frames for annotations iteratively.
Recently, there has been growing interest in developing learning-based methods to detect and utilize salient semi-global or global structures, such as junctions, lines, planes, cuboids, smooth surfaces, and all types of symmetries, for 3D scene modeling and understanding.
In this paper, we present a novel framework to detect line segments in man-made environments.
In the first stage, we train a CNN to map each pixel to an embedding space where pixels from the same plane instance have similar embeddings.
Ranked #1 on Plane Instance Segmentation on NYU Depth v2