In this work, (1) we propose a novel three-decoder architecture as the infrastructure for focused attention; 2) we use the generalized intersection box prediction task to effectively guide our model to focus on occlusion-specific regions; 3) our model achieves a new state-of-the-art performance on distance-aware relationship detection.
Detecting 3D keypoints from point clouds is important for shape reconstruction, while this work investigates the dual question: can shape reconstruction benefit 3D keypoint detection?
Multi-task indoor scene understanding is widely considered as an intriguing formulation, as the affinity of different tasks may lead to improved performance.
Ranked #27 on Semantic Segmentation on NYU Depth v2
Such a scheme has two limitations: 1) Storing and running several networks for different tasks are expensive for typical robotic platforms.
This paper aims to (1) summarize the fundamental problems and the state-of-the-art associated with scene text recognition; (2) introduce new insights and ideas; (3) provide a comprehensive review of publicly available resources; (4) point out directions for future work.
To remedy this issue, we propose a decoupled attention network (DAN), which decouples the alignment operation from using historical decoding results.
Ranked #3 on Handwritten Text Recognition on IAM
Scene text recognition has attracted particular research interest because it is a very challenging problem and has various applications.