Second, we introduce a novel loss to explicitly enforce consistency across generated views both in space and in time.
To tackle this, we propose Neural Human Performer, a novel approach that learns generalizable neural radiance fields based on a parametric human body model for robust performance capture.
In this paper, we identify that the problem is that the binary classifiers in existing proposal methods tend to overfit to the training categories.
Temporal correspondence - linking pixels or objects across frames - is a fundamental supervisory signal for the video models.
1 code implementation • 17 Jun 2021 • Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan, Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen
DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision.
In this work, we propose Boundary Basis based Instance Segmentation(B2Inst) to learn a global boundary representation that can complement existing global-mask-based methods that are often lacking high-frequency details.
In this paper, we propose to explicitly learn to imagine a storyline that bridges the visual gap.
In this paper, we investigate the problem of unpaired video-to-video translation.
Blind video decaptioning is a problem of automatically removing text overlays and inpainting the occluded parts in videos without any input masks.
The proposed variance loss allows a network to predict output scores for each frame with high discrepancy which enables effective feature learning and significantly improves model performance.
Ranked #1 on Unsupervised Video Summarization on SumMe
Self-supervised tasks such as colorization, inpainting and zigsaw puzzle have been utilized for visual representation learning for still images, when the number of labeled images is limited or absent at all.
Ranked #28 on Self-Supervised Action Recognition on HMDB51
In this paper, we present a method that improves scene graph generation by explicitly modeling inter-dependency among the entire object instances.
The recovery of the aforementioned damage pushes the network to obtain robust and general-purpose representations.
Weakly supervised semantic segmentation and localiza- tion have a problem of focusing only on the most important parts of an image since they use only image-level annota- tions.