In order to summarize a content video properly, it is important to grasp the sequential structure of video as well as the long-term dependency between frames.
In this per-clip inference scheme, we update the memory with an interval and simultaneously process a set of consecutive frames (i. e. clip) between the memory updates.
The scheme first clusters compound target data based on style, discovering multiple latent domains (discover).
In order to find the uncertain points, we generate an inconsistency mask using the proposed adaptive pixel selector and we label these segment-based regions to achieve near supervised performance with only a small fraction (about 2. 2%) ground truth points, which we call "Segment based Pixel-Labeling (SPL)".
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic Segmentation.
Temporal correspondence - linking pixels or objects across frames - is a fundamental supervisory signal for the video models.
However, since only the confident predictions are taken as pseudo labels, existing self-training approaches inevitably produce sparse pseudo labels in practice.
In this work, we propose Boundary Basis based Instance Segmentation(B2Inst) to learn a global boundary representation that can complement existing global-mask-based methods that are often lacking high-frequency details.
In this paper, we propose to explicitly learn to imagine a storyline that bridges the visual gap.
In this paper, we investigate the problem of unpaired video-to-video translation.
Blind video decaptioning is a problem of automatically removing text overlays and inpainting the occluded parts in videos without any input masks.
The proposed variance loss allows a network to predict output scores for each frame with high discrepancy which enables effective feature learning and significantly improves model performance.
Ranked #1 on Unsupervised Video Summarization on SumMe
In this paper, we present a method that improves scene graph generation by explicitly modeling inter-dependency among the entire object instances.
We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks.