Motion-based video frame interpolation commonly relies on optical flow to warp pixels from the inputs to the desired interpolation instant.
Specifically, splatting can be used to warp the input images to an arbitrary temporal location based on an optical flow estimate.
LDET leads to significant improvements on many datasets in the open-world instance segmentation task, outperforming baselines on cross-category generalization on COCO, as well as cross-dataset evaluation on UVO and Cityscapes.
Recyclable waste detection poses a unique computer vision challenge as it requires detection of highly deformable and often translucent objects in cluttered scenes without the kind of context information usually present in human-centric datasets.
Recently, most ZSS methods focus on learning the visual-semantic correspondence to transfer knowledge from seen classes to unseen classes at the pixel level.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism and captures the same rich spatial context at a small fraction of the computational cost, by changing the order of operations.
Ranked #26 on Semantic Segmentation on DensePASS
We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation.
Ranked #2 on Video Semantic Segmentation on Cityscapes val
Learning from a few examples is a challenging task for machine learning.
Then, the proposed Cascaded Refinement Network(CRN) takes the coarse segmentation as guidance to generate an accurate segmentation of full resolution.
Our method drives the network to learn a Level Set function for salient objects so it can output more accurate boundaries and compact saliency.
Hence we propose a new class of LSTM network, Global Context-Aware Attention LSTM (GCA-LSTM), for 3D action recognition, which is able to selectively focus on the informative joints in the action sequence with the assistance of global contextual information.
Ranked #5 on One-Shot 3D Action Recognition on NTU RGB+D 120