Self-Attention has become prevalent in computer vision models.
1 code implementation • 17 Jun 2021 • Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan, Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen
DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision.
We name this joint task as Depth-aware Video Panoptic Segmentation, and propose a new evaluation metric along with two derived datasets for it, which will be made available to the public.
Batch normalization (BN) is a fundamental unit in modern deep networks, in which a linear transformation module was designed for improving BN's flexibility of fitting complex data distributions.
The Wide Residual Networks (Wide-ResNets), a shallow but wide model variant of the Residual Networks (ResNets) by stacking a small number of residual blocks with large channel sizes, have demonstrated outstanding performance on multiple dense prediction tasks.
Ranked #1 on Panoptic Segmentation on Cityscapes val (using extra training data)
In this paper, we explore this mechanism in the backbone design for object detection.
Ranked #8 on Panoptic Segmentation on COCO test-dev
The key to a successful cascade architecture for precise instance segmentation is to fully leverage the relationship between bounding box detection and mask segmentation across multiple stages.
To address this issue, we propose BatchChannel Normalization (BCN), which uses batch knowledge to avoid the elimination singularities in the training of channel-normalized models.
Despite deep convolutional neural networks' great success in object classification, it suffers from severe generalization performance drop under occlusion due to the inconsistency between training and testing data.
Batch Normalization (BN) has become an out-of-box technique to improve deep network training.
Ranked #38 on Instance Segmentation on COCO minival
By simply replacing standard optimizers with Neural Rejuvenation, we are able to improve the performances of neural networks by a very large margin while using similar training efforts and maintaining their original resource usages.
Computer vision is difficult, partly because the desired mathematical function connecting input and output data is often complex, fuzzy and thus hard to learn.
In this paper, we propose that the robustness of a face detector against hard faces can be improved by learning small faces on hard images.
Ranked #2 on Face Detection on PASCAL Face
We focus on the problem of training a deep neural network in generations.
Convolution is spatially-symmetric, i. e., the visual features are independent of its position in the image, which limits its ability to utilize contextual cues for visual recognition.
Our models are based on the idea of encoding objects in terms of visual concepts, which are interpretable visual cues represented by the feature vectors within CNNs.
Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module.
Our method is by introducing computation orderings to the channels within convolutional layers or blocks, based on which we gradually compute the outputs in a channel-wise manner.
In this work, we address these limitations of CNNs by developing novel, flexible, and interpretable models for few-shot learning.
In this paper, we are interested in the few-shot learning problem.
We argue that estimation of object scales in images is helpful for generating object proposals, especially for supermarket images where object scales are usually within a small range.