When fine-tuning on downstream tasks, a modality-specific adapter is used to introduce the data and tasks' prior information into the model, making it suitable for these tasks.
Ranked #1 on Semantic Segmentation on ADE20K val
The sensor data detection model based on Gaussian and Bayesian algorithms can detect the anomalous sensor data in real-time and upload them to the cloud for further analysis, filtering the normal sensor data and reducing traffic load.
In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries.
Ranked #74 on 3D Object Detection on nuScenes
Although convolutional neural networks (CNNs) have achieved remarkable progress in weakly supervised semantic segmentation (WSSS), the effective receptive field of CNN is insufficient to capture global context information, leading to sub-optimal results.
Deep learning-based models encounter challenges when processing long-tailed data in the real world.
Ranked #1 on Long-tail Learning on ImageNet-LT (using extra training data)
We propose an accurate and efficient scene text detection framework, termed FAST (i. e., faster arbitrarily-shaped text detector).
Ranked #2 on Scene Text Detection on SCUT-CTW1500
Recent approaches for end-to-end text spotting have achieved promising results.
Specifically, we supervise the attention modules in the mask decoder in a layer-wise manner.
Ranked #2 on Panoptic Segmentation on COCO test-dev
The results presented in this study demonstrate the importance of multi-head models and attention mechanisms to an improved understanding of the remaining useful life of industrial assets.
Most polyp segmentation methods use CNNs as their backbone, leading to two key issues when exchanging information between the encoder and decoder: 1) taking into account the differences in contribution between different-level features; and 2) designing an effective mechanism for fusing these features.
Ranked #2 on Medical Image Segmentation on CVC-ColonDB
We hope this work will facilitate state-of-the-art Transformer researches in computer vision.
Ranked #39 on Object Detection on COCO minival
We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders.
Ranked #1 on Semantic Segmentation on COCO-Stuff full
Extensive experiments demonstrate the effectiveness of both PolarMask and PolarMask++, which achieve competitive results on instance segmentation in the challenging COCO dataset with single-model and single-scale training and testing, as well as new state-of-the-art results on rotate text detection and cell segmentation.
Ranked #47 on Instance Segmentation on COCO test-dev (using extra training data)
By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.
StarCraft II (SC2) is a real-time strategy game in which players produce and control multiple units to fight against opponent's units.
(1) We divide input image into small patches and adopt TIN, successfully transferring image style with arbitrary high-resolution.
Unlike the recently-proposed Transformer model (e. g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks.
Ranked #20 on Semantic Segmentation on DensePASS
Unlike most recent methods that focused on improving accuracy of image classification, we present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches to learn discriminative representations for object detection.
This work presents a new fine-grained transparent object segmentation dataset, termed Trans10K-v2, extending Trans10K-v1, the first large-scale transparent object segmentation dataset.
Ranked #3 on Semantic Segmentation on Trans10K
Although a polygon is a more accurate representation than an upright bounding box for text detection, the annotations of polygons are extremely expensive and challenging.
Such a property makes the distribution statistics of a bounding box highly correlated to its real localization quality.
Ranked #37 on Object Detection on COCO test-dev
Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.
The modules of HGG can be trained end-to-end with the keypoint detection network and is able to supervise the grouping process in a hierarchical manner.
Ranked #3 on 2D Human Pose Estimation on OCHuman
Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations.
Ranked #76 on Object Detection on COCO test-dev
To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10, 428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.
Ranked #4 on Semantic Segmentation on Trans10K
In this paper, we consider a hierarchical control based DC microgrid (DCmG) equipped with unknown input observer (UIO) based detectors, where the potential false data injection (FDI) attacks and the distributed countermeasure are investigated.
In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used as a mask prediction module for instance segmentation, by easily embedding it into most off-the-shelf detection methods.
Ranked #61 on Instance Segmentation on COCO test-dev
Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images.
Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications. In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing.
Ranked #1 on Scene Text Detection on MSRA-TD500
Due to the fact that there are large geometrical margins among the minimal scale kernels, our method is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances.
Ranked #9 on Scene Text Detection on SCUT-CTW1500
A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches.
Ranked #82 on Image Classification on CIFAR-100
To address these problems, we propose a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance.
Ranked #10 on Scene Text Detection on SCUT-CTW1500
Basing on the analysis by revealing the equivalence of modern networks, we find that both ResNet and DenseNet are essentially derived from the same "dense topology", yet they only differ in the form of connection -- addition (dubbed "inner link") vs. concatenation (dubbed "outer link").