While image classification models have recently continued to advance, most downstream applications such as object detection and semantic segmentation still employ ResNet variants as the backbone network due to their simple and modular structure.
IMAGE CLASSIFICATION INSTANCE SEGMENTATION OBJECT DETECTION SEMANTIC SEGMENTATION
We propose a method for converting a single RGB-D input image into a 3D photo - a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view.
Model efficiency has become increasingly important in computer vision.
#2 best model for
Object Detection
on COCO test-dev
Imaging in low light is challenging due to low photon count and low SNR.
Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem.
There has been remarkable progress on object detection and re-identification in recent years which are the core components for multi-object tracking.
SOTA for Multi-Object Tracking on MOT16 (using extra training data)
To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer.
#6 best model for
Language Modelling
on enwiki8
Experiments on several challenging datasets demonstrate the superiority of GroupDNet on performing the SMIS task.
Neural-based end-to-end approaches to natural language generation (NLG) from structured data or knowledge are data-hungry, making their adoption for real-world applications difficult with limited data.