This is the first use of sparse convolution for 2D masked modeling.
Ranked #1 on
Instance Segmentation
on COCO 2017 val
Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.
In this paper, we demonstrate that a learned discrete codebook prior in a small proxy space largely reduces the uncertainty and ambiguity of restoration mapping by casting blind face restoration as a code prediction task, while providing rich visual atoms for generating high-quality faces.
Ranked #1 on
Blind Face Restoration
on WIDER
In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series.
Ranked #12 on
Real-Time Object Detection
on COCO
This paper deals with the problem of audio source separation.
Deep learning methods are particularly promising for modeling many remote sensing tasks given the success of deep neural networks in similar computer vision tasks and the sheer volume of remotely sensed imagery available.
We propose WOFT -- a novel method for planar object tracking that estimates a full 8 degrees-of-freedom pose, i. e. the homography w. r. t.
The expressive power of Graph Neural Networks (GNNs) has been studied extensively through the Weisfeiler-Leman (WL) graph isomorphism test.
Many algorithms also rely solely on annotator statistics, ignoring the features of the examples from which the annotations derive.
Large-scale text-to-image diffusion models have made amazing advances.
Ranked #1 on
Text-to-Image Generation
on COCO