In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series.
Ranked #12 on
Real-Time Object Detection
on COCO
Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.
This paper deals with the problem of audio source separation.
We introduce a simple framework that operates on 3D points of single objects or whole scenes coupled with category-agnostic large-scale training from diverse RGB-D videos.
The expressive power of Graph Neural Networks (GNNs) has been studied extensively through the Weisfeiler-Leman (WL) graph isomorphism test.
Deep learning methods are particularly promising for modeling many remote sensing tasks given the success of deep neural networks in similar computer vision tasks and the sheer volume of remotely sensed imagery available.
Many algorithms also rely solely on annotator statistics, ignoring the features of the examples from which the annotations derive.
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.
Ranked #1 on
Language Modelling
on CLUE (CMRC2018)
Large-scale text-to-image diffusion models have made amazing advances.
Ranked #1 on
Text-to-Image Generation
on COCO
We present a simple, efficient, and scalable unfolding network, SAUNet, to simplify the network design with an adaptive alternate optimization framework for hyperspectral image (HSI) reconstruction.