Images Speak in Images: A Generalist Painter for In-Context Visual Learning

baaivision/painter 5 Dec 2022

In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images.

Keypoint Detection Semantic Segmentation

111
1.33 stars / hour

DAMO-YOLO : A Report on Real-Time Object Detection Design

tinyvision/damo-yolo 23 Nov 2022

In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series.

Neural Architecture Search object-detection +1

412
0.98 stars / hour

DiffusionInst: Diffusion Model for Instance Segmentation

chenhaoxing/DiffusionInst 6 Dec 2022

This paper proposes DiffusionInst, a novel framework that represents instances as instance-aware filters and formulates instance segmentation as a noise-to-filter denoising process.

Image Denoising Image Generation +3

64
0.95 stars / hour

Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model

wyhuai/ddnm 1 Dec 2022

Most existing Image Restoration (IR) models are task-specific, which can not be generalized to different degradation operators.

Colorization Deblurring +7

176
0.90 stars / hour

SimVTP: Simple Video Text Pre-training with Masked Autoencoders

mayuelala/simvtp 7 Dec 2022

This paper presents SimVTP: a Simple Video-Text Pretraining framework via masked autoencoders.

Contrastive Learning Text Matching

48
0.83 stars / hour

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

opengvlab/internvideo 6 Dec 2022

Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications.

Action Classification Action Recognition +6

44
0.63 stars / hour

Melody transcription via generative pre-training

chrisdonahue/sheetsage 4 Dec 2022

The combination of generative pre-training and a new dataset for this task results in $77$% stronger performance on melody transcription relative to the strongest available baseline.

Chord Recognition Information Retrieval +2

55
0.61 stars / hour

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

baaivision/eva 14 Nov 2022

We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.

Action Classification Action Recognition +7

346
0.59 stars / hour
1,633
0.51 stars / hour

ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT

extreme-bert/extreme-bert 30 Nov 2022

In this paper, we present ExtremeBERT, a toolkit for accelerating and customizing BERT pretraining.

Molecular System Prediction Sentence Classification

238
0.51 stars / hour