Images Speak in Images: A Generalist Painter for In-Context Visual Learning

baaivision/painter 5 Dec 2022

In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images.

Keypoint Detection Semantic Segmentation

106
1.33 stars / hour

DAMO-YOLO : A Report on Real-Time Object Detection Design

tinyvision/damo-yolo 23 Nov 2022

In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series.

Neural Architecture Search object-detection +1

394
0.98 stars / hour

DiffusionInst: Diffusion Model for Instance Segmentation

chenhaoxing/DiffusionInst 6 Dec 2022

This paper proposes DiffusionInst, a novel framework that represents instances as instance-aware filters and formulates instance segmentation as a noise-to-filter denoising process.

Image Denoising Image Generation +3

44
0.95 stars / hour

Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model

wyhuai/ddnm 1 Dec 2022

Most existing Image Restoration (IR) models are task-specific, which can not be generalized to different degradation operators.

Colorization Deblurring +7

171
0.90 stars / hour

SimVTP: Simple Video Text Pre-training with Masked Autoencoders

mayuelala/simvtp 7 Dec 2022

This paper presents SimVTP: a Simple Video-Text Pretraining framework via masked autoencoders.

Contrastive Learning Text Matching

39
0.83 stars / hour

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

opengvlab/internvideo 6 Dec 2022

Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications.

Action Classification Action Recognition +6

41
0.63 stars / hour

Melody transcription via generative pre-training

chrisdonahue/sheetsage 4 Dec 2022

The combination of generative pre-training and a new dataset for this task results in $77$% stronger performance on melody transcription relative to the strongest available baseline.

Chord Recognition Information Retrieval +2

45
0.61 stars / hour

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

baaivision/eva 14 Nov 2022

We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.

 Ranked #1 on Self-Supervised Image Classification on ImageNet (using extra training data)

Action Classification Action Recognition +7

338
0.59 stars / hour

Robust Speech Recognition via Large-Scale Weak Supervision

openai/whisper Preprint 2022

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.

Robust Speech Recognition Spoken language identification

17,876
0.53 stars / hour
1,611
0.51 stars / hour