Learning Video Representations from Large Language Models

facebookresearch/lavila 8 Dec 2022

We introduce LaViLa, a new approach to learning video-language representations by leveraging Large Language Models (LLMs).

Multi-Instance Retrieval Retrieval

72
1.29 stars / hour

Images Speak in Images: A Generalist Painter for In-Context Visual Learning

baaivision/painter 5 Dec 2022

In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images.

Keypoint Detection Semantic Segmentation

115
1.06 stars / hour

DiffusionInst: Diffusion Model for Instance Segmentation

chenhaoxing/DiffusionInst 6 Dec 2022

This paper proposes DiffusionInst, a novel framework that represents instances as instance-aware filters and formulates instance segmentation as a noise-to-filter denoising process.

Image Denoising Image Generation +3

66
0.96 stars / hour

DAMO-YOLO : A Report on Real-Time Object Detection Design

tinyvision/damo-yolo 23 Nov 2022

In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series.

Neural Architecture Search object-detection +1

417
0.96 stars / hour

SimVTP: Simple Video Text Pre-training with Masked Autoencoders

mayuelala/simvtp 7 Dec 2022

This paper presents SimVTP: a Simple Video-Text Pretraining framework via masked autoencoders.

Contrastive Learning Text Matching

48
0.86 stars / hour

UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation

amshaker/unetr_plus_plus 8 Dec 2022

Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks.

Image Segmentation Medical Image Segmentation +1

20
0.67 stars / hour

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

baaivision/eva 14 Nov 2022

We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.

Action Classification Action Recognition +7

353
0.61 stars / hour

Robust Speech Recognition via Large-Scale Weak Supervision

openai/whisper Preprint 2022

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.

Robust Speech Recognition speech-recognition

18,047
0.60 stars / hour

Melody transcription via generative pre-training

chrisdonahue/sheetsage 4 Dec 2022

The combination of generative pre-training and a new dataset for this task results in $77$% stronger performance on melody transcription relative to the strongest available baseline.

Chord Recognition Information Retrieval +2

56
0.59 stars / hour

Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model

wyhuai/ddnm 1 Dec 2022

Most existing Image Restoration (IR) models are task-specific, which can not be generalized to different degradation operators.

Colorization Deblurring +7

182
0.57 stars / hour