Robust Speech Recognition via Large-Scale Weak Supervision

openai/whisper Preprint 2022

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.

Robust Speech Recognition speech-recognition

9,303
9.04 stars / hour

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

XavierXiao/Dreambooth-Stable-Diffusion 25 Aug 2022

Once the subject is embedded in the output domain of the model, the unique identifier can then be used to synthesize fully-novel photorealistic images of the subject contextualized in different scenes.

Image Generation

1,052
2.38 stars / hour

VToonify: Controllable High-Resolution Portrait Video Style Transfer

williamyang1991/vtoonify 22 Sep 2022

Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency.

Face Alignment Style Transfer +1

428
2.21 stars / hour

LAVIS: A Library for Language-Vision Intelligence

salesforce/lavis 15 Sep 2022

We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.

Image Captioning Image Retrieval +6

622
1.84 stars / hour

Plenoxels: Radiance Fields without Neural Networks

kakaobrain/NeRF-Factory CVPR 2022

We introduce Plenoxels (plenoptic voxels), a system for photorealistic view synthesis.

589
1.68 stars / hour

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

IDEA-Research/detrex 7 Mar 2022

Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results.

 Ranked #1 on Object Detection on COCO minival (using extra training data)

object-detection Real-Time Object Detection

283
1.54 stars / hour

Poisson Flow Generative Models

newbeeer/poisson_flow 22 Sep 2022

We interpret the data points as electrical charges on the $z=0$ hyperplane in a space augmented with an additional dimension $z$, generating a high-dimensional electric field (the gradient of the solution to Poisson equation).

Image Generation

129
1.26 stars / hour

Text2Light: Zero-Shot Text-Driven HDR Panorama Generation

frozenburning/text2light 20 Sep 2022

To achieve super-resolution inverse tone mapping, we derive a continuous representation of 360-degree imaging from the LDR panorama as a set of structured latent codes anchored to the sphere.

inverse tone mapping Inverse-Tone-Mapping +2

200
1.02 stars / hour

towhee

towhee-io/towhee 22 Oct 2020

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Audio Fingerprint Contrastive Learning +1

1,364
0.91 stars / hour

SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation

visual-attention-network/segnext 18 Sep 2022

Notably, SegNeXt outperforms EfficientNet-L2 w/ NAS-FPN and achieves 90. 6% mIoU on the Pascal VOC 2012 test leaderboard using only 1/10 parameters of it.

Semantic Segmentation

322
0.86 stars / hour