Robust Speech Recognition via Large-Scale Weak Supervision

openai/whisper Preprint 2022

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.

Robust Speech Recognition speech-recognition

VToonify: Controllable High-Resolution Portrait Video Style Transfer

williamyang1991/vtoonify 22 Sep 2022

Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency.

Face Alignment Style Transfer +1

Plenoxels: Radiance Fields without Neural Networks

kakaobrain/NeRF-Factory CVPR 2022

We introduce Plenoxels (plenoptic voxels), a system for photorealistic view synthesis.

LAVIS: A Library for Language-Vision Intelligence

salesforce/lavis 15 Sep 2022

We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.

Image Captioning Image Retrieval +6

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

IDEA-Research/detrex 7 Mar 2022

Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results.

 Ranked #1 on Object Detection on COCO minival (using extra training data)

object-detection Real-Time Object Detection

Poisson Flow Generative Models

newbeeer/poisson_flow 22 Sep 2022

We interpret the data points as electrical charges on the $z=0$ hyperplane in a space augmented with an additional dimension $z$, generating a high-dimensional electric field (the gradient of the solution to Poisson equation).

Image Generation

Text2Light: Zero-Shot Text-Driven HDR Panorama Generation

frozenburning/text2light 20 Sep 2022

To achieve super-resolution inverse tone mapping, we derive a continuous representation of 360-degree imaging from the LDR panorama as a set of structured latent codes anchored to the sphere.

inverse tone mapping Inverse-Tone-Mapping +2

SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation

visual-attention-network/segnext 18 Sep 2022

Notably, SegNeXt outperforms EfficientNet-L2 w/ NAS-FPN and achieves 90. 6% mIoU on the Pascal VOC 2012 test leaderboard using only 1/10 parameters of it.

Semantic Segmentation

Deep Learning for Medical Image Segmentation: Tricks, Challenges and Future Directions

hust-linyi/seg_trick 21 Sep 2022

Over the past few years, the rapid development of deep learning technologies for computer vision has greatly promoted the performance of medical image segmentation (MedISeg).

Data Augmentation Domain Adaptation +3

USB: A Unified Semi-supervised Learning Benchmark

microsoft/semi-supervised-learning 12 Aug 2022

Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples.

