Grounded Language-Image Pre-training

microsoft/GLIP 7 Dec 2021

The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich.

 Phrase Grounding on Flickr30k Entities Test

2D Object Detection Phrase Grounding

WantWords: An Open-source Online Reverse Dictionary System

thunlp/WantWords EMNLP 2020

A reverse dictionary takes descriptions of words as input and outputs words semantically matching the input descriptions.

Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

rezazad68/deep-intervertebral-disc-labeling 14 Aug 2021

To further improve the performance of the proposed method, we propose a skeleton-based search space to reduce false positive detection.

Pose Estimation Semantic Segmentation

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

edresson/yourtts 4 Dec 2021

YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS.

Speech Synthesis Voice Conversion

Towards Real-World Blind Face Restoration with Generative Facial Prior


Blind face restoration usually relies on facial priors, such as facial geometry prior or reference prior, to restore realistic and faithful details.

Blind Face Restoration GAN inversion

Masked-attention Mask Transformer for Universal Image Segmentation

facebookresearch/Mask2Former 2 Dec 2021

While only the semantics of each task differ, current research focuses on designing specialized architectures for each task.

Instance Segmentation Panoptic Segmentation

Text2Mesh: Text-Driven Neural Stylization for Meshes

threedle/text2mesh 6 Dec 2021

In order to modify style, we obtain a similarity score between a text prompt (describing style) and a stylized mesh by harnessing the representational power of CLIP.

Neural Stylization

VocBench: A Neural Vocoder Benchmark for Speech Synthesis

facebookresearch/vocoder-benchmark 6 Dec 2021

We perform a subjective and objective evaluation to compare the performance of each vocoder along a different axis.

Speech Synthesis

Steerable discovery of neural audio effects

csteinmetz1/steerable-nafx 6 Dec 2021

Applications of deep learning for audio effects often focus on modeling analog effects or learning to control effects to emulate a trained audio engineer.

FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization

gnobitab/fusedream 2 Dec 2021

We approach text-to-image generation by combining the power of the retrained CLIP representation with an off-the-shelf image generator (GANs), optimizing in the latent space of GAN to find images that achieve maximum CLIP score with the given input text.

Zero-Shot Text-to-Image Generation

