YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS.
A reverse dictionary takes descriptions of words as input and outputs words semantically matching the input descriptions.
While only the semantics of each task differ, current research focuses on designing specialized architectures for each task.
Ranked #1 on Panoptic Segmentation on COCO minival
Blind face restoration usually relies on facial priors, such as facial geometry prior or reference prior, to restore realistic and faithful details.
Ranked #1 on Blind Face Restoration on CelebA-Test
In order to modify style, we obtain a similarity score between a text prompt (describing style) and a stylized mesh by harnessing the representational power of CLIP.
Ranked #1 on Neural Stylization on Meshes
To further improve the performance of the proposed method, we propose a skeleton-based search space to reduce false positive detection.
We perform a subjective and objective evaluation to compare the performance of each vocoder along a different axis.
Despite the initial belief that Convolutional Neural Networks (CNNs) are driven by shapes to perform visual recognition tasks, recent evidence suggests that texture bias in CNNs provides higher performing models when learning on large labeled training datasets.
Ranked #2 on Few-Shot Semantic Segmentation on FSS-1000
We approach text-to-image generation by combining the power of the retrained CLIP representation with an off-the-shelf image generator (GANs), optimizing in the latent space of GAN to find images that achieve maximum CLIP score with the given input text.
Ranked #1 on Zero-Shot Text-to-Image Generation on COCO