Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain.
Ranked #1 on
Question Answering
on PubMedQA
By incorporating the vision features in both stages, the model is able to generate effective rationales that contribute to answer inference.
In this paper, we present TEXTure, a novel method for text-guided generation, editing, and transfer of textures for 3D shapes.
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.
Ranked #1 on
Image Retrieval
on COCO
Vizier is the de-facto blackbox and hyperparameter optimization service across Google, having optimized some of Google's largest products and research efforts.
In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.
Ranked #1 on
Text-To-Speech Synthesis
on LJSpeech
By fitting a bridge-shaped curve to the illumination map distribution, both regions are suppressed and two tasks are bridged naturally.
Furthermore, we propose a latent-mapping algorithm in the latent space to convert the amateur vocal tone to the professional one.
We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image.
In this paper, we demonstrate that a learned discrete codebook prior in a small proxy space largely reduces the uncertainty and ambiguity of restoration mapping by casting blind face restoration as a code prediction task, while providing rich visual atoms for generating high-quality faces.
Ranked #1 on
Blind Face Restoration
on WIDER