Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain.
Ranked #1 on
Question Answering
on PubMedQA
Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt.
By incorporating the vision features in both stages, the model is able to generate effective rationales that contribute to answer inference.
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.
Ranked #1 on
Image Retrieval
on COCO
To reproduce the success of text-to-image (T2I) generation, recent works in text-to-video (T2V) generation employ large-scale text-video dataset for fine-tuning.
We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image.
By fitting a bridge-shaped curve to the illumination map distribution, both regions are suppressed and two tasks are bridged naturally.
In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.
Ranked #1 on
Text-To-Speech Synthesis
on LJSpeech
The recent surge in popularity of diffusion models for image generation has brought new attention to the potential of these models in other areas of media generation.
Despite recent success in large language model (LLM) reasoning, LLMs struggle with hierarchical multi-step reasoning tasks like generating complex programs.
Ranked #1 on
Code Generation
on APPS