Text-to-image synthesis has recently seen significant progress thanks to large pretrained language models, large-scale training data, and the introduction of scalable model families such as diffusion and autoregressive models.
Ranked #9 on Text-to-Image Generation on COCO
Most work on reward learning has used simulated environments, but complex information about values is often expressed in natural language, and we believe reward learning for language is a key to making RL practical and safe for real-world tasks.
This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time.
Ranked #38 on Language Modelling on enwik8
We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image.
Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM).
Furthermore, we propose a latent-mapping algorithm in the latent space to convert the amateur vocal tone to the professional one.
We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions.
Ranked #1 on Novel View Synthesis on LLFF
In the first stage, we perform self-supervised representation learning on unlabeled points with the proposed Viewpoint Bottleneck loss function.