We introduce generative infinite-vocabulary transformers (GIVT) which generate vector sequences with real-valued entries, instead of discrete tokens from a finite vocabulary.
Ranked #13 on Image Generation on ImageNet 256x256
Therefore, we propose Latent Optimization of Hairstyles via Orthogonalization (LOHO), an optimization-based approach using GAN inversion to infill missing hair structure details in latent space during hairstyle transfer.
We present InstantMesh, a feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability.
We first decompose the video scene by using open-vocabulary mask trackers and an adapted image diffusion model to segment, track, and amodally complete the objects and background in the video.
We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size.
To investigate these aspects, we create and publish a novel TQA evaluation benchmark in English.
Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e. g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.
Image diffusion models have been utilized in various tasks, such as text-to-image generation and controllable image synthesis.
While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge.
Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs.