In an effort to further advance semi-supervised generative and classification tasks, we propose a simple yet effective training strategy called dual pseudo training (DPT), built upon strong semi-supervised learners and diffusion models.
In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques.
This exploration holds the potential to unveil new insights and challenges for low-bit quantization of LLaMA3 and other forthcoming LLMs, especially in addressing performance degradation problems that suffer in LLM compression.
This work presents BAdam, an optimizer that leverages the block coordinate optimization framework with Adam as the inner solver.
Image inpainting, the process of restoring corrupted images, has seen significant advancements with the advent of diffusion models (DMs).
Hierarchically gated linear RNN (HGRN, Qin et al. 2023) has demonstrated competitive training speed and performance in language modeling, while offering efficient inference.
In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation.
Ranked #2 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)
In this work, we propose CharacterFactory, a framework that allows sampling new characters with consistent identities in the latent space of GANs for diffusion models.
We present BenchmarkName, a novel benchmark to quantify LLM security risks and capabilities.
Current scene reconstruction techniques frequently result in the loss of object detail textures and are unable to reconstruct object portions that are occluded or unseen in views.