Existing methodologies for animating portrait images face significant challenges, particularly in handling non-frontal perspectives, rendering dynamic objects around the portrait, and generating immersive, realistic backgrounds.
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
In this paper, we propose a novel approach for the Generalized Video Face Restoration (GVFR) task, which integrates video BFR, inpainting, and colorization tasks that we empirically show to benefit each other.
OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web.
We present Magic Mirror, a framework for generating identity-preserved videos with cinematic-level quality and dynamic motion.
Specifically, we propose an iterative paradigm to refine each generated image, leveraging both the text prompt and all generated images from the previous iteration.
In this paper, we present an attempt at an architecture which operates on an explicit higher-level semantic representation, which we name a concept.
We introduce TangoFlux, an efficient Text-to-Audio (TTA) generative model with 515M parameters, capable of generating up to 30 seconds of 44. 1kHz audio in just 3. 7 seconds on a single A40 GPU.
Ranked #2 on Audio Generation on AudioCaps
This work presents Sa2VA, the first unified model for dense grounded understanding of both images and videos.
Automatically generating presentations from documents is a challenging task that requires balancing content quality, visual design, and structural coherence.