Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks

fudan-generative-vision/hallo3 1 Dec 2024

Existing methodologies for animating portrait images face significant challenges, particularly in handling non-frontal perspectives, rendering dynamic objects around the portrait, and generating immersive, realistic backgrounds.

Image Animation Portrait Animation

404
1.28 stars / hour

DeepSeek-V3 Technical Report

deepseek-ai/deepseek-v3 27 Dec 2024

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

Language Modeling Language Modelling

18,728
0.98 stars / hour

SVFR: A Unified Framework for Generalized Video Face Restoration

wangzhiyaoo/svfr 2 Jan 2025

In this paper, we propose a novel approach for the Generalized Video Face Restoration (GVFR) task, which integrates video BFR, inpainting, and colorization tasks that we empirically show to benefit each other.

Colorization Representation Learning

422
1.42 stars / hour

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

all-hands-ai/openhands 23 Jul 2024

OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web.

43,247
0.94 stars / hour

Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers

dvlab-research/magicmirror 7 Jan 2025

We present Magic Mirror, a framework for generating identity-preserved videos with cinematic-level quality and dynamic motion.

Diversity Text-to-Video Generation +1

81
0.57 stars / hour

Story-Adapter: A Training-free Iterative Framework for Long Story Visualization

jwmao1/story-adapter 8 Oct 2024

Specifically, we propose an iterative paradigm to refine each generated image, leveraging both the text prompt and all generated images from the previous iteration.

Image Generation Story Visualization

632
0.79 stars / hour

Large Concept Models: Language Modeling in a Sentence Representation Space

facebookresearch/large_concept_model 11 Dec 2024

In this paper, we present an attempt at an architecture which operates on an explicit higher-level semantic representation, which we name a concept.

Language Modeling Language Modelling +4

1,668
0.73 stars / hour

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

declare-lab/TangoFlux 30 Dec 2024

We introduce TangoFlux, an efficient Text-to-Audio (TTA) generative model with 515M parameters, capable of generating up to 30 seconds of 44. 1kHz audio in just 3. 7 seconds on a single A40 GPU.

Audio Generation

528
0.60 stars / hour

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

magic-research/Sa2VA 7 Jan 2025

This work presents Sa2VA, the first unified model for dense grounded understanding of both images and videos.

2k Language Modeling +7

444
1.61 stars / hour

PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides

icip-cas/pptagent 7 Jan 2025

Automatically generating presentations from documents is a challenging task that requires balancing content quality, visual design, and structural coherence.

99
0.69 stars / hour