Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

paper2poster/paper2poster 27 May 2025

To address this challenge, we introduce the first benchmark and metric suite for poster generation, which pairs recent conference papers with author-designed posters and evaluates outputs on (i)Visual Quality-semantic alignment with human posters, (ii)Textual Coherence-language fluency, (iii)Holistic Assessment-six fine-grained aesthetic and informational criteria scored by a VLM-as-judge, and notably (iv)PaperQuiz-the poster's ability to convey core paper content as measured by VLMs answering generated quizzes.

2,001
0.59 stars / hour

HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters

tencent-hunyuan/hunyuanvideo-avatar 26 May 2025

This ensures the dynamic motion and strong character consistency; (ii) An Audio Emotion Module (AEM) is introduced to extract and transfer the emotional cues from an emotion reference image to the target generated video, enabling fine-grained and accurate emotion style control; (iii) A Face-Aware Audio Adapter (FAA) is proposed to isolate the audio-driven character with latent-level face mask, enabling independent audio injection via cross-attention for multi-character scenarios.

Human Animation

1,186
0.57 stars / hour

QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos

nvlabs/queen 5 Dec 2024

Online free-viewpoint video (FVV) streaming is a challenging problem, which is relatively under-explored.

Attribute Quantization

88
0.54 stars / hour

RWKV-7 "Goose" with Expressive Dynamic State Evolution

fla-org/flash-linear-attention 18 Mar 2025

We present RWKV-7 "Goose", a new sequence modeling architecture with constant memory usage and constant inference time per token.

In-Context Learning Language Modeling +1

2,671
0.53 stars / hour

Representing Long Volumetric Video with Temporal Gaussian Hierarchy

dendenxu/fast-gaussian-rasterization 12 Dec 2024

In addition, the tree-like structure of the Gaussian hierarchy allows us to efficiently represent the scene at a particular moment with a subset of Gaussian primitives, leading to nearly constant GPU memory usage during the training or rendering regardless of the video length.

863
0.51 stars / hour

OmniAudio: Generating Spatial Audio from 360-Degree Video

liuhuadai/omniaudio 21 Apr 2025

To generate spatial audio from 360-degree video, we propose a novel framework OmniAudio, which leverages self-supervised pre-training using both spatial audio data (in FOA format) and large-scale non-spatial data.

Audio Generation

222
0.50 stars / hour

Advanced long-term earth system forecasting by learning the small-scale nature

easylearningscores/triton_ai4earth 26 May 2025

Reliable long-term forecast of Earth system dynamics is heavily hampered by instabilities in current AI models during extended autoregressive simulations.

209
0.47 stars / hour

SEW: Self-Evolving Agentic Workflows for Automated Code Generation

evoagentx/evoagentx 24 May 2025

Large Language Models (LLMs) have demonstrated effectiveness in code generation tasks.

Code Generation

823
0.47 stars / hour

VACE: All-in-One Video Creation and Editing

ali-vilab/vace 10 Mar 2025

Further pursuing the unification of generation and editing tasks has yielded significant progress in the domain of image content creation.

All Video Editing +1

2,544
0.45 stars / hour

Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems

mingyin1/agents_failure_attribution 30 Apr 2025

In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems.

176
0.45 stars / hour