AnyTop: Character Animation Diffusion with Any Topology

anytop2025/anytop 24 Feb 2025

Generating motion for arbitrary skeletons is a longstanding challenge in computer graphics, remaining largely unexplored due to the scarcity of diverse datasets and the irregular nature of the data.

Denoising

199
0.28 stars / hour

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

aidc-ai/awesome-unified-multimodal-models 5 May 2025

Despite their respective successes, these two domains have evolved independently, leading to distinct architectural paradigms: While autoregressive-based architectures have dominated multimodal understanding, diffusion-based models have become the cornerstone of image generation.

Survey Text-to-Image Generation

200
0.28 stars / hour

TabularARGN: A Flexible and Efficient Auto-Regressive Framework for Generating High-Fidelity Synthetic Data

mostly-ai/mostlyai arXiv:2501.12012v1 2025

Synthetic data generation for tabular datasets must balance fidelity, efficiency, and versatility to meet the demands of real-world applications.

Fairness Imputation +2

521
0.27 stars / hour

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

inclusionai/ming 5 May 2025

We introduce Ming-Lite-Uni, an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language.

multimodal interaction Text-to-Image Generation

95
0.27 stars / hour

Superposition Yields Robust Neural Scaling

liuyz0/superpositionscaling 15 May 2025

We found that when superposition is weak, meaning only the most frequent features are represented without interference, the scaling of loss with model size depends on the underlying feature frequency; if feature frequencies follow a power law, so does the loss.

14
0.26 stars / hour

OmniAudio: Generating Spatial Audio from 360-Degree Video

liuhuadai/omniaudio 21 Apr 2025

To generate spatial audio from 360-degree video, we propose a novel framework OmniAudio, which leverages self-supervised pre-training using both spatial audio data (in FOA format) and large-scale non-spatial data.

Audio Generation

39
0.26 stars / hour

Multi-Camera Hand-Eye Calibration for Human-Robot Collaboration in Industrial Robotic Workcells

davidea97/multi-camera-hand-eye-calibration 17 Jun 2024

In industrial scenarios, effective human-robot collaboration relies on multi-camera systems to robustly monitor human operators despite the occlusions that typically show up in a robotic workcell.

17
0.24 stars / hour

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

hitsz-tmg/awesome-large-multimodal-reasoning-models 8 May 2025

Large Multimodal Reasoning Models (LMRMs) have emerged as a promising paradigm, integrating modalities such as text, images, audio, and video to support complex reasoning capabilities and aiming to achieve comprehensive perception, precise understanding, and deep reasoning.

Multimodal Reasoning

279
0.24 stars / hour

Generative AI for Autonomous Driving: Frontiers and Opportunities

taco-group/genai4ad 13 May 2025

This survey delivers a comprehensive and critical synthesis of the emerging role of GenAI across the autonomous driving stack.

Autonomous Driving Video Generation

38
0.24 stars / hour

VGGT: Visual Geometry Grounded Transformer

facebookresearch/vggt 14 Mar 2025

We present VGGT, a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views.

Depth Estimation Novel View Synthesis +3

6,768
0.23 stars / hour