MUSt3R: Multi-view Network for Stereo 3D Reconstruction

naver/must3r CVPR 2025

DUSt3R introduced a novel paradigm in geometric computer vision by proposing a model that can provide dense and unconstrained Stereo 3D Reconstruction of arbitrary image collections with no prior information about camera calibration nor viewpoint poses.

3D Reconstruction Articles +3

135
0.68 stars / hour

Practical Efficiency of Muon for Pretraining

KellerJordan/Muon 4 May 2025

We demonstrate that Muon, the simplest instantiation of a second-order optimizer, explicitly expands the Pareto frontier over AdamW on the compute-time tradeoff.

893
0.67 stars / hour

ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering

sakanaai/ale-bench 10 Jun 2025

How well do AI systems perform in algorithm engineering for hard optimization problems in domains such as package-delivery routing, crew scheduling, factory production planning, and power-grid balancing?

Scheduling

61
0.57 stars / hour

MAGREF: Masked Guidance for Any-Reference Video Generation

magref-video/magref 29 May 2025

Video generation has made substantial strides with the emergence of deep generative models, especially diffusion-based approaches.

Video Generation

151
0.55 stars / hour

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

meigen-ai/multitalk 28 May 2025

Audio-driven human animation methods, such as talking head and talking body generation, have made remarkable progress in generating synchronized facial movements and appealing visual quality videos.

Human Animation Instruction Following +1

365
0.53 stars / hour

HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters

tencent-hunyuan/hunyuanvideo-avatar 26 May 2025

This ensures the dynamic motion and strong character consistency; (ii) An Audio Emotion Module (AEM) is introduced to extract and transfer the emotional cues from an emotion reference image to the target generated video, enabling fine-grained and accurate emotion style control; (iii) A Face-Aware Audio Adapter (FAA) is proposed to isolate the audio-driven character with latent-level face mask, enabling independent audio injection via cross-attention for multi-character scenarios.

Human Animation

1,350
0.53 stars / hour

MagCache: Fast Video Generation with Magnitude-Aware Cache

Zehong-Ma/ComfyUI-MagCache 10 Jun 2025

Existing acceleration techniques for video diffusion models often rely on uniform heuristics or time-embedding variants to skip timesteps and reuse cached features.

SSIM Video Generation

127
0.52 stars / hour

Logits-Based Finetuning

dvlab-research/logits-based-finetuning 30 May 2025

We deeply explore the main contributors of OOD detection and find that reconstruction-based pretext tasks have the potential to provide a generally applicable and efficacious prior, which benefits the model in learning intrinsic data distributions of the ID dataset.

Out of Distribution (OOD) Detection

82
0.52 stars / hour

AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

ucla-mobility/AutoVLA 16 Jun 2025

Recent advancements in Vision-Language-Action (VLA) models have shown promise for end-to-end autonomous driving by leveraging world knowledge and reasoning capabilities.

Action Generation Bench2Drive +3

61
0.50 stars / hour

Benchmarking Laparoscopic Surgical Image Restoration and Beyond

pjlallen/surgical-image-restoration 25 May 2025

To systematically investigate and address various forms of surgical scene degradation, we introduce a real-world open-source surgical image restoration dataset covering laparoscopic environments, called SurgClean, which involves multi-type image restoration tasks, e. g., desmoking, defogging, and desplashing.

Benchmarking Image Restoration

108
0.50 stars / hour