Kimi-VL Technical Report

moonshotai/kimi-vl 10 Apr 2025

We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2. 8B parameters in its language decoder (Kimi-VL-A3B).

Long-Context Understanding Mathematical Reasoning +3

743
0.48 stars / hour

Relative Pose Estimation through Affine Corrections of Monocular Depth Priors

markyu98/madpose 9 Jan 2025

In this paper, we develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities, covering both calibrated and uncalibrated conditions.

Monocular Depth Estimation Pose Estimation

172
0.48 stars / hour

3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

nv-tlabs/3dgrut 17 Dec 2024

3D Gaussian Splatting (3DGS) enables efficient reconstruction and high-fidelity real-time rendering of complex scenes on consumer hardware.

3DGS

641
0.47 stars / hour

VGGT: Visual Geometry Grounded Transformer

facebookresearch/vggt 14 Mar 2025

We present VGGT, a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views.

Depth Estimation Novel View Synthesis +3

5,140
0.46 stars / hour

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

xufangzhi/genius 11 Apr 2025

This motivates us to enhance LLM reasoning without the need for external supervision.

46
0.43 stars / hour

Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning

jincan333/MAS-TTS 14 Apr 2025

In this work, we introduce an adaptive multi-agent framework designed to enhance collaborative reasoning through both model-level training and system-level coordination.

Code Generation Mathematical Reasoning

23
0.40 stars / hour

LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

aigc3d/LHM 13 Mar 2025

Animatable 3D human reconstruction from a single image is a challenging problem due to the ambiguity in decoupling geometry, appearance, and deformation.

3D Human Reconstruction

1,853
0.40 stars / hour

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

getzep/graphiti 20 Jan 2025

We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark.

RAG Retrieval

4,041
0.38 stars / hour

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

mit-han-lab/nunchaku 7 Nov 2024

To address this, we co-design an inference engine Nunchaku that fuses the kernels of the low-rank branch into those of the low-bit branch to cut off redundant memory access.

Quantization

1,445
0.36 stars / hour

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

linkangheng/pr1 10 Apr 2025

In this work, we return to the fundamentals and explore the effects of RL on different perception tasks.

reinforcement-learning Reinforcement Learning +1

98
0.34 stars / hour