Representing Long Volumetric Video with Temporal Gaussian Hierarchy

dendenxu/fast-gaussian-rasterization 12 Dec 2024

In addition, the tree-like structure of the Gaussian hierarchy allows us to efficiently represent the scene at a particular moment with a subset of Gaussian primitives, leading to nearly constant GPU memory usage during the training or rendering regardless of the video length.

841
1.17 stars / hour

self-prompting analogical reasoning for uav object detection

lnxwow/Analogical-Reasoning Proceedings of the AAAI Conference on Artificial Intelligence 2025

While for analogical reasoningmodule, graph nodes consist of category-level prompt nodes and pixel-level image feature nodes. Analogical inference is based on graph convolution.

graph construction object-detection +2

119
1.08 stars / hour

Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents

jennyzzt/dgm 29 May 2025

The G\"odel machine proposed a theoretical alternative: a self-improving AI that repeatedly modifies itself in a provably beneficial manner.

Meta-Learning

1,188
0.86 stars / hour

EasyVolcap: Accelerating Neural Volumetric Video Research

zju3dv/easyvolcap 11 Dec 2023

Volumetric video is a technology that digitally records dynamic events such as artistic performances, sporting events, and remote conversations.

1,338
0.81 stars / hour

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

DreamTechAI/Direct3D-S2 23 May 2025

Generating high-resolution 3D shapes using volumetric representations such as Signed Distance Functions (SDFs) presents substantial computational and memory challenges.

3D Generation 3D geometry +5

712
0.79 stars / hour

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

paper2poster/paper2poster 27 May 2025

To address this challenge, we introduce the first benchmark and metric suite for poster generation, which pairs recent conference papers with author-designed posters and evaluates outputs on (i)Visual Quality-semantic alignment with human posters, (ii)Textual Coherence-language fluency, (iii)Holistic Assessment-six fine-grained aesthetic and informational criteria scored by a VLM-as-judge, and notably (iv)PaperQuiz-the poster's ability to convey core paper content as measured by VLMs answering generated quizzes.

1,919
0.68 stars / hour

R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration

zefan-cai/r-kv 30 May 2025

To address this, we propose Redundancy-aware KV Cache Compression for Reasoning models (R-KV), a novel method specifically targeting redundant tokens in reasoning models.

Mathematical Reasoning

261
0.64 stars / hour

RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination

microsoft/renderformer 28 May 2025

We present RenderFormer, a neural rendering pipeline that directly renders an image from a triangle-based representation of a scene with full global illumination effects and that does not require per-scene training or fine-tuning.

Neural Rendering

522
0.63 stars / hour

OmniAudio: Generating Spatial Audio from 360-Degree Video

liuhuadai/omniaudio 21 Apr 2025

To generate spatial audio from 360-degree video, we propose a novel framework OmniAudio, which leverages self-supervised pre-training using both spatial audio data (in FOA format) and large-scale non-spatial data.

Audio Generation

184
0.62 stars / hour

Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems

mingyin1/agents_failure_attribution 30 Apr 2025

In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems.

160
0.56 stars / hour