EasyVolcap: Accelerating Neural Volumetric Video Research

zju3dv/easyvolcap 11 Dec 2023

Volumetric video is a technology that digitally records dynamic events such as artistic performances, sporting events, and remote conversations.

1,349
0.44 stars / hour

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

index-tts/index-tts 8 Feb 2025

Recently, large language model (LLM) based text-to-speech (TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities. Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model.

Decoder Language Modeling +6

2,589
0.44 stars / hour

Training Long-Context LLMs Efficiently via Chunk-wise Optimization

wenhaoli-xmu/seco 22 May 2025

While long-context large language models (LLMs) exhibit remarkable document processing capabilities, their prohibitively high training costs often hinder customized applications.

16k

136
0.43 stars / hour

MiniCPM4: Ultra-Efficient LLMs on End Devices

openbmb/minicpm 9 Jun 2025

Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing.

Large Language Model

7,892
0.43 stars / hour

DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products

sustcsonglin/flash-linear-attention 14 Feb 2025

Linear Recurrent Neural Networks (linear RNNs) have emerged as competitive alternatives to Transformers for sequence modeling, offering efficient training and linear-time inference.

Language Modeling Language Modelling +1

2,650
0.41 stars / hour

MASKSEARCH: A Universal Pre-Training Framework to Enhance Agentic Search Capability

alibaba-nlp/masksearch 26 May 2025

In the pre-training stage, we introduce the Retrieval Augmented Mask Prediction (RAMP) task, where the model learns to leverage search tools to fill masked spans on a large number of pre-training data, thus acquiring universal retrieval and reasoning capabilities for LLMs.

Multi-hop Question Answering Question Answering +2

107
0.40 stars / hour

Digital Player: Evaluating Large Language Models based Human-like Agent in Games

fuxiailab/civagent 28 Feb 2025

With the rapid advancement of Large Language Models (LLMs), LLM-based autonomous agents have shown the potential to function as digital employees, such as digital analysts, teachers, and programmers.

Decision Making

98
0.40 stars / hour

Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting

bigcileng/bilateral-driving 5 Jun 2025

In this paper, we propose a novel multi-scale bilateral grid that unifies appearance codes and bilateral grids.

Autonomous Driving NeRF +1

91
0.39 stars / hour

AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment

codelion/openevolve 30 Mar 2021

In this paper, we introduce a new class of alphas to model scalar, vector, and matrix features which possess the strengths of these two existing classes.

AutoML Stock Prediction

2,401
0.39 stars / hour

VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning

alibaba-nlp/vrag 28 May 2025

As RL has been proven to be beneficial for model reasoning, we introduce VRAG-RL, a novel RL framework tailored for complex reasoning across visually rich information.

RAG

216
0.39 stars / hour