MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm

yuliang-liu/monkeyocr 5 Jun 2025

We introduce MonkeyOCR, a vision-language model for document parsing that advances the state of the art by leveraging a Structure-Recognition-Relation (SRR) triplet paradigm.

GPU Relation +1

4,958
3.13 stars / hour

WebDancer: Towards Autonomous Information Seeking Agency

alibaba-nlp/webagent 28 May 2025

We instantiate this framework in a web agent based on the ReAct, WebDancer.

4,432
3.11 stars / hour

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing

FunAudioLLM/ThinkSound 26 Jun 2025

While end-to-end video-to-audio generation has greatly improved, producing high-fidelity audio that authentically captures the nuances of visual content remains challenging.

Audio Generation Large Language Model +1

788
1.35 stars / hour

Do Large Language Models Need a Content Delivery Network?

lmcache/lmcache 16 Sep 2024

As the use of large language models (LLMs) expands rapidly, so does the range of knowledge needed to supplement various LLM queries.

In-Context Learning

3,193
1.22 stars / hour

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

index-tts/index-tts 8 Feb 2025

Recently, large language model (LLM) based text-to-speech (TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities. Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model.

Decoder Language Modeling +6

3,757
0.70 stars / hour

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

getzep/graphiti 20 Jan 2025

We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark.

RAG Retrieval +1

13,997
0.63 stars / hour

TradingAgents: Multi-Agents LLM Financial Trading Framework

tauricresearch/tradingagents 28 Dec 2024

Significant progress has been made in automated problem-solving using societies of agents powered by large language models (LLMs).

Management

15,605
0.58 stars / hour

Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

cocowy1/smoe-stereo 7 Jul 2025

To address this, we propose SMoEStereo, a novel framework that adapts VFMs for stereo matching through a tailored, scene-specific fusion of Low-Rank Adaptation (LoRA) and Mixture-of-Experts (MoE) modules.

Inductive Bias Mixture-of-Experts +1

122
0.56 stars / hour

Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting

bytedance/dolphin 20 May 2025

Document image parsing is challenging due to its complexly intertwined elements such as text paragraphs, figures, formulas, and tables.

4,295
0.56 stars / hour