MinerU: An Open-Source Solution for Precise Document Content Extraction

opendatalab/mineru 27 Sep 2024

Document content analysis has been a crucial research area in computer vision.

Diversity Optical Character Recognition (OCR)

31,125
0.38 stars / hour

Video-R1: Reinforcing Video Reasoning in MLLMs

tulerfeng/video-r1 27 Mar 2025

However, directly applying RL training with the GRPO algorithm to video reasoning presents two primary challenges: (i) a lack of temporal modeling for video reasoning, and (ii) the scarcity of high-quality video-reasoning data.

MVBench Reinforcement Learning (RL) +1

439
0.38 stars / hour

TextArena

leonguertler/textarena 15 Apr 2025

TextArena is an open-source collection of competitive text-based games for training and evaluation of agentic behavior in Large Language Models (LLMs).

text-based games

126
0.37 stars / hour

DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments

gair-nlp/deepresearcher 4 Apr 2025

In this paper, we introduce DeepResearcher, the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions.

Navigate Prompt Engineering +2

234
0.37 stars / hour

DDT: Decoupled Diffusion Transformer

MCG-NJU/DDT 8 Apr 2025

For ImageNet $256\times256$, Our DDT-XL/2 achieves a new state-of-the-art performance of {1. 31 FID}~(nearly $4\times$ faster training convergence compared to previous diffusion transformers).

Denoising

191
0.36 stars / hour

MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data

PaulBorneP/MESA 9 Apr 2025

Terrain modeling has traditionally relied on procedural techniques, which often require extensive domain expertise and handcrafted rules.

13
0.36 stars / hour

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

sparkaudio/spark-tts 3 Mar 2025

Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis.

Attribute Text to Speech +1

8,552
0.36 stars / hour

Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory

suzgunmirac/dynamic-cheatsheet 10 Apr 2025

Overall, our findings present DC as a promising approach for augmenting LMs with persistent memory, bridging the divide between isolated inference events and the cumulative, experience-driven learning characteristic of human cognition.

Math MMLU

46
0.35 stars / hour

TikZero: Zero-Shot Text-Guided Graphics Program Synthesis

potamides/detikzify 14 Mar 2025

Meanwhile, large amounts of unaligned graphics programs and captioned raster images are more readily available.

Program Synthesis

969
0.35 stars / hour