olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models

allenai/olmocr 25 Feb 2025

PDF documents have the potential to provide trillions of novel, high-quality tokens for training language models.

Diversity Language Modeling +1

11,896
0.36 stars / hour

LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

aigc3d/LHM 13 Mar 2025

Animatable 3D human reconstruction from a single image is a challenging problem due to the ambiguity in decoupling geometry, appearance, and deformation.

3D Human Reconstruction

1,970
0.36 stars / hour

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

getzep/graphiti 20 Jan 2025

We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark.

RAG Retrieval

4,281
0.36 stars / hour

Olympus: A Universal Task Router for Computer Vision Tasks

yuanze-lin/Olympus 12 Dec 2024

We introduce Olympus, a new approach that transforms Multimodal Large Language Models (MLLMs) into a unified framework capable of handling a wide array of computer vision tasks.

369
0.34 stars / hour

3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

nv-tlabs/3dgrut 17 Dec 2024

3D Gaussian Splatting (3DGS) enables efficient reconstruction and high-fidelity real-time rendering of complex scenes on consumer hardware.

3DGS

681
0.33 stars / hour

CHORDONOMICON: A Dataset of 666,000 Songs and their Chord Progressions

spyroskantarelis/chordonomicon 29 Oct 2024

Chord progressions encapsulate important information about music, pertaining to its structure and conveyed emotions.

71
0.21 stars / hour

What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph

jytmelon/g-prune 4 Jan 2025

Recent Multimodal Large Language Models(MLLMs) often use a large number of visual tokens to compensate their visual shortcoming, leading to excessive computation and obvious visual redundancy.

TextVQA

62
0.31 stars / hour

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

simular-ai/agent-s 1 Apr 2025

Computer use agents automate digital tasks by directly interacting with graphical user interfaces (GUIs) on computers and mobile devices, offering significant potential to enhance human productivity by completing an open-ended space of user queries.

AI Agent Task Planning

2,429
0.32 stars / hour

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

luo-junyu/awesome-agent-papers 27 Mar 2025

The era of intelligent agents is upon us, driven by revolutionary advancements in large language models.

Language Modeling Language Modelling +1

483
0.30 stars / hour

Sleep-time Compute: Beyond Inference Scaling at Test-time

letta-ai/sleep-time-compute 17 Apr 2025

Scaling test-time compute has emerged as a key ingredient for enabling large language models (LLMs) to solve difficult problems, but comes with high latency and inference cost.

49
0.30 stars / hour