Olympus: A Universal Task Router for Computer Vision Tasks

yuanze-lin/Olympus 12 Dec 2024

We introduce Olympus, a new approach that transforms Multimodal Large Language Models (MLLMs) into a unified framework capable of handling a wide array of computer vision tasks.

281
0.42 stars / hour

RARE: Retrieval-Augmented Reasoning Modeling

open-dataflow/rare 30 Mar 2025

Domain-specific intelligence demands specialized knowledge and sophisticated reasoning for problem-solving, posing significant challenges for large language models (LLMs) that struggle with knowledge hallucination and inadequate reasoning capabilities under constrained parameter budgets.

Hallucination Memorization +1

71
0.40 stars / hour

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

sparkaudio/spark-tts 3 Mar 2025

Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis.

Attribute Text to Speech +1

8,493
0.37 stars / hour

TrackOcc: Camera-based 4D Panoptic Occupancy Tracking

tsinghua-mars-lab/trackocc 11 Mar 2025

In this work, we introduce a brand-new task, Camera-based 4D Panoptic Occupancy Tracking, which simultaneously addresses panoptic occupancy segmentation and object tracking from camera-only input.

3D Object Tracking Object Tracking +1

23
0.33 stars / hour

Efficient Memory Management for Large Language Model Serving with PagedAttention

vllm-project/vllm 12 Sep 2023

On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage.

Language Modeling Language Modelling +2

45,042
0.32 stars / hour

Optimal Stepsize for Diffusion Sampling

bebebe666/optimalsteps 27 Mar 2025

Diffusion models achieve remarkable generation quality but suffer from computational intensive sampling due to suboptimal step discretization.

Denoising Text-to-Image Generation

131
0.32 stars / hour

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

bytedance/infiniteyou 20 Mar 2025

Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX.

Image Generation

1,970
0.31 stars / hour

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

multi-swe-bench/multi-swe-bench 3 Apr 2025

The task of issue resolving is to modify a codebase to generate a patch that addresses a given issue.

Reinforcement Learning (RL)

108
0.31 stars / hour

MegaMath: Pushing the Limits of Open Math Corpora

llm360/megamath 3 Apr 2025

(3) Exploring Synthetic data: We synthesized QA-style text, math-related code, and interleaved text-code blocks from web data or code data.

Diversity Math +1

50
0.30 stars / hour