SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

HKUDS/SepLLM 16 Dec 2024

This observation suggests that information of the segments between these separator tokens can be effectively condensed into the separator tokens themselves without significant information loss.

GSM8K Language Modeling +1

307
0.37 stars / hour

qlib

microsoft/qlib 10 Jul 2020

Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions.

Machine Translation Scheduling +1

27,160
0.32 stars / hour

s3: You Don't Need That Much Data to Train a Search Agent via RL

pat-jj/s3 20 May 2025

Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference.

RAG Reinforcement Learning (RL) +2

469
0.30 stars / hour

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

lukas-blecher/LaTeX-OCR ICLR 2021

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited.

 Ranked #1 on Image Classification on CIFAR-10 (using extra training data)

image-classification Semantic Segmentation

15,002
0.29 stars / hour

OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks

facebookresearch/olla 24 Oct 2022

We present OLLA, an algorithm that optimizes the lifetime and memory location of the tensors used to train neural networks.

97
0.29 stars / hour

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

yfzhang114/r1_reward 5 May 2025

Our reward model, R1-Reward, trained using the StableReinforce algorithm on this dataset, significantly improves performance on multimodal reward modeling benchmarks.

Reinforcement Learning (RL)

232
0.29 stars / hour

Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution

charlesq9/alita 26 May 2025

For Maximal self-evolution, we enable the creativity of Alita by providing a suite of general-purpose components to autonomously construct, refine, and reuse external capabilities by generating task-related model context protocols (MCPs) from open source, which contributes to scalable agentic reasoning.

692
0.28 stars / hour

Fine-grained and accurate source code differencing

GumTreeDiff/gumtree ACM/IEEE International Conference on Automated Software Engineering 2014

At the heart of software evolution is a sequence of edit actions, called an edit script, made to a source code file.

1,140
0.27 stars / hour

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

meigen-ai/multitalk 28 May 2025

Audio-driven human animation methods, such as talking head and talking body generation, have made remarkable progress in generating synchronized facial movements and appealing visual quality videos.

Human Animation Instruction Following +1

1,347
0.27 stars / hour

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Zhangwenyao1/DreamVLA 7 Jul 2025

However, existing methods are limited to challenging image-based forecasting, which suffers from redundant information and lacks comprehensive and critical world knowledge, including dynamic, spatial and semantic information.

Image Generation Multimodal Reasoning +3

92
0.26 stars / hour