REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

openrlhf/openrlhf 4 Jan 2025

Reinforcement Learning from Human Feedback (RLHF) has emerged as a critical approach for aligning large language models with human preferences, witnessing rapid algorithmic evolution through methods such as Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), REINFORCE Leave One-Out (RLOO), ReMax, and Group Relative Policy Optimization (GRPO).

Computational Efficiency

4,619
0.66 stars / hour

Kimi k1.5: Scaling Reinforcement Learning with LLMs

moonshotai/kimi-k1.5 22 Jan 2025

Moreover, we present effective long2short methods that use long-CoT techniques to improve short-CoT models, yielding state-of-the-art short-CoT reasoning results -- e. g., 60. 8 on AIME, 94. 6 on MATH500, 47. 3 on LiveCodeBench -- outperforming existing short-CoT models such as GPT-4o and Claude Sonnet 3. 5 by a large margin (up to +550%).

Math reinforcement-learning +2

2,853
0.57 stars / hour

IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

plurai-ai/intellagent 19 Jan 2025

IntellAgent represents a paradigm shift in evaluating conversational AI.

Navigate

720
0.56 stars / hour

Process Reinforcement through Implicit Rewards

prime-rl/prime 3 Feb 2025

While dense rewards also offer an appealing choice for the reinforcement learning (RL) of LLMs since their fine-grained rewards have the potential to address some inherent issues of outcome rewards, such as training efficiency and credit assignment, this potential remains largely unrealized.

Math Reinforcement Learning (RL)

1,212
0.55 stars / hour

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

bytedance/ui-tars 21 Jan 2025

This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e. g., keyboard and mouse operations).

2,317
0.55 stars / hour

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

ziyuguo99/image-generation-cot 23 Jan 2025

We hope our study provides unique insights and paves a new path for integrating CoT reasoning with autoregressive image generation.

Image Generation

433
0.53 stars / hour

ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills

lecar-lab/asap 3 Feb 2025

In the second stage, we deploy the policies in the real world and collect real-world data to train a delta (residual) action model that compensates for the dynamics mismatch.

530
0.51 stars / hour

Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding

xid32/naacl_2025_twm 9 Feb 2025

To overcome these challenges, we introduce a specialized cognitive module, temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of MFMs.

Image Captioning Image-text Retrieval +5

49
0.50 stars / hour

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

gusye1234/nano-graphrag 24 Apr 2024

To combine the strengths of these contrasting methods, we propose a Graph RAG approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text to be indexed.

Query-focused Summarization Question Answering +2

2,359
0.46 stars / hour