Agent S: An Open Agentic Framework that Uses Computers Like a Human

simular-ai/agent-s 10 Oct 2024

We present Agent S, an open agentic framework that enables autonomous interaction with computers through a Graphical User Interface (GUI), aimed at transforming human-computer interaction by automating complex, multi-step tasks.

AI Agent Task Planning

1,142
1.02 stars / hour

Visual-RFT: Visual Reinforcement Fine-Tuning

liuziyu77/visual-rft 3 Mar 2025

Reinforcement Fine-Tuning (RFT) in Large Reasoning Models like OpenAI o1 learns from feedback on its answers, which is especially useful in applications when fine-tuning data is scarce.

Few-Shot Object Detection Fine-Grained Image Classification +4

1,245
0.99 stars / hour

Executable Code Actions Elicit Better LLM Agents

xingyaoww/code-act 1 Feb 2024

LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e. g., the scope of pre-defined tools) and restricted flexibility (e. g., inability to compose multiple tools).

Language Modelling Large Language Model

798
0.95 stars / hour

GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control

nv-tlabs/GEN3C 5 Mar 2025

Our results demonstrate more precise camera control than prior work, as well as state-of-the-art results in sparse-view novel view synthesis, even in challenging settings such as driving scenes and monocular dynamic video.

Novel View Synthesis Video Generation

387
0.94 stars / hour

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

dvlab-research/Seg-Zero 9 Mar 2025

Traditional methods for reasoning segmentation rely on supervised fine-tuning with categorical labels and simple descriptions, limiting its out-of-domain generalization and lacking explicit reasoning processes.

Domain Generalization Open Vocabulary Object Detection +6

189
0.87 stars / hour

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

hustvl/alphadrive 10 Mar 2025

Some studies integrate vision-language models (VLMs) into autonomous driving, but they typically rely on pre-trained models with simple supervised fine-tuning (SFT) on driving data, without further exploration of training strategies or optimizations specifically tailored for planning.

Autonomous Driving Common Sense Reasoning +1

107
0.72 stars / hour

LLM4AD: A Platform for Algorithm Design with Large Language Model

optima-cityu/llm4ad 23 Dec 2024

We introduce LLM4AD, a unified Python platform for algorithm design (AD) with large language models (LLMs).

Language Modeling Language Modelling +2

250
0.70 stars / hour

Self-rewarding correction for mathematical reasoning

volcengine/verl 26 Feb 2025

We study self-rewarding reasoning large language models (LLMs), which can simultaneously generate step-by-step reasoning and evaluate the correctness of their outputs during the inference time-without external feedback.

Mathematical Reasoning

4,803
0.67 stars / hour

Lanpaint: Training-Free Diffusion Inpainting with Exact and Fast Conditional Inference

scraed/LanPaint 5 Feb 2025

Diffusion models generate high-quality images but often lack efficient and universally applicable inpainting capabilities, particularly in community-trained models.

158
0.66 stars / hour