Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

hpcaitech/open-sora 12 Mar 2025

With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable.

Video Generation

25,569
0.48 stars / hour

Visual-RFT: Visual Reinforcement Fine-Tuning

liuziyu77/visual-rft 3 Mar 2025

Reinforcement Fine-Tuning (RFT) in Large Reasoning Models like OpenAI o1 learns from feedback on its answers, which is especially useful in applications when fine-tuning data is scarce.

Few-Shot Object Detection Fine-Grained Image Classification +4

1,356
0.46 stars / hour

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

yvanyin/goalflow 7 Mar 2025

Furthermore, GoalFlow employs an efficient generative method, Flow Matching, to generate multimodal trajectories, and incorporates a refined scoring mechanism to select the optimal trajectory from the candidates.

Autonomous Driving Denoising

75
0.45 stars / hour

Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation

EnVision-Research/Kiss3DGen 3 Mar 2025

The normal maps are then used to reconstruct a 3D mesh, and the multi-view images provide texture mapping, resulting in a complete 3D model.

3D Generation 3D Reconstruction +2

223
0.45 stars / hour

Scaling Synthetic Data Creation with 1,000,000,000 Personas

camel-ai/camel 28 Jun 2024

We propose a novel persona-driven data synthesis methodology that leverages various perspectives within a large language model (LLM) to create diverse synthetic data.

Language Modeling Language Modelling +3

10,861
0.43 stars / hour

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

tidedra/lmm-r1 10 Mar 2025

Enhancing reasoning in Large Multimodal Models (LMMs) faces unique challenges from the complex interplay between visual perception and logical reasoning, particularly in compact 3B-parameter architectures where architectural constraints limit reasoning capacity and modality alignment.

Logical Reasoning Multimodal Reasoning +1

624
0.43 stars / hour

PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

x-plug/mobileagent 20 Feb 2025

From the decision-making perspective, to handle complex user instructions and interdependent subtasks more effectively, we propose a hierarchical multi-agent collaboration architecture that decomposes decision-making processes into Instruction-Subtask-Action levels.

Decision Making

3,881
0.42 stars / hour

Evaluating LLM Reasoning in the Operations Research Domain with ORQA

nl4opt/ORQA 22 Dec 2024

In this paper, we introduce and apply Operations Research Question Answering (ORQA), a new benchmark designed to assess the generalization capabilities of Large Language Models (LLMs) in the specialized technical domain of Operations Research (OR).

Question Answering

44
0.42 stars / hour

From System 1 to System 2: A Survey of Reasoning Large Language Models

zzli2022/awesome-slow-reason-system 24 Feb 2025

Achieving human-level intelligence requires refining the transition from the fast, intuitive System 1 to the slower, more deliberate System 2 reasoning.

Logical Reasoning

810
0.42 stars / hour

VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

TencentARC/VideoPainter 7 Mar 2025

Video inpainting, which aims to restore corrupted video content, has experienced substantial progress.

Image Inpainting Optical Flow Estimation +3

265
0.41 stars / hour