SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video

pku-yuangroup/swapanyone 12 Mar 2025

Video body-swapping aims to replace the body in an existing video with a new body from arbitrary sources, which has garnered more attention in recent years.

Video Inpainting

53
0.43 stars / hour

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

NVIDIA/audio-flamingo 6 Mar 2025

Fine-tuning AF2 on LongAudio leads to exceptional performance on our proposed LongAudioBench, an expert annotated benchmark for evaluating ALMs on long audio understanding capabilities.

Audio captioning Language Modeling +2

390
0.40 stars / hour

Visual-RFT: Visual Reinforcement Fine-Tuning

liuziyu77/visual-rft 3 Mar 2025

Reinforcement Fine-Tuning (RFT) in Large Reasoning Models like OpenAI o1 learns from feedback on its answers, which is especially useful in applications when fine-tuning data is scarce.

Few-Shot Object Detection Fine-Grained Image Classification +4

1,327
0.40 stars / hour

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

pku-alignment/align-anything 20 Dec 2024

In this work, we make the first attempt to fine-tune all-modality models (i. e. input and output with any modality, also named any-to-any models) using human preference data across all modalities (including text, image, audio, and video), ensuring its behavior aligns with human intentions.

All Instruction Following

2,888
0.38 stars / hour

Lanpaint: Training-Free Diffusion Inpainting with Exact and Fast Conditional Inference

scraed/LanPaint 5 Feb 2025

Diffusion models generate high-quality images but often lack efficient and universally applicable inpainting capabilities, particularly in community-trained models.

179
0.37 stars / hour

PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

x-plug/mobileagent 20 Feb 2025

From the decision-making perspective, to handle complex user instructions and interdependent subtasks more effectively, we propose a hierarchical multi-agent collaboration architecture that decomposes decision-making processes into Instruction-Subtask-Action levels.

Decision Making

3,823
0.37 stars / hour

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

modalminds/mm-eureka 10 Mar 2025

We present MM-Eureka, a multimodal reasoning model that successfully extends large-scale rule-based reinforcement learning (RL) to multimodal reasoning.

Multimodal Reasoning Reinforcement Learning (RL)

380
0.36 stars / hour

Scaling Synthetic Data Creation with 1,000,000,000 Personas

lightaime/camel 28 Jun 2024

We propose a novel persona-driven data synthesis methodology that leverages various perspectives within a large language model (LLM) to create diverse synthetic data.

Language Modeling Language Modelling +3

10,573
0.36 stars / hour

VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

TencentARC/VideoPainter 7 Mar 2025

Video inpainting, which aims to restore corrupted video content, has experienced substantial progress.

Image Inpainting Optical Flow Estimation +3

251
0.35 stars / hour

Atom of Thoughts for Markov LLM Test-Time Scaling

qixucen/atom 17 Feb 2025

Based on this observation, we propose Atom of Thoughts (AoT), where each state transition in the reasoning process consists of decomposing the current question into a dependency-based directed acyclic graph and contracting its subquestions, forming a new atomic question state.

500
0.35 stars / hour