Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

thunlp/proactiveagent 16 Oct 2024

The labeled data is used to train a reward model that simulates human judgment and serves as an automatic evaluator of the proactiveness of LLM agents.

170
0.76 stars / hour

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

antgroup/echomimic_v2 15 Nov 2024

Recent work on human animation usually involves audio, pose, or movement maps conditions, thereby achieves vivid animation quality.

Audio-Driven Body Animation Human Animation +1

1,590
0.73 stars / hour

SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition

topdu/openocr 24 Nov 2024

In this paper, we propose SVTRv2, a CTC model that beats leading EDTRs in both accuracy and inference speed.

Decoder Optical Character Recognition (OCR) +1

317
0.66 stars / hour

OminiControl: Minimal and Universal Control for Diffusion Transformer

Yuanshi9815/OminiControl 22 Nov 2024

In this paper, we introduce OminiControl, a highly versatile and parameter-efficient framework that integrates image conditions into pre-trained Diffusion Transformer (DiT) models.

769
0.62 stars / hour

StableAnimator: High-Quality Identity-Preserving Human Image Animation

Francis-Rings/StableAnimator 26 Nov 2024

During inference, we propose a novel Hamilton-Jacobi-Bellman (HJB) equation-based optimization to further enhance the face quality.

Denoising Face Reenactment +3

210
0.60 stars / hour

MinerU: An Open-Source Solution for Precise Document Content Extraction

opendatalab/mineru 27 Sep 2024

Document content analysis has been a crucial research area in computer vision.

Diversity Optical Character Recognition (OCR)

20,511
0.59 stars / hour

MARS: Unleashing the Power of Variance Reduction for Training Large Models

AGI-Arena/MARS 15 Nov 2024

Despite the development of numerous variance reduction algorithms in the past decade aimed at accelerating stochastic optimization in both convex and nonconvex settings, variance reduction has not found widespread success in training deep neural networks or large language models.

Stochastic Optimization

357
0.54 stars / hour

GraphCast: Learning skillful medium-range global weather forecasting

google-deepmind/graphcast 24 Dec 2022

Global medium-range weather forecasting is critical to decision-making across many social and economic domains.

Decision Making Weather Forecasting

5,021
0.53 stars / hour

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

yangchris11/samurai 18 Nov 2024

The Segment Anything Model 2 (SAM 2) has demonstrated strong performance in object segmentation tasks but faces challenges in visual object tracking, particularly when managing crowded scenes with fast-moving or self-occluding objects.

Visual Object Tracking Visual Tracking

5,686
0.52 stars / hour

XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL

XGenerationLab/XiYan-SQL 13 Nov 2024

On the other hand, we implement the ICL approach with an example selection method based on named entity recognition to prevent overemphasis on entities.

Diversity In-Context Learning +3

143
0.51 stars / hour