Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

thunlp/proactiveagent 16 Oct 2024

The labeled data is used to train a reward model that simulates human judgment and serves as an automatic evaluator of the proactiveness of LLM agents.

97
0.82 stars / hour

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

yangchris11/samurai 18 Nov 2024

The Segment Anything Model 2 (SAM 2) has demonstrated strong performance in object segmentation tasks but faces challenges in visual object tracking, particularly when managing crowded scenes with fast-moving or self-occluding objects.

Visual Object Tracking Visual Tracking

5,421
0.82 stars / hour

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

aidc-ai/marco-o1 21 Nov 2024

Currently OpenAI o1 sparks a surge of interest in the study of large reasoning models (LRM).

Reinforcement Learning (RL)

1,033
0.78 stars / hour

MinerU: An Open-Source Solution for Precise Document Content Extraction

opendatalab/mineru 27 Sep 2024

Document content analysis has been a crucial research area in computer vision.

Diversity Optical Character Recognition (OCR)

19,911
0.77 stars / hour

StableAnimator: High-Quality Identity-Preserving Human Image Animation

Francis-Rings/StableAnimator 26 Nov 2024

During inference, we propose a novel Hamilton-Jacobi-Bellman (HJB) equation-based optimization to further enhance the face quality.

Denoising Face Reenactment +3

144
0.74 stars / hour

DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving

hustvl/diffusiondrive 22 Nov 2024

However, the numerous denoising steps in the robotic diffusion policy and the more dynamic, open-world nature of traffic scenes pose substantial challenges for generating diverse driving actions at a real-time speed.

Autonomous Driving Denoising

204
0.69 stars / hour

DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting

chenhoy/droid-splat 26 Nov 2024

Recent progress in scene synthesis makes standalone SLAM systems purely based on optimizing hyperprimitives with a Rendering objective possible.

Camera Calibration Depth Estimation +1

164
0.68 stars / hour

MureObjectStitch: Multi-reference Image Composition

bcmi/mureobjectstitch-image-composition 12 Nov 2024

Generative image composition aims to regenerate the given foreground object in the background image to produce a realistic composite image.

Object

104
0.65 stars / hour

3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes

convexsplatting/convex-splatting 22 Nov 2024

Our results highlight the potential of 3D Convex Splatting to become the new standard for high-quality scene reconstruction and novel view synthesis.

Novel View Synthesis

158
0.61 stars / hour

One Diffusion to Generate Them All

lehduong/onediffusion 25 Nov 2024

Experimental results demonstrate competitive performance across tasks in both generation and prediction such as text-to-image, multiview generation, ID preservation, depth estimation and camera pose estimation despite relatively small training dataset.

Camera Pose Estimation Deblurring +4

248
0.57 stars / hour