GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

yvanyin/goalflow 7 Mar 2025

Furthermore, GoalFlow employs an efficient generative method, Flow Matching, to generate multimodal trajectories, and incorporates a refined scoring mechanism to select the optimal trajectory from the candidates.

Autonomous Driving Denoising

65
0.34 stars / hour

PE3R: Perception-Efficient 3D Reconstruction

hujiecpp/pe3r 10 Mar 2025

PE3R employs a feed-forward architecture to enable rapid 3D semantic field reconstruction.

3D Reconstruction Zero-shot Generalization

260
0.34 stars / hour

Inductive Moment Matching

lumalabs/imm 10 Mar 2025

Diffusion models and Flow Matching generate high-quality samples but are slow at inference, and distilling them into few-step models often leads to instability and extensive tuning.

393
0.34 stars / hour

AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

opendrivelab/agibot-world 9 Mar 2025

Introducing AgiBot World, a large-scale platform comprising over 1 million trajectories across 217 tasks in five deployment scenarios, we achieve an order-of-magnitude increase in data scale compared to existing datasets.

1,792
0.34 stars / hour

Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering

nvlabs/svraster 5 Dec 2024

We propose an efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians.

Novel View Synthesis

438
0.34 stars / hour

A Distractor-Aware Memory for Visual Object Tracking with SAM2

jovanavidenovic/dam4sam 26 Nov 2024

We argue that a more sophisticated memory model is required, and propose a new distractor-aware memory model for SAM2 and an introspection-based update strategy that jointly addresses the segmentation accuracy as well as tracking robustness.

Semi-Supervised Video Object Segmentation Visual Object Tracking +1

226
0.33 stars / hour

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

hjyao00/mulberry 24 Dec 2024

Using CoMCTS, we construct Mulberry-260k, a multimodal dataset with a tree of rich, explicit and well-defined reasoning nodes for each question.

978
0.33 stars / hour

HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States

leigest519/hiddendetect 20 Feb 2025

The integration of additional modalities increases the susceptibility of large vision-language models (LVLMs) to safety risks, such as jailbreak attacks, compared to their language-only counterparts.

108
0.33 stars / hour

MOAT: Evaluating LMMs for Capability Integration and Instruction Grounding

Cambrian-yzt/MOAT 12 Mar 2025

However, there remains a significant gap between state-of-the-art LMMs and human performance when it comes to complex tasks that require a combination of fundamental VL capabilities, as well as tasks involving the grounding of complex instructions.

19
0.32 stars / hour

Chain of Draft: Thinking Faster by Writing Less

sileix/chain-of-draft 25 Feb 2025

Large Language Models (LLMs) have demonstrated remarkable performance in solving complex reasoning tasks through mechanisms like Chain-of-Thought (CoT) prompting, which emphasizes verbose, step-by-step reasoning.

190
0.32 stars / hour