Relative Pose Estimation through Affine Corrections of Monocular Depth Priors

markyu98/madpose 9 Jan 2025

In this paper, we develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities, covering both calibrated and uncalibrated conditions.

Monocular Depth Estimation Pose Estimation

167
0.46 stars / hour

Holistic Fusion: Task- and Setup-Agnostic Robot Localization and State Estimation with Factor Graphs

leggedrobotics/holistic_fusion 8 Apr 2025

Seamless operation of mobile robots in challenging environments requires low-latency local motion estimation (e. g., dynamic maneuvers) and accurate global localization (e. g., wayfinding).

Motion Estimation Sensor Fusion

75
0.45 stars / hour

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

bytedance/ui-tars 21 Jan 2025

This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e. g., keyboard and mouse operations).

4,011
0.42 stars / hour

TerraTorch: The Geospatial Foundation Models Toolkit

IBM/terratorch 26 Mar 2025

TerraTorch is a fine-tuning and benchmarking toolkit for Geospatial Foundation Models built on PyTorch Lightning and tailored for satellite, weather, and climate data.

Benchmarking Decoder +2

413
0.41 stars / hour

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

foundationagents/awesome-foundation-agents 31 Mar 2025

The advent of large language models (LLMs) has catalyzed a transformative shift in artificial intelligence, paving the way for advanced intelligent agents capable of sophisticated reasoning, robust perception, and versatile action across diverse domains.

 Ranked #1 on Continual Learning on AIDS (using extra training data)

AutoML Continual Learning

774
0.40 stars / hour

MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse

hiyouga/easyr1 24 Mar 2025

We present MetaSpatial, the first reinforcement learning (RL)-based framework designed to enhance 3D spatial reasoning in vision-language models (VLMs), enabling real-time 3D scene generation without the need for hard-coded optimizations.

Layout Generation Reinforcement Learning (RL) +2

2,049
0.40 stars / hour

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

microsoft/bitnet 17 Feb 2025

The advent of 1-bit large language models (LLMs), led by BitNet b1. 58, has spurred interest in ternary LLMs.

13,388
0.40 stars / hour

Olympus: A Universal Task Router for Computer Vision Tasks

yuanze-lin/Olympus 12 Dec 2024

We introduce Olympus, a new approach that transforms Multimodal Large Language Models (MLLMs) into a unified framework capable of handling a wide array of computer vision tasks.

292
0.39 stars / hour

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

mit-han-lab/nunchaku 7 Nov 2024

To address this, we co-design an inference engine Nunchaku that fuses the kernels of the low-rank branch into those of the low-bit branch to cut off redundant memory access.

Quantization

1,426
0.39 stars / hour