FAST-LIVO2: Fast, Direct LiDAR-Inertial-Visual Odometry

hku-mars/fast-livo2 26 Aug 2024

The fusion of both visual and LiDAR measurements is based on a single unified voxel map where the LiDAR module constructs the geometric structure for registering new LiDAR scans and the visual module attaches image patches to the LiDAR points.

Visual Odometry

1,020
0.33 stars / hour

MatMamba: A Matryoshka State Space Model

scaledfoundations/matmamba 9 Oct 2024

In this work, we present MatMamba: a state space model which combines Matryoshka-style learning with Mamba2, by modifying the block to contain nested dimensions to enable joint training and adaptive inference.

Representation Learning State Space Models

34
0.33 stars / hour

How to Train Long-Context Language Models (Effectively)

princeton-nlp/prolong 3 Oct 2024

We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information.

85
0.33 stars / hour

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

thu-ml/SageAttention 3 Oct 2024

Although quantization has proven to be an effective method for accelerating model inference, existing quantization methods primarily focus on optimizing the linear layer.

Image Generation Quantization +1

158
0.32 stars / hour

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

FunAudioLLM/SenseVoice 4 Jul 2024

This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs).

Emotion Recognition Event Detection +6

2,996
0.32 stars / hour

PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

stevenlsw/physgen 27 Sep 2024

We present PhysGen, a novel image-to-video generation method that converts a single image and an input condition (e. g., force and torque applied to an object in the image) to produce a realistic, physically plausible, and temporally consistent video.

Image to Video Generation

173
0.32 stars / hour

PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation

nvlabs/radio 2 Oct 2024

Various visual foundation models have distinct strengths and weaknesses, both of which can be improved through heterogeneous multi-teacher knowledge distillation without labels, termed "agglomerative models."

Knowledge Distillation

709
0.32 stars / hour

Q-VLM: Post-training Quantization for Large Vision-Language Models

changyuanwang17/qvlm 10 Oct 2024

On the contrary, we mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy searching with low search cost.

Language Modelling Quantization

29
0.31 stars / hour

A Multi-Level Superoptimizer for Tensor Programs

mirage-project/mirage 9 May 2024

We introduce Mirage, the first multi-level superoptimizer for tensor programs.

Navigate

518
0.30 stars / hour

Grounding Image Matching in 3D with MASt3R

naver/mast3r 14 Jun 2024

Image Matching is a core component of all best-performing algorithms and pipelines in 3D vision.

3D Reconstruction

1,021
0.28 stars / hour