SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

mit-han-lab/nunchaku 7 Nov 2024

To address this, we co-design an inference engine Nunchaku that fuses the kernels of the low-rank branch into those of the low-bit branch to cut off redundant memory access.

Quantization

465
0.33 stars / hour

FilterNet: Harnessing Frequency Filters for Time Series Forecasting

aikunyi/filternet 3 Nov 2024

While numerous forecasters have been proposed using different network architectures, the Transformer-based models have state-of-the-art performance in time series forecasting.

Time Series Time Series Forecasting

91
0.33 stars / hour

Docling Technical Report

DS4SD/docling 19 Aug 2024

This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion.

12,894
0.32 stars / hour

How to Correctly do Semantic Backpropagation on Language-based Agentic Systems

hishamalyahya/semantic_backprop 4 Dec 2024

Language-based agentic systems have shown great promise in recent years, transitioning from solving small-scale research problems to being deployed in challenging real-world tasks.

GSM8K

24
0.32 stars / hour

DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting

chenhoy/droid-splat 26 Nov 2024

Recent progress in scene synthesis makes standalone SLAM systems purely based on optimizing hyperprimitives with a Rendering objective possible.

Camera Calibration Depth Estimation +1

237
0.32 stars / hour

DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding

idea-research/dino-x-api 21 Nov 2024

DINO-X employs the same Transformer-based encoder-decoder architecture as Grounding DINO 1. 5 to pursue an object-level representation for open-world object understanding.

Long-tailed Object Detection Object +4

494
0.31 stars / hour

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

sunzey/x-prompt 2 Dec 2024

Extensive experiments validate the model's performance across diverse seen image generation tasks and its capacity to generalize to previously unseen tasks.

In-Context Learning Language Modelling +1

87
0.31 stars / hour

MinerU: An Open-Source Solution for Precise Document Content Extraction

opendatalab/mineru 27 Sep 2024

Document content analysis has been a crucial research area in computer vision.

Diversity Optical Character Recognition (OCR)

20,794
0.30 stars / hour

Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail

bartn8/stereoanywhere 5 Dec 2024

We introduce Stereo Anywhere, a novel stereo-matching framework that combines geometric constraints with robust priors from monocular depth Vision Foundation Models (VFMs).

Stereo Matching Zero-shot Generalization

61
0.30 stars / hour

StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows

ag2ai/ag2 17 Mar 2024

In StateFlow, we distinguish between "process grounding" (via state and state transitions) and "sub-task solving" (through actions within a state), enhancing control and interpretability of the task-solving procedure.

Management

847
0.30 stars / hour