D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement

Peterande/D-FINE 17 Oct 2024

When pretrained on Objects365, D-FINE-L / X attains 57. 1% / 59. 3% AP, surpassing all existing real-time detectors.

 Ranked #1 on Real-Time Object Detection on MS COCO (using extra training data)

Real-Time Object Detection regression

651
0.48 stars / hour

KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation

openspg/kag 10 Sep 2024

The recently developed retrieval-augmented generation (RAG) technology has enabled the efficient construction of domain-specific applications.

Knowledge Graphs Question Answering +2

521
0.47 stars / hour

Extended Agriculture-Vision: An Extension of a Large Aerial Image Dataset for Agricultural Pattern Analysis

jingwu6/extended-agriculture-vision-dataset 4 Mar 2023

First, we generate and release an improved version of the Agriculture-Vision dataset (Chiu et al., 2020b) to include raw, full-field imagery for greater experimental flexibility.

Benchmarking Contrastive Learning +2

160
0.44 stars / hour

Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks

jill0001/leopard 2 Oct 2024

Tasks involving multiple text-rich images are especially challenging, as they require not only understanding the content of individual images but reasoning about inter-relationships and logical flows across multiple visual inputs.

Language Modelling

122
0.44 stars / hour

HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models

HelloVision/HelloMeme 30 Oct 2024

We propose an effective method for inserting adapters into text-to-image foundation models, which enables the execution of complex downstream tasks while preserving the generalization ability of the base model.

Video Generation

114
0.43 stars / hour

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

SWivid/F5-TTS 9 Oct 2024

This sampling strategy for flow step can be easily applied to existing flow matching based models without retraining.

Denoising Text to Speech

6,810
0.43 stars / hour

Degradation-Aware Residual-Conditioned Optimal Transport for Unified Image Restoration

xl-tang3/DA-RCOT 3 Nov 2024

More crucially, we design the transport map for restoration as a two-pass DA-RCOT map, in which the transport residual is computed in the first pass and then encoded as multi-scale residual embeddings to condition the second-pass restoration.

Unified Image Restoration

36
0.43 stars / hour

CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes

DekuLiuTesla/CityGaussian 1 Nov 2024

Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis.

Novel View Synthesis

474
0.42 stars / hour

Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities

gpt-omni/mini-omni2 15 Oct 2024

It can understand visual, auditory, and textual modalities, directly output audio, and support flexible duplex interaction.

Language Modelling

1,514
0.41 stars / hour