AgentSquare: Automatic LLM Agent Search in Modular Design Space

tsinghua-fib-lab/agentsquare 8 Oct 2024

We believe that the modular design space and AgentSquare search framework offer a platform for fully exploiting the potential of prior successful designs and consolidating the collective efforts of research community.

107
0.49 stars / hour

D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement

Peterande/D-FINE 17 Oct 2024

When pretrained on Objects365, D-FINE-L / X attains 57. 1% / 59. 3% AP, surpassing all existing real-time detectors.

 Ranked #1 on Real-Time Object Detection on MS COCO (using extra training data)

Real-Time Object Detection regression

649
0.48 stars / hour

KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation

openspg/kag 10 Sep 2024

The recently developed retrieval-augmented generation (RAG) technology has enabled the efficient construction of domain-specific applications.

Knowledge Graphs Question Answering +2

519
0.47 stars / hour

Extended Agriculture-Vision: An Extension of a Large Aerial Image Dataset for Agricultural Pattern Analysis

jingwu6/extended-agriculture-vision-dataset 4 Mar 2023

First, we generate and release an improved version of the Agriculture-Vision dataset (Chiu et al., 2020b) to include raw, full-field imagery for greater experimental flexibility.

Benchmarking Contrastive Learning +2

115
0.44 stars / hour

Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks

jill0001/leopard 2 Oct 2024

Tasks involving multiple text-rich images are especially challenging, as they require not only understanding the content of individual images but reasoning about inter-relationships and logical flows across multiple visual inputs.

Language Modelling

122
0.44 stars / hour

HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models

HelloVision/HelloMeme 30 Oct 2024

We propose an effective method for inserting adapters into text-to-image foundation models, which enables the execution of complex downstream tasks while preserving the generalization ability of the base model.

Video Generation

112
0.43 stars / hour

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

SWivid/F5-TTS 9 Oct 2024

This sampling strategy for flow step can be easily applied to existing flow matching based models without retraining.

Denoising Text to Speech

6,771
0.43 stars / hour

Degradation-Aware Residual-Conditioned Optimal Transport for Unified Image Restoration

xl-tang3/DA-RCOT 3 Nov 2024

More crucially, we design the transport map for restoration as a two-pass DA-RCOT map, in which the transport residual is computed in the first pass and then encoded as multi-scale residual embeddings to condition the second-pass restoration.

Unified Image Restoration

33
0.43 stars / hour

CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes

DekuLiuTesla/CityGaussian 1 Nov 2024

Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis.

Novel View Synthesis

469
0.42 stars / hour