Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

youngsheen/SimVQ 4 Nov 2024

However, VQ models are often hindered by the problem of representation collapse in the latent space, which leads to low codebook utilization and limits the scalability of the codebook for large-scale training.

Quantization Representation Learning

74
0.58 stars / hour

LightRAG: Simple and Fast Retrieval-Augmented Generation

hkuds/lightrag 8 Oct 2024

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs.

Information Retrieval RAG +1

7,365
0.61 stars / hour

PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting

cvlab-kaist/PF3plat 29 Oct 2024

We then introduce lightweight, learnable modules to refine depth and pose estimates from the coarse alignments, improving the quality of 3D reconstruction and novel view synthesis.

3D Reconstruction Monocular Depth Estimation +1

119
0.55 stars / hour

HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models

HelloVision/HelloMeme 30 Oct 2024

We propose an effective method for inserting adapters into text-to-image foundation models, which enables the execution of complex downstream tasks while preserving the generalization ability of the base model.

Video Generation

106
0.48 stars / hour

AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

THUDM/Android-Lab 31 Oct 2024

It supports both large language models (LLMs) and multimodal models (LMMs) in the same action space.

Benchmarking

92
0.48 stars / hour

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

SWivid/F5-TTS 9 Oct 2024

This sampling strategy for flow step can be easily applied to existing flow matching based models without retraining.

Denoising Text to Speech

6,707
0.49 stars / hour

LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models

Fsoft-AIC/LibMoE 1 Nov 2024

Built upon three core principles: (i) modular design, (ii) efficient training; (iii) comprehensive evaluation, LibMoE brings MoE in LLMs more accessible to a wide range of researchers by standardizing the training and evaluation pipelines.

Benchmarking

25
0.47 stars / hour

Agent S: An Open Agentic Framework that Uses Computers Like a Human

simular-ai/agent-s 10 Oct 2024

We present Agent S, an open agentic framework that enables autonomous interaction with computers through a Graphical User Interface (GUI), aimed at transforming human-computer interaction by automating complex, multi-step tasks.

AI Agent

557
0.30 stars / hour

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

haiyang-w/tokenformer 30 Oct 2024

By treating model parameters as tokens, we replace all the linear projections in Transformers with our token-parameter attention layer, where input tokens act as queries and model parameters as keys and values.

164
0.44 stars / hour

Extended Agriculture-Vision: An Extension of a Large Aerial Image Dataset for Agricultural Pattern Analysis

jingwu6/extended-agriculture-vision-dataset 4 Mar 2023

First, we generate and release an improved version of the Agriculture-Vision dataset (Chiu et al., 2020b) to include raw, full-field imagery for greater experimental flexibility.

Benchmarking Contrastive Learning +2

97
0.43 stars / hour