Taming Rectified Flow for Inversion and Editing

wangjiangshan0725/rf-solver-edit 7 Nov 2024

Rectified-flow-based diffusion transformers, such as FLUX and OpenSora, have demonstrated exceptional performance in the field of image and video generation.

Text-to-Image Generation Video Editing +1

55
0.85 stars / hour

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

fudan-generative-vision/hallo2 10 Oct 2024

To the best of our knowledge, Hallo2, proposed in this paper, is the first method to achieve 4K resolution and generate hour-long, audio-driven portrait image animations enhanced with textual prompts.

4k Image Animation +2

3,708
0.80 stars / hour

Classification Done Right for Vision-Language Pre-Training

x-cls/superclass 5 Nov 2024

Due to the absence of the text encoding as contrastive target, SuperClass does not require a text encoder and does not need to maintain a large batch size as CLIP does.

Classification

79
0.80 stars / hour

AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

THUDM/Android-Lab 31 Oct 2024

It supports both large language models (LLMs) and multimodal models (LMMs) in the same action space.

Benchmarking

124
0.79 stars / hour

SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

mit-han-lab/deepcompressor 7 Nov 2024

To address this, we co-design an inference engine Nunchaku that fuses the kernels of the low-rank branch into those of the low-bit branch to cut off redundant memory access.

Quantization

166
0.79 stars / hour

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

youngsheen/SimVQ 4 Nov 2024

However, VQ models are often hindered by the problem of representation collapse in the latent space, which leads to low codebook utilization and limits the scalability of the codebook for large-scale training.

Quantization Representation Learning

93
0.76 stars / hour

LightRAG: Simple and Fast Retrieval-Augmented Generation

hkuds/lightrag 8 Oct 2024

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs.

Information Retrieval RAG +1

7,788
0.73 stars / hour

GameGen-X: Interactive Open-world Game Video Generation

gamegen-x/gamegen-x 1 Nov 2024

To realize this vision, we first collected and built an Open-World Video Game Dataset from scratch.

Text-to-Video Generation Video Generation

144
0.67 stars / hour

Domain-Controlled Prompt Learning

caoql98/dcpl 30 Sep 2023

Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms, leading to suboptimal performance due to the misinterpretation of specific images in natural image patterns.

113
0.66 stars / hour

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

all-hands-ai/openhands 23 Jul 2024

OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web.

35,510
0.66 stars / hour