Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task

360cvgroup/qihoo-t2x 6 Sep 2024

The global self-attention mechanism in diffusion transformers involves redundant computation due to the sparse and redundant nature of visual information, and the attention map of tokens within a spatial window shows significant similarity.

Video Generation

58
0.34 stars / hour

SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning

lamm-mit/SciAgentsDiscovery 9 Sep 2024

A key challenge in artificial intelligence is the creation of systems capable of autonomously advancing scientific understanding by exploring novel domains, identifying complex patterns, and uncovering previously unseen connections in vast scientific data.

Knowledge Graphs scientific discovery

125
0.32 stars / hour

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

openbmb/minicpm 9 Apr 2024

For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation.

Domain Adaptation

6,799
0.31 stars / hour

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

fudan-generative-vision/champ 21 Mar 2024

In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques.

Animated GIF Generation Image Animation +1

3,920
0.30 stars / hour

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

openbmb/ioa 9 Jul 2024

The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents.

542
0.30 stars / hour

DTL: Disentangled Transfer Learning for Visual Recognition

heekhero/DTL 13 Dec 2023

When pre-trained models become rapidly larger, the cost of fine-tuning on downstream tasks steadily increases, too.

Transfer Learning

56
0.30 stars / hour

MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving

emzucas/minidrive 11 Sep 2024

Meanwhile, most existing VLMs lack the ability to process multiple images, making it difficult to adapt to multi-camera perception in autonomous driving.

Autonomous Driving Feature Engineering +1

20
0.29 stars / hour

FLUX that Plays Music

feizc/fluxmusic 1 Sep 2024

This paper explores a simple extension of diffusion-based rectified flow Transformers for text-to-music generation, termed as FluxMusic.

Music Generation Text-to-Music Generation

1,413
0.28 stars / hour

AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction

lightblues/agentre 3 Sep 2024

The relation extraction (RE) in complex scenarios faces challenges such as diverse relation types and ambiguous relations between entities within a single sentence, leading to the poor performance of pure "text-in, text-out" language models (LMs).

Relation Relation Extraction +1

29
0.27 stars / hour