Domain-Controlled Prompt Learning

caoql98/dcpl 30 Sep 2023

Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms, leading to suboptimal performance due to the misinterpretation of specific images in natural image patterns.

111
0.79 stars / hour

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

fudan-generative-vision/hallo2 10 Oct 2024

To the best of our knowledge, Hallo2, proposed in this paper, is the first method to achieve 4K resolution and generate hour-long, audio-driven portrait image animations enhanced with textual prompts.

4k Image Animation +2

3,608
0.76 stars / hour

LightRAG: Simple and Fast Retrieval-Augmented Generation

hkuds/lightrag 8 Oct 2024

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs.

Information Retrieval RAG +1

7,558
0.72 stars / hour

Geometric Transformer with Interatomic Positional Encoding

microsoft/AI2BMD NeurIPS 2023

The widespread adoption of Transformer architectures in various data modalities has opened new avenues for the applications in molecular modeling.

299
0.68 stars / hour

Taming Rectified Flow for Inversion and Editing

wangjiangshan0725/rf-solver-edit 7 Nov 2024

Rectified-flow-based diffusion transformers, such as FLUX and OpenSora, have demonstrated exceptional performance in the field of image and video generation.

Text-to-Image Generation Video Editing +1

29
0.67 stars / hour

Moonshine: Speech Recognition for Live Transcription and Voice Commands

usefulsensors/moonshine 21 Oct 2024

This paper introduces Moonshine, a family of speech recognition models optimized for live transcription and voice command processing.

Decoder Position +2

2,090
0.66 stars / hour

Adaptive Length Image Tokenization via Recurrent Allocation

shivamduggal4/adaptive-length-tokenizer 4 Nov 2024

Our encoder-decoder architecture recursively processes 2D image tokens, distilling them into 1D latent tokens over multiple iterations of recurrent rollouts.

Decoder

56
0.65 stars / hour

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

all-hands-ai/openhands 23 Jul 2024

OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web.

35,049
0.64 stars / hour

A Scalable Communication Protocol for Networks of Large Language Models

agora-protocol/paper-demo 14 Oct 2024

These requisites, which we refer to as the Agent Communication Trilemma, are hard to achieve in large networks of agents.

73
0.50 stars / hour

Melody Is All You Need For Music Generation

shaopengw/Awesome-Music-Generation 30 Sep 2024

We present the Melody Guided Music Generation (MMGen) model, the first novel approach using melody to guide the music generation that, despite a pretty simple method and extremely limited resources, achieves excellent performance.

cross-modal alignment Music Generation +1

96
0.49 stars / hour