CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

thudm/cogvideo 12 Aug 2024

We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels.

Text-to-Video Generation Video Alignment +2

9,057
0.60 stars / hour

EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

aigc-apps/easyanimate 29 May 2024

The motion module can be adapted to various DiT baseline methods to generate video with different styles.

Image Generation Video Generation

1,423
0.59 stars / hour

Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement

nju-pcalab/rag-diffusion 10 Nov 2024

To handle these limitations, we decouple the multi-region generation into two sub-tasks, the construction of individual region (Regional Hard Binding) that ensures the regional prompt is properly executed, and the overall detail refinement (Regional Soft Refinement) over regions that dismiss the visual boundaries and enhance adjacent interactions.

Attribute RAG +1

57
0.54 stars / hour

FinanceBench: A New Benchmark for Financial Question Answering

SuperpoweredAI/spRAG 20 Nov 2023

We test 16 state of the art model configurations (including GPT-4-Turbo, Llama2 and Claude2, with vector stores and long context prompts) on a sample of 150 cases from FinanceBench, and manually review their answers (n=2, 400).

Question Answering Retrieval +1

962
0.52 stars / hour

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

opendevin/opendevin 23 Jul 2024

OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web.

36,368
0.52 stars / hour

AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems

microsoft/autogen 9 Aug 2024

Multi-agent systems, where multiple agents (generative AI models + tools) collaborate, are emerging as an effective pattern for solving long-running, complex tasks in numerous domains.

33,970
0.49 stars / hour

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

haiyang-w/tokenformer 30 Oct 2024

By treating model parameters as tokens, we replace all the linear projections in Transformers with our token-parameter attention layer, where input tokens act as queries and model parameters as keys and values.

259
0.47 stars / hour

MinerU: An Open-Source Solution for Precise Document Content Extraction

opendatalab/mineru 27 Sep 2024

Document content analysis has been a crucial research area in computer vision.

Diversity Optical Character Recognition (OCR)

14,532
0.47 stars / hour

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

tencent/tencent-hunyuan-large 4 Nov 2024

In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens.

Logical Reasoning Mathematical Problem-Solving

1,055
0.45 stars / hour

FAN: Fourier Analysis Networks

yihongdong/fan 3 Oct 2024

Despite the remarkable success achieved by neural networks, particularly those represented by MLP and Transformer, we reveal that they exhibit potential flaws in the modeling and reasoning of periodicity, i. e., they tend to memorize the periodic data rather than genuinely understanding the underlying principles of periodicity.

Language Modelling Time Series Forecasting

77
0.45 stars / hour