OmniGen: Unified Image Generation

vectorspacelab/omnigen 17 Sep 2024

In this work, we introduce OmniGen, a new diffusion model for unified image generation.

Edge Detection Pose Estimation +2

2,486
1.66 stars / hour

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

mit-han-lab/nunchaku 7 Nov 2024

To address this, we co-design an inference engine Nunchaku that fuses the kernels of the low-rank branch into those of the low-bit branch to cut off redundant memory access.

Quantization

267
1.57 stars / hour

Docling Technical Report

DS4SD/docling 19 Aug 2024

This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion.

8,635
1.46 stars / hour

ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate

ishohei220/adopt 5 Nov 2024

Adam is one of the most popular optimization algorithms in deep learning.

Image Classification

241
1.19 stars / hour

Taming Rectified Flow for Inversion and Editing

wangjiangshan0725/rf-solver-edit 7 Nov 2024

Rectified-flow-based diffusion transformers, such as FLUX and OpenSora, have demonstrated exceptional performance in the field of image and video generation.

Text-to-Image Generation Video Editing +1

110
0.82 stars / hour

Qwen2.5-Coder Technical Report

qwenlm/qwen2.5-coder 18 Sep 2024

In this report, we introduce the Qwen2. 5-Coder series, a significant upgrade from its predecessor, CodeQwen1. 5.

Code Generation +2

1,338
0.76 stars / hour

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

fudan-generative-vision/hallo2 10 Oct 2024

To the best of our knowledge, Hallo2, proposed in this paper, is the first method to achieve 4K resolution and generate hour-long, audio-driven portrait image animations enhanced with textual prompts.

4k Image Animation +2

3,851
0.71 stars / hour

MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

donydchen/mvsplat360 7 Nov 2024

To evaluate MVSplat360's performance, we introduce a new benchmark using the challenging DL3DV-10K dataset, where MVSplat360 achieves superior visual quality compared to state-of-the-art methods on wide-sweeping or even 360{\deg} NVS tasks.

3D Reconstruction Denoising +2

83
0.70 stars / hour

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

thudm/cogvideo 12 Aug 2024

We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels.

Text-to-Video Generation Video Alignment +2

9,026
0.63 stars / hour

FinanceBench: A New Benchmark for Financial Question Answering

SuperpoweredAI/spRAG 20 Nov 2023

We test 16 state of the art model configurations (including GPT-4-Turbo, Llama2 and Claude2, with vector stores and long context prompts) on a sample of 150 cases from FinanceBench, and manually review their answers (n=2, 400).

Question Answering Retrieval +1

957
0.63 stars / hour