Qwen2.5-Coder Technical Report

qwenlm/qwen2.5-coder 18 Sep 2024

In this report, we introduce the Qwen2. 5-Coder series, a significant upgrade from its predecessor, CodeQwen1. 5.

Code Generation +2

1,874
4.13 stars / hour

Docling Technical Report

DS4SD/docling 19 Aug 2024

This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion.

8,963
1.63 stars / hour

OmniGen: Unified Image Generation

vectorspacelab/omnigen 17 Sep 2024

In this work, we introduce OmniGen, a new diffusion model for unified image generation.

Edge Detection Pose Estimation +2

2,563
1.54 stars / hour

Scaling Mesh Generation via Compressive Tokenization

whaohan/bpt 11 Nov 2024

We propose a compressive yet effective mesh representation, Blocked and Patchified Tokenization (BPT), facilitating the generation of meshes exceeding 8k faces.

8k

71
1.17 stars / hour

Autoregressive Models in Vision: A Survey

chaofantao/autoregressive-models-in-vision-survey 8 Nov 2024

Autoregressive modeling has been a huge success in the field of natural language processing (NLP).

3D Generation Survey +1

169
1.13 stars / hour

In-Context LoRA for Diffusion Transformers

ali-vilab/In-Context-LoRA 31 Oct 2024

While task-specific in terms of tuning data, our framework remains task-agnostic in architecture and pipeline, offering a powerful tool for the community and providing valuable insights for further research on product-level task-agnostic generation systems.

Image Generation

567
1.12 stars / hour

SplatFormer: Point Transformer for Robust 3D Gaussian Splatting

ChenYutongTHU/SplatFormer 10 Nov 2024

To our knowledge, this is the first successful application of point transformers directly on 3DGS sets, surpassing the limitations of previous multi-scene training methods, which could handle only a restricted number of input views during inference.

Novel View Synthesis

53
1.04 stars / hour

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

mit-han-lab/nunchaku 7 Nov 2024

To address this, we co-design an inference engine Nunchaku that fuses the kernels of the low-rank branch into those of the low-bit branch to cut off redundant memory access.

Quantization

285
1.04 stars / hour

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

fudan-generative-vision/hallo2 10 Oct 2024

To the best of our knowledge, Hallo2, proposed in this paper, is the first method to achieve 4K resolution and generate hour-long, audio-driven portrait image animations enhanced with textual prompts.

4k Image Animation +2

3,930
0.82 stars / hour

The Surprising Effectiveness of Test-Time Training for Abstract Reasoning

ekinakyurek/marc 11 Nov 2024

TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models; applying TTT to an 8B-parameter language model, we achieve 53% accuracy on the ARC's public validation set, improving the state-of-the-art by nearly 25% for public and purely neural approaches.

ARC Language Modelling

116
0.79 stars / hour