DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

shallowdream204/dreamclear 24 Oct 2024

Our second contribution, DreamClear, is a DiT-based image restoration model.

Image Restoration

663
2.59 stars / hour

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

open-mmlab/Amphion 7 Jul 2024

To facilitate the scale-up of Emilia, we also present Emilia-Pipe, the first open-source preprocessing pipeline designed to efficiently transform raw, in-the-wild speech data into high-quality training data with speech annotations.

7,288
2.33 stars / hour

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

open-mmlab/amphion 1 Sep 2024

The recent large-scale text-to-speech (TTS) systems are usually grouped as autoregressive and non-autoregressive systems.

Self-Supervised Learning Text to Speech

7,234
2.23 stars / hour

Docling Technical Report

DS4SD/docling 19 Aug 2024

This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion.

5,728
2.03 stars / hour

OmniGen: Unified Image Generation

vectorspacelab/omnigen 17 Sep 2024

In this work, we introduce OmniGen, a new diffusion model for unified image generation.

Edge Detection Pose Estimation +2

1,861
1.93 stars / hour

KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation

openspg/kag 10 Sep 2024

The recently developed retrieval-augmented generation (RAG) technology has enabled the efficient construction of domain-specific applications.

Knowledge Graphs Question Answering +2

485
1.81 stars / hour

Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

hustvl/senna 29 Oct 2024

In contrast, Large Vision-Language Models (LVLMs) excel in scene understanding and reasoning.

Autonomous Driving Scene Understanding +1

136
1.80 stars / hour

D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement

Peterande/D-FINE 17 Oct 2024

When pretrained on Objects365, D-FINE-L / X attains 57. 1% / 59. 3% AP, surpassing all existing real-time detectors.

 Ranked #1 on Real-Time Object Detection on MS COCO (using extra training data)

Real-Time Object Detection regression

626
1.76 stars / hour

Data Formulator 2: Iteratively Creating Rich Visualizations with AI

microsoft/data-formulator 28 Aug 2024

To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals.

Code Generation Navigate

1,164
1.60 stars / hour

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

haiyang-w/tokenformer 30 Oct 2024

By treating model parameters as tokens, we replace all the linear projections in Transformers with our token-parameter attention layer, where input tokens act as queries and model parameters as keys and values.

151
1.46 stars / hour