Our second contribution, DreamClear, is a DiT-based image restoration model.
To facilitate the scale-up of Emilia, we also present Emilia-Pipe, the first open-source preprocessing pipeline designed to efficiently transform raw, in-the-wild speech data into high-quality training data with speech annotations.
The recent large-scale text-to-speech (TTS) systems are usually grouped as autoregressive and non-autoregressive systems.
This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion.
In this work, we introduce OmniGen, a new diffusion model for unified image generation.
The recently developed retrieval-augmented generation (RAG) technology has enabled the efficient construction of domain-specific applications.
In contrast, Large Vision-Language Models (LVLMs) excel in scene understanding and reasoning.
When pretrained on Objects365, D-FINE-L / X attains 57. 1% / 59. 3% AP, surpassing all existing real-time detectors.
Ranked #1 on Real-Time Object Detection on MS COCO (using extra training data)
To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals.
By treating model parameters as tokens, we replace all the linear projections in Transformers with our token-parameter attention layer, where input tokens act as queries and model parameters as keys and values.