In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens.
This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion.
In response, we introduce TableGPT2, a model rigorously pre-trained and fine-tuned with over 593. 8K tables and 2. 36M high-quality query-table-output tuples, a scale of table-related data unprecedented in prior research.
Adam is one of the most popular optimization algorithms in deep learning.
To alleviate this problem, we propose HtmlRAG, which uses HTML instead of plain text as the format of retrieved knowledge in RAG.
To address these limitations, we propose PromptFix, a comprehensive framework that enables diffusion models to follow human instructions to perform a wide variety of image-processing tasks.
While task-specific in terms of tuning data, our framework remains task-agnostic in architecture and pipeline, offering a powerful tool for the community and providing valuable insights for further research on product-level task-agnostic generation systems.
In this work, we introduce OmniGen, a new diffusion model for unified image generation.
PiML (read $\pi$-ML, /`pai`em`el/) is an integrated and open-access Python toolbox for interpretable machine learning model development and model diagnostics.
To evaluate MVSplat360's performance, we introduce a new benchmark using the challenging DL3DV-10K dataset, where MVSplat360 achieves superior visual quality compared to state-of-the-art methods on wide-sweeping or even 360{\deg} NVS tasks.