Docling Technical Report

DS4SD/docling 19 Aug 2024

This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion.

8,233
3.07 stars / hour

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

tencent/tencent-hunyuan-large 4 Nov 2024

In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens.

Logical Reasoning Mathematical Problem-Solving

1,000
2.39 stars / hour

ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate

ishohei220/adopt 5 Nov 2024

Adam is one of the most popular optimization algorithms in deep learning.

Image Classification

224
1.65 stars / hour

TableGPT2: A Large Multimodal Model with Tabular Data Integration

tablegpt/tablegpt-agent 4 Nov 2024

In response, we introduce TableGPT2, a model rigorously pre-trained and fine-tuned with over 593. 8K tables and 2. 36M high-quality query-table-output tuples, a scale of table-related data unprecedented in prior research.

Benchmarking Data Integration

169
1.65 stars / hour

OmniGen: Unified Image Generation

vectorspacelab/omnigen 17 Sep 2024

In this work, we introduce OmniGen, a new diffusion model for unified image generation.

Edge Detection Pose Estimation +2

2,397
1.28 stars / hour

HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

plageon/HtmlRAG 5 Nov 2024

To alleviate this problem, we propose HtmlRAG, which uses HTML instead of plain text as the format of retrieved knowledge in RAG.

Hallucination RAG +1

135
1.07 stars / hour

MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

donydchen/mvsplat360 7 Nov 2024

To evaluate MVSplat360's performance, we introduce a new benchmark using the challenging DL3DV-10K dataset, where MVSplat360 achieves superior visual quality compared to state-of-the-art methods on wide-sweeping or even 360{\deg} NVS tasks.

3D Reconstruction Denoising +2

73
1.06 stars / hour

Geometric Transformer with Interatomic Positional Encoding

microsoft/AI2BMD NeurIPS 2023

The widespread adoption of Transformer architectures in various data modalities has opened new avenues for the applications in molecular modeling.

332
1.05 stars / hour

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

facebookresearch/optimizers 12 Sep 2023

It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network.

Stochastic Optimization

420
1.03 stars / hour

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

THUDM/WebRL 4 Nov 2024

Specifically, WebRL incorporates 1) a self-evolving curriculum that generates new tasks from unsuccessful attempts, 2) a robust outcome-supervised reward model (ORM), and 3) adaptive reinforcement learning strategies to ensure consistent improvements.

177
0.86 stars / hour