OCR-free Document Understanding Transformer

clovaai/donut 30 Nov 2021

Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.

Optical Character Recognition

360
0.37 stars / hour

Ivy: Templated Deep Learning for Inter-Framework Portability

ivy-dl/ivy 4 Feb 2021

We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.

4,998
0.36 stars / hour

Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models

compvis/latent-diffusion 26 Jul 2022

In RDMs, a set of nearest neighbors is retrieved from an external database during training for each training instance, and the diffusion model is conditioned on these informative samples.

Image Generation Prompt Engineering

2,687
0.35 stars / hour

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

labmlai/annotated_deep_learning_paper_implementations BigScience (ACL) 2022

We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license.

Language Modelling

11,582
0.34 stars / hour

Elucidating the Design Space of Diffusion-Based Generative Models

lucidrains/imagen-pytorch 1 Jun 2022

We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices.

Image Generation

4,704
0.32 stars / hour

MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

google-research/jax3d 30 Jul 2022

Neural Radiance Fields (NeRFs) have demonstrated amazing ability to synthesize images of 3D scenes from novel views.

Novel View Synthesis

225
0.31 stars / hour

Learning the Beauty in Songs: Neural Singing Voice Beautifier

MoonInTheRiver/DiffSinger ACL 2022

Furthermore, we propose a latent-mapping algorithm in the latent space to convert the amateur vocal tone to the professional one.

Dynamic Time Warping

455
0.30 stars / hour

Patch Similarity Aware Data-Free Quantization for Vision Transformers

zkkli/psaq-vit 4 Mar 2022

Vision transformers have recently gained great success on various computer vision tasks; nevertheless, their high model complexity makes it challenging to deploy on resource-constrained devices.

Data Free Quantization

35
0.30 stars / hour

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

nvidiagameworks/kaolin-wisp 16 Jan 2022

Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate.

3D Reconstruction 3D Shape Reconstruction +2

634
0.27 stars / hour

Speech Enhancement and Dereverberation with Diffusion-based Generative Models

sp-uhh/sgmse 11 Aug 2022

Furthermore, we show that the proposed method achieves remarkable state-of-the-art performance in single-channel speech dereverberation.

Speech Dereverberation

26
0.26 stars / hour