StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

autonomousvision/stylegan-t 23 Jan 2023

Text-to-image synthesis has recently seen significant progress thanks to large pretrained language models, large-scale training data, and the introduction of scalable model families such as diffusion and autoregressive models.

Pretrained Language Models Text-to-Image Generation

339
0.83 stars / hour

OCR-free Document Understanding Transformer

clovaai/donut 30 Nov 2021

Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.

Optical Character Recognition

1,184
0.78 stars / hour

BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining

microsoft/biogpt 19 Oct 2022

Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain.

Document Classification Language Modelling +3

206
0.60 stars / hour

DAMO-YOLO : A Report on Real-Time Object Detection Design

tinyvision/damo-yolo 23 Nov 2022

In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series.

Neural Architecture Search object-detection +1

1,093
0.59 stars / hour

Towards Robust Blind Face Restoration with Codebook Lookup Transformer

sczhou/codeformer 22 Jun 2022

In this paper, we demonstrate that a learned discrete codebook prior in a small proxy space largely reduces the uncertainty and ambiguity of restoration mapping by casting blind face restoration as a code prediction task, while providing rich visual atoms for generating high-quality faces.

Blind Face Restoration

4,108
0.49 stars / hour

Scaling Language-Image Pre-training via Masking

ofa-sys/chinese-clip 1 Dec 2022

We present Fast Language-Image Pre-training (FLIP), a simple and more efficient method for training CLIP.

765
0.47 stars / hour

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

hazyresearch/h3 28 Dec 2022

First, we use synthetic language modeling tasks to understand the gap between SSMs and attention.

Few-Shot Learning Language Modelling

185
0.40 stars / hour

ProGen2: Exploring the Boundaries of Protein Language Models

salesforce/progen 27 Jun 2022

Attention-based models trained on protein sequences have demonstrated incredible success at classification and generation tasks relevant for artificial intelligence-driven protein design.

173
0.39 stars / hour

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning

MTLab/MorphMLP 24 Nov 2021

With such multi-dimension and multi-scale factorization, our MorphMLP block can achieve a great accuracy-computation balance.

Ranked #18 on Action Recognition on Something-Something V2 (using extra training data)

Action Recognition Image Classification +2

93
0.37 stars / hour