# OCR-free Document Understanding Transformer

30 Nov 2021

Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.

# Flow-Guided Transformer for Video Inpainting

14 Aug 2022

Especially in spatial transformer, we design a dual perspective spatial MHSA, which integrates the global tokens to the window-based attention.

# StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3

16 Aug 2022

Notably, StyleFaceV is capable of generating realistic $1024\times1024$ face videos even without high-resolution training videos.

# LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

15 Aug 2022

We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance.

# Learning to Count Anything: Reference-less Class-agnostic Counting with Weak Supervision

20 May 2022

While there are class-agnostic counting methods that can generalise to unseen classes, these methods require reference images to define the type of object to be counted, as well as instance annotations during training.

# Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing

17 Aug 2022

In particular, we propose a novel approach paint2pix, which learns to predict (and adapt) "what a user wants to draw" from rudimentary brushstroke inputs, by learning a mapping from the manifold of incomplete human paintings to their realistic renderings.

# YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

# Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models

26 Jul 2022

In RDMs, a set of nearest neighbors is retrieved from an external database during training for each training instance, and the diffusion model is conditioned on these informative samples.

# TotalSegmentator: robust segmentation of 104 anatomical structures in CT images

11 Aug 2022

Finally, we train a segmentation algorithm on this new dataset.

# Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

28 Jan 2022

Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations.

