# OCR-free Document Understanding Transformer

30 Nov 2021

Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.

564
1.18 stars / hour

# PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

The key of PAConv is to construct the convolution kernel by dynamically assembling basic weight matrices stored in Weight Bank, where the coefficients of these weight matrices are self-adaptively learned from point positions through ScoreNet.

74
1.08 stars / hour

# LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

15 Aug 2022

We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance.

135
0.96 stars / hour

# Flow-Guided Transformer for Video Inpainting

14 Aug 2022

Especially in spatial transformer, we design a dual perspective spatial MHSA, which integrates the global tokens to the window-based attention.

84
0.87 stars / hour

# Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing

17 Aug 2022

In particular, we propose a novel approach paint2pix, which learns to predict (and adapt) "what a user wants to draw" from rudimentary brushstroke inputs, by learning a mapping from the manifold of incomplete human paintings to their realistic renderings.

27
0.83 stars / hour

# StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3

16 Aug 2022

Notably, StyleFaceV is capable of generating realistic $1024\times1024$ face videos even without high-resolution training videos.

68
0.78 stars / hour

# Learning to Count Anything: Reference-less Class-agnostic Counting with Weak Supervision

20 May 2022

While there are class-agnostic counting methods that can generalise to unseen classes, these methods require reference images to define the type of object to be counted, as well as instance annotations during training.

83
0.71 stars / hour

# TotalSegmentator: robust segmentation of 104 anatomical structures in CT images

11 Aug 2022

Finally, we train a segmentation algorithm on this new dataset.

64
0.59 stars / hour

# YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

4,598
0.58 stars / hour

# Can Language Models Make Fun? A Case Study in Chinese Comical Crosstalk

2 Jul 2022

However, the humor aspect of natural language is relatively under-investigated, especially in the age of pre-trained language models.

66
0.56 stars / hour