Flow-Guided Transformer for Video Inpainting

hitachinsk/fgt 14 Aug 2022

Especially in spatial transformer, we design a dual perspective spatial MHSA, which integrates the global tokens to the window-based attention.

Video Inpainting

55
1.54 stars / hour

OCR-free Document Understanding Transformer

clovaai/donut 30 Nov 2021

Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.

Optical Character Recognition

520
1.12 stars / hour

Collaborative Neural Rendering using Anime Character Sheets

megvii-research/conr 12 Jul 2022

Drawing images of characters with desired poses is an essential but laborious task in anime production.

Neural Rendering

365
0.76 stars / hour

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

timdettmers/bitsandbytes 15 Aug 2022

We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance.

Language Modelling Linguistic Acceptability +4

72
0.61 stars / hour

KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints

facebookresearch/KeypointNeRF 10 May 2022

In this work, we investigate common issues with existing spatial encodings and propose a simple yet highly effective approach to modeling high-fidelity volumetric humans from sparse views.

3D Face Reconstruction 3D Human Reconstruction +2

121
0.59 stars / hour

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

alpa-projects/alpa 28 Jan 2022

Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations.

755
0.57 stars / hour

Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models

compvis/latent-diffusion 26 Jul 2022

In RDMs, a set of nearest neighbors is retrieved from an external database during training for each training instance, and the diffusion model is conditioned on these informative samples.

Image Generation Prompt Engineering

2,766
0.45 stars / hour

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Real-Time Object Detection

4,507
0.44 stars / hour

Deep Patch Visual Odometry

princeton-vl/dpvo 8 Aug 2022

We propose Deep Patch Visual Odometry (DPVO), a new deep learning system for monocular Visual Odometry (VO).

Monocular Visual Odometry

120
0.42 stars / hour

StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3

arthur-qiu/stylefacev 16 Aug 2022

Notably, StyleFaceV is capable of generating realistic $1024\times1024$ face videos even without high-resolution training videos.

Image Generation Video Generation

35
0.42 stars / hour