Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality

implus/um-mae 20 May 2022

Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an elegant asymmetric encoder-decoder design, which significantly optimizes both the pre-training efficiency and fine-tuning accuracy.

Object Detection

70
1.01 stars / hour

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

zhangyp15/beverse 19 May 2022

Specifically, BEVerse first performs shared feature extraction and lifting to generate 4D BEV representations from multi-timestamp and multi-view images.

3D Object Detection Autonomous Driving +3

95
0.84 stars / hour

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

hpcaitech/colossalai 28 Oct 2021

The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing.

3,550
0.82 stars / hour

PaddleNLP

PaddlePaddle/PaddleNLP ACL ARR January 2022

Easy-to-use and powerful NLP library with Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including Neural Search, Question Answering, Information Extraction and Sentiment Analysis end-to-end system.

Few-Shot Learning Link Prediction +2

4,184
0.82 stars / hour

Ivy: Templated Deep Learning for Inter-Framework Portability

ivy-dl/ivy 4 Feb 2021

We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.

1,848
0.72 stars / hour

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

hongfz16/avatarclip 17 May 2022

Our key insight is to take advantage of the powerful vision-language model CLIP for supervising neural human generation, in terms of 3D geometry, texture and animation.

Language Modelling motion synthesis +1

331
0.69 stars / hour

Extracting Triangular 3D Models, Materials, and Lighting From Images

NVlabs/nvdiffrec 24 Nov 2021

We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations.

711
0.64 stars / hour

Automated Crossword Solving

albertkx/berkeley-crossword-solver ACL 2022

We present the Berkeley Crossword Solver, a state-of-the-art approach for automatically solving crossword puzzles.

Question Answering

57
0.62 stars / hour

Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors

hsouri/bayesiantransferlearning 20 May 2022

Deep learning is increasingly moving towards a transfer learning paradigm whereby large foundation models are fine-tuned on downstream tasks, starting from an initialization learned on the source task.

Transfer Learning

29
0.59 stars / hour

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

PaddlePaddle/DeepSpeech 10 Dec 2020

In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.

Speech Recognition

3,237
0.57 stars / hour