Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality

implus/um-mae 20 May 2022

Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an elegant asymmetric encoder-decoder design, which significantly optimizes both the pre-training efficiency and fine-tuning accuracy.

Object Detection

0.91 stars / hour

Conformer: Convolution-augmented Transformer for Speech Recognition

PaddlePaddle/DeepSpeech 16 May 2020

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).

Automatic Speech Recognition

0.73 stars / hour

Ivy: Templated Deep Learning for Inter-Framework Portability

ivy-dl/ivy 4 Feb 2021

We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.

0.72 stars / hour

MuJoCo: A physics engine for model-based control

deepmind/mujoco IEEE/RSJ IROS 2012

To facilitate optimal control applications and in particular sampling and finite differencing, the dynamics can be evaluated for different states and controls in parallel.

0.65 stars / hour

Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors

hsouri/bayesiantransferlearning 20 May 2022

Deep learning is increasingly moving towards a transfer learning paradigm whereby large foundation models are fine-tuned on downstream tasks, starting from an initialization learned on the source task.

Transfer Learning

0.63 stars / hour

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

zhangyp15/beverse 19 May 2022

Specifically, BEVerse first performs shared feature extraction and lifting to generate 4D BEV representations from multi-timestamp and multi-view images.

3D Object Detection Autonomous Driving +3

0.58 stars / hour

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

Rongjiehuang/FastDiff 21 Apr 2022

Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time.

Denoising Speech Synthesis +1

0.55 stars / hour

Automated Crossword Solving

albertkx/berkeley-crossword-solver ACL 2022

We present the Berkeley Crossword Solver, a state-of-the-art approach for automatically solving crossword puzzles.

Question Answering

0.55 stars / hour

Extracting Triangular 3D Models, Materials, and Lighting From Images

NVlabs/nvdiffrec 24 Nov 2021

We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations.

0.54 stars / hour