Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality

implus/um-mae 20 May 2022

Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an elegant asymmetric encoder-decoder design, which significantly optimizes both the pre-training efficiency and fine-tuning accuracy.

Object Detection

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

PaddlePaddle/DeepSpeech 10 Dec 2020

In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.

Speech Recognition

Ivy: Templated Deep Learning for Inter-Framework Portability

ivy-dl/ivy 4 Feb 2021

We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.

PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

PaddlePaddle/PaddleSpeech 20 May 2022

PaddleSpeech is an open-source all-in-one speech toolkit.

MuJoCo: A physics engine for model-based control

deepmind/mujoco IEEE/RSJ IROS 2012

To facilitate optimal control applications and in particular sampling and finite differencing, the dynamics can be evaluated for different states and controls in parallel.

Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors

hsouri/bayesiantransferlearning 20 May 2022

Deep learning is increasingly moving towards a transfer learning paradigm whereby large foundation models are fine-tuned on downstream tasks, starting from an initialization learned on the source task.

Transfer Learning

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

zhangyp15/beverse 19 May 2022

Specifically, BEVerse first performs shared feature extraction and lifting to generate 4D BEV representations from multi-timestamp and multi-view images.

3D Object Detection Autonomous Driving +3

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

Rongjiehuang/FastDiff 21 Apr 2022

Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time.

Denoising Speech Synthesis +1

Automated Crossword Solving

albertkx/berkeley-crossword-solver ACL 2022

We present the Berkeley Crossword Solver, a state-of-the-art approach for automatically solving crossword puzzles.

Question Answering

Extracting Triangular 3D Models, Materials, and Lighting From Images

NVlabs/nvdiffrec 24 Nov 2021

We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations.

