Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an elegant asymmetric encoder-decoder design, which significantly optimizes both the pre-training efficiency and fine-tuning accuracy.
Ranked #16 on
Object Detection
on COCO minival
Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).
Ranked #9 on
Speech Recognition
on LibriSpeech test-clean
We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.
PaddleSpeech is an open-source all-in-one speech toolkit.
Automatic Speech Recognition
Environmental Sound Classification
+8
To facilitate optimal control applications and in particular sampling and finite differencing, the dynamics can be evaluated for different states and controls in parallel.
Deep learning is increasingly moving towards a transfer learning paradigm whereby large foundation models are fine-tuned on downstream tasks, starting from an initialization learned on the source task.
Specifically, BEVerse first performs shared feature extraction and lifting to generate 4D BEV representations from multi-timestamp and multi-view images.
Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time.
Ranked #6 on
Text-To-Speech Synthesis
on LJSpeech
We present the Berkeley Crossword Solver, a state-of-the-art approach for automatically solving crossword puzzles.
We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations.