7 papers with code • 0 benchmarks • 0 datasets
We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.
First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset.
Deep generative models have achieved great success in unsupervised learning with the ability to capture complex nonlinear relationships between latent generating factors and observations.
Then, for each class, probabilities of this class are used to compute a mean vector, which we refer to as mean soft labels.
On the Aurora 4 task, the very deep CNN achieves a WER of 8. 81%, further 7. 99% with auxiliary feature joint training, and 7. 09% with LSTM-RNN joint decoding.