Tacotron 2 is a neural network architecture for speech synthesis directly from text. It consists of two components:
In contrast to the original Tacotron, Tacotron 2 uses simpler building blocks, using vanilla LSTM and convolutional layers in the encoder and decoder instead of CBHG stacks and GRU recurrent layers. Tacotron 2 does not use a “reduction factor”, i.e., each decoder step corresponds to a single spectrogram frame. Location-sensitive attention is used instead of additive attention.
Source: Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram PredictionsPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Speech Synthesis | 14 | 43.75% |
Text-To-Speech Synthesis | 4 | 12.50% |
Voice Cloning | 2 | 6.25% |
Style Transfer | 2 | 6.25% |
Acoustic Modelling | 1 | 3.13% |
Voice Conversion | 1 | 3.13% |
Transliteration | 1 | 3.13% |
Zero-Shot Learning | 1 | 3.13% |
Classification | 1 | 3.13% |