|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan.
More specifically, a time-frequency bin is masked if the filterbank energy in this bin is less than a certain energy threshold.
We propose an unsupervised speaker adaptation method inspired by the neural Turing machine for end-to-end (E2E) automatic speech recognition (ASR).
We present results on the LibriSpeech dataset showing that limiting the left context for self-attention in the Transformer layers makes decoding computationally tractable for streaming, with only a slight degradation in accuracy.
We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC).
To support the online recognition, we integrate the state reuse chunk-SAE and the MTA based SAD into online CTC/attention architecture.
Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art results in end-to-end automatic speech recognition (ASR).