Firstly, we apply the prosody-based data augmentation to supplement the audio data.
Using this corpus, we also construct a retrieval-based evaluation task for Finnish chatbot development.
On ComParE 2020 tasks, we investigate applying an ensemble of E2E models for robust performance and developing task-specific modifications for each task.
We, however, show that for character-based NNLMs, only pretraining with a related language improves the ASR performance, and using an unrelated language may deteriorate it.
On these tasks, interpolating the baseline RNNLM approximation and a conventional LM outperforms the conventional LM in terms of the Maximum Term Weighted Value for single-character subwords.
There are several approaches for improving neural machine translation for low-resource languages: Monolingual data can be exploited via pretraining or data augmentation; Parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; Subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary.
Transformers have recently taken the center stage in language modeling after LSTM's were considered the dominant model architecture for a long time.
Using English, Finnish, North Sami, and Turkish data sets, we show that this approach is able to find better solutions to the optimization problem defined by the Morfessor Baseline model than its original recursive training algorithm.
In the mobile device, augmented reality (AR) was used to help the hearing impaired observe gestures and lip movements of the speaker simultaneously with the transcriptions.
In this paper, we also describe the experiments leading up to our final systems.
Our experiments show that the effect of the visual features in our system is small.
Today, the vocabulary size for language models in large vocabulary speech recognition is typically several hundreds of thousands of words.
We present a new tool for training neural network language models (NNLMs), scoring sentences, and generating text.