Kiosks are a popular self-service option in many fast-food restaurants, they save time for the visitors and save labor for the fast-food chains.
In this memo, we show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results in low-resource setting for the language identification task and set up a SOTA for the Low Resource ASR challenge dataset.
This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on predicting language IDs from speech.
The performance of automated speech recognition (ASR) systems is well known to differ for varied application domains.
Ranked #1 on Speech Recognition on MediaSpeech
In the past few years, triplet loss-based metric embeddings have become a de-facto standard for several important computer vision problems, most no-tably, person reidentification.
Ranked #1 on Keyword Spotting on Google Speech Commands