Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR).
Ranked #9 on Speech Recognition on LibriSpeech test-other
In this work we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task?
Historically these have focused on automatic speech recognition (ASR), speaker identification, or other lower-level tasks.
Ranked #1 on Named Entity Recognition (NER) on SLUE
A Multi-mode ASR model can fulfill various latency requirements during inference -- when a larger latency becomes acceptable, the model can process longer future context to achieve higher accuracy and when a latency budget is not flexible, the model can be less dependent on future context but still achieve reliable accuracy.
In this paper, we explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when".
When combined with self-attentive SRU LM rescoring, multistream CNN contributes for ASAPP to achieve the best WER of 1. 75% on test-clean in LibriSpeech.
In this paper we present state-of-the-art (SOTA) performance on the LibriSpeech corpus with two novel neural network architectures, a multistream CNN for acoustic modeling and a self-attentive simple recurrent unit (SRU) for language modeling.
Ranked #7 on Speech Recognition on LibriSpeech test-clean
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization.
Ranked #1 on Speaker Diarization on CALLHOME (DER(ig olp) metric)
Self-attention has been a huge success for many downstream tasks in NLP, which led to exploration of applying self-attention to speech problems as well.
Ranked #24 on Speech Recognition on LibriSpeech test-clean
This method was applied with the CallHome training corpus and improved individual system performances by on average 6. 1% (relative) against the CallHome portion of the evaluation set with no performance loss on the Switchboard portion.