Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.
Neural transducers have been widely used in automatic speech recognition (ASR).
The proposed model builds on a state-of-the-art recognition system using a bi-directional long short-term memory (BLSTM) model with utterance-wise dropout and iterative speaker adaptation, but employs a Conformer encoder instead of the BLSTM network.
Atlantic Multidecadal Variability (AMV) describes variations of North Atlantic sea surface temperature with a typical cycle of between 60 and 70 years.
In this paper, we propose to apply recurrent selective attention network (RSAN) to CSS, which generates a variable number of output channels based on active speaker counting.
To promote the practical application for autonomous floating wastes cleaning, we present FloW, the first dataset for floating waste detection in inland water areas.
We propose speaker separation using speaker inventories and estimated speech (SSUSIES), a framework leveraging speaker profiles and estimated speech for speaker separation.
Although our system is trained on simulated room impulse responses (RIR) based on a fixed number of microphones arranged in a given geometry, it generalizes well to a real array with the same geometry.
In this study, we analyze the distortion problem, compare different acoustic models, and investigate a distortion-independent training scheme for monaural speech recognition.
This paper presented our work on applying Recurrent Deep Stacking Networks (RDSNs) to Robust Automatic Speech Recognition (ASR) tasks.
This paper proposed a class of novel Deep Recurrent Neural Networks which can incorporate language-level information into acoustic models.