PASE+ is a problem-agnostic speech encoder that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require manual annotations as ground truth). An online speech distortion module is employed, that contaminates the input signals with a variety of random disturbances. A revised encoder is also proposed that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks. Finally, the authors refine the set of workers used in self-supervision to encourage better cooperation.
Source: Multi-task self-supervised learning for Robust Speech RecognitionPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Denoising | 1 | 14.29% |
Emotion Classification | 1 | 14.29% |
Multi-Task Learning | 1 | 14.29% |
Speech Denoising | 1 | 14.29% |
Robust Speech Recognition | 1 | 14.29% |
Self-Supervised Learning | 1 | 14.29% |
Speech Recognition | 1 | 14.29% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |