Speech Separation Models

VoiceFilter-Lite is a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system. In this architecture, the voice filtering model operates as a frame-by-frame frontend signal processor to enhance the features consumed by the speech recognizer, without reconstructing audio signals from the features. The key contributions are (1) A system to perform speech separation directly on ASR input features; (2) An asymmetric loss function to penalize oversuppression during training, to make the model harmless under various acoustic environments, (3) An adaptive suppression strength mechanism to adapt to different noise conditions.

Source: VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition

Papers


Paper Code Results Date Stars

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories