Real-time Single-channel Dereverberation and Separation with Time-domainAudio Separation Network

ISCA Interspeech 2018  ·  Yi Luo, Nima Mesgarani ·

We investigate the recently proposed Time-domain Audio Sep-aration Network (TasNet) in the task of real-time single-channel speech dereverberation. Unlike systems that take time-frequency representation of the audio as input, TasNet learns anadaptive front-end in replacement of the time-frequency rep-resentation by a time-domain convolutional non-negative au-toencoder. We show that by formulating the dereverberationproblem as a denoising problem where the direct path is sepa-rated from the reverberations, a TasNet denoising autoencodercan outperform a deep LSTM baseline on log-power magnitudespectrogram input in both causal and non-causal settings. Wefurther show that adjusting the stride size in the convolutionalautoencoder helps both the dereverberation and separation per-formance.



Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Speech Separation WSJ0-2mix TasNet v2 SI-SDRi 13.2 # 19