Paper

PoLyScriber: Integrated Fine-tuning of Extractor and Lyrics Transcriber for Polyphonic Music

Lyrics transcription of polyphonic music is challenging as the background music affects lyrics intelligibility. Typically, lyrics transcription can be performed by a two-step pipeline, i.e. a singing vocal extraction front end, followed by a lyrics transcriber back end, where the front end and back end are trained separately. Such a two-step pipeline suffers from both imperfect vocal extraction and mismatch between front end and back end. In this work, we propose a novel end-to-end integrated fine-tuning framework, that we call PoLyScriber, to globally optimize the vocal extractor front end and lyrics transcriber back end for lyrics transcription in polyphonic music. The experimental results show that our proposed PoLyScriber achieves substantial improvements over the existing approaches on publicly available test datasets.

Results in Papers With Code
(↓ scroll down to see all results)