Search Results for author: Xiaoxue Gao

Found 7 papers, 1 papers with code

Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training

no code implementations • 1 Apr 2024 • Ruijie Tao, Xinyuan Qian, Rohan Kumar Das, Xiaoxue Gao, Jiadong Wang, Haizhou Li

Audio-visual active speaker detection (AV-ASD) aims to identify which visible face is speaking in a scene with one or more persons.

Audio-Visual Active Speaker Detection Denoising +1

Paper
Add Code

Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks

no code implementations • 24 Feb 2024 • Duo Ma, Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

In this paper, we investigate a new way to pre-train such a joint speech-text model to learn enhanced speech representations and benefit various speech-related downstream tasks.

Pseudo Label Self-Supervised Learning

Paper
Add Code

Self-Transriber: Few-shot Lyrics Transcription with Self-training

no code implementations • 18 Nov 2022 • Xiaoxue Gao, Xianghu Yue, Haizhou Li

The current lyrics transcription approaches heavily rely on supervised learning with labeled data, but such data are scarce and manual labeling of singing is expensive.

Few-Shot Learning

Paper
Add Code

token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text

no code implementations • 30 Oct 2022 • Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

Firstly, due to the distinct characteristics between speech and text modalities, where speech is continuous while text is discrete, we first discretize speech into a sequence of discrete speech tokens to solve the modality mismatch problem.

intent-classification Intent Classification +1

Paper
Add Code

PoLyScriber: Integrated Fine-tuning of Extractor and Lyrics Transcriber for Polyphonic Music

no code implementations • 15 Jul 2022 • Xiaoxue Gao, Chitralekha Gupta, Haizhou Li

Lyrics transcription of polyphonic music is challenging as the background music affects lyrics intelligibility.

Paper
Add Code

Music-robust Automatic Lyrics Transcription of Polyphonic Music

1 code implementation • 7 Apr 2022 • Xiaoxue Gao, Chitralekha Gupta, Haizhou Li

To improve the robustness of lyrics transcription to the background music, we propose a strategy of combining the features that emphasize the singing vocals, i. e. music-removed features that represent singing vocal extracted features, and the features that capture the singing vocals as well as the background music, i. e. music-present features.

Automatic Lyrics Transcription Language Modelling

Paper
Code

Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music

no code implementations • 7 Apr 2022 • Xiaoxue Gao, Chitralekha Gupta, Haizhou Li

Lyrics transcription of polyphonic music is challenging not only because the singing vocals are corrupted by the background music, but also because the background music and the singing style vary across music genres, such as pop, metal, and hip hop, which affects lyrics intelligibility of the song in different ways.

Automatic Lyrics Transcription

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.