Search Results for author: Xianghu Yue

Found 7 papers, 0 papers with code

Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks

no code implementations24 Feb 2024 Duo Ma, Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

In this paper, we investigate a new way to pre-train such a joint speech-text model to learn enhanced speech representations and benefit various speech-related downstream tasks.

Pseudo Label Self-Supervised Learning

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

no code implementations22 Jan 2024 Xianghu Yue, Xiaohai Tian, Lu Lu, Malu Zhang, Zhizheng Wu, Haizhou Li

To bridge the gap between modalities, CoAVT employs a query encoder, which contains a set of learnable query embeddings, and extracts the most informative audiovisual features of the corresponding text.

AudioCaps Audio-Visual Synchronization +4

Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder

no code implementations19 Jul 2023 Jingru Lin, Xianghu Yue, Junyi Ao, Haizhou Li

We train the model based on the idea that different realisations of the same word should be close in the underlying embedding space.

Word Embeddings

Self-Transriber: Few-shot Lyrics Transcription with Self-training

no code implementations18 Nov 2022 Xiaoxue Gao, Xianghu Yue, Haizhou Li

The current lyrics transcription approaches heavily rely on supervised learning with labeled data, but such data are scarce and manual labeling of singing is expensive.

Few-Shot Learning

token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text

no code implementations30 Oct 2022 Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

Firstly, due to the distinct characteristics between speech and text modalities, where speech is continuous while text is discrete, we first discretize speech into a sequence of discrete speech tokens to solve the modality mismatch problem.

intent-classification Intent Classification +1

End-to-End Code-Switching ASR for Low-Resourced Language Pairs

no code implementations27 Sep 2019 Xianghu Yue, Grandee Lee, Emre Yilmaz, Fang Deng, Haizhou Li

In this work, we describe an E2E ASR pipeline for the recognition of CS speech in which a low-resourced language is mixed with a high resourced language.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Multi-Graph Decoding for Code-Switching ASR

no code implementations18 Jun 2019 Emre Yilmaz, Samuel Cohen, Xianghu Yue, David van Leeuwen, Haizhou Li

This archive contains recordings with monolingual Frisian and Dutch speech segments as well as Frisian-Dutch CS speech, hence the recognition performance on monolingual segments is also vital for accurate transcriptions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Cannot find the paper you are looking for? You can Submit a new open access paper.