Search Results for author: Junyi Ao

Found 13 papers, 6 papers with code

The YiTrans Speech Translation System for IWSLT 2022 Offline Shared Task

no code implementations IWSLT (ACL) 2022 Ziqiang Zhang, Junyi Ao

This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task, which translates from English audio to German, Chinese, and Japanese.

Data Augmentation Decoder +1

SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech

no code implementations3 Jul 2024 Jingru Lin, Meng Ge, Junyi Ao, Liqun Deng, Haizhou Li

Specifically, SA-WavLM follows an "extract-merge-predict" pipeline in which the representations of each speaker in the input mixture are first extracted individually and then merged before the final prediction.

Self-Supervised Learning

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

1 code implementation19 Jun 2024 Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu

We also conduct a comprehensive evaluation using objective evaluation methods (e. g. BLEU and ROUGE), subjective evaluations and LLM-based metrics for the generated responses.

Dialogue Understanding

Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks

no code implementations24 Feb 2024 Duo Ma, Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

In this paper, we investigate a new way to pre-train such a joint speech-text model to learn enhanced speech representations and benefit various speech-related downstream tasks.

Pseudo Label Self-Supervised Learning

The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

no code implementations26 Dec 2023 Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng

This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition.

Automatic Speech Recognition Data Augmentation +2

Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder

no code implementations19 Jul 2023 Jingru Lin, Xianghu Yue, Junyi Ao, Haizhou Li

We train the model based on the idea that different realisations of the same word should be close in the underlying embedding space.

Word Embeddings

token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text

no code implementations30 Oct 2022 Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

Firstly, due to the distinct characteristics between speech and text modalities, where speech is continuous while text is discrete, we first discretize speech into a sequence of discrete speech tokens to solve the modality mismatch problem.

intent-classification Intent Classification +1

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

1 code implementation29 Mar 2022 Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko, Haizhou Li

LightHuBERT outperforms the original HuBERT on ASR and five SUPERB tasks with the HuBERT size, achieves comparable performance to the teacher model in most tasks with a reduction of 29% parameters, and obtains a $3. 5\times$ compression ratio in three SUPERB tasks, e. g., automatic speaker verification, keyword spotting, and intent classification, with a slight accuracy loss.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

4 code implementations ACL 2022 Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +8

Multi-View Self-Attention Based Transformer for Speaker Recognition

no code implementations11 Oct 2021 Rui Wang, Junyi Ao, Long Zhou, Shujie Liu, Zhihua Wei, Tom Ko, Qing Li, Yu Zhang

In this work, we propose a novel multi-view self-attention mechanism and present an empirical study of different Transformer variants with or without the proposed attention mechanism for speaker recognition.

Speaker Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.