2 code implementations • 22 Aug 2023 • Shansong Liu, Atin Sakkeer Hussain, Chenshuo Sun, Ying Shan
To fill this gap, we present a methodology for generating question-answer pairs from existing audio captioning datasets and introduce the MusicQA Dataset designed for answering open-ended music-related questions.
Ranked #1 on Music Question Answering on MusicQA
no code implementations • 28 Jun 2022 • Xu Li, Shansong Liu, Ying Shan
It is suspected that a single embedding vector may only capture averaged and coarse-grained speaker characteristics, which is insufficient for the SVC task.
no code implementations • 19 Mar 2022 • Shujie Hu, Shansong Liu, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shoukang Hu, Mingyu Cui, Xunying Liu, Helen Meng
Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems for normal speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 15 Jan 2022 • Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng
Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date.
Audio-Visual Speech Recognition Automatic Speech Recognition +4
no code implementations • 14 Jan 2022 • Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng
This paper investigates a set of data augmentation techniques for disordered speech recognition, including vocal tract length perturbation (VTLP), tempo perturbation and speed perturbation.
no code implementations • 14 Jan 2022 • Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi Ye, Zengrui Jin, Xunying Liu, Helen Meng
Automatic recognition of disordered speech remains a highly challenging task to date.
1 code implementation • 8 Jan 2022 • Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng
State-of-the-art automatic speech recognition (ASR) system development is data and computation intensive.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 2 Aug 2021 • Zengrui Jin, Mengzhe Geng, Xurong Xie, Jianwei Yu, Shansong Liu, Xunying Liu, Helen Meng
Automatic recognition of disordered speech remains a highly challenging task to date.
no code implementations • 9 Feb 2021 • Boyang Xue, Jianwei Yu, Junhao Xu, Shansong Liu, Shoukang Hu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng
Performance improvements were also obtained on a cross domain LM adaptation task requiring porting a Transformer LM trained on the Switchboard and Fisher data to a low-resource DementiaBank elderly speech corpus.
no code implementations • 8 Dec 2020 • Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng
On a third cross domain adaptation task requiring rapidly porting a 1000 hour LibriSpeech data trained system to a small DementiaBank elderly speech corpus, the proposed Bayesian TDNN LF-MMI systems outperformed the baseline system using direct weight fine-tuning by up to 2. 5\% absolute WER reduction.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 16 Nov 2020 • Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu
Automatic speech recognition (ASR) technologies have been significantly advanced in the past few decades.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 17 Jul 2020 • Shoukang Hu, Xurong Xie, Shansong Liu, Mingyu Cui, Mengzhe Geng, Xunying Liu, Helen Meng
Deep neural networks (DNNs) based automatic speech recognition (ASR) systems are often designed using expert knowledge and empirical evaluation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 6 Jan 2020 • Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu
Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29. 98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system.
Ranked #4 on Audio-Visual Speech Recognition on LRS2
Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +4