no code implementations • 18 Dec 2023 • Peng Shen, Xuguang Lu, Hisashi Kawai
Effective extraction and application of linguistic features are central to the enhancement of spoken Language IDentification (LID) performance.
no code implementations • 18 Dec 2023 • Peng Shen, Xugang Lu, Hisashi Kawai
Multi-talker overlapped speech recognition remains a significant challenge, requiring not only speech recognition but also speaker diarization tasks to be addressed.
no code implementations • 20 Oct 2023 • Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
Our previous study discovered that completely aligning the distributions between the source and target domains can introduce a negative transfer, where classes or irrelevant classes from the source domain map to a different class in the target domain during distribution alignment.
no code implementations • 28 Sep 2023 • Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
Due to the modality discrepancy between textual and acoustic modeling, efficiently transferring linguistic knowledge from a pretrained language model (PLM) to acoustic encoding for automatic speech recognition (ASR) still remains a challenging task.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 24 Sep 2023 • Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
Since the PLM is built from text while the acoustic model is trained with speech, a cross-modal alignment is required in order to transfer the context dependent linguistic knowledge from the PLM to acoustic encoding.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 29 Jul 2022 • Peng Shen, Xugang Lu, Hisashi Kawai
For Mandarin end-to-end (E2E) automatic speech recognition (ASR) tasks, compared to character-based modeling units, pronunciation-based modeling units could improve the sharing of modeling units in model training but meet homophone problems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 8 Apr 2022 • Peng Shen, Xugang Lu, Hisashi Kawai
The acoustic and linguistic features are important cues for the spoken language identification (LID) task.
no code implementations • 31 Mar 2022 • Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
In order to reduce domain discrepancy to improve the performance of cross-domain spoken language identification (SLID) system, as an unsupervised domain adaptation (UDA) method, we have proposed a joint distribution alignment (JDA) model based on optimal transport (OT).
no code implementations • 7 Apr 2021 • Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
However, in most of the discriminative training for SiamNN, only the distribution of pair-wised sample distances is considered, and the additional discriminative information in joint distribution of samples is ignored.
no code implementations • 9 Jan 2021 • Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
By initializing the two-branch neural network with the generatively learned model parameters of the JB model, we train the model parameters with the pairwise samples as a binary discrimination task.
no code implementations • 24 Dec 2020 • Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
By minimizing the classification loss on the training data set with the adaptation loss on both training and testing data sets, the statistical distribution difference between training and testing domains is reduced.
no code implementations • 27 Dec 2019 • Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai
However, a potential limitation of the network is that the discriminative features from the bottom layers (which can model the short-range dependency) are smoothed out in the final representation.