Search Results for author: Peng Shen

Found 12 papers, 0 papers with code

Generative linguistic representation for spoken language identification

no code implementations18 Dec 2023 Peng Shen, Xuguang Lu, Hisashi Kawai

Effective extraction and application of linguistic features are central to the enhancement of spoken Language IDentification (LID) performance.

Language Identification speech-recognition +2

Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition

no code implementations18 Dec 2023 Peng Shen, Xugang Lu, Hisashi Kawai

Multi-talker overlapped speech recognition remains a significant challenge, requiring not only speech recognition but also speaker diarization tasks to be addressed.

speaker-diarization Speaker Diarization +2

Neural domain alignment for spoken language recognition based on optimal transport

no code implementations20 Oct 2023 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

Our previous study discovered that completely aligning the distributions between the source and target domains can introduce a negative transfer, where classes or irrelevant classes from the source domain map to a different class in the target domain during distribution alignment.

Unsupervised Domain Adaptation

Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASR

no code implementations28 Sep 2023 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

Due to the modality discrepancy between textual and acoustic modeling, efficiently transferring linguistic knowledge from a pretrained language model (PLM) to acoustic encoding for automatic speech recognition (ASR) still remains a challenging task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cross-modal Alignment with Optimal Transport for CTC-based ASR

no code implementations24 Sep 2023 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

Since the PLM is built from text while the acoustic model is trained with speech, a cross-modal alignment is required in order to transfer the context dependent linguistic knowledge from the PLM to acoustic encoding.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition

no code implementations29 Jul 2022 Peng Shen, Xugang Lu, Hisashi Kawai

For Mandarin end-to-end (E2E) automatic speech recognition (ASR) tasks, compared to character-based modeling units, pronunciation-based modeling units could improve the sharing of modeling units in model training but meet homophone problems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Partial Coupling of Optimal Transport for Spoken Language Identification

no code implementations31 Mar 2022 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

In order to reduce domain discrepancy to improve the performance of cross-domain spoken language identification (SLID) system, as an unsupervised domain adaptation (UDA) method, we have proposed a joint distribution alignment (JDA) model based on optimal transport (OT).

Language Identification Spoken language identification +1

Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification

no code implementations7 Apr 2021 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

However, in most of the discriminative training for SiamNN, only the distribution of pair-wised sample distances is considered, and the additional discriminative information in joint distribution of samples is ignored.

Binary Classification feature selection +1

Coupling a generative model with a discriminative learning framework for speaker verification

no code implementations9 Jan 2021 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

By initializing the two-branch neural network with the generatively learned model parameters of the JB model, we train the model parameters with the pairwise samples as a binary discrimination task.

Decision Making feature selection +1

Unsupervised neural adaptation model based on optimal transport for spoken language identification

no code implementations24 Dec 2020 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

By minimizing the classification loss on the training data set with the adaptation loss on both training and testing data sets, the statistical distribution difference between training and testing domains is reduced.

Language Identification Spoken language identification

Cross-scale Attention Model for Acoustic Event Classification

no code implementations27 Dec 2019 Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai

However, a potential limitation of the network is that the discriminative features from the bottom layers (which can model the short-range dependency) are smoothed out in the final representation.

Classification General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.