Search Results for author: Zhiyuan Tang

Found 24 papers, 2 papers with code

A Two-Stage Framework to Generate Video Chapter

no code implementations29 Sep 2021 Canyu Le, Zhiyuan Tang, Ke Li, Jiandong Yang

On top of this dataset, we propose a two-stage framework to perform chapter localization and chapter title generation.

Vocal Bursts Valence Prediction

Can We Trust Deep Speech Prior?

no code implementations4 Nov 2020 Ying Shi, Haolin Chen, Zhiyuan Tang, Lantian Li, Dong Wang, Jiqing Han

Recently, speech enhancement (SE) based on deep speech prior has attracted much attention, such as the variational auto-encoder with non-negative matrix factorization (VAE-NMF) architecture.

Speech Enhancement

AP20-OLR Challenge: Three Tasks and Their Baselines

no code implementations4 Jun 2020 Zheng Li, Miao Zhao, Qingyang Hong, Lin Li, Zhiyuan Tang, Dong Wang, Li-Ming Song, Cheng Yang

Based on Kaldi and Pytorch, recipes for i-vector and x-vector systems are also conducted as baselines for the three tasks.

Dialect Identification

AP19-OLR Challenge: Three Tasks and Their Baselines

no code implementations16 Jul 2019 Zhiyuan Tang, Dong Wang, Li-Ming Song

The participants can refer to these online-published recipes to deploy LID systems for convenience.

Phonetic-attention scoring for deep speaker features in speaker verification

no code implementations8 Nov 2018 Lantian Li, Zhiyuan Tang, Ying Shi, Dong Wang

This score reflects the similarity of the two frames in phonetic content, and is used to weigh the contribution of this frame pair in the utterance-based scoring.

Machine Translation Speaker Verification +1

Gaussian-Constrained training for speaker verification

no code implementations8 Nov 2018 Lantian Li, Zhiyuan Tang, Ying Shi, Dong Wang

This paper proposes a Gaussian-constrained training approach that (1) discards the parametric classifier, and (2) enforces the distribution of the derived speaker vectors to be Gaussian.

Speaker Verification

AP18-OLR Challenge: Three Tasks and Their Baselines

1 code implementation2 Jun 2018 Zhiyuan Tang, Dong Wang, Qing Chen

The third oriental language recognition (OLR) challenge AP18-OLR is introduced in this paper, including the data profile, the tasks and the evaluation principles.

Open Set Learning

Deep factorization for speech signal

no code implementations27 Feb 2018 Lantian Li, Dong Wang, Yixiang Chen, Ying Shi, Zhiyuan Tang, Thomas Fang Zheng

Various informative factors mixed in speech signals, leading to great difficulty when decoding any of the factors.

Emotion Recognition Speaker Recognition

Full-info Training for Deep Speaker Feature Learning

no code implementations31 Oct 2017 Lantian Li, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng

In recent studies, it has shown that speaker patterns can be learned from very short speech segments (e. g., 0. 3 seconds) by a carefully designed convolutional & time-delay deep neural network (CT-DNN) model.

Speaker Verification

AP17-OLR Challenge: Data, Plan, and Baseline

1 code implementation28 Jun 2017 Zhiyuan Tang, Dong Wang, Yixiang Chen, Qing Chen

We present the data profile and the evaluation plan of the second oriental language recognition (OLR) challenge AP17-OLR.

Deep Speaker Verification: Do We Need End to End?

no code implementations22 Jun 2017 Dong Wang, Lantian Li, Zhiyuan Tang, Thomas Fang Zheng

This principle has recently been applied to several prototype research on speaker verification (SV), where the feature learning and classifier are learned together with an objective function that is consistent with the evaluation metric.

Speaker Verification

Deep Factorization for Speech Signal

no code implementations5 Jun 2017 Dong Wang, Lantian Li, Ying Shi, Yixiang Chen, Zhiyuan Tang

In this paper, we demonstrated that the speaker factor is also a short-time spectral pattern and can be largely identified with just a few frames using a simple deep neural network (DNN).

Emotion Recognition

Phonetic Temporal Neural Model for Language Identification

no code implementations9 May 2017 Zhiyuan Tang, Dong Wang, Yixiang Chen, Lantian Li, Andrew Abel

Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID).

Language Identification

Phone-aware Neural Language Identification

no code implementations9 May 2017 Zhiyuan Tang, Dong Wang, Yixiang Chen, Ying Shi, Lantian Li

Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID).

Language Identification

Memory Visualization for Gated Recurrent Neural Networks in Speech Recognition

no code implementations28 Sep 2016 Zhiyuan Tang, Ying Shi, Dong Wang, Yang Feng, Shiyue Zhang

Recurrent neural networks (RNNs) have shown clear superiority in sequence modeling, particularly the ones with gated units, such as long short-term memory (LSTM) and gated recurrent unit (GRU).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Collaborative Learning for Language and Speaker Recognition

no code implementations27 Sep 2016 Lantian Li, Zhiyuan Tang, Dong Wang, Andrew Abel, Yang Feng, Shiyue Zhang

This paper presents a unified model to perform language and speaker recognition simultaneously and altogether.

Speaker Recognition

OC16-CE80: A Chinese-English Mixlingual Database and A Speech Recognition Baseline

no code implementations27 Sep 2016 Dong Wang, Zhiyuan Tang, Difei Tang, Qing Chen

We present the OC16-CE80 Chinese-English mixlingual speech database which was released as a main resource for training, development and test for the Chinese-English mixlingual speech recognition (MixASR-CHEN) challenge on O-COCOSDA 2016.

speech-recognition Speech Recognition

Multi-task Recurrent Model for Speech and Speaker Recognition

no code implementations31 Mar 2016 Zhiyuan Tang, Lantian Li, Dong Wang

Although highly correlated, speech and speaker recognition have been regarded as two independent tasks and studied by two communities.

Speaker Recognition

Knowledge Transfer Pre-training

no code implementations7 Jun 2015 Zhiyuan Tang, Dong Wang, Yiqiao Pan, Zhiyong Zhang

Compared to the conventional layer-wise methods, this new method does not care about the model structure, so can be used to pre-train very complex models.

speech-recognition Speech Recognition +1

Recurrent Neural Network Training with Dark Knowledge Transfer

no code implementations18 May 2015 Zhiyuan Tang, Dong Wang, Zhiyong Zhang

Recent research found that a well-trained model can be used as a teacher to train other child models, by using the predictions generated by the teacher model as supervision.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Cannot find the paper you are looking for? You can Submit a new open access paper.