Search Results for author: Zhiyuan Tang

Found 24 papers, 2 papers with code

A Two-Stage Framework to Generate Video Chapter

no code implementations • 29 Sep 2021 • Canyu Le, Zhiyuan Tang, Ke Li, Jiandong Yang

On top of this dataset, we propose a two-stage framework to perform chapter localization and chapter title generation.

Paper
Add Code

Semantic Data Augmentation for End-to-End Mandarin Speech Recognition

no code implementations • 26 Apr 2021 • Jianwei Sun, Zhiyuan Tang, Hengxin Yin, Wei Wang, Xi Zhao, Shuaijiang Zhao, Xiaoning Lei, Wei Zou, Xiangang Li

Augmentation related issues, such as comparison of different strategies and ratios for data combination are also investigated.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Can We Trust Deep Speech Prior?

no code implementations • 4 Nov 2020 • Ying Shi, Haolin Chen, Zhiyuan Tang, Lantian Li, Dong Wang, Jiqing Han

Recently, speech enhancement (SE) based on deep speech prior has attracted much attention, such as the variational auto-encoder with non-negative matrix factorization (VAE-NMF) architecture.

Speech Enhancement

Paper
Add Code

AP20-OLR Challenge: Three Tasks and Their Baselines

no code implementations • 4 Jun 2020 • Zheng Li, Miao Zhao, Qingyang Hong, Lin Li, Zhiyuan Tang, Dong Wang, Li-Ming Song, Cheng Yang

Based on Kaldi and Pytorch, recipes for i-vector and x-vector systems are also conducted as baselines for the three tasks.

Dialect Identification

Paper
Add Code

AP19-OLR Challenge: Three Tasks and Their Baselines

no code implementations • 16 Jul 2019 • Zhiyuan Tang, Dong Wang, Li-Ming Song

The participants can refer to these online-published recipes to deploy LID systems for convenience.

Paper
Add Code

Phonetic-attention scoring for deep speaker features in speaker verification

no code implementations • 8 Nov 2018 • Lantian Li, Zhiyuan Tang, Ying Shi, Dong Wang

This score reflects the similarity of the two frames in phonetic content, and is used to weigh the contribution of this frame pair in the utterance-based scoring.

Machine Translation Speaker Verification +1

Paper
Add Code

Gaussian-Constrained training for speaker verification

no code implementations • 8 Nov 2018 • Lantian Li, Zhiyuan Tang, Ying Shi, Dong Wang

This paper proposes a Gaussian-constrained training approach that (1) discards the parametric classifier, and (2) enforces the distribution of the derived speaker vectors to be Gaussian.

Speaker Verification

Paper
Add Code

AP18-OLR Challenge: Three Tasks and Their Baselines

1 code implementation • 2 Jun 2018 • Zhiyuan Tang, Dong Wang, Qing Chen

The third oriental language recognition (OLR) challenge AP18-OLR is introduced in this paper, including the data profile, the tasks and the evaluation principles.

Open Set Learning

Paper
Code

Deep factorization for speech signal

no code implementations • 27 Feb 2018 • Lantian Li, Dong Wang, Yixiang Chen, Ying Shi, Zhiyuan Tang, Thomas Fang Zheng

Various informative factors mixed in speech signals, leading to great difficulty when decoding any of the factors.

Emotion Recognition Speaker Recognition

Paper
Add Code

Human and Machine Speaker Recognition Based on Short Trivial Events

no code implementations • 15 Nov 2017 • Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai, Dong Wang

Trivial events are ubiquitous in human to human conversations, e. g., cough, laugh and sniff.

Speaker Recognition

Paper
Add Code

Full-info Training for Deep Speaker Feature Learning

no code implementations • 31 Oct 2017 • Lantian Li, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng

In recent studies, it has shown that speaker patterns can be learned from very short speech segments (e. g., 0. 3 seconds) by a carefully designed convolutional & time-delay deep neural network (CT-DNN) model.

Speaker Verification

Paper
Add Code

AP17-OLR Challenge: Data, Plan, and Baseline

1 code implementation • 28 Jun 2017 • Zhiyuan Tang, Dong Wang, Yixiang Chen, Qing Chen

We present the data profile and the evaluation plan of the second oriental language recognition (OLR) challenge AP17-OLR.

Paper
Code

Deep Speaker Verification: Do We Need End to End?

no code implementations • 22 Jun 2017 • Dong Wang, Lantian Li, Zhiyuan Tang, Thomas Fang Zheng

This principle has recently been applied to several prototype research on speaker verification (SV), where the feature learning and classifier are learned together with an objective function that is consistent with the evaluation metric.

Speaker Verification

Paper
Add Code

Deep Factorization for Speech Signal

no code implementations • 5 Jun 2017 • Dong Wang, Lantian Li, Ying Shi, Yixiang Chen, Zhiyuan Tang

In this paper, we demonstrated that the speaker factor is also a short-time spectral pattern and can be largely identified with just a few frames using a simple deep neural network (DNN).

Emotion Recognition

Paper
Add Code

Deep Speaker Feature Learning for Text-independent Speaker Verification

no code implementations • 10 May 2017 • Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang

Recently deep neural networks (DNNs) have been used to learn speaker features.

Text-Independent Speaker Verification

Paper
Add Code

Phonetic Temporal Neural Model for Language Identification

no code implementations • 9 May 2017 • Zhiyuan Tang, Dong Wang, Yixiang Chen, Lantian Li, Andrew Abel

Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID).

Language Identification

Paper
Add Code

Phone-aware Neural Language Identification

no code implementations • 9 May 2017 • Zhiyuan Tang, Dong Wang, Yixiang Chen, Ying Shi, Lantian Li

Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID).

Language Identification

Paper
Add Code

Memory Visualization for Gated Recurrent Neural Networks in Speech Recognition

no code implementations • 28 Sep 2016 • Zhiyuan Tang, Ying Shi, Dong Wang, Yang Feng, Shiyue Zhang

Recurrent neural networks (RNNs) have shown clear superiority in sequence modeling, particularly the ones with gated units, such as long short-term memory (LSTM) and gated recurrent unit (GRU).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Multi-task Recurrent Model for True Multilingual Speech Recognition

no code implementations • 27 Sep 2016 • Zhiyuan Tang, Lantian Li, Dong Wang

Research on multilingual speech recognition remains attractive yet challenging.

speech-recognition Speech Recognition

Paper
Add Code

Collaborative Learning for Language and Speaker Recognition

no code implementations • 27 Sep 2016 • Lantian Li, Zhiyuan Tang, Dong Wang, Andrew Abel, Yang Feng, Shiyue Zhang

This paper presents a unified model to perform language and speaker recognition simultaneously and altogether.

Speaker Recognition

Paper
Add Code

OC16-CE80: A Chinese-English Mixlingual Database and A Speech Recognition Baseline

no code implementations • 27 Sep 2016 • Dong Wang, Zhiyuan Tang, Difei Tang, Qing Chen

We present the OC16-CE80 Chinese-English mixlingual speech database which was released as a main resource for training, development and test for the Chinese-English mixlingual speech recognition (MixASR-CHEN) challenge on O-COCOSDA 2016.

speech-recognition Speech Recognition

Paper
Add Code

Multi-task Recurrent Model for Speech and Speaker Recognition

no code implementations • 31 Mar 2016 • Zhiyuan Tang, Lantian Li, Dong Wang

Although highly correlated, speech and speaker recognition have been regarded as two independent tasks and studied by two communities.

Speaker Recognition

Paper
Add Code

Knowledge Transfer Pre-training

no code implementations • 7 Jun 2015 • Zhiyuan Tang, Dong Wang, Yiqiao Pan, Zhiyong Zhang

Compared to the conventional layer-wise methods, this new method does not care about the model structure, so can be used to pre-train very complex models.

speech-recognition Speech Recognition +1

Paper
Add Code

Recurrent Neural Network Training with Dark Knowledge Transfer

no code implementations • 18 May 2015 • Zhiyuan Tang, Dong Wang, Zhiyong Zhang

Recent research found that a well-trained model can be used as a teacher to train other child models, by using the predictions generated by the teacher model as supervision.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.