no code implementations • 17 Dec 2024 • Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Yanyu Li, Yifan Gong, Kai Zhang, Hao Tan, Jason Kuen, Henghui Ding, Zhihao Shu, Wei Niu, Pu Zhao, Yanzhi Wang, Jiuxiang Gu
In this paper, we show that performing the full computation of the model at each diffusion step is unnecessary, as some computations can be skipped by lazily reusing the results of previous steps.
1 code implementation • 2 Nov 2024 • Zheng Zhan, Yushu Wu, Yifan Gong, Zichong Meng, Zhenglun Kong, Changdi Yang, Geng Yuan, Pu Zhao, Wei Niu, Yanzhi Wang
The rapid progress in artificial intelligence-generated content (AIGC), especially with diffusion models, has significantly advanced development of high-quality video generation.
1 code implementation • 16 Oct 2024 • Zheng Zhan, Yushu Wu, Zhenglun Kong, Changdi Yang, Yifan Gong, Xuan Shen, Xue Lin, Pu Zhao, Yanzhi Wang
While token reduction techniques offer a straightforward post-training strategy, we find that applying existing methods directly to SSMs leads to substantial performance drops.
1 code implementation • 1 Oct 2024 • Yifan Gong, Yushu Wu, Zheng Zhan, Pu Zhao, Liangkai Liu, Chao Wu, Xulong Tang, Yanzhi Wang
Two-stage object detectors exhibit high accuracy and precise localization, especially for identifying small objects that are favorable for various edge applications.
no code implementations • 27 Sep 2024 • Zheng Zhan, Zhenglun Kong, Yifan Gong, Yushu Wu, Zichong Meng, Hangyu Zheng, Xuan Shen, Stratis Ioannidis, Wei Niu, Pu Zhao, Yanzhi Wang
Inspired by the observations that the final prediction in vision transformers (ViTs) is only based on a subset of most informative tokens, we take the novel step of enhancing the efficiency of SSM-based vision models through token-based pruning.
1 code implementation • 25 Sep 2024 • Xuan Shen, Pu Zhao, Yifan Gong, Zhenglun Kong, Zheng Zhan, Yushu Wu, Ming Lin, Chao Wu, Xue Lin, Yanzhi Wang
Large Language Models (LLMs) have long held sway in the realms of artificial intelligence research.
no code implementations • 16 Jul 2024 • Yifan Gong, Zheng Zhan, Yanyu Li, Yerlan Idelbayev, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren
Good weight initialization serves as an effective measure to reduce the training cost of a deep neural network (DNN) model.
no code implementations • 16 Jan 2024 • Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka
The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First, a benchmarking dataset of 315 meetings, averaging 6 minutes each, capturing a broad spectrum of real-world acoustic conditions and conversational dynamics.
no code implementations • 11 Jan 2024 • Yifan Gong, Zheng Zhan, Qing Jin, Yanyu Li, Yerlan Idelbayev, Xian Liu, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren
One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models to generate paired datasets used for training generative adversarial networks (GANs).
no code implementations • 15 Nov 2023 • Jing Yao, Xiaoyuan Yi, Xiting Wang, Yifan Gong, Xing Xie
The rapid advancement of Large Language Models (LLMs) has attracted much attention to value alignment for their responsible development.
no code implementations • 14 Sep 2023 • Shaoshi Ling, Guoli Ye, Rui Zhao, Yifan Gong
The attention-based encoder-decoder (AED) speech recognition model has been widely successful in recent years.
no code implementations • 30 Jul 2023 • Eric Sun, Jinyu Li, Jian Xue, Yifan Gong
When mixing 20, 000 hours augmented speech data generated by our method with 12, 500 hours original transcribed speech data for Italian Transformer transducer model pre-training, we achieve 8. 7% relative word error rate reduction.
1 code implementation • 17 Jul 2023 • Shaoshi Ling, Yuxuan Hu, Shuangbei Qian, Guoli Ye, Yao Qian, Yifan Gong, Ed Lin, Michael Zeng
Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perform acoustic and language modeling functions.
no code implementations • 30 Apr 2023 • Zifeng Wang, Zheng Zhan, Yifan Gong, Yucai Shao, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy
Rehearsal-based approaches are a mainstay of continual learning (CL).
1 code implementation • 13 Mar 2023 • Yuguang Yao, Jiancheng Liu, Yifan Gong, Xiaoming Liu, Yanzhi Wang, Xue Lin, Sijia Liu
We call this 'model parsing of adversarial attacks' - a task to uncover 'arcana' in terms of the concealed VM information in attacks.
no code implementations • 1 Mar 2023 • Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong
We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference.
no code implementations • 9 Dec 2022 • Yifan Gong, Zheng Zhan, Pu Zhao, Yushu Wu, Chao Wu, Caiwen Ding, Weiwen Jiang, Minghai Qin, Yanzhi Wang
By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i. e., keeping the difference in speed performance under various execution frequencies as small as possible.
1 code implementation • 22 Nov 2022 • Sizhe Chen, Geng Yuan, Xinwen Cheng, Yifan Gong, Minghai Qin, Yanzhi Wang, Xiaolin Huang
In this paper, we uncover them by model checkpoints' gradients, forming the proposed self-ensemble protection (SEP), which is very effective because (1) learning on examples ignored during normal training tends to yield DNNs ignoring normal examples; (2) checkpoints' cross-model gradients are close to orthogonal, meaning that they are as diverse as DNNs with different architectures.
no code implementations • 7 Nov 2022 • Yashesh Gaur, Nick Kibre, Jian Xue, Kangyuan Shu, Yuhui Wang, Issac Alphanso, Jinyu Li, Yifan Gong
Automatic Speech Recognition (ASR) systems typically yield output in lexical form.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 20 Sep 2022 • Zifeng Wang, Zheng Zhan, Yifan Gong, Geng Yuan, Wei Niu, Tong Jian, Bin Ren, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy
SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity.
1 code implementation • 25 Jul 2022 • Yushu Wu, Yifan Gong, Pu Zhao, Yanyu Li, Zheng Zhan, Wei Niu, Hao Tang, Minghai Qin, Bin Ren, Yanzhi Wang
Instead of measuring the speed on mobile devices at each iteration during the search process, a speed model incorporated with compiler optimizations is leveraged to predict the inference latency of the SR block with various width configurations for faster convergence.
2 code implementations • ICLR 2022 • Yifan Gong, Yuguang Yao, Yize Li, Yimeng Zhang, Xiaoming Liu, Xue Lin, Sijia Liu
However, carefully crafted, tiny adversarial perturbations are difficult to recover by optimizing a unilateral RED objective.
no code implementations • 24 Jan 2022 • Liang Lu, Jinyu Li, Yifan Gong
Our experimental results based on the 2-speaker LibrispeechMix dataset show that the SURT model can achieve promising EP detection without significantly degradation of the recognition accuracy.
no code implementations • 22 Nov 2021 • Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, Xulong Tang, Yanzhi Wang
Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices.
1 code implementation • NeurIPS 2021 • Geng Yuan, Xiaolong Ma, Wei Niu, Zhengang Li, Zhenglun Kong, Ning Liu, Yifan Gong, Zheng Zhan, Chaoyang He, Qing Jin, Siyue Wang, Minghai Qin, Bin Ren, Yanzhi Wang, Sijia Liu, Xue Lin
Systematical evaluation on accuracy, training speed, and memory footprint are conducted, where the proposed MEST framework consistently outperforms representative SOTA works.
no code implementations • 10 Oct 2021 • Guoli Ye, Vadim Mazalov, Jinyu Li, Yifan Gong
Hybrid and end-to-end (E2E) systems have their individual advantages, with different error patterns in the speech recognition results.
no code implementations • 6 Oct 2021 • Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li, Xie Chen, Yu Wu, Yifan Gong
ILMA enables a fast text-only adaptation of the E2E model without increasing the run-time computational cost.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 23 Sep 2021 • Jeremy H. M. Wong, Yifan Gong
Speakers may move around while diarisation is being performed.
no code implementations • 22 Sep 2021 • Jeremy H. M. Wong, Igor Abramovski, Xiong Xiao, Yifan Gong
Previous works have shown that spatial location information can be complementary to speaker embeddings for a speaker diarisation task.
no code implementations • ICCV 2021 • Zheng Zhan, Yifan Gong, Pu Zhao, Geng Yuan, Wei Niu, Yushu Wu, Tianyun Zhang, Malith Jayaweera, David Kaeli, Bin Ren, Xue Lin, Yanzhi Wang
Though recent years have witnessed remarkable progress in single image super-resolution (SISR) tasks with the prosperous development of deep neural networks (DNNs), the deep learning methods are confronted with the computation and memory consumption issues in practice, especially for resource-limited platforms such as mobile devices.
no code implementations • 4 Jun 2021 • Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong
In this work, we perform LM fusion in the minimum WER (MWER) training of an E2E model to obviate the need for LM weights tuning during inference.
no code implementations • 27 Apr 2021 • Rui Zhao, Jian Xue, Jinyu Li, Wenning Wei, Lei He, Yifan Gong
The first challenge is solved with a splicing data method which concatenates the speech segments extracted from the source domain data.
no code implementations • 5 Apr 2021 • Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong
In multi-talker scenarios such as meetings and conversations, speech processing systems are usually required to transcribe the audio as well as identify the speakers for downstream applications.
no code implementations • 2 Feb 2021 • Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong
The efficacy of external language model (LM) integration with existing end-to-end (E2E) automatic speech recognition (ASR) systems can be improved significantly using the internal language model estimation (ILME) method.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 26 Nov 2020 • Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong
End-to-end multi-talker speech recognition is an emerging research trend in the speech community due to its vast potential in applications such as conversation and meeting transcriptions.
no code implementations • 3 Nov 2020 • Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong
The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 23 Oct 2020 • Liang Lu, Zhong Meng, Naoyuki Kanda, Jinyu Li, Yifan Gong
Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end acoustic model that extends the standard Recurrent Neural Network Transducer (RNN-T) for the purpose of the external language model (LM) fusion.
no code implementations • 20 Oct 2020 • Peidong Wang, Zhuo Chen, DeLiang Wang, Jinyu Li, Yifan Gong
We propose speaker separation using speaker inventories and estimated speech (SSUSIES), a framework leveraging speaker profiles and estimated speech for speaker separation.
no code implementations • 30 Jul 2020 • Jinyu Li, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong
Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 19 May 2020 • Liang Lu, Changliang Liu, Jinyu Li, Yifan Gong
While recurrent neural networks still largely define state-of-the-art speech recognition systems, the Transformer network has been proven to be a competitive alternative, especially in the offline condition.
no code implementations • 1 May 2020 • Hu Hu, Rui Zhao, Jinyu Li, Liang Lu, Yifan Gong
Recently, the recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research due to its advantages of being capable for online streaming speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 25 Apr 2020 • Zhong Meng, Hu Hu, Jinyu Li, Changliang Liu, Yan Huang, Yifan Gong, Chin-Hui Lee
We propose a novel neural label embedding (NLE) scheme for the domain adaptation of a deep neural network (DNN) acoustic model with unpaired data samples from source and target domains.
no code implementations • 10 Apr 2020 • Hirofumi Inaguma, Yashesh Gaur, Liang Lu, Jinyu Li, Yifan Gong
This leads to an inevitable latency during inference.
no code implementations • 17 Mar 2020 • Jinyu Li, Rui Zhao, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong
While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion, we argue that such conventional hybrid models can still be significantly improved.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 13 Mar 2020 • Yifan Gong, Zheng Zhan, Zhengang Li, Wei Niu, Xiaolong Ma, Wenhao Wang, Bin Ren, Caiwen Ding, Xue Lin, Xiao-Lin Xu, Yanzhi Wang
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.
no code implementations • 19 Feb 2020 • Peiyan Dong, Siyue Wang, Wei Niu, Chengming Zhang, Sheng Lin, Zhengang Li, Yifan Gong, Bin Ren, Xue Lin, Yanzhi Wang, Dingwen Tao
Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become prevalent on mobile devices such as smart phones.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 23 Jan 2020 • Zhengang Li, Yifan Gong, Xiaolong Ma, Sijia Liu, Mengshu Sun, Zheng Zhan, Zhenglun Kong, Geng Yuan, Yanzhi Wang
Structured weight pruning is a representative model compression technique of DNNs for hardware efficiency and inference accelerations.
no code implementations • 23 Jan 2020 • Xiaolong Ma, Zhengang Li, Yifan Gong, Tianyun Zhang, Wei Niu, Zheng Zhan, Pu Zhao, Jian Tang, Xue Lin, Bin Ren, Yanzhi Wang
Accelerating DNN execution on various resource-limited computing platforms has been a long-standing problem.
no code implementations • 6 Jan 2020 • Zhong Meng, Jinyu Li, Yashesh Gaur, Yifan Gong
In this work, we extend the T/S learning to large-scale unsupervised domain adaptation of an attention-based end-to-end (E2E) model through two levels of knowledge transfer: teacher's token posteriors as soft labels and one-best predictions as decoder guidance.
no code implementations • 6 Jan 2020 • Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong
However, as one input to the decoder recurrent neural network (RNN), each WSU embedding is learned independently through context and acoustic information in a purely data-driven fashion.
no code implementations • 10 Dec 2019 • Takuya Yoshioka, Igor Abramovski, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao, Tianyan Zhou
This increases marginally to 1. 6% when 50% of the attendees are unknown to the system.
no code implementations • 9 Nov 2019 • Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong
We propose three regularization-based speaker adaptation approaches to adapt the attention-based encoder-decoder (AED) model with very limited adaptation data from target speakers for end-to-end automatic speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 26 Sep 2019 • Jinyu Li, Rui Zhao, Hu Hu, Yifan Gong
In this paper, we improve the RNN-T training in two aspects.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 9 Sep 2019 • Liang Lu, Eric Sun, Yifan Gong
Furthermore, the auxiliary loss also works as a regularizer, which improves the generalization capacity of the network.
1 code implementation • 12 Jul 2019 • Liang Lu, Xiong Xiao, Zhuo Chen, Yifan Gong
While similar toolkits are available built on top of the two, a key feature of PyKaldi2 is sequence training with criteria such as MMI, sMBR and MPE.
no code implementations • 11 May 2019 • Shi-Xiong Zhang, Yifan Gong, Dong Yu
One good property of the DPN is that it can be trained on unencrypted speech features in the traditional way.
no code implementations • 29 Apr 2019 • Zhong Meng, Jinyu Li, Yifan Gong
We propose a novel adversarial speaker adaptation (ASA) scheme, in which adversarial learning is applied to regularize the distribution of deep hidden features in a speaker-dependent (SD) deep neural network (DNN) acoustic model to be close to that of a fixed speaker-independent (SI) DNN acoustic model during adaptation.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 29 Apr 2019 • Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong
The use of deep networks to extract embeddings for speaker recognition has proven successfully.
no code implementations • 28 Apr 2019 • Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong
To overcome this problem, we propose a conditional T/S learning scheme, in which a "smart" student model selectively chooses to learn from either the teacher model or the ground truth labels conditioned on whether the teacher can correctly predict the ground truth.
no code implementations • 28 Apr 2019 • Zhong Meng, Jinyu Li, Yifan Gong
Adversarial domain-invariant training (ADIT) proves to be effective in suppressing the effects of domain variability in acoustic modeling and has led to improved performance in automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 4 Jan 2019 • Ke Li, Jinyu Li, Yong Zhao, Kshitiz Kumar, Yifan Gong
We propose two approaches for speaker adaptation in end-to-end (E2E) automatic speech recognition systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 31 Dec 2018 • Amit Das, Jinyu Li, Guoli Ye, Rui Zhao, Yifan Gong
In particular, we introduce Attention CTC, Self-Attention CTC, Hybrid CTC, and Mixed-unit CTC.
no code implementations • 6 Sep 2018 • Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang, Juang
To achieve better performance on ASR task, senone-aware (SA) AFM is further proposed in which an acoustic model network is jointly trained with the feature-mapping and discriminator networks to optimize the senone classification loss in addition to the AFM losses.
no code implementations • 6 Sep 2018 • Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang, Juang
In this paper, we propose a cycle-consistent speech enhancement (CSE) in which an additional inverse mapping network is introduced to reconstruct the noisy features from the enhanced ones.
no code implementations • 28 Aug 2018 • Jinyu Li, Changliang Liu, Yifan Gong
In this paper, we propose a layer trajectory LSTM (ltLSTM) which builds a layer-LSTM using all the layer outputs from a standard multi-layer time-LSTM.
no code implementations • 14 Apr 2018 • Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong
In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system.
no code implementations • 2 Apr 2018 • Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang, Juang
In this method, a student acoustic model and a condition classifier are jointly optimized to minimize the Kullback-Leibler divergence between the output distributions of the teacher and student models, and simultaneously, to min-maximize the condition classification loss.
no code implementations • 2 Apr 2018 • Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang, Juang
We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system.
no code implementations • 15 Mar 2018 • Jinyu Li, Guoli Ye, Amit Das, Rui Zhao, Yifan Gong
However, the word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.
no code implementations • 15 Mar 2018 • Amit Das, Jinyu Li, Rui Zhao, Yifan Gong
In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework.
no code implementations • 28 Nov 2017 • Jinyu Li, Guoli Ye, Rui Zhao, Jasha Droppo, Yifan Gong
However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.
no code implementations • 21 Nov 2017 • Zhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li, Yifan Gong
Unsupervised domain adaptation of speech signal aims at adapting a well-trained source-domain acoustic model to the unlabeled data from target domain.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 17 Aug 2017 • Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong
High accuracy speech recognition requires a large amount of transcribed data for supervised training.
no code implementations • 3 Jan 2017 • Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li, Yifan Gong
A new type of End-to-End system for text-dependent speaker verification is presented in this paper.