no code implementations • 9 Oct 2023 • Ying Shi, Dong Wang, Lantian Li, Jiqing Han
This paper investigates the possibility of extracting a target sentence from multi-talker speech using only a keyword as input.
no code implementations • 28 May 2023 • Ying Shi, Dong Wang, Lantian Li, Jiqing Han, Shi Yin
We propose a novel Mix Training (MT) strategy that encourages the model to discover low-energy keywords from noisy and mixed speech.
no code implementations • 4 Nov 2020 • Ying Shi, Haolin Chen, Zhiyuan Tang, Lantian Li, Dong Wang, Jiqing Han
Recently, speech enhancement (SE) based on deep speech prior has attracted much attention, such as the variational auto-encoder with non-negative matrix factorization (VAE-NMF) architecture.
no code implementations • 18 Jul 2019 • Ying Shi, Wei Wei, Zhiming Zheng
Zero-shot learning (ZSL) aims to recognize the novel object categories using the semantic representation of categories, and the key idea is to explore the knowledge of how the novel class is semantically related to the familiar classes.
no code implementations • 8 Nov 2018 • Lantian Li, Zhiyuan Tang, Ying Shi, Dong Wang
This paper proposes a Gaussian-constrained training approach that (1) discards the parametric classifier, and (2) enforces the distribution of the derived speaker vectors to be Gaussian.
no code implementations • 8 Nov 2018 • Lantian Li, Zhiyuan Tang, Ying Shi, Dong Wang
This score reflects the similarity of the two frames in phonetic content, and is used to weigh the contribution of this frame pair in the utterance-based scoring.
no code implementations • 27 Feb 2018 • Lantian Li, Dong Wang, Yixiang Chen, Ying Shi, Zhiyuan Tang, Thomas Fang Zheng
Various informative factors mixed in speech signals, leading to great difficulty when decoding any of the factors.
no code implementations • 5 Jun 2017 • Dong Wang, Lantian Li, Ying Shi, Yixiang Chen, Zhiyuan Tang
In this paper, we demonstrated that the speaker factor is also a short-time spectral pattern and can be largely identified with just a few frames using a simple deep neural network (DNN).
no code implementations • 10 May 2017 • Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang
Recently deep neural networks (DNNs) have been used to learn speaker features.
no code implementations • 9 May 2017 • Zhiyuan Tang, Dong Wang, Yixiang Chen, Ying Shi, Lantian Li
Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID).
no code implementations • 28 Sep 2016 • Zhiyuan Tang, Ying Shi, Dong Wang, Yang Feng, Shiyue Zhang
Recurrent neural networks (RNNs) have shown clear superiority in sequence modeling, particularly the ones with gated units, such as long short-term memory (LSTM) and gated recurrent unit (GRU).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • LREC 2012 • Marc Tomlinson, David Bracewell, Mary Draper, Zewar Almissour, Ying Shi, Jeremy Bensley
An analysis of our annotations reflects a high-degree of overlap between current theories on power and conflict within a group and the behavior of individuals within the transcripts.