no code implementations • 26 Nov 2022 • Wenbin Li, Meihao Kong, Xuesong Yang, Lei Wang, Jing Huo, Yang Gao, Jiebo Luo
In this study, we present a new unified contrastive learning representation framework (named UniCLR) suitable for all the above four kinds of methods from a novel perspective of basic affinity matrix.
no code implementations • 11 May 2022 • Ye Tang, Xuesong Yang, Xinrui Liu, Xiwei Zhao, Zhangang Lin, Changping Peng
Graph Neural Networks (GNNs) is an architecture for structural data, and has been adopted in a mass of tasks and achieved fabulous results, such as link prediction, node classification, graph classification and so on.
2 code implementations • 10 Sep 2021 • Wenbin Li, Ziyi, Wang, Xuesong Yang, Chuanqi Dong, Pinzhuo Tian, Tiexin Qin, Jing Huo, Yinghuan Shi, Lei Wang, Yang Gao, Jiebo Luo
Furthermore, based on LibFewShot, we provide comprehensive evaluations on multiple benchmarks with various backbone architectures to evaluate common pitfalls and effects of different training tricks.
1 code implementation • 22 Jul 2021 • Wenbin Li, Xuesong Yang, Meihao Kong, Lei Wang, Jing Huo, Yang Gao, Jiebo Luo
However, in small data regimes, we can not obtain a sufficient number of negative pairs or effectively avoid the over-fitting problem when negatives are not used at all.
no code implementations • 14 Dec 2020 • Hu Hu, Xuesong Yang, Zeynab Raeesy, Jinxi Guo, Gokce Keskin, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Roland Maas
Accents mismatching is a critical problem for end-to-end ASR.
11 code implementations • 14 May 2019 • Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Mark Hasegawa-Johnson
On the other hand, CVAE training is simple but does not come with the distribution-matching property of a GAN.
no code implementations • 5 Nov 2018 • Di He, Xuesong Yang, Boon Pang Lim, Yi Liang, Mark Hasegawa-Johnson, Deming Chen
In this paper, the convergence properties of CTC are improved by incorporating acoustic landmarks.
no code implementations • 15 May 2018 • Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen
Furui first demonstrated that the identity of both consonant and vowel can be perceived from the C-V transition; later, Stevens proposed that acoustic landmarks are the primary cues for speech perception, and that steady-state regions are secondary or supplemental.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 15 Feb 2018 • Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florencio, Mark Hasegawa-Johnson
On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels.
no code implementations • 7 Feb 2018 • Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Mark Hasegawa-Johnson
The performance of automatic speech recognition systems degrades with increasing mismatch between the training and testing scenarios.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 3 Dec 2016 • Xuesong Yang, Yun-Nung Chen, Dilek Hakkani-Tur, Paul Crook, Xiujun Li, Jianfeng Gao, Li Deng
Natural language understanding and dialogue policy learning are both essential in conversational systems that predict the next system actions in response to a current user utterance.
no code implementations • 10 Nov 2016 • Xiang Kong, Xuesong Yang, Mark Hasegawa-Johnson, Jeung-Yoon Choi, Stefanie Shattuck-Hufnagel
Three consonant voicing classifiers were developed: (1) manually selected acoustic features anchored at a phonetic landmark, (2) MFCCs (either averaged across the segment or anchored at the landmark), and(3) acoustic features computed using a convolutional neural network (CNN).