no code implementations • 17 Feb 2023 • Mufan Sang, Yong Zhao, Gang Liu, John H. L. Hansen, Jian Wu
The proposed models achieve 0. 75% EER on VoxCeleb 1 test set, outperforming the previously proposed Transformer-based models and CNN-based models, such as ResNet34 and ECAPA-TDNN.
no code implementations • 22 Nov 2022 • Vinay Kothapally, John H. L. Hansen
Several speech processing systems have demonstrated considerable performance improvements when deep complex neural networks (DCNN) are coupled with self-attention (SA) networks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 19 Nov 2022 • Iván López-Espejo, Ram C. M. C. Shekar, Zheng-Hua Tan, Jesper Jensen, John H. L. Hansen
In the context of keyword spotting (KWS), the replacement of handcrafted speech features by learnable features has not yielded superior KWS performance.
no code implementations • 17 Nov 2022 • Zhenyu Wang, John H. L. Hansen
Automatic speaker verification systems are vulnerable to a variety of access threats, prompting research into the formulation of effective spoofing detection systems to act as a gate to filter out such spoofing attacks.
no code implementations • 17 Nov 2022 • Zhenyu Wang, John H. L. Hansen
A comprehensive set of experiments are conducted to demonstrate that: 1) diverse acoustic environments do impact speaker recognition performance, which could advance research in audio forensics, 2) domain adversarial training learns the discriminative features which are also invariant to shifts between domains, 3) discrepancy-minimizing adaptation achieves effective performance simultaneously across multiple acoustic domains, and 4) moment-matching adaptation along with dynamic distribution alignment also significantly promotes speaker recognition performance on each domain, especially for the LENA-field domain with noise compared to all other systems.
no code implementations • 3 Nov 2022 • Aditya Joglekar, John H. L. Hansen
The Fearless Steps Challenge 2019 Phase-1 (FSC-P1) is the inaugural Challenge of the Fearless Steps Initiative hosted by the Center for Robust Speech Systems (CRSS) at the University of Texas at Dallas.
no code implementations • 4 Aug 2022 • Wei Xia, John H. L. Hansen
In this study, a general global time-frequency context modeling framework is proposed to leverage the context information specifically for speaker representation modeling.
no code implementations • 10 Jul 2022 • Mufan Sang, John H. L. Hansen
In this study, we show that GAP is a special case of a discrete cosine transform (DCT) on time-frequency domain mathematically using only the lowest frequency component in frequency decomposition.
no code implementations • 4 Jul 2022 • Jiamin Xie, John H. L. Hansen
Convolutional neural networks (CNN) have improved speech recognition performance greatly by exploiting localized time-frequency patterns.
no code implementations • 30 Jun 2022 • Szu-Jui Chen, Jiamin Xie, John H. L. Hansen
As such, we further propose a feature refinement loss for decorrelation to efficiently combine the set of input features.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 29 Mar 2022 • Mu Yang, Kevin Hirschi, Stephen D. Looney, Okim Kang, John H. L. Hansen
We show that fine-tuning with pseudo labels achieves a 5. 35% phoneme error rate reduction and 2. 48% MDD F1 score improvement over a labeled-samples-only fine-tuning baseline.
no code implementations • 28 Jan 2022 • Zhenyu Wang, John H. L. Hansen
Audio analysis for forensic speaker verification offers unique challenges in system performance due in part to data collected in naturalistic field acoustic environments where location/scenario uncertainty is common in the forensic data collection process.
no code implementations • 7 Dec 2021 • Yongkang Liu, Ziran Wang, Kyungtae Han, Zhenyu Shou, Prashant Tiwari, John H. L. Hansen
To advance the development of visual guidance systems, we introduce a novel vision-cloud data fusion methodology, integrating camera image and Digital Twin information from the cloud to help intelligent vehicles make better decisions.
no code implementations • 16 Nov 2021 • Midia Yousefi, John H. L. Hansen
A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal, referred to as label permutation ambiguity.
no code implementations • 30 Oct 2021 • Midia Yousefi, John H. L. Hansen
Most current speech technology systems are designed to operate well even in the presence of multiple active speakers.
no code implementations • 23 Sep 2021 • Szu-Jui Chen, Wei Xia, John H. L. Hansen
With additional techniques such as pronunciation and silence probability modeling, plus multi-style training, we achieve a +5. 42% and +3. 18% relative WER improvement for the development and evaluation sets of the Fearless Steps Corpus.
no code implementations • 12 Dec 2020 • Mufan Sang, Wei Xia, John H. L. Hansen
Despite speaker verification has achieved significant performance improvement with the development of deep neural networks, domain mismatch is still a challenging problem in this field.
no code implementations • 15 Nov 2020 • Meemnur Rashid, Kaisar Ahmed Alman, Khaled Hasan, John H. L. Hansen, Taufiq Hasan
To capture these variations, we utilize a set of well-known acoustic and prosodic features with a Support Vector Machine (SVM) classifier for detecting the presence of respiratory distress.
no code implementations • 21 Sep 2020 • Mufan Sang, Wei Xia, John H. L. Hansen
In forensic applications, it is very common that only small naturalistic datasets consisting of short utterances in complex or unknown acoustic environments are available.
no code implementations • 5 Sep 2020 • Zhenyu Wang, Wei Xia, John H. L. Hansen
Forensic audio analysis for speaker verification offers unique challenges due to location/scenario uncertainty and diversity mismatch between reference and naturalistic field recordings.
no code implementations • 2 Sep 2020 • Wei Xia, John H. L. Hansen
In this study, we propose the global context guided channel and time-frequency transformations to model the long-range, non-local time-frequency dependencies and channel variances in speaker representations.
no code implementations • 17 Jul 2020 • Vinay Kothapally, Wei Xia, Shahram Ghorbani, John H. L. Hansen, Wei Xue, Jing Huang
The reliability of using fully convolutional networks (FCNs) has been successfully demonstrated by recent studies in many speech applications.
no code implementations • 8 Jul 2020 • Yongkang Liu, Ziran Wang, Kyungtae Han, Zhenyu Shou, Prashant Tiwari, John H. L. Hansen
With the rapid development of intelligent vehicles and Advanced Driving Assistance Systems (ADAS), a mixed level of human driver engagements is involved in the transportation system.
no code implementations • 5 Mar 2020 • Kazi Nazmul Haque, Rajib Rana, John H. L. Hansen, Björn Schuller
However, the model can become redundant if it is intended for a specific task.
no code implementations • 17 Dec 2019 • Fahimeh Bahmaninezhad, Shi-Xiong Zhang, Yong Xu, Meng Yu, John H. L. Hansen, Dong Yu
The initial solutions introduced for deep learning based speech separation analyzed the speech signals into time-frequency domain with STFT; and then encoded mixed signals were fed into a deep neural network based separator.
no code implementations • 15 Oct 2019 • Salar Jafarlou, Soheil Khorram, Vinay Kothapally, John H. L. Hansen
In the present study, we address this issue by investigating variants of large receptive field CNNs (LRF-CNNs) which include deeply recursive networks, dilated convolutional neural networks, and stacked hourglass networks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 1 Oct 2019 • Shahram Ghorbani, Soheil Khorram, John H. L. Hansen
An obvious approach to leverage data from a new domain (e. g., new accented speech) is to first generate a comprehensive dataset of all domains, by combining all available data, and then use this dataset to retrain the acoustic models.
no code implementations • 5 Aug 2019 • Wei Xia, Jing Huang, John H. L. Hansen
Speaker verification systems often degrade significantly when there is a language mismatch between training and testing data.
no code implementations • 4 Aug 2019 • Midia Yousefi, Soheil Khorram, John H. L. Hansen
Recently proposed Permutation Invariant Training (PIT) addresses this problem by determining the output-label assignment which minimizes the separation error.
no code implementations • 3 Jul 2019 • Nursadul Mamun, Soheil Khorram, John H. L. Hansen
To improve speech enhancement methods for CI users, we propose to perform speech enhancement in a cochlear filter-bank feature space, a feature-set specifically designed for CI users based on CI auditory stimuli.
3 code implementations • 7 Jun 2019 • Ekim Yurtsever, Yongkang Liu, Jacob Lambert, Chiyomi Miyajima, Eijiro Takeuchi, Kazuya Takeda, John H. L. Hansen
The best result, with a 0. 937 AUC score, was obtained with the proposed network.
no code implementations • 16 Apr 2019 • Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Cheng-Lin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE).
no code implementations • 11 Mar 2019 • Yang Zheng, Izzat H. Izzat, John H. L. Hansen
An intelligent vehicle should be able to understand the driver's perception of the environment as well as controlling behavior of the vehicle.
no code implementations • 24 Oct 2016 • Chunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, Navid Shokouhi, John H. L. Hansen
This document briefly describes the systems submitted by the Center for Robust Speech Systems (CRSS) from The University of Texas at Dallas (UTD) to the 2016 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE).
no code implementations • 21 Sep 2016 • Suwon Shon, Seongkyu Mun, John H. L. Hansen, Hanseok Ko
The experimental results show that the use of duration and score fusion improves language recognition performance by 5% relative in LRiMLC15 cost.