Search Results for author: John H. L. Hansen

Found 40 papers, 3 papers with code

Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification

no code implementations1 Mar 2024 Mufan Sang, John H. L. Hansen

With excellent generalization ability, self-supervised speech models have shown impressive performance on various downstream speech tasks in the pre-training and fine-tuning paradigm.

Speaker Verification Transfer Learning

Multi-objective Non-intrusive Hearing-aid Speech Assessment Model

no code implementations15 Nov 2023 Hsin-Tien Chiang, Szu-Wei Fu, Hsin-Min Wang, Yu Tsao, John H. L. Hansen

Furthermore, we demonstrated that incorporating SSL models resulted in greater transferability to OOD dataset.

MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition

1 code implementation27 Oct 2023 Jiamin Xie, John H. L. Hansen

In this paper, we present MixRep, a simple and effective data augmentation strategy based on mixup for low-resource ASR.

Data Augmentation speech-recognition +1

Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition

no code implementations17 Oct 2023 Shahram Ghorbani, John H. L. Hansen

In this study, embeddings from advanced pre-trained language identification (LID) and speaker identification (SID) models are leveraged to improve the accuracy of accent classification and non-native accentedness assessment.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model

no code implementations10 Jun 2023 Mu Yang, Ram C. M. C. Shekar, Okim Kang, John H. L. Hansen

This study is focused on understanding and quantifying the change in phoneme and prosody information encoded in the Self-Supervised Learning (SSL) model, brought by an accent identification (AID) fine-tuning task.

Automatic Speech Recognition Prosody Prediction +3

Improving Transformer-based Networks With Locality For Automatic Speaker Verification

no code implementations17 Feb 2023 Mufan Sang, Yong Zhao, Gang Liu, John H. L. Hansen, Jian Wu

The proposed models achieve 0. 75% EER on VoxCeleb 1 test set, outperforming the previously proposed Transformer-based models and CNN-based models, such as ResNet34 and ECAPA-TDNN.

Speaker Verification

Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation

no code implementations22 Nov 2022 Vinay Kothapally, John H. L. Hansen

Several speech processing systems have demonstrated considerable performance improvements when deep complex neural networks (DCNN) are coupled with self-attention (SA) networks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting

no code implementations19 Nov 2022 Iván López-Espejo, Ram C. M. C. Shekar, Zheng-Hua Tan, Jesper Jensen, John H. L. Hansen

In the context of keyword spotting (KWS), the replacement of handcrafted speech features by learnable features has not yielded superior KWS performance.

Small-Footprint Keyword Spotting

Audio Anti-spoofing Using a Simple Attention Module and Joint Optimization Based on Additive Angular Margin Loss and Meta-learning

no code implementations17 Nov 2022 Zhenyu Wang, John H. L. Hansen

Automatic speaker verification systems are vulnerable to a variety of access threats, prompting research into the formulation of effective spoofing detection systems to act as a gate to filter out such spoofing attacks.

Binary Classification Meta-Learning +3

Multi-source Domain Adaptation for Text-independent Forensic Speaker Recognition

no code implementations17 Nov 2022 Zhenyu Wang, John H. L. Hansen

A comprehensive set of experiments are conducted to demonstrate that: 1) diverse acoustic environments do impact speaker recognition performance, which could advance research in audio forensics, 2) domain adversarial training learns the discriminative features which are also invariant to shifts between domains, 3) discrepancy-minimizing adaptation achieves effective performance simultaneously across multiple acoustic domains, and 4) moment-matching adaptation along with dynamic distribution alignment also significantly promotes speaker recognition performance on each domain, especially for the LENA-field domain with noise compared to all other systems.

Domain Adaptation Speaker Recognition

Fearless Steps Challenge Phase-1 Evaluation Plan

no code implementations3 Nov 2022 Aditya Joglekar, John H. L. Hansen

The Fearless Steps Challenge 2019 Phase-1 (FSC-P1) is the inaugural Challenge of the Fearless Steps Initiative hosted by the Center for Robust Speech Systems (CRSS) at the University of Texas at Dallas.

Attention and DCT based Global Context Modeling for Text-independent Speaker Recognition

no code implementations4 Aug 2022 Wei Xia, John H. L. Hansen

Second, a 2D-DCT based context model is proposed to improve model efficiency and examine the benefits of signal modeling.

Speaker Recognition Speaker Verification +1

Multi-Frequency Information Enhanced Channel Attention Module for Speaker Representation Learning

no code implementations10 Jul 2022 Mufan Sang, John H. L. Hansen

In this study, we show that GAP is a special case of a discrete cosine transform (DCT) on time-frequency domain mathematically using only the lowest frequency component in frequency decomposition.

Representation Learning Speaker Verification

DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition

no code implementations4 Jul 2022 Jiamin Xie, John H. L. Hansen

Convolutional neural networks (CNN) have improved speech recognition performance greatly by exploiting localized time-frequency patterns.

speech-recognition Speech Recognition

Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment

1 code implementation29 Mar 2022 Mu Yang, Kevin Hirschi, Stephen D. Looney, Okim Kang, John H. L. Hansen

We show that fine-tuning with pseudo labels achieves a 5. 35% phoneme error rate reduction and 2. 48% MDD F1 score improvement over a labeled-samples-only fine-tuning baseline.

Pseudo Label Self-Supervised Learning

Impact of Naturalistic Field Acoustic Environments on Forensic Text-independent Speaker Verification System

no code implementations28 Jan 2022 Zhenyu Wang, John H. L. Hansen

Audio analysis for forensic speaker verification offers unique challenges in system performance due in part to data collected in naturalistic field acoustic environments where location/scenario uncertainty is common in the forensic data collection process.

Text-Independent Speaker Verification

Vision-Cloud Data Fusion for ADAS: A Lane Change Prediction Case Study

no code implementations7 Dec 2021 Yongkang Liu, Ziran Wang, Kyungtae Han, Zhenyu Shou, Prashant Tiwari, John H. L. Hansen

To advance the development of visual guidance systems, we introduce a novel vision-cloud data fusion methodology, integrating camera image and Digital Twin information from the cloud to help intelligent vehicles make better decisions.


Single-channel speech separation using Soft-minimum Permutation Invariant Training

no code implementations16 Nov 2021 Midia Yousefi, John H. L. Hansen

A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal, referred to as label permutation ambiguity.

Speech Separation

Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network

no code implementations30 Oct 2021 Midia Yousefi, John H. L. Hansen

Most current speech technology systems are designed to operate well even in the presence of multiple active speakers.

Scenario Aware Speech Recognition: Advancements for Apollo Fearless Steps & CHiME-4 Corpora

no code implementations23 Sep 2021 Szu-Jui Chen, Wei Xia, John H. L. Hansen

With additional techniques such as pronunciation and silence probability modeling, plus multi-style training, we achieve a +5. 42% and +3. 18% relative WER improvement for the development and evaluation sets of the Fearless Steps Corpus.

speech-recognition Speech Recognition

DEAAN: Disentangled Embedding and Adversarial Adaptation Network for Robust Speaker Representation Learning

no code implementations12 Dec 2020 Mufan Sang, Wei Xia, John H. L. Hansen

Despite speaker verification has achieved significant performance improvement with the development of deep neural networks, domain mismatch is still a challenging problem in this field.

Disentanglement Domain Adaptation +1

Respiratory Distress Detection from Telephone Speech using Acoustic and Prosodic Features

no code implementations15 Nov 2020 Meemnur Rashid, Kaisar Ahmed Alman, Khaled Hasan, John H. L. Hansen, Taufiq Hasan

To capture these variations, we utilize a set of well-known acoustic and prosodic features with a Support Vector Machine (SVM) classifier for detecting the presence of respiratory distress.

Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias

no code implementations21 Sep 2020 Mufan Sang, Wei Xia, John H. L. Hansen

In forensic applications, it is very common that only small naturalistic datasets consisting of short utterances in complex or unknown acoustic environments are available.

Inductive Bias Knowledge Distillation +1

Cross-domain Adaptation with Discrepancy Minimization for Text-independent Forensic Speaker Verification

no code implementations5 Sep 2020 Zhenyu Wang, Wei Xia, John H. L. Hansen

Forensic audio analysis for speaker verification offers unique challenges due to location/scenario uncertainty and diversity mismatch between reference and naturalistic field recordings.

Domain Adaptation Speaker Verification

Speaker Representation Learning using Global Context Guided Channel and Time-Frequency Transformations

no code implementations2 Sep 2020 Wei Xia, John H. L. Hansen

In this study, we propose the global context guided channel and time-frequency transformations to model the long-range, non-local time-frequency dependencies and channel variances in speaker representations.

Representation Learning Speaker Verification

Sensor Fusion of Camera and Cloud Digital Twin Information for Intelligent Vehicles

no code implementations8 Jul 2020 Yongkang Liu, Ziran Wang, Kyungtae Han, Zhenyu Shou, Prashant Tiwari, John H. L. Hansen

With the rapid development of intelligent vehicles and Advanced Driving Assistance Systems (ADAS), a mixed level of human driver engagements is involved in the transportation system.

Position Sensor Fusion

A Unified Framework for Speech Separation

no code implementations17 Dec 2019 Fahimeh Bahmaninezhad, Shi-Xiong Zhang, Yong Xu, Meng Yu, John H. L. Hansen, Dong Yu

The initial solutions introduced for deep learning based speech separation analyzed the speech signals into time-frequency domain with STFT; and then encoded mixed signals were fed into a deep neural network based separator.

Speech Separation

Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition

no code implementations15 Oct 2019 Salar Jafarlou, Soheil Khorram, Vinay Kothapally, John H. L. Hansen

In the present study, we address this issue by investigating variants of large receptive field CNNs (LRF-CNNs) which include deeply recursive networks, dilated convolutional neural networks, and stacked hourglass networks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Domain Expansion in DNN-based Acoustic Models for Robust Speech Recognition

no code implementations1 Oct 2019 Shahram Ghorbani, Soheil Khorram, John H. L. Hansen

An obvious approach to leverage data from a new domain (e. g., new accented speech) is to first generate a comprehensive dataset of all domains, by combining all available data, and then use this dataset to retrain the acoustic models.

Robust Speech Recognition speech-recognition

Probabilistic Permutation Invariant Training for Speech Separation

no code implementations4 Aug 2019 Midia Yousefi, Soheil Khorram, John H. L. Hansen

Recently proposed Permutation Invariant Training (PIT) addresses this problem by determining the output-label assignment which minimizes the separation error.

Speech Separation

Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients

no code implementations3 Jul 2019 Nursadul Mamun, Soheil Khorram, John H. L. Hansen

To improve speech enhancement methods for CI users, we propose to perform speech enhancement in a cochlear filter-bank feature space, a feature-set specifically designed for CI users based on CI auditory stimuli.

Speech Enhancement

Exploring OpenStreetMap Availability for Driving Environment Understanding

no code implementations11 Mar 2019 Yang Zheng, Izzat H. Izzat, John H. L. Hansen

An intelligent vehicle should be able to understand the driver's perception of the environment as well as controlling behavior of the vehicle.

Autonomous Driving Semantic Segmentation

UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation

no code implementations24 Oct 2016 Chunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, Navid Shokouhi, John H. L. Hansen

This document briefly describes the systems submitted by the Center for Robust Speech Systems (CRSS) from The University of Texas at Dallas (UTD) to the 2016 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE).

Clustering Dimensionality Reduction +1

KU-ISPL Language Recognition System for NIST 2015 i-Vector Machine Learning Challenge

no code implementations21 Sep 2016 Suwon Shon, Seongkyu Mun, John H. L. Hansen, Hanseok Ko

The experimental results show that the use of duration and score fusion improves language recognition performance by 5% relative in LRiMLC15 cost.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.