Search Results for author: Xiaohui Zhang

Found 31 papers, 8 papers with code

A Bionic Data-driven Approach for Long-distance Underwater Navigation with Anomaly Resistance

no code implementations6 Feb 2024 Songnan Yang, Xiaohui Zhang, Shiliang Zhang, Xuehui Ma, Wenqi Bai, Yushuai Li, TingWen Huang

We integrate the developed mechanism with the TA-LSTM, and calibrate the predicted heading angles to gain resistance against geomagnetic anomalies.

Attention-Based CNN-BiLSTM for Sleep State Classification of Spatiotemporal Wide-Field Calcium Imaging Data

1 code implementation16 Jan 2024 Xiaohui Zhang, Eric C. Landsness, Hanyang Miao, Wei Chen, Michelle Tang, Lindsey M. Brier, Joseph P. Culver, Jin-Moo Lee, Mark A. Anastasio

Comparison with Existing Method: On a 3-hour WFCI recording, the CNN-BiLSTM achieved a kappa of 0. 67, comparable to a kappa of 0. 65 corresponding to the human EEG/EMG-based scoring.

EEG

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection

1 code implementation15 Dec 2023 Xiaohui Zhang, Jiangyan Yi, Chenglong Wang, Chuyuan Zhang, Siding Zeng, JianHua Tao

The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms.

Continual Learning DeepFake Detection +3

Multimodal Representation Learning by Alternating Unimodal Adaptation

no code implementations17 Nov 2023 Xiaohui Zhang, Jaehong Yoon, Mohit Bansal, Huaxiu Yao

This optimization process is controlled by a gradient modification mechanism to prevent the shared head from losing previously acquired information.

Representation Learning

Deep learning-based image super-resolution of a novel end-expandable optical fiber probe for application in esophageal cancer diagnostics

no code implementations3 Oct 2023 Xiaohui Zhang, Mimi Tan, Mansour Nabil, Richa Shukla, Shaleen Vasavada, Sharmila Anandasabapathy, Mark A. Anastasio, Elena Petrova

Aim: To improve the efficiency of endoscopic screening, we proposed a novel end-expandable endoscopic optical fiber probe for larger field of visualization and employed a deep learning-based image super-resolution (DL-SR) method to overcome the issue of limited sampling capability.

Image Super-Resolution Specificity

Exploring Speech Enhancement for Low-resource Speech Synthesis

no code implementations19 Sep 2023 Zhaoheng Ni, Sravya Popuri, Ning Dong, Kohei Saijo, Xiaohui Zhang, Gael Le Lan, Yangyang Shi, Vikas Chandra, Changhan Wang

High-quality and intelligible speech is essential to text-to-speech (TTS) model training, however, obtaining high-quality data for low-resource languages is challenging and expensive.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection

1 code implementation7 Aug 2023 Xiaohui Zhang, Jiangyan Yi, JianHua Tao, Chenglong Wang, Chuyuan Zhang

The orthogonal weight modification to overcome catastrophic forgetting does not consider the similarity of genuine audio across different datasets.

Continual Learning Speech Emotion Recognition

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection

no code implementations9 Jun 2023 Chenglong Wang, Jiangyan Yi, Xiaohui Zhang, JianHua Tao, Le Xu, Ruibo Fu

Self-supervised speech models are a rapidly developing research topic in fake audio detection.

Adaptive Fake Audio Detection with Low-Rank Model Squeezing

no code implementations8 Jun 2023 Xiaohui Zhang, Jiangyan Yi, JianHua Tao, Chenlong Wang, Le Xu, Ruibo Fu

During the inference stage, these adaptation matrices are combined with the existing model to generate the final prediction output.

TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio

no code implementations4 Apr 2023 Anurag Kumar, Ke Tan, Zhaoheng Ni, Pranay Manocha, Xiaohui Zhang, Ethan Henderson, Buye Xu

To enable this, a variety of metrics to measure quality and intelligibility under different assumptions have been developed.

Anchored Speech Recognition with Neural Transducers

no code implementations20 Oct 2022 Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli

In this paper, we investigate anchored speech recognition to make neural transducers robust to background speech.

speech-recognition Speech Recognition

Joint localization and classification of breast tumors on ultrasound images using a novel auxiliary attention-based framework

no code implementations11 Oct 2022 Zong Fan, Ping Gong, Shanshan Tang, Christine U. Lee, Xiaohui Zhang, Pengfei Song, Shigao Chen, Hua Li

By use of the attention mechanism, the auxiliary lesion-aware network can optimize multi-scale intermediate feature maps and extract rich semantic information to improve classification and localization performance.

Classification Lesion Detection

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

no code implementations18 Nov 2021 Chunxi Liu, Michael Picheny, Leda Sari, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, Yatharth Saraf

This paper presents initial Speech Recognition results on "Casual Conversations" -- a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Impact of deep learning-based image super-resolution on binary signal detection

no code implementations6 Jul 2021 Xiaohui Zhang, Varun A. Kelkar, Jason Granstedt, Hua Li, Mark A. Anastasio

The presented study highlights the urgent need for the objective assessment of DL-SR methods and suggests avenues for improving their efficacy in medical imaging applications.

Generative Adversarial Network Image Super-Resolution

Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR

no code implementations9 Nov 2020 Xiaohui Zhang, Frank Zhang, Chunxi Liu, Kjell Schubert, Julian Chan, Pradyot Prakash, Jun Liu, Ching-Feng Yeh, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig

In this work, to measure the accuracy and efficiency for a latency-controlled streaming automatic speech recognition (ASR) application, we perform comprehensive evaluations on three popular training criteria: LF-MMI, CTC and RNN-T.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces

no code implementations19 May 2020 Frank Zhang, Yongqiang Wang, Xiaohui Zhang, Chunxi Liu, Yatharth Saraf, Geoffrey Zweig

In this work, we first show that on the widely used LibriSpeech benchmark, our transformer-based context-dependent connectionist temporal classification (CTC) system produces state-of-the-art results.

Speech Recognition

Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks

1 code implementation23 Oct 2019 Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura, Geoffrey Zweig

As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function.

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition

no code implementations2 Oct 2019 Duc Le, Xiaohui Zhang, Weiyi Zheng, Christian Fügen, Geoffrey Zweig, Michael L. Seltzer

There is an implicit assumption that traditional hybrid approaches for automatic speech recognition (ASR) cannot directly model graphemes and need to rely on phonetic lexicons to get competitive performance, especially on English which has poor grapheme-phoneme correspondence.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Multilingual Graphemic Hybrid ASR with Massive Data Augmentation

no code implementations LREC 2020 Chunxi Liu, Qiaochu Zhang, Xiaohui Zhang, Kritika Singh, Yatharth Saraf, Geoffrey Zweig

Towards developing high-performing ASR for low-resource languages, approaches to address the lack of resources are to make use of data from multiple languages, and to augment the training data by creating acoustic variations.

Data Augmentation

Parallel training of DNNs with Natural Gradient and Parameter Averaging

1 code implementation27 Oct 2014 Daniel Povey, Xiaohui Zhang, Sanjeev Khudanpur

However, we have another method, an approximate and efficient implementation of Natural Gradient for Stochastic Gradient Descent (NG-SGD), which seems to allow our periodic-averaging method to work well, as well as substantially improving the convergence of SGD on a single machine.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.