Search Results for author: Xiaohui Zhang

Found 41 papers, 11 papers with code

Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement

no code implementations11 Feb 2025 Xueyao Zhang, Xiaohui Zhang, Kainan Peng, Zhenyu Tang, Vimal Manohar, Yingru Liu, Jeff Hwang, Dangna Li, Yuhao Wang, Julian Chan, Yuan Huang, Zhizheng Wu, Mingbo Ma

However, existing methods rely heavily on annotated data, and struggle with effectively disentangling timbre and style, leading to challenges in achieving controllable generation, especially in zero-shot scenarios.

Disentanglement Text to Speech +1

Geomagnetic and Inertial Combined Navigation Approach Based on Flexible Correction-Model Predictive Control Algorithm

no code implementations8 Dec 2024 Xiaohui Zhang, Xingming Li, Songnan Yang, Wenqi Bai, Yirong Lan

This paper proposes a geomagnetic and inertial combined navigation approach based on the flexible correction-model predictive control algorithm (Fc-MPC).

Model Predictive Control

Long-distance Geomagnetic Navigation in GNSS-denied Environments with Deep Reinforcement Learning

no code implementations21 Oct 2024 Wenqi Bai, Xiaohui Zhang, Shiliang Zhang, Songnan Yang, Yushuai Li, TingWen Huang

Geomagnetic navigation has drawn increasing attention with its capacity in navigating through complex environments and its independence from external navigation services like global navigation satellite systems (GNSS).

Deep Reinforcement Learning

Efficient Streaming LLM for Speech Recognition

no code implementations2 Oct 2024 Junteng Jia, Gil Keren, Wei Zhou, Egor Lakomkin, Xiaohui Zhang, Chunyang Wu, Frank Seide, Jay Mahadeokar, Ozlem Kalinli

Recent works have shown that prompting large language models with audio encodings can unlock speech recognition capabilities.

Decoder speech-recognition +1

Efficient Neural Common Neighbor for Temporal Graph Link Prediction

1 code implementation12 Jun 2024 Xiaohui Zhang, Yanbo Wang, Xiyuan Wang, Muhan Zhang

However, such methods focus on learning individual node representations, but overlook the pairwise representation learning nature of link prediction and fail to capture the important pairwise features of links such as common neighbors (CN).

Link Prediction Representation Learning

Identifying Functional Brain Networks of Spatiotemporal Wide-Field Calcium Imaging Data via a Long Short-Term Memory Autoencoder

no code implementations30 May 2024 Xiaohui Zhang, Eric C Landsness, Lindsey M Brier, Wei Chen, Michelle J. Tang, Hanyang Miao, Jin-Moo Lee, Mark A. Anastasio, Joseph P. Culver

The goal of this study is to elucidate and illustrate, qualitatively and quantitatively, the FBNs identified by use of the LSTM-AER method and compare them to those from traditional SBC and ICA.

Representation Learning

A Bionic Data-driven Approach for Long-distance Underwater Navigation with Anomaly Resistance

no code implementations6 Feb 2024 Songnan Yang, Xiaohui Zhang, Shiliang Zhang, Xuehui Ma, Wenqi Bai, Yushuai Li, TingWen Huang

We integrate the developed mechanism with the TA-LSTM, and calibrate the predicted heading angles to gain resistance against geomagnetic anomalies.

Attention-Based CNN-BiLSTM for Sleep State Classification of Spatiotemporal Wide-Field Calcium Imaging Data

1 code implementation16 Jan 2024 Xiaohui Zhang, Eric C. Landsness, Hanyang Miao, Wei Chen, Michelle Tang, Lindsey M. Brier, Joseph P. Culver, Jin-Moo Lee, Mark A. Anastasio

Comparison with Existing Method: On a 3-hour WFCI recording, the CNN-BiLSTM achieved a kappa of 0. 67, comparable to a kappa of 0. 65 corresponding to the human EEG/EMG-based scoring.

EEG

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection

1 code implementation15 Dec 2023 Xiaohui Zhang, Jiangyan Yi, Chenglong Wang, Chuyuan Zhang, Siding Zeng, JianHua Tao

The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms.

Audio Deepfake Detection Continual Learning +3

Multimodal Representation Learning by Alternating Unimodal Adaptation

no code implementations CVPR 2024 Xiaohui Zhang, Jaehong Yoon, Mohit Bansal, Huaxiu Yao

This optimization process is controlled by a gradient modification mechanism to prevent the shared head from losing previously acquired information.

Representation Learning

Deep learning-based image super-resolution of a novel end-expandable optical fiber probe for application in esophageal cancer diagnostics

no code implementations3 Oct 2023 Xiaohui Zhang, Mimi Tan, Mansour Nabil, Richa Shukla, Shaleen Vasavada, Sharmila Anandasabapathy, Mark A. Anastasio, Elena Petrova

Aim: To improve the efficiency of endoscopic screening, we proposed a novel end-expandable endoscopic optical fiber probe for larger field of visualization and employed a deep learning-based image super-resolution (DL-SR) method to overcome the issue of limited sampling capability.

Image Super-Resolution Specificity

Exploring Speech Enhancement for Low-resource Speech Synthesis

no code implementations19 Sep 2023 Zhaoheng Ni, Sravya Popuri, Ning Dong, Kohei Saijo, Xiaohui Zhang, Gael Le Lan, Yangyang Shi, Vikas Chandra, Changhan Wang

High-quality and intelligible speech is essential to text-to-speech (TTS) model training, however, obtaining high-quality data for low-resource languages is challenging and expensive.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection

1 code implementation7 Aug 2023 Xiaohui Zhang, Jiangyan Yi, JianHua Tao, Chenglong Wang, Chuyuan Zhang

The orthogonal weight modification to overcome catastrophic forgetting does not consider the similarity of genuine audio across different datasets.

Continual Learning Speech Emotion Recognition

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection

no code implementations9 Jun 2023 Chenglong Wang, Jiangyan Yi, Xiaohui Zhang, JianHua Tao, Le Xu, Ruibo Fu

Self-supervised speech models are a rapidly developing research topic in fake audio detection.

Adaptive Fake Audio Detection with Low-Rank Model Squeezing

no code implementations8 Jun 2023 Xiaohui Zhang, Jiangyan Yi, JianHua Tao, Chenlong Wang, Le Xu, Ruibo Fu

During the inference stage, these adaptation matrices are combined with the existing model to generate the final prediction output.

TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio

no code implementations4 Apr 2023 Anurag Kumar, Ke Tan, Zhaoheng Ni, Pranay Manocha, Xiaohui Zhang, Ethan Henderson, Buye Xu

To enable this, a variety of metrics to measure quality and intelligibility under different assumptions have been developed.

Anchored Speech Recognition with Neural Transducers

no code implementations20 Oct 2022 Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli

In this paper, we investigate anchored speech recognition to make neural transducers robust to background speech.

speech-recognition Speech Recognition

Joint localization and classification of breast tumors on ultrasound images using a novel auxiliary attention-based framework

no code implementations11 Oct 2022 Zong Fan, Ping Gong, Shanshan Tang, Christine U. Lee, Xiaohui Zhang, Pengfei Song, Shigao Chen, Hua Li

By use of the attention mechanism, the auxiliary lesion-aware network can optimize multi-scale intermediate feature maps and extract rich semantic information to improve classification and localization performance.

Classification Lesion Detection

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

no code implementations18 Nov 2021 Chunxi Liu, Michael Picheny, Leda Sari, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, Yatharth Saraf

This paper presents initial Speech Recognition results on "Casual Conversations" -- a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Impact of deep learning-based image super-resolution on binary signal detection

no code implementations6 Jul 2021 Xiaohui Zhang, Varun A. Kelkar, Jason Granstedt, Hua Li, Mark A. Anastasio

The presented study highlights the urgent need for the objective assessment of DL-SR methods and suggests avenues for improving their efficacy in medical imaging applications.

Generative Adversarial Network Image Super-Resolution

Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR

no code implementations9 Nov 2020 Xiaohui Zhang, Frank Zhang, Chunxi Liu, Kjell Schubert, Julian Chan, Pradyot Prakash, Jun Liu, Ching-Feng Yeh, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig

In this work, to measure the accuracy and efficiency for a latency-controlled streaming automatic speech recognition (ASR) application, we perform comprehensive evaluations on three popular training criteria: LF-MMI, CTC and RNN-T.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces

no code implementations19 May 2020 Frank Zhang, Yongqiang Wang, Xiaohui Zhang, Chunxi Liu, Yatharth Saraf, Geoffrey Zweig

In this work, we first show that on the widely used LibriSpeech benchmark, our transformer-based context-dependent connectionist temporal classification (CTC) system produces state-of-the-art results.

Ranked #20 on Speech Recognition on LibriSpeech test-other (using extra training data)

Speech Recognition

Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks

2 code implementations23 Oct 2019 Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura, Geoffrey Zweig

As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function.

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition

no code implementations2 Oct 2019 Duc Le, Xiaohui Zhang, Weiyi Zheng, Christian Fügen, Geoffrey Zweig, Michael L. Seltzer

There is an implicit assumption that traditional hybrid approaches for automatic speech recognition (ASR) cannot directly model graphemes and need to rely on phonetic lexicons to get competitive performance, especially on English which has poor grapheme-phoneme correspondence.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Multilingual Graphemic Hybrid ASR with Massive Data Augmentation

no code implementations LREC 2020 Chunxi Liu, Qiaochu Zhang, Xiaohui Zhang, Kritika Singh, Yatharth Saraf, Geoffrey Zweig

Towards developing high-performing ASR for low-resource languages, approaches to address the lack of resources are to make use of data from multiple languages, and to augment the training data by creating acoustic variations.

Data Augmentation

Parallel training of DNNs with Natural Gradient and Parameter Averaging

1 code implementation27 Oct 2014 Daniel Povey, Xiaohui Zhang, Sanjeev Khudanpur

However, we have another method, an approximate and efficient implementation of Natural Gradient for Stochastic Gradient Descent (NG-SGD), which seems to allow our periodic-averaging method to work well, as well as substantially improving the convergence of SGD on a single machine.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.