Search Results for author: Xiaohui Zhang

Found 31 papers, 9 papers with code

Parallel training of DNNs with Natural Gradient and Parameter Averaging

1 code implementation • 27 Oct 2014 • Daniel Povey, Xiaohui Zhang, Sanjeev Khudanpur

However, we have another method, an approximate and efficient implementation of Natural Gradient for Stochastic Gradient Descent (NG-SGD), which seems to allow our periodic-averaging method to work well, as well as substantially improving the convergence of SGD on a single machine.

speech-recognition Speech Recognition

Paper
Code

Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework

no code implementations • 12 Jun 2017 • Xiaohui Zhang, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur

Speech recognition systems for irregularly-spelled languages like English normally require hand-written pronunciations.

speech-recognition Speech Recognition

Paper
Add Code

Automated vehicle's behavior decision making using deep reinforcement learning and high-fidelity simulation environment

no code implementations • 17 Apr 2018 • Yingjun Ye, Xiaohui Zhang, Jian Sun

Therefore, a framework of the decision-making training and learning is put forward in this paper.

Decision Making reinforcement-learning +1

Paper
Add Code

Multilingual Graphemic Hybrid ASR with Massive Data Augmentation

no code implementations • LREC 2020 • Chunxi Liu, Qiaochu Zhang, Xiaohui Zhang, Kritika Singh, Yatharth Saraf, Geoffrey Zweig

Towards developing high-performing ASR for low-resource languages, approaches to address the lack of resources are to make use of data from multiple languages, and to augment the training data by creating acoustic variations.

Data Augmentation

Paper
Add Code

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition

no code implementations • 2 Oct 2019 • Duc Le, Xiaohui Zhang, Weiyi Zheng, Christian Fügen, Geoffrey Zweig, Michael L. Seltzer

There is an implicit assumption that traditional hybrid approaches for automatic speech recognition (ASR) cannot directly model graphemes and need to rely on phonetic lexicons to get competitive performance, especially on English which has poor grapheme-phoneme correspondence.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Transformer-based Acoustic Modeling for Hybrid Speech Recognition

no code implementations • 22 Oct 2019 • Yongqiang Wang, Abdel-rahman Mohamed, Duc Le, Chunxi Liu, Alex Xiao, Jay Mahadeokar, Hongzhao Huang, Andros Tjandra, Xiaohui Zhang, Frank Zhang, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer

We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition.

Ranked #23 on Speech Recognition on LibriSpeech test-other (using extra training data)

Language Modelling speech-recognition +1

Paper
Add Code

Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks

1 code implementation • 23 Oct 2019 • Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura, Geoffrey Zweig

As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function.

Paper
Code

Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces

no code implementations • 19 May 2020 • Frank Zhang, Yongqiang Wang, Xiaohui Zhang, Chunxi Liu, Yatharth Saraf, Geoffrey Zweig

In this work, we first show that on the widely used LibriSpeech benchmark, our transformer-based context-dependent connectionist temporal classification (CTC) system produces state-of-the-art results.

Ranked #17 on Speech Recognition on LibriSpeech test-other (using extra training data)

Speech Recognition

Paper
Add Code

Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR

no code implementations • 9 Nov 2020 • Xiaohui Zhang, Frank Zhang, Chunxi Liu, Kjell Schubert, Julian Chan, Pradyot Prakash, Jun Liu, Ching-Feng Yeh, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig

In this work, to measure the accuracy and efficiency for a latency-controlled streaming automatic speech recognition (ASR) application, we perform comprehensive evaluations on three popular training criteria: LF-MMI, CTC and RNN-T.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Impact of deep learning-based image super-resolution on binary signal detection

no code implementations • 6 Jul 2021 • Xiaohui Zhang, Varun A. Kelkar, Jason Granstedt, Hua Li, Mark A. Anastasio

The presented study highlights the urgent need for the objective assessment of DL-SR methods and suggests avenues for improving their efficacy in medical imaging applications.

Generative Adversarial Network Image Super-Resolution

Paper
Add Code

On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models

no code implementations • 9 Jul 2021 • Xiaohui Zhang, Vimal Manohar, David Zhang, Frank Zhang, Yangyang Shi, Nayan Singhal, Julian Chan, Fuchun Peng, Yatharth Saraf, Mike Seltzer

Hybrid automatic speech recognition (ASR) models are typically sequentially trained with CTC or LF-MMI criteria.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution

no code implementations • 7 Oct 2021 • Yangyang Shi, Chunyang Wu, Dilin Wang, Alex Xiao, Jay Mahadeokar, Xiaohui Zhang, Chunxi Liu, Ke Li, Yuan Shangguan, Varun Nagaraja, Ozlem Kalinli, Mike Seltzer

This paper improves the streaming transformer transducer for speech recognition by using non-causal convolution.

speech-recognition Speech Recognition

Paper
Add Code

Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings

no code implementations • 7 Oct 2021 • Jialu Li, Vimal Manohar, Pooja Chitkara, Andros Tjandra, Michael Picheny, Frank Zhang, Xiaohui Zhang, Yatharth Saraf

Domain-adversarial training (DAT) and multi-task learning (MTL) are two common approaches for building accent-robust ASR models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

no code implementations • 15 Oct 2021 • Haichuan Yang, Yuan Shangguan, Dilin Wang, Meng Li, Pierce Chuang, Xiaohui Zhang, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra

From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a variety of edge devices with different computational budgets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

no code implementations • 18 Nov 2021 • Chunxi Liu, Michael Picheny, Leda Sari, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, Yatharth Saraf

This paper presents initial Speech Recognition results on "Casual Conversations" -- a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A novel adversarial learning strategy for medical image classification

no code implementations • 23 Jun 2022 • Zong Fan, Xiaohui Zhang, Jacob A. Gasienica, Jennifer Potts, Su Ruan, Wade Thorstad, Hiram Gay, Pengfei Song, Xiaowei Wang, Hua Li

Deep learning (DL) techniques have been extensively utilized for medical image classification.

Generative Adversarial Network Image Classification +1

Paper
Add Code

Joint localization and classification of breast tumors on ultrasound images using a novel auxiliary attention-based framework

no code implementations • 11 Oct 2022 • Zong Fan, Ping Gong, Shanshan Tang, Christine U. Lee, Xiaohui Zhang, Pengfei Song, Shigao Chen, Hua Li

By use of the attention mechanism, the auxiliary lesion-aware network can optimize multi-scale intermediate feature maps and extract rich semantic information to improve classification and localization performance.

Classification Lesion Detection

Paper
Add Code

Anchored Speech Recognition with Neural Transducers

no code implementations • 20 Oct 2022 • Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli

In this paper, we investigate anchored speech recognition to make neural transducers robust to background speech.

speech-recognition Speech Recognition

Paper
Add Code

TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio

no code implementations • 4 Apr 2023 • Anurag Kumar, Ke Tan, Zhaoheng Ni, Pranay Manocha, Xiaohui Zhang, Ethan Henderson, Buye Xu

To enable this, a variety of metrics to measure quality and intelligibility under different assumptions have been developed.

Paper
Add Code

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

1 code implementation • 10 Apr 2023 • Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polák, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community.

Benchmarking Simultaneous Speech-to-Text Translation +2

7,932

Paper
Code

Scaling Speech Technology to 1,000+ Languages

3 code implementations • arXiv 2023 • Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli

Expanding the language coverage of speech technology has the potential to improve access to information for many more people.

Automatic Speech Recognition Language Identification +4

29,388

Paper
Code

Adaptive Fake Audio Detection with Low-Rank Model Squeezing

no code implementations • 8 Jun 2023 • Xiaohui Zhang, Jiangyan Yi, JianHua Tao, Chenlong Wang, Le Xu, Ruibo Fu

During the inference stage, these adaptation matrices are combined with the existing model to generate the final prediction output.

Paper
Add Code

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection

no code implementations • 9 Jun 2023 • Chenglong Wang, Jiangyan Yi, Xiaohui Zhang, JianHua Tao, Le Xu, Ruibo Fu

Self-supervised speech models are a rapidly developing research topic in fake audio detection.

Paper
Add Code

Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection

1 code implementation • 7 Aug 2023 • Xiaohui Zhang, Jiangyan Yi, JianHua Tao, Chenglong Wang, Chuyuan Zhang

The orthogonal weight modification to overcome catastrophic forgetting does not consider the similarity of genuine audio across different datasets.

Continual Learning Speech Emotion Recognition

Paper
Code

Exploring Speech Enhancement for Low-resource Speech Synthesis

no code implementations • 19 Sep 2023 • Zhaoheng Ni, Sravya Popuri, Ning Dong, Kohei Saijo, Xiaohui Zhang, Gael Le Lan, Yangyang Shi, Vikas Chandra, Changhan Wang

High-quality and intelligible speech is essential to text-to-speech (TTS) model training, however, obtaining high-quality data for low-resource languages is challenging and expensive.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Deep learning-based image super-resolution of a novel end-expandable optical fiber probe for application in esophageal cancer diagnostics

no code implementations • 3 Oct 2023 • Xiaohui Zhang, Mimi Tan, Mansour Nabil, Richa Shukla, Shaleen Vasavada, Sharmila Anandasabapathy, Mark A. Anastasio, Elena Petrova

Aim: To improve the efficiency of endoscopic screening, we proposed a novel end-expandable endoscopic optical fiber probe for larger field of visualization and employed a deep learning-based image super-resolution (DL-SR) method to overcome the issue of limited sampling capability.

Image Super-Resolution Specificity

Paper
Add Code

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

1 code implementation • 27 Oct 2023 • Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

TorchAudio is an open-source audio and speech processing library built for PyTorch.

Self-Supervised Learning Speech Enhancement +2

2,390

Paper
Code

Multimodal Representation Learning by Alternating Unimodal Adaptation

1 code implementation • 17 Nov 2023 • Xiaohui Zhang, Jaehong Yoon, Mohit Bansal, Huaxiu Yao

This optimization process is controlled by a gradient modification mechanism to prevent the shared head from losing previously acquired information.

Representation Learning

Paper
Code

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection

1 code implementation • 15 Dec 2023 • Xiaohui Zhang, Jiangyan Yi, Chenglong Wang, Chuyuan Zhang, Siding Zeng, JianHua Tao

The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms.

Continual Learning DeepFake Detection +3

Paper
Code

Attention-Based CNN-BiLSTM for Sleep State Classification of Spatiotemporal Wide-Field Calcium Imaging Data

1 code implementation • 16 Jan 2024 • Xiaohui Zhang, Eric C. Landsness, Hanyang Miao, Wei Chen, Michelle Tang, Lindsey M. Brier, Joseph P. Culver, Jin-Moo Lee, Mark A. Anastasio

Comparison with Existing Method: On a 3-hour WFCI recording, the CNN-BiLSTM achieved a kappa of 0. 67, comparable to a kappa of 0. 65 corresponding to the human EEG/EMG-based scoring.

EEG

Paper
Code

A Bionic Data-driven Approach for Long-distance Underwater Navigation with Anomaly Resistance

no code implementations • 6 Feb 2024 • Songnan Yang, Xiaohui Zhang, Shiliang Zhang, Xuehui Ma, Wenqi Bai, Yushuai Li, TingWen Huang

We integrate the developed mechanism with the TA-LSTM, and calibrate the predicted heading angles to gain resistance against geomagnetic anomalies.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.