Search Results for author: Xiaohui Zhang

Found 15 papers, 1 papers with code

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

no code implementations18 Nov 2021 Chunxi Liu, Michael Picheny, Leda Sari, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, Yatharth Saraf

This paper presents initial Speech Recognition results on "Casual Conversations" -- a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone.

Fairness Speech Recognition

Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

no code implementations15 Oct 2021 Haichuan Yang, Yuan Shangguan, Dilin Wang, Meng Li, Pierce Chuang, Xiaohui Zhang, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra

From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a variety of edge devices with different computational budgets.

Speech Recognition

Impact of deep learning-based image super-resolution on binary signal detection

no code implementations6 Jul 2021 Xiaohui Zhang, Varun A. Kelkar, Jason Granstedt, Hua Li, Mark A. Anastasio

The presented study highlights the urgent need for the objective assessment of DL-SR methods and suggests avenues for improving their efficacy in medical imaging applications.

Image Super-Resolution

Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR

no code implementations9 Nov 2020 Xiaohui Zhang, Frank Zhang, Chunxi Liu, Kjell Schubert, Julian Chan, Pradyot Prakash, Jun Liu, Ching-Feng Yeh, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig

In this work, to measure the accuracy and efficiency for a latency-controlled streaming automatic speech recognition (ASR) application, we perform comprehensive evaluations on three popular training criteria: LF-MMI, CTC and RNN-T.

Speech Recognition

Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces

no code implementations19 May 2020 Frank Zhang, Yongqiang Wang, Xiaohui Zhang, Chunxi Liu, Yatharth Saraf, Geoffrey Zweig

In this work, we first show that on the widely used LibriSpeech benchmark, our transformer-based context-dependent connectionist temporal classification (CTC) system produces state-of-the-art results.

Speech Recognition

Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks

no code implementations23 Oct 2019 Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura, Geoffrey Zweig

As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function.

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition

no code implementations2 Oct 2019 Duc Le, Xiaohui Zhang, Weiyi Zheng, Christian Fügen, Geoffrey Zweig, Michael L. Seltzer

There is an implicit assumption that traditional hybrid approaches for automatic speech recognition (ASR) cannot directly model graphemes and need to rely on phonetic lexicons to get competitive performance, especially on English which has poor grapheme-phoneme correspondence.

Speech Recognition

Multilingual Graphemic Hybrid ASR with Massive Data Augmentation

no code implementations LREC 2020 Chunxi Liu, Qiaochu Zhang, Xiaohui Zhang, Kritika Singh, Yatharth Saraf, Geoffrey Zweig

Towards developing high-performing ASR for low-resource languages, approaches to address the lack of resources are to make use of data from multiple languages, and to augment the training data by creating acoustic variations.

Data Augmentation

Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework

no code implementations12 Jun 2017 Xiaohui Zhang, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur

Speech recognition systems for irregularly-spelled languages like English normally require hand-written pronunciations.

Speech Recognition

Parallel training of DNNs with Natural Gradient and Parameter Averaging

1 code implementation27 Oct 2014 Daniel Povey, Xiaohui Zhang, Sanjeev Khudanpur

However, we have another method, an approximate and efficient implementation of Natural Gradient for Stochastic Gradient Descent (NG-SGD), which seems to allow our periodic-averaging method to work well, as well as substantially improving the convergence of SGD on a single machine.

Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.