Search Results for author: Minhua Wu

Found 9 papers, 0 papers with code

Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning

no code implementations • 5 Jan 2019 • Ladislav Mošner, Minhua Wu, Anirudh Raju, Sree Hari Krishnan Parthasarathi, Kenichi Kumatani, Shiva Sundaram, Roland Maas, Björn Hoffmeister

For real-world speech recognition applications, noise robustness is still a challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Fully Learnable Front-End for Multi-Channel Acoustic Modeling using Semi-Supervised Learning

no code implementations • 1 Feb 2020 • Sanna Wager, Aparna Khare, Minhua Wu, Kenichi Kumatani, Shiva Sundaram

Using a large offline teacher model trained on beamformed audio, we trained a simpler multi-channel student acoustic model used in the speech recognition system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Robust Multi-channel Speech Recognition using Frequency Aligned Network

no code implementations • 6 Feb 2020 • Taejin Park, Kenichi Kumatani, Minhua Wu, Shiva Sundaram

In this paper, we further develop this idea and use frequency aligned network for robust multi-channel automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

no code implementations • 9 Mar 2021 • Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.

Quantization Representation Learning +1

Paper
Add Code

Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition

no code implementations • 12 May 2021 • Bhargav Pulugundla, Yang Gao, Brian King, Gokce Keskin, Harish Mallidi, Minhua Wu, Jasha Droppo, Roland Maas

The end-to-end 2D Conv-Attention model is compared with a multi-head self-attention and superdirective-based neural beamformers.

speech-recognition Speech Recognition

Paper
Add Code

Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End

no code implementations • 14 May 2021 • Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo

On the other hand, a streaming system using per-frame intent posteriors as extra inputs for the RNN-T ASR system yields a 3. 33% relative WERR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio

no code implementations • 4 Jun 2021 • Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas

An ASR model that operates on both primary and auxiliary data can achieve better accuracy compared to a primary-only solution; and a model that can serve both primary-only (PO) and primary-plus-auxiliary (PPA) modes is highly desirable.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Guided contrastive self-supervised pre-training for automatic speech recognition

no code implementations • 22 Oct 2022 • Aparna Khare, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas

Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

no code implementations • 26 Jan 2024 • Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran

We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM).

Language Modelling Large Language Model

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.