Search Results for author: Minhua Wu

Found 9 papers, 0 papers with code

Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

no code implementations26 Jan 2024 Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran

We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM).

Language Modelling Large Language Model

Guided contrastive self-supervised pre-training for automatic speech recognition

no code implementations22 Oct 2022 Aparna Khare, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas

Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio

no code implementations4 Jun 2021 Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas

An ASR model that operates on both primary and auxiliary data can achieve better accuracy compared to a primary-only solution; and a model that can serve both primary-only (PO) and primary-plus-auxiliary (PPA) modes is highly desirable.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

no code implementations9 Mar 2021 Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.

Quantization Representation Learning +1

Robust Multi-channel Speech Recognition using Frequency Aligned Network

no code implementations6 Feb 2020 Taejin Park, Kenichi Kumatani, Minhua Wu, Shiva Sundaram

In this paper, we further develop this idea and use frequency aligned network for robust multi-channel automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Fully Learnable Front-End for Multi-Channel Acoustic Modeling using Semi-Supervised Learning

no code implementations1 Feb 2020 Sanna Wager, Aparna Khare, Minhua Wu, Kenichi Kumatani, Shiva Sundaram

Using a large offline teacher model trained on beamformed audio, we trained a simpler multi-channel student acoustic model used in the speech recognition system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.