Search Results for author: Duc Le

Found 36 papers, 7 papers with code

Seq2seq for Automatic Paraphasia Detection in Aphasic Speech

1 code implementation • 16 Dec 2023 • Matthew Perez, Duc Le, Amrit Romana, Elise Jones, Keli Licata, Emily Mower Provost

In this paper, we propose a novel, sequence-to-sequence (seq2seq) model that is trained end-to-end (E2E) to perform both ASR and paraphasia detection tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

StemGen: A music generation model that listens

no code implementations • 14 Dec 2023 • Julian D. Parker, Janne Spijkervet, Katerina Kosta, Furkan Yesiler, Boris Kuznetsov, Ju-Chiang Wang, Matt Avent, Jitong Chen, Duc Le

End-to-end generation of musical audio using deep learning techniques has seen an explosion of activity recently.

Information Retrieval Music Generation +2

Paper
Add Code

A Foundation Model for Music Informatics

1 code implementation • 6 Nov 2023 • Minz Won, Yun-Ning Hung, Duc Le

This paper investigates foundation models tailored for music informatics, a domain currently challenged by the scarcity of labeled data and generalization issues.

Information Retrieval Music Information Retrieval +2

126

Paper
Code

Scaling Up Music Information Retrieval Training with Semi-Supervised Learning

no code implementations • 2 Oct 2023 • Yun-Ning Hung, Ju-Chiang Wang, Minz Won, Duc Le

To our knowledge, this is the first attempt to study the effects of scaling up both model and training data for a variety of MIR tasks.

Information Retrieval Music Information Retrieval +1

Paper
Add Code

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

no code implementations • 22 Jul 2023 • Suyoun Kim, Akshat Shrivastava, Duc Le, Ju Lin, Ozlem Kalinli, Michael L. Seltzer

End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic parse from speech have become more promising recently.

speech-recognition Speech Recognition +1

Paper
Add Code

Text Generation with Speech Synthesis for ASR Data Augmentation

no code implementations • 22 May 2023 • Zhuangqun Huang, Gil Keren, Ziran Jiang, Shashank Jain, David Goss-Grubbs, Nelson Cheng, Farnaz Abtahi, Duc Le, David Zhang, Antony D'Avirro, Ethan Campbell-Taylor, Jessie Salas, Irina-Elena Veliche, Xi Chen

In this work, we explore text augmentation for ASR using large-scale pre-trained neural networks, and systematically compare those to traditional text augmentation methods.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Improving Fast-slow Encoder based Transducer with Streaming Deliberation

no code implementations • 15 Dec 2022 • Ke Li, Jay Mahadeokar, Jinxi Guo, Yangyang Shi, Gil Keren, Ozlem Kalinli, Michael L. Seltzer, Duc Le

Experiments on Librispeech and in-house data show relative WER reductions (WERRs) from 3% to 5% with a slight increase in model size and negligible extra token emission latency compared with fast-slow encoder based transducer.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities

no code implementations • 10 Nov 2022 • Andros Tjandra, Nayan Singhal, David Zhang, Ozlem Kalinli, Abdelrahman Mohamed, Duc Le, Michael L. Seltzer

Later, we use our optimal tokenization strategy to train multiple embedding and output model to further improve our result.

Paper
Add Code

Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers

no code implementations • 2 Nov 2022 • Duc Le, Frank Seide, Yuhao Wang, Yang Li, Kjell Schubert, Ozlem Kalinli, Michael L. Seltzer

We show how factoring the RNN-T's output distribution can significantly reduce the computation cost and power consumption for on-device ASR inference with no loss in accuracy.

Paper
Add Code

Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition

no code implementations • 31 Oct 2022 • Suyoun Kim, Ke Li, Lucas Kabela, Rongqing Huang, Jiedan Zhu, Ozlem Kalinli, Duc Le

In this work, we present our Joint Audio/Text training method for Transformer Rescorer, to leverage unpaired text-only data which is relatively cheaper than paired audio-text data.

speech-recognition Speech Recognition

Paper
Add Code

Robust Singular Values based on L1-norm PCA

no code implementations • 21 Oct 2022 • Duc Le, Panos P. Markopoulos

The L2-norm (sum of squared values) formulation of PCA promotes peripheral data points and, thus, makes PCA sensitive against outliers.

Image Compression

Paper
Add Code

Multimodality Multi-Lead ECG Arrhythmia Classification using Self-Supervised Learning

1 code implementation • 30 Sep 2022 • Thinh Phan, Duc Le, Patel Brijesh, Donald Adjeroh, Jingxian Wu, Morten Olgaard Jensen, Ngan Le

Electrocardiogram (ECG) signal is one of the most effective sources of information mainly employed for the diagnosis and prediction of cardiovascular diseases (CVDs) connected with the abnormalities in heart rhythm.

ECG Classification Self-Knowledge Distillation +3

Paper
Code

Learning ASR pathways: A sparse multilingual ASR model

no code implementations • 13 Sep 2022 • Mu Yang, Andros Tjandra, Chunxi Liu, David Zhang, Duc Le, Ozlem Kalinli

Neural network pruning compresses automatic speech recognition (ASR) models effectively.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

STOP: A dataset for Spoken Task Oriented Semantic Parsing

1 code implementation • 29 Jun 2022 • Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed

Furthermore, in addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

29,237

Paper
Code

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

no code implementations • 19 Apr 2022 • Niko Moritz, Frank Seide, Duc Le, Jay Mahadeokar, Christian Fuegen

The two most popular loss functions for streaming end-to-end automatic speech recognition (ASR) are RNN-Transducer (RNN-T) and connectionist temporal classification (CTC).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Deliberation Model for On-Device Spoken Language Understanding

no code implementations • 4 Apr 2022 • Duc Le, Akshat Shrivastava, Paden Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer

We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU), where a streaming automatic speech recognition (ASR) model produces the first-pass hypothesis and a second-pass natural language understanding (NLU) component generates the semantic parse by conditioning on both ASR's text and audio embeddings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Streaming parallel transducer beam search with fast-slow cascaded encoders

no code implementations • 29 Mar 2022 • Jay Mahadeokar, Yangyang Shi, Ke Li, Duc Le, Jiedan Zhu, Vikas Chandra, Ozlem Kalinli, Michael L Seltzer

Streaming ASR with strict latency constraints is required in many speech recognition applications.

Low-latency processing speech-recognition +1

Paper
Add Code

Neural-FST Class Language Model for End-to-End Speech Recognition

no code implementations • 28 Jan 2022 • Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu, Bo wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer

We propose Neural-FST Class Language Model (NFCLM) for end-to-end speech recognition, a novel method that combines neural network language models (NNLMs) and finite state transducers (FSTs) in a mathematically consistent framework.

Language Modelling speech-recognition +1

Paper
Add Code

Scaling ASR Improves Zero and Few Shot Learning

no code implementations • 10 Nov 2021 • Alex Xiao, Weiyi Zheng, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed

With 4. 5 million hours of English speech from 10 different sources across 120 countries and models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric

no code implementations • 11 Oct 2021 • Suyoun Kim, Duc Le, Weiyi Zheng, Tarun Singh, Abhinav Arora, Xiaoyu Zhai, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer

Measuring automatic speech recognition (ASR) system quality is critical for creating user-satisfying voice-driven applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition

no code implementations • 6 Apr 2021 • Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer

As speech-enabled devices such as smartphones and smart speakers become increasingly ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems that can run directly on-device; end-to-end (E2E) speech recognition models such as recurrent neural network transducers and their variants have recently emerged as prime candidates for this task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios

no code implementations • 6 Apr 2021 • Jay Mahadeokar, Yangyang Shi, Yuan Shangguan, Chunyang Wu, Alex Xiao, Hang Su, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer

In order to achieve flexible and better accuracy and latency trade-offs, the following techniques are used.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding

no code implementations • 5 Apr 2021 • Suyoun Kim, Abhinav Arora, Duc Le, Ching-Feng Yeh, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer

We define SemDist as the distance between a reference and hypothesis pair in a sentence-level embedding space.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +14

Paper
Add Code

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion

no code implementations • 5 Apr 2021 • Duc Le, Mahaveer Jain, Gil Keren, Suyoun Kim, Yangyang Shi, Jay Mahadeokar, Julian Chan, Yuan Shangguan, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Michael L. Seltzer

How to leverage dynamic contextual information in end-to-end speech recognition has remained an active research area.

Language Modelling speech-recognition +1

Paper
Add Code

Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency

no code implementations • 5 Apr 2021 • Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer

DET gets similar accuracy as a baseline model with better latency on a large in-house data set by assigning a lightweight encoder for the beginning part of one utterance and a full-size encoder for the rest.

speech-recognition Speech Recognition

Paper
Add Code

Deep Shallow Fusion for RNN-T Personalization

no code implementations • 16 Nov 2020 • Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen, Michael L. Seltzer

End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in particular, have gained significant traction in the automatic speech recognition community in the last few years due to their simplicity, compactness, and excellent performance on generic transcription tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Alignment Restricted Streaming Recurrent Neural Network Transducer

no code implementations • 5 Nov 2020 • Jay Mahadeokar, Yuan Shangguan, Duc Le, Gil Keren, Hang Su, Thong Le, Ching-Feng Yeh, Christian Fuegen, Michael L. Seltzer

There is a growing interest in the speech community in developing Recurrent Neural Network Transducer (RNN-T) models for automatic speech recognition (ASR) applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Improving RNN Transducer Based ASR with Auxiliary Tasks

1 code implementation • 5 Nov 2020 • Chunxi Liu, Frank Zhang, Duc Le, Suyoun Kim, Yatharth Saraf, Geoffrey Zweig

End-to-end automatic speech recognition (ASR) models with a single neural network have recently demonstrated state-of-the-art results compared to conventional hybrid speech recognizers.

Ranked #15 on Speech Recognition on LibriSpeech test-clean

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer

no code implementations • 26 Oct 2020 • Suyoun Kim, Yuan Shangguan, Jay Mahadeokar, Antoine Bruguier, Christian Fuegen, Michael L. Seltzer, Duc Le

Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech recognition model architectures, has an implicit neural network language model (NNLM) and cannot easily leverage unpaired text data during training.

Language Modelling speech-recognition +1

Paper
Add Code

Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition

1 code implementation • 21 Oct 2020 • Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng Yeh, Julian Chan, Frank Zhang, Duc Le, Mike Seltzer

For a low latency scenario with an average latency of 80 ms, Emformer achieves WER $3. 01\%$ on test-clean and $7. 09\%$ on test-other.

speech-recognition Speech Recognition

Paper
Code

Classification of Huntington Disease using Acoustic and Lexical Features

no code implementations • 7 Aug 2020 • Matthew Perez, Wenyu Jin, Duc Le, Noelle Carlozzi, Praveen Dayalu, Angela Roberts, Emily Mower Provost

Speech is a critical biomarker for Huntington Disease (HD), with changes in speech increasing in severity as the disease progresses.

Classification General Classification

Paper
Add Code

Weak-Attention Suppression For Transformer Based Speech Recognition

no code implementations • 18 May 2020 • Yangyang Shi, Yongqiang Wang, Chunyang Wu, Christian Fuegen, Frank Zhang, Duc Le, Ching-Feng Yeh, Michael L. Seltzer

Transformers, originally proposed for natural language processing (NLP) tasks, have recently achieved great success in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Transformer-Transducer: End-to-End Speech Recognition with Self-Attention

1 code implementation • 28 Oct 2019 • Ching-Feng Yeh, Jay Mahadeokar, Kaustubh Kalgaonkar, Yongqiang Wang, Duc Le, Mahaveer Jain, Kjell Schubert, Christian Fuegen, Michael L. Seltzer

We explore options to use Transformer networks in neural transducer for end-to-end speech recognition.

speech-recognition Speech Recognition

Paper
Code

G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR

no code implementations • 22 Oct 2019 • Duc Le, Thilo Koehler, Christian Fuegen, Michael L. Seltzer

Grapheme-based acoustic modeling has recently been shown to outperform phoneme-based approaches in both hybrid and end-to-end automatic speech recognition (ASR), even on non-phonemic languages like English.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Transformer-based Acoustic Modeling for Hybrid Speech Recognition

no code implementations • 22 Oct 2019 • Yongqiang Wang, Abdel-rahman Mohamed, Duc Le, Chunxi Liu, Alex Xiao, Jay Mahadeokar, Hongzhao Huang, Andros Tjandra, Xiaohui Zhang, Frank Zhang, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer

We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition.

Ranked #23 on Speech Recognition on LibriSpeech test-other (using extra training data)

Language Modelling speech-recognition +1

Paper
Add Code

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition

no code implementations • 2 Oct 2019 • Duc Le, Xiaohui Zhang, Weiyi Zheng, Christian Fügen, Geoffrey Zweig, Michael L. Seltzer

There is an implicit assumption that traditional hybrid approaches for automatic speech recognition (ASR) cannot directly model graphemes and need to rely on phonetic lexicons to get competitive performance, especially on English which has poor grapheme-phoneme correspondence.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.