Search Results for author: Frank Zhang

Found 19 papers, 3 papers with code

Improving RNN Transducer Based ASR with Auxiliary Tasks

1 code implementation • 5 Nov 2020 • Chunxi Liu, Frank Zhang, Duc Le, Suyoun Kim, Yatharth Saraf, Geoffrey Zweig

End-to-end automatic speech recognition (ASR) models with a single neural network have recently demonstrated state-of-the-art results compared to conventional hybrid speech recognizers.

Ranked #15 on Speech Recognition on LibriSpeech test-clean

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition

1 code implementation • 21 Oct 2020 • Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng Yeh, Julian Chan, Frank Zhang, Duc Le, Mike Seltzer

For a low latency scenario with an average latency of 80 ms, Emformer achieves WER $3. 01\%$ on test-clean and $7. 09\%$ on test-other.

speech-recognition Speech Recognition

Paper
Code

Transformer-based Acoustic Modeling for Hybrid Speech Recognition

no code implementations • 22 Oct 2019 • Yongqiang Wang, Abdel-rahman Mohamed, Duc Le, Chunxi Liu, Alex Xiao, Jay Mahadeokar, Hongzhao Huang, Andros Tjandra, Xiaohui Zhang, Frank Zhang, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer

We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition.

Ranked #23 on Speech Recognition on LibriSpeech test-other (using extra training data)

Language Modelling speech-recognition +1

Paper
Add Code

Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks

1 code implementation • 23 Oct 2019 • Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura, Geoffrey Zweig

As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function.

Paper
Code

Training ASR models by Generation of Contextual Information

no code implementations • 27 Oct 2019 • Kritika Singh, Dmytro Okhonko, Jun Liu, Yongqiang Wang, Frank Zhang, Ross Girshick, Sergey Edunov, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

Supervised ASR models have reached unprecedented levels of accuracy, thanks in part to ever-increasing amounts of labelled training data.

speech-recognition Speech Recognition +2

Paper
Add Code

Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model

no code implementations • 15 May 2020 • Da-Rong Liu, Chunxi Liu, Frank Zhang, Gabriel Synnaeve, Yatharth Saraf, Geoffrey Zweig

Videos uploaded on social media are often accompanied with textual descriptions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory

no code implementations • 16 May 2020 • Chunyang Wu, Yongqiang Wang, Yangyang Shi, Ching-Feng Yeh, Frank Zhang

The memory bankstores the embedding information for all the processed seg-ments.

Paper
Add Code

Weak-Attention Suppression For Transformer Based Speech Recognition

no code implementations • 18 May 2020 • Yangyang Shi, Yongqiang Wang, Chunyang Wu, Christian Fuegen, Frank Zhang, Duc Le, Ching-Feng Yeh, Michael L. Seltzer

Transformers, originally proposed for natural language processing (NLP) tasks, have recently achieved great success in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces

no code implementations • 19 May 2020 • Frank Zhang, Yongqiang Wang, Xiaohui Zhang, Chunxi Liu, Yatharth Saraf, Geoffrey Zweig

In this work, we first show that on the widely used LibriSpeech benchmark, our transformer-based context-dependent connectionist temporal classification (CTC) system produces state-of-the-art results.

Ranked #17 on Speech Recognition on LibriSpeech test-other (using extra training data)

Speech Recognition

Paper
Add Code

Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications

no code implementations • 27 Oct 2020 • Yongqiang Wang, Yangyang Shi, Frank Zhang, Chunyang Wu, Julian Chan, Ching-Feng Yeh, Alex Xiao

We compare the transformer based acoustic models with their LSTM counterparts on industrial scale tasks.

speech-recognition Speech Recognition +1

Paper
Add Code

Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition

no code implementations • 3 Nov 2020 • Ching-Feng Yeh, Yongqiang Wang, Yangyang Shi, Chunyang Wu, Frank Zhang, Julian Chan, Michael L. Seltzer

Attention-based models have been gaining popularity recently for their strong performance demonstrated in fields such as machine translation and automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR

no code implementations • 9 Nov 2020 • Xiaohui Zhang, Frank Zhang, Chunxi Liu, Kjell Schubert, Julian Chan, Pradyot Prakash, Jun Liu, Ching-Feng Yeh, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig

In this work, to measure the accuracy and efficiency for a latency-controlled streaming automatic speech recognition (ASR) application, we perform comprehensive evaluations on three popular training criteria: LF-MMI, CTC and RNN-T.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models

no code implementations • 9 Jul 2021 • Xiaohui Zhang, Vimal Manohar, David Zhang, Frank Zhang, Yangyang Shi, Nayan Singhal, Julian Chan, Fuchun Peng, Yatharth Saraf, Mike Seltzer

Hybrid automatic speech recognition (ASR) models are typically sequentially trained with CTC or LF-MMI criteria.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Improved Language Identification Through Cross-Lingual Self-Supervised Learning

no code implementations • 8 Jul 2021 • Andros Tjandra, Diptanu Gon Choudhury, Frank Zhang, Kritika Singh, Alexis Conneau, Alexei Baevski, Assaf Sela, Yatharth Saraf, Michael Auli

Language identification greatly impacts the success of downstream tasks such as automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings

no code implementations • 7 Oct 2021 • Jialu Li, Vimal Manohar, Pooja Chitkara, Andros Tjandra, Michael Picheny, Frank Zhang, Xiaohui Zhang, Yatharth Saraf

Domain-adversarial training (DAT) and multi-task learning (MTL) are two common approaches for building accent-robust ASR models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Scaling ASR Improves Zero and Few Shot Learning

no code implementations • 10 Nov 2021 • Alex Xiao, Weiyi Zheng, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed

With 4. 5 million hours of English speech from 10 different sources across 120 countries and models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Pushing the performances of ASR models on English and Spanish accents

no code implementations • 22 Dec 2022 • Pooja Chitkara, Morgane Riviere, Jade Copet, Frank Zhang, Yatharth Saraf

Speech to text models tend to be trained and evaluated against a single target accent.

Paper
Add Code

Short-Term Stock Price Forecasting using exogenous variables and Machine Learning Algorithms

no code implementations • 17 May 2023 • Albert Wong, Steven Whang, Emilio Sagre, Niha Sachin, Gustavo Dutra, Yew-Wei Lim, Gaetan Hains, Youry Khmelevsky, Frank Zhang

Creating accurate predictions in the stock market has always been a significant challenge in finance.

Paper
Add Code

3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation

no code implementations • 14 Mar 2024 • Frank Zhang, Yibo Zhang, Quan Zheng, Rui Ma, Wei Hua, Hujun Bao, Weiwei Xu, Changqing Zou

Text-driven 3D scene generation techniques have made rapid progress in recent years.

Scene Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.