Search Results for author: Geoffrey Zweig

Found 28 papers, 5 papers with code

Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR

no code implementations9 Nov 2020 Xiaohui Zhang, Frank Zhang, Chunxi Liu, Kjell Schubert, Julian Chan, Pradyot Prakash, Jun Liu, Ching-Feng Yeh, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig

In this work, to measure the accuracy and efficiency for a latency-controlled streaming automatic speech recognition (ASR) application, we perform comprehensive evaluations on three popular training criteria: LF-MMI, CTC and RNN-T.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Improving RNN Transducer Based ASR with Auxiliary Tasks

1 code implementation5 Nov 2020 Chunxi Liu, Frank Zhang, Duc Le, Suyoun Kim, Yatharth Saraf, Geoffrey Zweig

End-to-end automatic speech recognition (ASR) models with a single neural network have recently demonstrated state-of-the-art results compared to conventional hybrid speech recognizers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Multi-modal Self-Supervision from Generalized Data Transformations

no code implementations28 Sep 2020 Mandela Patrick, Yuki Asano, Polina Kuznetsova, Ruth Fong, Joao F. Henriques, Geoffrey Zweig, Andrea Vedaldi

In this paper, we show that, for videos, the answer is more complex, and that better results can be obtained by accounting for the interplay between invariance, distinctiveness, multiple modalities and time.

Audio Classification Retrieval +1

Contextual RNN-T For Open Domain ASR

no code implementations4 Jun 2020 Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf

By using an attention model and a biasing model to leverage the contextual metadata that accompanies a video, we observe a relative improvement of about 16% in Word Error Rate on Named Entities (WER-NE) for videos with related metadata.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces

no code implementations19 May 2020 Frank Zhang, Yongqiang Wang, Xiaohui Zhang, Chunxi Liu, Yatharth Saraf, Geoffrey Zweig

In this work, we first show that on the widely used LibriSpeech benchmark, our transformer-based context-dependent connectionist temporal classification (CTC) system produces state-of-the-art results.

Ranked #18 on Speech Recognition on LibriSpeech test-other (using extra training data)

Speech Recognition

Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks

1 code implementation23 Oct 2019 Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura, Geoffrey Zweig

As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function.

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition

no code implementations2 Oct 2019 Duc Le, Xiaohui Zhang, Weiyi Zheng, Christian Fügen, Geoffrey Zweig, Michael L. Seltzer

There is an implicit assumption that traditional hybrid approaches for automatic speech recognition (ASR) cannot directly model graphemes and need to rely on phonetic lexicons to get competitive performance, especially on English which has poor grapheme-phoneme correspondence.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Multilingual Graphemic Hybrid ASR with Massive Data Augmentation

no code implementations LREC 2020 Chunxi Liu, Qiaochu Zhang, Xiaohui Zhang, Kritika Singh, Yatharth Saraf, Geoffrey Zweig

Towards developing high-performing ASR for low-resource languages, approaches to address the lack of resources are to make use of data from multiple languages, and to augment the training data by creating acoustic variations.

Data Augmentation

Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

3 code implementations ACL 2017 Jason D. Williams, Kavosh Asadi, Geoffrey Zweig

End-to-end learning of recurrent neural networks (RNNs) is an attractive solution for dialog systems; however, current techniques are data-intensive and require thousands of dialogs to learn simple behaviors.

reinforcement-learning Reinforcement Learning (RL)

An Attentional Neural Conversation Model with Improved Specificity

no code implementations3 Jun 2016 Kaisheng Yao, Baolin Peng, Geoffrey Zweig, Kam-Fai Wong

Experimental results indicate that the model outperforms previously proposed neural conversation architectures, and that using specificity in the objective function significantly improves performances for both generation and retrieval.

Retrieval Specificity

Attention with Intention for a Neural Network Conversation Model

no code implementations29 Oct 2015 Kaisheng Yao, Geoffrey Zweig, Baolin Peng

The intention network is a recurrent network that models the dynamics of the intention process.

Decoder Language Modelling

Sequence-to-Sequence Neural Net Models for Grapheme-to-Phoneme Conversion

no code implementations31 May 2015 Kaisheng Yao, Geoffrey Zweig

We find that the simple side-conditioned generation approach is able to rival the state-of-the-art, and we are able to significantly advance the stat-of-the-art with bi-directional long short-term memory (LSTM) neural networks that use the same alignment information that is used in conventional approaches.

Image Captioning Language Modelling +2

Cannot find the paper you are looking for? You can Submit a new open access paper.