Search Results for author: Frank Seide

Found 9 papers, 3 papers with code

Effective internal language model training and fusion for factorized transducer model

no code implementations2 Apr 2024 Jinxi Guo, Niko Moritz, Yingyi Ma, Frank Seide, Chunyang Wu, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

However, even with the adoption of factorized transducer models, limited improvement has been observed compared to shallow fusion.

Language Modelling

AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

no code implementations18 Jan 2024 Ju Lin, Niko Moritz, Yiteng Huang, Ruiming Xie, Ming Sun, Christian Fuegen, Frank Seide

Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

DISGO: Automatic End-to-End Evaluation for Scene Text OCR

no code implementations25 Aug 2023 Mei-Yuh Hwang, Yangyang Shi, Ankit Ramchandani, Guan Pang, Praveen Krishnan, Lucas Kabela, Frank Seide, Samyak Datta, Jun Liu

This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds.

Machine Translation Optical Character Recognition +2

Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers

no code implementations2 Nov 2022 Duc Le, Frank Seide, Yuhao Wang, Yang Li, Kjell Schubert, Ozlem Kalinli, Michael L. Seltzer

We show how factoring the RNN-T's output distribution can significantly reduce the computation cost and power consumption for on-device ASR inference with no loss in accuracy.

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

no code implementations19 Apr 2022 Niko Moritz, Frank Seide, Duc Le, Jay Mahadeokar, Christian Fuegen

The two most popular loss functions for streaming end-to-end automatic speech recognition (ASR) are RNN-Transducer (RNN-T) and connectionist temporal classification (CTC).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Federated Domain Adaptation for ASR with Full Self-Supervision

no code implementations30 Mar 2022 Junteng Jia, Jay Mahadeokar, Weiyi Zheng, Yuan Shangguan, Ozlem Kalinli, Frank Seide

Cross-device federated learning (FL) protects user privacy by collaboratively training a model on user devices, therefore eliminating the need for collecting, storing, and manually labeling user data.

Automatic Speech Recognition (ASR) Data Augmentation +2

Marian: Fast Neural Machine Translation in C++

2 code implementations ACL 2018 Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, Alexandra Birch

We present Marian, an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs.

Decoder Machine Translation +1

CNTK: Microsoft's Open-Source Deep-Learning Toolkit

1 code implementation ACM SIGKDD 2016 Frank Seide, Amit Agarwal

This tutorial will introduce the Computational Network Toolkit, or CNTK, Microsoft's cutting-edge open-source deep-learning toolkit for Windows and Linux.

Clustering Dimensionality Reduction +2

Cannot find the paper you are looking for? You can Submit a new open access paper.