Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.
Deploying an end-to-end automatic speech recognition (ASR) model on mobile/embedded devices is a challenging task, since the device computational power and energy consumption requirements are dynamically changed in practice.
In addition, we propose to combine this intermediate CTC loss with stochastic depth training, and apply this combination to a recently proposed Conformer network.
While neural machine translation (NMT) provides high-quality translation, it is still hard to interpret and analyze its behavior.
In this paper, we introduce papago - a translator for mobile device which is equipped with new features that can provide convenience for users.