More recent works on self-supervised training with unlabeled data open new perspectives in term of performance for automatic speech recognition and natural language processing.
In this paper, we propose a novel end-to-end sequence-to-sequence spoken language understanding model using an attention mechanism.
This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation.
We present an end-to-end approach to extract semantic concepts directly from the speech audio signal.
Until now, NER from speech is made through a pipeline process that consists in processing first an automatic speech recognition (ASR) on the audio and then processing a NER on the ASR outputs.