no code implementations • EMNLP (IWSLT) 2019 • Jan Niehues, Rolando Cattoni, Sebastian Stüker, Matteo Negri, Marco Turchi, Thanh-Le Ha, Elizabeth Salesky, Ramon Sanabria, Loic Barrault, Lucia Specia, Marcello Federico
The IWSLT 2019 evaluation campaign featured three tasks: speech translation of (i) TED talks and (ii) How2 instructional videos from English into German and Portuguese, and (iii) text translation of TED talks from English into Czech.
no code implementations • EMNLP (IWSLT) 2019 • Tejas Srinivasan, Ramon Sanabria, Florian Metze
In Neural Machine Translation (NMT) the usage of sub-words and characters as source and target units offers a simple and flexible solution for translation of rare and unseen words.
no code implementations • 2 Apr 2024 • Frank Palma Gomez, Ramon Sanabria, Yun-Hsuan Sung, Daniel Cer, Siddharth Dalmia, Gustavo Hernandez Abrego
Our multi-modal LLM-based retrieval system is capable of matching speech and text in 102 languages despite only training on 21 languages.
no code implementations • 4 Feb 2024 • Alexandra Saliba, Yuanchao Li, Ramon Sanabria, Catherine Lai
Through a comparative experiment and a layer-wise accuracy analysis on two distinct corpora, IEMOCAP and ESD, we explore differences between AWEs and raw self-supervised representations, as well as the proper utilization of AWEs alone and in combination with word embeddings.
no code implementations • 3 Jun 2023 • Ramon Sanabria, Ondrej Klejch, Hao Tang, Sharon Goldwater
Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units.
no code implementations • 31 Mar 2023 • Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, Ondrej Klejch, Peter Bell
Although the great many advances in English automatic speech recognition (ASR) over the past decades, results are usually reported based on test datasets which fail to represent the diversity of English as spoken today around the globe.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 28 Oct 2022 • Ramon Sanabria, Hao Tang, Sharon Goldwater
Given the strong results of self-supervised models on various tasks, there have been surprisingly few studies exploring self-supervised representations for acoustic word embeddings (AWE), fixed-dimensional vectors representing variable-length spoken word segments.
no code implementations • 1 Mar 2022 • Ramon Sanabria, Wei-Ning Hsu, Alexei Baevski, Michael Auli
In this paper, we present a controlled study to better understand the effect of such factors on the performance of pre-trained representations on automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • EMNLP (insights) 2021 • Ramon Sanabria, Hao Tang, Sharon Goldwater
Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks.
no code implementations • 5 Apr 2021 • Ramon Sanabria, Austin Waters, Jason Baldridge
Speech-based image retrieval has been studied as a proxy for joint representation learning, usually without emphasis on retrieval itself.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • EMNLP (nlpbt) 2020 • Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott
Our experiments on the Flickr 8K Audio Captions Corpus show that multimodal ASR can generalize to recover different types of masked words in this unstructured masking setting.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott
In experiments on the Flickr8K Audio Captions Corpus, we find that our model improves over approaches that use global visual features, that the proposals enable the model to recover entities and other related words, such as adjectives, and that improvements are due to the model's ability to localize the correct proposals.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 13 Feb 2020 • Tejas Srinivasan, Ramon Sanabria, Florian Metze
Speech is understood better by using visual context; for this reason, there have been many attempts to use images to adapt automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • EMNLP (IWSLT) 2019 • Tejas Srinivasan, Ramon Sanabria, Florian Metze
In Neural Machine Translation (NMT) the usage of subwords and characters as source and target units offers a simple and flexible solution for translation of rare and unseen words.
no code implementations • 30 Jun 2019 • Tejas Srinivasan, Ramon Sanabria, Florian Metze
Multimodal learning allows us to leverage information from multiple sources (visual, acoustic and text), similar to our experience of the real world.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 9 Nov 2018 • Ozan Caglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, Florian Metze
Specifically, in our previous work, we propose a multistep visual adaptive training approach which improves the accuracy of an audio-based Automatic Speech Recognition (ASR) system.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
2 code implementations • 1 Nov 2018 • Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, Florian Metze
In this paper, we introduce How2, a multimodal collection of instructional videos with English subtitles and crowdsourced Portuguese translations.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 18 Jul 2018 • Ramon Sanabria, Florian Metze
Our model obtains 14. 0% Word Error Rate on the Eval2000 Switchboard subset without any decoder or language model, outperforming the current state-of-the-art on acoustic-to-word models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 25 Apr 2018 • Shruti Palaskar, Ramon Sanabria, Florian Metze
Transcription or sub-titling of open-domain videos is still a challenging domain for Automatic Speech Recognition (ASR) due to the data's challenging acoustics, variable signal processing and the essentially unrestricted domain of the data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 21 Feb 2018 • Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W. black
Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains.
no code implementations • 19 Dec 2017 • Thomas Zenkel, Ramon Sanabria, Florian Metze, Alex Waibel
This paper proposes a novel approach to create an unit set for CTC based speech recognition systems.
no code implementations • 15 Aug 2017 • Thomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel
The CTC loss function maps an input sequence of observable feature vectors to an output sequence of symbols.
no code implementations • 21 Nov 2016 • Ramon Sanabria, Florian Metze, Fernando de la Torre
Speech is one of the most effective ways of communication among humans.