Search Results for author: Ramon Sanabria

Found 23 papers, 3 papers with code

The IWSLT 2019 Evaluation Campaign

no code implementations • EMNLP (IWSLT) 2019 • Jan Niehues, Rolando Cattoni, Sebastian Stüker, Matteo Negri, Marco Turchi, Thanh-Le Ha, Elizabeth Salesky, Ramon Sanabria, Loic Barrault, Lucia Specia, Marcello Federico

The IWSLT 2019 evaluation campaign featured three tasks: speech translation of (i) TED talks and (ii) How2 instructional videos from English into German and Portuguese, and (iii) text translation of TED talks from English into Czech.

Translation

Paper
Add Code

CMU’s Machine Translation System for IWSLT 2019

no code implementations • EMNLP (IWSLT) 2019 • Tejas Srinivasan, Ramon Sanabria, Florian Metze

In Neural Machine Translation (NMT) the usage of sub-words and characters as source and target units offers a simple and flexible solution for translation of rare and unseen words.

Machine Translation NMT +1

Paper
Add Code

Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems

no code implementations • 2 Apr 2024 • Frank Palma Gomez, Ramon Sanabria, Yun-Hsuan Sung, Daniel Cer, Siddharth Dalmia, Gustavo Hernandez Abrego

Our multi-modal LLM-based retrieval system is capable of matching speech and text in 102 languages despite only training on 21 languages.

Machine Translation Retrieval +1

Paper
Add Code

Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition

no code implementations • 4 Feb 2024 • Alexandra Saliba, Yuanchao Li, Ramon Sanabria, Catherine Lai

Through a comparative experiment and a layer-wise accuracy analysis on two distinct corpora, IEMOCAP and ESD, we explore differences between AWEs and raw self-supervised representations, as well as the proper utilization of AWEs alone and in combination with word embeddings.

Speech Emotion Recognition Word Embeddings

Paper
Add Code

Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

no code implementations • 3 Jun 2023 • Ramon Sanabria, Ondrej Klejch, Hao Tang, Sharon Goldwater

Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units.

Word Embeddings

Paper
Add Code

The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR

no code implementations • 31 Mar 2023 • Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, Ondrej Klejch, Peter Bell

Although the great many advances in English automatic speech recognition (ASR) over the past decades, results are usually reported based on test datasets which fail to represent the diversity of English as spoken today around the globe.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Analyzing Acoustic Word Embeddings from Pre-trained Self-supervised Speech Models

no code implementations • 28 Oct 2022 • Ramon Sanabria, Hao Tang, Sharon Goldwater

Given the strong results of self-supervised models on various tasks, there have been surprisingly few studies exploring self-supervised representations for acoustic word embeddings (AWE), fixed-dimensional vectors representing variable-length spoken word segments.

Word Embeddings

Paper
Add Code

Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training

no code implementations • 1 Mar 2022 • Ramon Sanabria, Wei-Ning Hsu, Alexei Baevski, Michael Auli

In this paper, we present a controlled study to better understand the effect of such factors on the performance of pre-trained representations on automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

On the Difficulty of Segmenting Words with Attention

no code implementations • EMNLP (insights) 2021 • Ramon Sanabria, Hao Tang, Sharon Goldwater

Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks.

Segmentation speech-recognition +2

Paper
Add Code

Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval

no code implementations • 5 Apr 2021 • Ramon Sanabria, Austin Waters, Jason Baldridge

Speech-based image retrieval has been studied as a proxy for joint representation learning, usually without emphasis on retrieval itself.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Multimodal Speech Recognition with Unstructured Audio Masking

no code implementations • EMNLP (nlpbt) 2020 • Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott

Our experiments on the Flickr 8K Audio Captions Corpus show that multimodal ASR can generalize to recover different types of masked words in this unstructured masking setting.

8k Automatic Speech Recognition +2

Paper
Add Code

Fine-Grained Grounding for Multimodal Speech Recognition

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott

In experiments on the Flickr8K Audio Captions Corpus, we find that our model improves over approaches that use global visual features, that the proposals enable the model to recover entities and other related words, such as adjectives, and that improvements are due to the model's ability to localize the correct proposals.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Looking Enhances Listening: Recovering Missing Speech Using Images

no code implementations • 13 Feb 2020 • Tejas Srinivasan, Ramon Sanabria, Florian Metze

Speech is understood better by using visual context; for this reason, there have been many attempts to use images to adapt automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Multitask Learning For Different Subword Segmentations In Neural Machine Translation

no code implementations • EMNLP (IWSLT) 2019 • Tejas Srinivasan, Ramon Sanabria, Florian Metze

In Neural Machine Translation (NMT) the usage of subwords and characters as source and target units offers a simple and flexible solution for translation of rare and unseen words.

Machine Translation NMT +2

Paper
Add Code

Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions

no code implementations • 30 Jun 2019 • Tejas Srinivasan, Ramon Sanabria, Florian Metze

Multimodal learning allows us to leverage information from multiple sources (visual, acoustic and text), similar to our experience of the real world.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Multimodal Grounding for Sequence-to-Sequence Speech Recognition

1 code implementation • 9 Nov 2018 • Ozan Caglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, Florian Metze

Specifically, in our previous work, we propose a multistep visual adaptive training approach which improves the accuracy of an audio-based Automatic Speech Recognition (ASR) system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

151

Paper
Code

How2: A Large-scale Dataset for Multimodal Language Understanding

2 code implementations • 1 Nov 2018 • Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, Florian Metze

In this paper, we introduce How2, a multimodal collection of instructional videos with English subtitles and crowdsourced Portuguese translations.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

151

Paper
Code

Hierarchical Multi Task Learning With CTC

no code implementations • 18 Jul 2018 • Ramon Sanabria, Florian Metze

Our model obtains 14. 0% Word Error Rate on the Eval2000 Switchboard subset without any decoder or language model, outperforming the current state-of-the-art on acoustic-to-word models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

End-to-End Multimodal Speech Recognition

no code implementations • 25 Apr 2018 • Shruti Palaskar, Ramon Sanabria, Florian Metze

Transcription or sub-titling of open-domain videos is still a challenging domain for Automatic Speech Recognition (ASR) due to the data's challenging acoustics, variable signal processing and the essentially unrestricted domain of the data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Sequence-based Multi-lingual Low Resource Speech Recognition

no code implementations • 21 Feb 2018 • Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W. black

Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains.

speech-recognition Speech Recognition