Search Results for author: Ramon Sanabria

Found 23 papers, 3 papers with code

The IWSLT 2019 Evaluation Campaign

no code implementations EMNLP (IWSLT) 2019 Jan Niehues, Rolando Cattoni, Sebastian Stüker, Matteo Negri, Marco Turchi, Thanh-Le Ha, Elizabeth Salesky, Ramon Sanabria, Loic Barrault, Lucia Specia, Marcello Federico

The IWSLT 2019 evaluation campaign featured three tasks: speech translation of (i) TED talks and (ii) How2 instructional videos from English into German and Portuguese, and (iii) text translation of TED talks from English into Czech.

Translation

CMU’s Machine Translation System for IWSLT 2019

no code implementations EMNLP (IWSLT) 2019 Tejas Srinivasan, Ramon Sanabria, Florian Metze

In Neural Machine Translation (NMT) the usage of sub-words and characters as source and target units offers a simple and flexible solution for translation of rare and unseen words.

Machine Translation NMT +1

Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems

no code implementations2 Apr 2024 Frank Palma Gomez, Ramon Sanabria, Yun-Hsuan Sung, Daniel Cer, Siddharth Dalmia, Gustavo Hernandez Abrego

Our multi-modal LLM-based retrieval system is capable of matching speech and text in 102 languages despite only training on 21 languages.

Machine Translation Retrieval +1

Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition

no code implementations4 Feb 2024 Alexandra Saliba, Yuanchao Li, Ramon Sanabria, Catherine Lai

Through a comparative experiment and a layer-wise accuracy analysis on two distinct corpora, IEMOCAP and ESD, we explore differences between AWEs and raw self-supervised representations, as well as the proper utilization of AWEs alone and in combination with word embeddings.

Speech Emotion Recognition Word Embeddings

The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR

no code implementations31 Mar 2023 Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, Ondrej Klejch, Peter Bell

Although the great many advances in English automatic speech recognition (ASR) over the past decades, results are usually reported based on test datasets which fail to represent the diversity of English as spoken today around the globe.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Analyzing Acoustic Word Embeddings from Pre-trained Self-supervised Speech Models

no code implementations28 Oct 2022 Ramon Sanabria, Hao Tang, Sharon Goldwater

Given the strong results of self-supervised models on various tasks, there have been surprisingly few studies exploring self-supervised representations for acoustic word embeddings (AWE), fixed-dimensional vectors representing variable-length spoken word segments.

Word Embeddings

Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training

no code implementations1 Mar 2022 Ramon Sanabria, Wei-Ning Hsu, Alexei Baevski, Michael Auli

In this paper, we present a controlled study to better understand the effect of such factors on the performance of pre-trained representations on automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval

no code implementations5 Apr 2021 Ramon Sanabria, Austin Waters, Jason Baldridge

Speech-based image retrieval has been studied as a proxy for joint representation learning, usually without emphasis on retrieval itself.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Multimodal Speech Recognition with Unstructured Audio Masking

no code implementations EMNLP (nlpbt) 2020 Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott

Our experiments on the Flickr 8K Audio Captions Corpus show that multimodal ASR can generalize to recover different types of masked words in this unstructured masking setting.

8k Automatic Speech Recognition +2

Fine-Grained Grounding for Multimodal Speech Recognition

1 code implementation Findings of the Association for Computational Linguistics 2020 Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott

In experiments on the Flickr8K Audio Captions Corpus, we find that our model improves over approaches that use global visual features, that the proposals enable the model to recover entities and other related words, such as adjectives, and that improvements are due to the model's ability to localize the correct proposals.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Looking Enhances Listening: Recovering Missing Speech Using Images

no code implementations13 Feb 2020 Tejas Srinivasan, Ramon Sanabria, Florian Metze

Speech is understood better by using visual context; for this reason, there have been many attempts to use images to adapt automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Multitask Learning For Different Subword Segmentations In Neural Machine Translation

no code implementations EMNLP (IWSLT) 2019 Tejas Srinivasan, Ramon Sanabria, Florian Metze

In Neural Machine Translation (NMT) the usage of subwords and characters as source and target units offers a simple and flexible solution for translation of rare and unseen words.

Machine Translation NMT +2

Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions

no code implementations30 Jun 2019 Tejas Srinivasan, Ramon Sanabria, Florian Metze

Multimodal learning allows us to leverage information from multiple sources (visual, acoustic and text), similar to our experience of the real world.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Multimodal Grounding for Sequence-to-Sequence Speech Recognition

1 code implementation9 Nov 2018 Ozan Caglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, Florian Metze

Specifically, in our previous work, we propose a multistep visual adaptive training approach which improves the accuracy of an audio-based Automatic Speech Recognition (ASR) system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

How2: A Large-scale Dataset for Multimodal Language Understanding

2 code implementations1 Nov 2018 Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, Florian Metze

In this paper, we introduce How2, a multimodal collection of instructional videos with English subtitles and crowdsourced Portuguese translations.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Hierarchical Multi Task Learning With CTC

no code implementations18 Jul 2018 Ramon Sanabria, Florian Metze

Our model obtains 14. 0% Word Error Rate on the Eval2000 Switchboard subset without any decoder or language model, outperforming the current state-of-the-art on acoustic-to-word models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

End-to-End Multimodal Speech Recognition

no code implementations25 Apr 2018 Shruti Palaskar, Ramon Sanabria, Florian Metze

Transcription or sub-titling of open-domain videos is still a challenging domain for Automatic Speech Recognition (ASR) due to the data's challenging acoustics, variable signal processing and the essentially unrestricted domain of the data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Sequence-based Multi-lingual Low Resource Speech Recognition

no code implementations21 Feb 2018 Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W. black

Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains.

speech-recognition Speech Recognition

Subword and Crossword Units for CTC Acoustic Models

no code implementations19 Dec 2017 Thomas Zenkel, Ramon Sanabria, Florian Metze, Alex Waibel

This paper proposes a novel approach to create an unit set for CTC based speech recognition systems.

Language Modelling speech-recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.