Search Results for author: Juan Zuluaga-Gomez

Found 22 papers, 12 papers with code

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

1 code implementation1 Nov 2023 Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico

Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers.

Automatic Speech Recognition speech-recognition +3

HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition

2 code implementations29 May 2023 Florian Mai, Juan Zuluaga-Gomez, Titouan Parcollet, Petr Motlicek

In particular, multi-head HyperConformer achieves comparable or higher recognition performance while being more efficient than Conformer in terms of inference speed, memory, parameter count, and available training data.

speech-recognition Speech Recognition

CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice

1 code implementation29 May 2023 Juan Zuluaga-Gomez, Sara Ahmed, Danielius Visockas, Cem Subakan

We introduce a simple-to-follow recipe aligned to the SpeechBrain toolkit for accent classification based on Common Voice 7. 0 (English) and Common Voice 11. 0 (Italian, German, and Spanish).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Breast Cancer Diagnosis Using Machine Learning Techniques

no code implementations4 May 2023 Juan Zuluaga-Gomez

Breast cancer is one of the most threatening diseases in women's life; thus, the early and accurate diagnosis plays a key role in reducing the risk of death in a patient's life.

Image Classification

A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers

no code implementations16 Apr 2023 Juan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlicek, Matthias Kleinert

The overall pipeline is composed of the following submodules: (i) automatic speech recognition (ASR) system that transforms audio into a sequence of words; (ii) high-level air traffic control (ATC) related entity parser that understands the transcribed voice communication; and (iii) a text-to-speech submodule that generates a spoken utterance that resembles a pilot based on the situation of the dialogue.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator

no code implementations14 Dec 2022 Amrutha Prasad, Juan Zuluaga-Gomez, Petr Motlicek, Saeed Sarfjoo, Iuliia Nigmatulina, Karel Vesely

The system understands the voice communications issued by the ATCo, and, in turn, it generates a spoken prompt that follows the pilot's phraseology to the initial communication.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications

2 code implementations31 Mar 2022 Juan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Saeed Sarfjoo, Petr Motlicek, Matthias Kleinert, Hartmut Helmke, Oliver Ohneiser, Qingran Zhan

Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine-tuned on downstream tasks e. g., automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A Survey of Breast Cancer Screening Techniques: Thermography and Electrical Impedance Tomography

no code implementations8 Feb 2022 Juan Zuluaga-Gomez, N. Zerhouni, Z. Al Masry, C. Devalland, C. Varnier

Breast cancer is a disease that threatens many women's life, thus, early and accurate detection plays a key role in reducing the mortality rate.

A two-step approach to leverage contextual data: speech recognition in air-traffic communications

no code implementations8 Feb 2022 Iuliia Nigmatulina, Juan Zuluaga-Gomez, Amrutha Prasad, Seyyed Saeed Sarfjoo, Petr Motlicek

Automatic Speech Recognition (ASR), as the assistance of speech communication between pilots and air-traffic controllers, can significantly reduce the complexity of the task and increase the reliability of transmitted information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications

2 code implementations12 Oct 2021 Juan Zuluaga-Gomez, Seyyed Saeed Sarfjoo, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlicek, Karel Ondrej, Oliver Ohneiser, Hartmut Helmke

We propose a system that combines SAD and a BERT model to perform speaker change detection and speaker role detection (SRD) by chunking ASR transcripts, i. e., SD with a defined number of speakers together with SRD.

Action Detection Activity Detection +7

Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition

no code implementations27 Aug 2021 Amrutha Prasad, Juan Zuluaga-Gomez, Petr Motlicek, Saeed Sarfjoo, Iuliia Nigmatulina, Oliver Ohneiser, Hartmut Helmke

In this work, we propose to (1) automatically segment the ATCO and pilot data based on an intuitive approach exploiting ASR transcripts and (2) subsequently consider an automatic recognition of ATCOs' and pilots' voice as two separate tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Pkwrap: a PyTorch Package for LF-MMI Training of Acoustic Models

1 code implementation7 Oct 2020 Srikanth Madikeri, Sibo Tong, Juan Zuluaga-Gomez, Apoorv Vyas, Petr Motlicek, Hervé Bourlard

We present a simple wrapper that is useful to train acoustic models in PyTorch using Kaldi's LF-MMI training framework.

Audio and Speech Processing Sound

Automatic Speech Recognition Benchmark for Air-Traffic Communications

3 code implementations18 Jun 2020 Juan Zuluaga-Gomez, Petr Motlicek, Qingran Zhan, Karel Vesely, Rudolf Braun

We demonstrate that the cross-accent flaws due to speakers' accents are minimized due to the amount of data, making the system feasible for ATC environments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A CNN-based methodology for breast cancer diagnosis using thermal images

no code implementations30 Oct 2019 Juan Zuluaga-Gomez, Zeina Al Masry, Khaled Benaggoune, Safa Meraghni, Noureddine Zerhouni

Methods: We performed a study of the influence of data pre-processing, data augmentation and database size versus a proposed set of CNN models.

Data Augmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.