Search Results for author: Tamás Grósz

Found 9 papers, 3 papers with code

Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference

no code implementations • 16 Oct 2023 • Dejan Porjazovski, Yaroslav Getman, Tamás Grósz, Mikko Kurimo

In this paper, we employ large pre-trained models for the ACM Multimedia Computational Paralinguistics Challenge, addressing the Requests and Emotion Share tasks.

Bayesian Inference Emotion Recognition +1

Paper
Add Code

Topic Identification For Spontaneous Speech: Enriching Audio Features With Embedded Linguistic Information

1 code implementation • 21 Jul 2023 • Dejan Porjazovski, Tamás Grósz, Mikko Kurimo

Traditional topic identification solutions from audio rely on an automatic speech recognition system (ASR) to produce transcripts used as input to a text-based model.

Automatic Speech Recognition speech-recognition +1

Paper
Code

End-to-end Ensemble-based Feature Selection for Paralinguistics Tasks

no code implementations • 28 Oct 2022 • Tamás Grósz, Mittul Singh, Sudarsana Reddy Kadiri, Hemant Kathania, Mikko Kurimo

The current state-of-the-art methods proposed for these tasks are ensembles based on deep neural networks like ResNets in conjunction with feature engineering.

Feature Engineering feature selection

Paper
Add Code

Comparison and Analysis of New Curriculum Criteria for End-to-End ASR

1 code implementation • 10 Aug 2022 • Georgios Karakasidis, Tamás Grósz, Mikko Kurimo

We hypothesize that end-to-end models can achieve better performance when provided with an organized training set consisting of examples that exhibit an increasing level of difficulty (i. e. a curriculum).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Lahjoita puhetta -- a large-scale corpus of spoken Finnish with some benchmarks

no code implementations • 24 Mar 2022 • Anssi Moisio, Dejan Porjazovski, Aku Rouhe, Yaroslav Getman, Anja Virkkunen, Tamás Grósz, Krister Lindén, Mikko Kurimo

The Donate Speech campaign has so far succeeded in gathering approximately 3600 hours of ordinary, colloquial Finnish speech into the Lahjoita puhetta (Donate Speech) corpus.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Data augmentation using prosody and false starts to recognize non-native children's speech

1 code implementation • 29 Aug 2020 • Hemant Kathania, Mittul Singh, Tamás Grósz, Mikko Kurimo

Firstly, we apply the prosody-based data augmentation to supplement the audio data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Aalto's End-to-End DNN systems for the INTERSPEECH 2020 Computational Paralinguistics Challenge

no code implementations • 6 Aug 2020 • Tamás Grósz, Mittul Singh, Sudarsana Reddy Kadiri, Hemant Kathania, Mikko Kurimo

On ComParE 2020 tasks, we investigate applying an ensemble of E2E models for robust performance and developing task-specific modifications for each task.

Feature Engineering

Paper
Add Code

Ultrasound-based Silent Speech Interface Built on a Continuous Vocoder

no code implementations • 24 Jun 2019 • Tamás Gábor Csapó, Mohammed Salah Al-Radhi, Géza Németh, Gábor Gosztolya, Tamás Grósz, László Tóth, Alexandra Markó

Recently it was shown that within the Silent Speech Interface (SSI) field, the prediction of F0 is possible from Ultrasound Tongue Images (UTI) as the articulatory input, using Deep Neural Networks for articulatory-to-acoustic mapping.

Sound Audio and Speech Processing

Paper
Add Code

GMM-Free Flat Start Sequence-Discriminative DNN Training

no code implementations • 11 Oct 2016 • Gábor Gosztolya, Tamás Grósz, László Tóth

Recently, attempts have been made to remove Gaussian mixture models (GMM) from the training process of deep neural network-based hidden Markov models (HMM/DNN).

Clustering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.