Search Results for author: Themos Stafylakis

Found 24 papers, 9 papers with code

Improving Speaker Verification with Self-Pretrained Transformer Models

no code implementations17 May 2023 Junyi Peng, Oldřich Plchot, Themos Stafylakis, Ladislav Mošner, Lukáš Burget, Jan Černocký

Recently, fine-tuning large pre-trained Transformer models using downstream datasets has received a rising interest.

Speaker Verification

Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothing

no code implementations3 Nov 2022 Sofoklis Kakouros, Themos Stafylakis, Ladislav Mosner, Lukas Burget

When recognizing emotions from speech, we encounter two common problems: how to optimally capture emotion-relevant information from the speech signal and how to best quantify or categorize the noisy subjective emotion labels.

Emotion Recognition

Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters

no code implementations28 Oct 2022 Junyi Peng, Themos Stafylakis, Rongzhi Gu, Oldřich Plchot, Ladislav Mošner, Lukáš Burget, Jan Černocký

Recently, the pre-trained Transformer models have received a rising interest in the field of speech processing thanks to their great success in various downstream tasks.

Speaker Verification Transfer Learning

Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations

no code implementations15 Oct 2022 Themos Stafylakis, Ladislav Mosner, Sofoklis Kakouros, Oldrich Plchot, Lukas Burget, Jan Cernocky

Self-supervised learning of speech representations from large amounts of unlabeled data has enabled state-of-the-art results in several speech processing tasks.

Descriptive Self-Supervised Learning

Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings

3 code implementations28 Mar 2022 Niko Brümmer, Albert Swart, Ladislav Mošner, Anna Silnova, Oldřich Plchot, Themos Stafylakis, Lukáš Burget

In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring backends are commonly used, namely cosine scoring or PLDA.

Speaker Recognition

Speaker embeddings by modeling channel-wise correlations

1 code implementation6 Apr 2021 Themos Stafylakis, Johan Rohdin, Lukas Burget

Speaker embeddings extracted with deep 2D convolutional neural networks are typically modeled as projections of first and second order statistics of channel-frequency pairs onto a linear layer, using either average or attentive pooling along the time axis.

Speaker Recognition Style Transfer

Seeing wake words: Audio-visual Keyword Spotting

1 code implementation2 Sep 2020 Liliane Momeni, Triantafyllos Afouras, Themos Stafylakis, Samuel Albanie, Andrew Zisserman

The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio.

Lip Reading Visual Keyword Spotting

Probabilistic embeddings for speaker diarization

1 code implementation6 Apr 2020 Anna Silnova, Niko Brümmer, Johan Rohdin, Themos Stafylakis, Lukáš Burget

We apply the proposed probabilistic embeddings as input to an agglomerative hierarchical clustering (AHC) algorithm to do diarization in the DIHARD'19 evaluation set.

Clustering speaker-diarization +1

Detecting Spoofing Attacks Using VGG and SincNet: BUT-Omilia Submission to ASVspoof 2019 Challenge

no code implementations13 Jul 2019 Hossein Zeinali, Themos Stafylakis, Georgia Athanasopoulou, Johan Rohdin, Ioannis Gkinis, Lukáš Burget, Jan "Honza'' Černocký

In this paper, we present the system description of the joint efforts of Brno University of Technology (BUT) and Omilia -- Conversational Intelligence for the ASVSpoof2019 Spoofing and Countermeasures Challenge.

Self-supervised speaker embeddings

no code implementations6 Apr 2019 Themos Stafylakis, Johan Rohdin, Oldrich Plchot, Petr Mizera, Lukas Burget

Contrary to i-vectors, speaker embeddings such as x-vectors are incapable of leveraging unlabelled utterances, due to the classification loss over training speakers.

General Classification

How to Improve Your Speaker Embeddings Extractor in Generic Toolkits

no code implementations5 Nov 2018 Hossein Zeinali, Lukas Burget, Johan Rohdin, Themos Stafylakis, Jan Cernocky

Recently, speaker embeddings extracted with deep neural networks became the state-of-the-art method for speaker verification.

Speaker Verification

Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs

no code implementations3 Nov 2018 Themos Stafylakis, Muhammad Haris Khan, Georgios Tzimiropoulos

A further analysis on the utility of target word boundaries is provided, as well as on the capacity of the network in modeling the linguistic context of the target word.

Lipreading speech-recognition +1

Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture

no code implementations28 Sep 2018 Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Georgios Tzimiropoulos, Maja Pantic

Therefore, we could use a CTC loss in combination with an attention-based model in order to force monotonic alignments and at the same time get rid of the conditional independence assumption.

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +3

Gaussian meta-embeddings for efficient scoring of a heavy-tailed PLDA model

no code implementations27 Feb 2018 Niko Brummer, Anna Silnova, Lukas Burget, Themos Stafylakis

Embeddings in machine learning are low-dimensional representations of complex input patterns, with the property that simple geometric operations like Euclidean distances and dot products can be used for classification and comparison tasks.

Speaker Recognition

End-to-end Audiovisual Speech Recognition

2 code implementations18 Feb 2018 Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Feipeng Cai, Georgios Tzimiropoulos, Maja Pantic

In presence of high levels of noise, the end-to-end audiovisual model significantly outperforms both audio-only models.

Lipreading speech-recognition +1

Deep word embeddings for visual speech recognition

1 code implementation30 Oct 2017 Themos Stafylakis, Georgios Tzimiropoulos

In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition.

Lipreading speech-recognition +2

Cannot find the paper you are looking for? You can Submit a new open access paper.