Search Results for author: Themos Stafylakis

Found 24 papers, 9 papers with code

DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors

1 code implementation • 7 Dec 2023 • Federico Landini, Mireia Diez, Themos Stafylakis, Lukáš Burget

Until recently, the field of speaker diarization was dominated by cascaded systems.

speaker-diarization Speaker Diarization

Paper
Code

A Simple Baseline for Knowledge-Based Visual Question Answering

no code implementations • 20 Oct 2023 • Alexandros Xenos, Themos Stafylakis, Ioannis Patras, Georgios Tzimiropoulos

This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA).

Ranked #5 on Visual Question Answering (VQA) on A-OKVQA (DA VQA Score metric)

In-Context Learning Question Answering +1

Paper
Add Code

Improving Speaker Verification with Self-Pretrained Transformer Models

no code implementations • 17 May 2023 • Junyi Peng, Oldřich Plchot, Themos Stafylakis, Ladislav Mošner, Lukáš Burget, Jan Černocký

Recently, fine-tuning large pre-trained Transformer models using downstream datasets has received a rising interest.

Speaker Verification

Paper
Add Code

Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothing

no code implementations • 3 Nov 2022 • Sofoklis Kakouros, Themos Stafylakis, Ladislav Mosner, Lukas Burget

When recognizing emotions from speech, we encounter two common problems: how to optimally capture emotion-relevant information from the speech signal and how to best quantify or categorize the noisy subjective emotion labels.

Emotion Recognition

Paper
Add Code

Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters

no code implementations • 28 Oct 2022 • Junyi Peng, Themos Stafylakis, Rongzhi Gu, Oldřich Plchot, Ladislav Mošner, Lukáš Burget, Jan Černocký

Recently, the pre-trained Transformer models have received a rising interest in the field of speech processing thanks to their great success in various downstream tasks.

Speaker Verification Transfer Learning

Paper
Add Code

Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations

no code implementations • 15 Oct 2022 • Themos Stafylakis, Ladislav Mosner, Sofoklis Kakouros, Oldrich Plchot, Lukas Burget, Jan Cernocky

Self-supervised learning of speech representations from large amounts of unlabeled data has enabled state-of-the-art results in several speech processing tasks.

Descriptive Self-Supervised Learning

Paper
Add Code

On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding

no code implementations • 11 Oct 2022 • Gaëlle Laperrière, Valentin Pelloin, Mickaël Rouvier, Themos Stafylakis, Yannick Estève

In this paper we examine the use of semantically-aligned speech representations for end-to-end spoken language understanding (SLU).

Sentence Sentence Embedding +2

Paper
Add Code

An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification

no code implementations • 3 Oct 2022 • Junyi Peng, Oldrich Plchot, Themos Stafylakis, Ladislav Mosner, Lukas Burget, Jan Cernocky

In recent years, self-supervised learning paradigm has received extensive attention due to its great success in various down-stream tasks.

Self-Supervised Learning Speaker Verification

Paper
Add Code

Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries

no code implementations • 29 Mar 2022 • Themos Stafylakis, Ladislav Mošner, Oldřich Plchot, Johan Rohdin, Anna Silnova, Lukáš Burget, Jan "Honza'' Černocký

In this paper, we demonstrate a method for training speaker embedding extractors using weak annotation.

speaker-diarization Speaker Diarization

Paper
Add Code

Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings

3 code implementations • 28 Mar 2022 • Niko Brümmer, Albert Swart, Ladislav Mošner, Anna Silnova, Oldřich Plchot, Themos Stafylakis, Lukáš Burget

In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring backends are commonly used, namely cosine scoring or PLDA.

Speaker Recognition

Paper
Code

Analyzing speaker verification embedding extractors and back-ends under language and channel mismatch

no code implementations • 19 Mar 2022 • Anna Silnova, Themos Stafylakis, Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Pavel Matejka, Lukas Burget, Ondrej Glembek, Niko Brummer

In this paper, we analyze the behavior and performance of speaker embeddings and the back-end scoring model under domain and language mismatch.

Attribute Speaker Verification

Paper
Add Code

Speaker embeddings by modeling channel-wise correlations

1 code implementation • 6 Apr 2021 • Themos Stafylakis, Johan Rohdin, Lukas Burget

Speaker embeddings extracted with deep 2D convolutional neural networks are typically modeled as projections of first and second order statistics of channel-frequency pairs onto a linear layer, using either average or attentive pooling along the time axis.

Speaker Recognition Style Transfer

Paper
Code

Seeing wake words: Audio-visual Keyword Spotting

1 code implementation • 2 Sep 2020 • Liliane Momeni, Triantafyllos Afouras, Themos Stafylakis, Samuel Albanie, Andrew Zisserman

The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio.

Lip Reading Visual Keyword Spotting

Paper
Code

Probabilistic embeddings for speaker diarization

1 code implementation • 6 Apr 2020 • Anna Silnova, Niko Brümmer, Johan Rohdin, Themos Stafylakis, Lukáš Burget

We apply the proposed probabilistic embeddings as input to an agglomerative hierarchical clustering (AHC) algorithm to do diarization in the DIHARD'19 evaluation set.

Clustering speaker-diarization +1

Paper
Code

Detecting Spoofing Attacks Using VGG and SincNet: BUT-Omilia Submission to ASVspoof 2019 Challenge

no code implementations • 13 Jul 2019 • Hossein Zeinali, Themos Stafylakis, Georgia Athanasopoulou, Johan Rohdin, Ioannis Gkinis, Lukáš Burget, Jan "Honza'' Černocký

In this paper, we present the system description of the joint efforts of Brno University of Technology (BUT) and Omilia -- Conversational Intelligence for the ASVSpoof2019 Spoofing and Countermeasures Challenge.

Paper
Add Code

Self-supervised speaker embeddings

no code implementations • 6 Apr 2019 • Themos Stafylakis, Johan Rohdin, Oldrich Plchot, Petr Mizera, Lukas Burget

Contrary to i-vectors, speaker embeddings such as x-vectors are incapable of leveraging unlabelled utterances, due to the classification loss over training speakers.

General Classification

Paper
Add Code

How to Improve Your Speaker Embeddings Extractor in Generic Toolkits

no code implementations • 5 Nov 2018 • Hossein Zeinali, Lukas Burget, Johan Rohdin, Themos Stafylakis, Jan Cernocky

Recently, speaker embeddings extracted with deep neural networks became the state-of-the-art method for speaker verification.

Speaker Verification

Paper
Add Code

Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs

no code implementations • 3 Nov 2018 • Themos Stafylakis, Muhammad Haris Khan, Georgios Tzimiropoulos

A further analysis on the utility of target word boundaries is provided, as well as on the capacity of the network in modeling the linguistic context of the target word.

Lipreading speech-recognition +1

Paper
Add Code

Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture

no code implementations • 28 Sep 2018 • Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Georgios Tzimiropoulos, Maja Pantic

Therefore, we could use a CTC loss in combination with an attention-based model in order to force monotonic alignments and at the same time get rid of the conditional independence assumption.

Ranked #5 on Audio-Visual Speech Recognition on LRS2

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Zero-shot keyword spotting for visual speech recognition in-the-wild

1 code implementation • ECCV 2018 • Themos Stafylakis, Georgios Tzimiropoulos

Visual keyword spotting (KWS) is the problem of estimating whether a text query occurs in a given recording using only video information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Gaussian meta-embeddings for efficient scoring of a heavy-tailed PLDA model

no code implementations • 27 Feb 2018 • Niko Brummer, Anna Silnova, Lukas Burget, Themos Stafylakis

Embeddings in machine learning are low-dimensional representations of complex input patterns, with the property that simple geometric operations like Euclidean distances and dot products can be used for classification and comparison tasks.

Speaker Recognition

Paper
Add Code

End-to-end Audiovisual Speech Recognition

2 code implementations • 18 Feb 2018 • Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Feipeng Cai, Georgios Tzimiropoulos, Maja Pantic

In presence of high levels of noise, the end-to-end audiovisual model significantly outperforms both audio-only models.

Ranked #17 on Lipreading on Lip Reading in the Wild

Lipreading speech-recognition +1

173

Paper
Code

Deep word embeddings for visual speech recognition

1 code implementation • 30 Oct 2017 • Themos Stafylakis, Georgios Tzimiropoulos

In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition.

Lipreading speech-recognition +2

Paper
Code

Combining Residual Networks with LSTMs for Lipreading

4 code implementations • 12 Mar 2017 • Themos Stafylakis, Georgios Tzimiropoulos

We propose an end-to-end deep learning architecture for word-level visual speech recognition.

Ranked #19 on Lipreading on Lip Reading in the Wild

Lipreading Lip Reading +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.