no code implementations • 18 Jun 2024 • Themos Stafylakis, Anna Silnova, Johan Rohdin, Oldrich Plchot, Lukas Burget
Speaker embedding extractors are typically trained using a classification loss over the training speakers.
no code implementations • 10 Jun 2024 • Christos Vlachos, Themos Stafylakis, Ion Androutsopoulos
Data augmentation (DA), whereby synthetic training examples are added to the training data, has been successful in other NLP systems, but has not been explored as extensively in ToDSs.
1 code implementation • 7 Dec 2023 • Federico Landini, Mireia Diez, Themos Stafylakis, Lukáš Burget
Until recently, the field of speaker diarization was dominated by cascaded systems.
no code implementations • 20 Oct 2023 • Alexandros Xenos, Themos Stafylakis, Ioannis Patras, Georgios Tzimiropoulos
This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA).
Ranked #5 on Visual Question Answering (VQA) on A-OKVQA (DA VQA Score metric)
no code implementations • 17 May 2023 • Junyi Peng, Oldřich Plchot, Themos Stafylakis, Ladislav Mošner, Lukáš Burget, Jan Černocký
Recently, fine-tuning large pre-trained Transformer models using downstream datasets has received a rising interest.
no code implementations • 3 Nov 2022 • Sofoklis Kakouros, Themos Stafylakis, Ladislav Mosner, Lukas Burget
When recognizing emotions from speech, we encounter two common problems: how to optimally capture emotion-relevant information from the speech signal and how to best quantify or categorize the noisy subjective emotion labels.
no code implementations • 28 Oct 2022 • Junyi Peng, Themos Stafylakis, Rongzhi Gu, Oldřich Plchot, Ladislav Mošner, Lukáš Burget, Jan Černocký
Recently, the pre-trained Transformer models have received a rising interest in the field of speech processing thanks to their great success in various downstream tasks.
no code implementations • 15 Oct 2022 • Themos Stafylakis, Ladislav Mosner, Sofoklis Kakouros, Oldrich Plchot, Lukas Burget, Jan Cernocky
Self-supervised learning of speech representations from large amounts of unlabeled data has enabled state-of-the-art results in several speech processing tasks.
no code implementations • 11 Oct 2022 • Gaëlle Laperrière, Valentin Pelloin, Mickaël Rouvier, Themos Stafylakis, Yannick Estève
In this paper we examine the use of semantically-aligned speech representations for end-to-end spoken language understanding (SLU).
no code implementations • 3 Oct 2022 • Junyi Peng, Oldrich Plchot, Themos Stafylakis, Ladislav Mosner, Lukas Burget, Jan Cernocky
In recent years, self-supervised learning paradigm has received extensive attention due to its great success in various down-stream tasks.
no code implementations • 29 Mar 2022 • Themos Stafylakis, Ladislav Mošner, Oldřich Plchot, Johan Rohdin, Anna Silnova, Lukáš Burget, Jan "Honza'' Černocký
In this paper, we demonstrate a method for training speaker embedding extractors using weak annotation.
3 code implementations • 28 Mar 2022 • Niko Brümmer, Albert Swart, Ladislav Mošner, Anna Silnova, Oldřich Plchot, Themos Stafylakis, Lukáš Burget
In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring backends are commonly used, namely cosine scoring or PLDA.
no code implementations • 19 Mar 2022 • Anna Silnova, Themos Stafylakis, Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Pavel Matejka, Lukas Burget, Ondrej Glembek, Niko Brummer
In this paper, we analyze the behavior and performance of speaker embeddings and the back-end scoring model under domain and language mismatch.
1 code implementation • 6 Apr 2021 • Themos Stafylakis, Johan Rohdin, Lukas Burget
Speaker embeddings extracted with deep 2D convolutional neural networks are typically modeled as projections of first and second order statistics of channel-frequency pairs onto a linear layer, using either average or attentive pooling along the time axis.
1 code implementation • 2 Sep 2020 • Liliane Momeni, Triantafyllos Afouras, Themos Stafylakis, Samuel Albanie, Andrew Zisserman
The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio.
1 code implementation • 6 Apr 2020 • Anna Silnova, Niko Brümmer, Johan Rohdin, Themos Stafylakis, Lukáš Burget
We apply the proposed probabilistic embeddings as input to an agglomerative hierarchical clustering (AHC) algorithm to do diarization in the DIHARD'19 evaluation set.
no code implementations • 13 Jul 2019 • Hossein Zeinali, Themos Stafylakis, Georgia Athanasopoulou, Johan Rohdin, Ioannis Gkinis, Lukáš Burget, Jan "Honza'' Černocký
In this paper, we present the system description of the joint efforts of Brno University of Technology (BUT) and Omilia -- Conversational Intelligence for the ASVSpoof2019 Spoofing and Countermeasures Challenge.
no code implementations • 6 Apr 2019 • Themos Stafylakis, Johan Rohdin, Oldrich Plchot, Petr Mizera, Lukas Burget
Contrary to i-vectors, speaker embeddings such as x-vectors are incapable of leveraging unlabelled utterances, due to the classification loss over training speakers.
no code implementations • 5 Nov 2018 • Hossein Zeinali, Lukas Burget, Johan Rohdin, Themos Stafylakis, Jan Cernocky
Recently, speaker embeddings extracted with deep neural networks became the state-of-the-art method for speaker verification.
no code implementations • 3 Nov 2018 • Themos Stafylakis, Muhammad Haris Khan, Georgios Tzimiropoulos
A further analysis on the utility of target word boundaries is provided, as well as on the capacity of the network in modeling the linguistic context of the target word.
no code implementations • 28 Sep 2018 • Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Georgios Tzimiropoulos, Maja Pantic
Therefore, we could use a CTC loss in combination with an attention-based model in order to force monotonic alignments and at the same time get rid of the conditional independence assumption.
Ranked #5 on Audio-Visual Speech Recognition on LRS2
Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • ECCV 2018 • Themos Stafylakis, Georgios Tzimiropoulos
Visual keyword spotting (KWS) is the problem of estimating whether a text query occurs in a given recording using only video information.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 27 Feb 2018 • Niko Brummer, Anna Silnova, Lukas Burget, Themos Stafylakis
Embeddings in machine learning are low-dimensional representations of complex input patterns, with the property that simple geometric operations like Euclidean distances and dot products can be used for classification and comparison tasks.
2 code implementations • IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018 • Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Feipeng Cai, Georgios Tzimiropoulos, Maja Pantic
In presence of high levels of noise, the end-to-end audiovisual model significantly outperforms both audio-only models.
Ranked #20 on Lipreading on Lip Reading in the Wild
1 code implementation • 30 Oct 2017 • Themos Stafylakis, Georgios Tzimiropoulos
In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition.
4 code implementations • 12 Mar 2017 • Themos Stafylakis, Georgios Tzimiropoulos
We propose an end-to-end deep learning architecture for word-level visual speech recognition.
Ranked #22 on Lipreading on Lip Reading in the Wild