Search Results for author: Stavros Petridis

Found 41 papers, 12 papers with code

Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation

no code implementations6 Jan 2023 Michał Stypułkowski, Konstantinos Vougioukas, Sen He, Maciej Zięba, Stavros Petridis, Maja Pantic

Talking face generation has historically struggled to produce head movements and natural facial expressions without guidance from additional reference videos.

Talking Face Generation Video Generation

Jointly Learning Visual and Auditory Speech Representations from Raw Data

no code implementations12 Dec 2022 Alexandros Haliassos, Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Maja Pantic

We observe strong results in low- and high-resource labelled data settings when fine-tuning the visual and auditory encoders resulting from a single pre-training stage, in which the encoders are jointly trained.

 Ranked #1 on Speech Recognition on LRS2 (using extra training data)

Audio-Visual Speech Recognition Lipreading +2

LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

no code implementations20 Nov 2022 Rodrigo Mira, Buye Xu, Jacob Donley, Anurag Kumar, Stavros Petridis, Vamsi Krishna Ithapu, Maja Pantic

Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging not only the audio itself but also the target speaker's lip movements.

Speech Enhancement Speech Synthesis

Streaming Audio-Visual Speech Recognition with Alignment Regularization

no code implementations3 Nov 2022 Pingchuan Ma, Niko Moritz, Stavros Petridis, Christian Fuegen, Maja Pantic

The audio and the visual encoder neural networks are both based on the conformer architecture, which is made streamable using chunk-wise self-attention (CSA) and causal convolution.

Audio-Visual Speech Recognition Automatic Speech Recognition +5

SS-VAERR: Self-Supervised Apparent Emotional Reaction Recognition from Video

no code implementations20 Oct 2022 Marija Jegorova, Stavros Petridis, Maja Pantic

This work focuses on the apparent emotional reaction recognition (AERR) from the video-only input, conducted in a self-supervised fashion.

Self-Supervised Learning

Training Strategies for Improved Lip-reading

1 code implementation3 Sep 2022 Pingchuan Ma, Yujiang Wang, Stavros Petridis, Jie Shen, Maja Pantic

In this paper, we systematically investigate the performance of state-of-the-art data augmentation approaches, temporal models and other training strategies, like self-distillation and using word boundary indicators.

 Ranked #1 on Lipreading on Lip Reading in the Wild (using extra training data)

Data Augmentation Lipreading +1

SVTS: Scalable Video-to-Speech Synthesis

no code implementations4 May 2022 Rodrigo Mira, Alexandros Haliassos, Stavros Petridis, Björn W. Schuller, Maja Pantic

Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip movements into the corresponding audio.

Speech Synthesis

Self-supervised Video-centralised Transformer for Video Face Clustering

no code implementations24 Mar 2022 Yujiang Wang, Mingzhi Dong, Jie Shen, Yiming Luo, Yiming Lin, Pingchuan Ma, Stavros Petridis, Maja Pantic

We also investigate face clustering in egocentric videos, a fast-emerging field that has not been studied yet in works related to face clustering.

Contrastive Learning Face Clustering

Visual Speech Recognition for Multiple Languages in the Wild

2 code implementations26 Feb 2022 Pingchuan Ma, Stavros Petridis, Maja Pantic

However, these advances are usually due to the larger training sets rather than the model design.

 Ranked #1 on Lipreading on GRID corpus (mixed-speech) (using extra training data)

Hyperparameter Optimization Lipreading +2

Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection

1 code implementation CVPR 2022 Alexandros Haliassos, Rodrigo Mira, Stavros Petridis, Maja Pantic

One of the most pressing challenges for the detection of face-manipulated videos is generalising to forgery methods not seen during training while remaining effective under common corruptions such as compression.

Domain Generalisation for Apparent Emotional Facial Expression Recognition across Age-Groups

no code implementations18 Oct 2021 Rafael Poyiadzi, Jie Shen, Stavros Petridis, Yujiang Wang, Maja Pantic

We then study the effect of variety and number of age-groups used during training on generalisation to unseen age-groups and observe that an increase in the number of training age-groups tends to increase the apparent emotional facial expression recognition performance on unseen age-groups.

Facial Expression Recognition (FER)

LiRA: Learning Visual Speech Representations from Audio through Self-supervision

no code implementations16 Jun 2021 Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Björn W. Schuller, Maja Pantic

The large amount of audiovisual content being shared online today has drawn substantial attention to the prospect of audiovisual self-supervised learning.

Lip Reading Self-Supervised Learning

End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks

no code implementations27 Apr 2021 Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Björn W. Schuller, Maja Pantic

In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm.

Lip Reading Speech Synthesis

DINO: A Conditional Energy-Based GAN for Domain Translation

1 code implementation ICLR 2021 Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

Domain translation is the process of transforming data from one domain to another while preserving the common semantics.

Translation

End-to-end Audio-visual Speech Recognition with Conformers

1 code implementation12 Feb 2021 Pingchuan Ma, Stavros Petridis, Maja Pantic

In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and Convolution-augmented transformer (Conformer), that can be trained in an end-to-end manner.

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +5

Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection

1 code implementation CVPR 2021 Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

Extensive experiments show that this simple approach significantly surpasses the state-of-the-art in terms of generalisation to unseen manipulations and robustness to perturbations, as well as shed light on the factors responsible for its performance.

Lipreading speech-recognition +1

Domain Adversarial Neural Networks for Dysarthric Speech Recognition

no code implementations7 Oct 2020 Dominika Woszczyk, Stavros Petridis, David Millard

The results are compared to a speaker-adaptive (SA) model as well as speaker-dependent (SD) and multi-task learning models (MTL).

Multi-Task Learning speech-recognition +1

Lip-reading with Densely Connected Temporal Convolutional Networks

1 code implementation29 Sep 2020 Pingchuan Ma, Yujiang Wang, Jie Shen, Stavros Petridis, Maja Pantic

In this work, we present the Densely Connected Temporal Convolutional Network (DC-TCN) for lip-reading of isolated words.

Lip Reading

Towards Practical Lipreading with Distilled and Efficient Models

1 code implementation13 Jul 2020 Pingchuan Ma, Brais Martinez, Stavros Petridis, Maja Pantic

However, our most promising lightweight models are on par with the current state-of-the-art while showing a reduction of 8. 2x and 3. 9x in terms of computational cost and number of parameters, respectively, which we hope will enable the deployment of lipreading models in practical applications.

Knowledge Distillation Lipreading

Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision

no code implementations8 Jul 2020 Abhinav Shukla, Stavros Petridis, Maja Pantic

This enriches the audio encoder with visual information and the encoder can be used for evaluation without the visual modality.

Acoustic Scene Classification Action Recognition +3

Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?

no code implementations4 May 2020 Abhinav Shukla, Stavros Petridis, Maja Pantic

Our results demonstrate the potential of visual self-supervision for audio feature learning and suggest that joint visual and audio self-supervision leads to more informative audio representations for speech and emotion recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Lipreading using Temporal Convolutional Networks

2 code implementations23 Jan 2020 Brais Martinez, Pingchuan Ma, Stavros Petridis, Maja Pantic

We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively.

Lipreading Lip Reading

Detecting Adversarial Attacks On Audiovisual Speech Recognition

no code implementations18 Dec 2019 Pingchuan Ma, Stavros Petridis, Maja Pantic

In this work, we propose an efficient and straightforward detection method based on the temporal correlation between audio and video streams.

Audio-Visual Speech Recognition speech-recognition +1

Towards Pose-invariant Lip-Reading

no code implementations14 Nov 2019 Shiyang Cheng, Pingchuan Ma, Georgios Tzimiropoulos, Stavros Petridis, Adrian Bulat, Jie Shen, Maja Pantic

The proposed model significantly outperforms previous approaches on non-frontal views while retaining the superior performance on frontal and near frontal mouth views.

Lip Reading

Realistic Speech-Driven Facial Animation with GANs

no code implementations14 Jun 2019 Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

We present an end-to-end system that generates videos of a talking head, using only a still image of a person and an audio clip containing speech, without relying on handcrafted intermediate features.

Audio-Visual Synchronization Lip Reading

Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition

no code implementations5 Jun 2019 Pingchuan Ma, Stavros Petridis, Maja Pantic

Several audio-visual speech recognition models have been recently proposed which aim to improve the robustness over audio-only models in the presence of noise.

Audio-Visual Speech Recognition speech-recognition +1

End-to-End Visual Speech Recognition for Small-Scale Datasets

no code implementations2 Apr 2019 Stavros Petridis, Yujiang Wang, Pingchuan Ma, Zuwei Li, Maja Pantic

In this work, we present an end-to-end visual speech recognition system based on fully-connected layers and Long-Short Memory (LSTM) networks which is suitable for small-scale datasets.

General Classification speech-recognition +1

Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture

no code implementations28 Sep 2018 Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Georgios Tzimiropoulos, Maja Pantic

Therefore, we could use a CTC loss in combination with an attention-based model in order to force monotonic alignments and at the same time get rid of the conditional independence assumption.

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +3

Transfer Learning for Action Unit Recognition

no code implementations19 Jul 2018 Yen Khye Lim, Zukang Liao, Stavros Petridis, Maja Pantic

This paper presents a classifier ensemble for Facial Expression Recognition (FER) based on models derived from transfer learning.

Action Unit Detection Facial Action Unit Detection +2

End-to-End Speech-Driven Facial Animation with Temporal GANs

1 code implementation23 May 2018 Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

To the best of our knowledge, this is the first method capable of generating subject independent realistic videos directly from raw audio.

Lip Reading

A real-time and unsupervised face Re-Identification system for Human-Robot Interaction

1 code implementation10 Apr 2018 Yujiang Wang, Jie Shen, Stavros Petridis, Maja Pantic

In this paper, we present an effective and unsupervised face Re-ID system which simultaneously re-identifies multiple faces for HRI.

Face Recognition Online Clustering

Visual-Only Recognition of Normal, Whispered and Silent Speech

no code implementations18 Feb 2018 Stavros Petridis, Jie Shen, Doruk Cetin, Maja Pantic

We show that an absolute decrease in classification rate of up to 3. 7% is observed when training and testing on normal and whispered, respectively, and vice versa.

speech-recognition Visual Speech Recognition

End-to-end Audiovisual Speech Recognition

2 code implementations18 Feb 2018 Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Feipeng Cai, Georgios Tzimiropoulos, Maja Pantic

In presence of high levels of noise, the end-to-end audiovisual model significantly outperforms both audio-only models.

Lipreading speech-recognition +1

End-to-End Audiovisual Fusion with LSTMs

no code implementations12 Sep 2017 Stavros Petridis, Yujiang Wang, Zuwei Li, Maja Pantic

To the best of our knowledge, this is the first audiovisual fusion model which simultaneously learns to extract features directly from the pixels and spectrograms and perform classification of speech and nonlinguistic vocalisations.

Classification General Classification +2

End-to-End Multi-View Lipreading

no code implementations1 Sep 2017 Stavros Petridis, Yujiang Wang, Zuwei Li, Maja Pantic

To the best of our knowledge, this is the first model which simultaneously learns to extract features directly from the pixels and performs visual speech classification from multiple views and also achieves state-of-the-art performance.

General Classification Lipreading

Local Deep Neural Networks for Age and Gender Classification

no code implementations24 Mar 2017 Zukang Liao, Stavros Petridis, Maja Pantic

We tested the proposed modified local deep neural networks approach on the LFW and Adience databases for the task of gender and age classification.

Age And Gender Classification Classification +1

End-To-End Visual Speech Recognition With LSTMs

no code implementations20 Jan 2017 Stavros Petridis, Zuwei Li, Maja Pantic

Recently, several deep learning approaches have been presented which automatically extract features from the mouth images and aim to replace the feature extraction stage.

Classification General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.