Search Results for author: Maja Pantic

Found 101 papers, 26 papers with code

BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition

1 code implementation2 Apr 2024 Alexandros Haliassos, Andreas Zinonos, Rodrigo Mira, Stavros Petridis, Maja Pantic

In this work, we propose BRAVEn, an extension to the recent RAVEn method, which learns speech representations entirely from raw audio-visual data.

speech-recognition Speech Recognition

Audio-visual video-to-speech synthesis with synthesized input audio

no code implementations31 Jul 2023 Triantafyllos Kefalas, Yannis Panagakis, Maja Pantic

The implicit assumption of this task is that the sound signal is either missing or contains a high amount of noise/corruption such that it is not useful for processing.

Speech Synthesis

SparseVSR: Lightweight and Noise Robust Visual Speech Recognition

no code implementations10 Jul 2023 Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Alexandros Haliassos, Stavros Petridis, Maja Pantic

We evaluate our 50% sparse model on 7 different visual noise types and achieve an overall absolute improvement of more than 2% WER compared to the dense equivalent.

speech-recognition Visual Speech Recognition

Large-scale unsupervised audio pre-training for video-to-speech synthesis

no code implementations27 Jun 2023 Triantafyllos Kefalas, Yannis Panagakis, Maja Pantic

Most established approaches to date involve a two-step process, whereby an intermediate representation from the video, such as a spectrogram, is extracted first and then passed to a vocoder to produce the raw audio.

speech-recognition Speech Recognition +1

Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models

1 code implementation15 May 2023 Antoni Bigata Casademunt, Rodrigo Mira, Nikita Drobyshev, Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

Speech-driven animation has gained significant traction in recent years, with current methods achieving near-photorealistic results.

Face Generation

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

no code implementations CVPR 2023 Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen

Furthermore, when combined with large-scale pseudo-labeled audio-visual data SynthVSR yields a new state-of-the-art VSR WER of 16. 9% using publicly available data only, surpassing the recent state-of-the-art approaches trained with 29 times more non-public machine-transcribed video data (90, 000 hours).

Lip Reading speech-recognition +1

Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels

1 code implementation25 Mar 2023 Pingchuan Ma, Alexandros Haliassos, Adriana Fernandez-Lopez, Honglie Chen, Stavros Petridis, Maja Pantic

Recently, the performance of automatic, visual, and audio-visual speech recognition (ASR, VSR, and AV-ASR, respectively) has been substantially improved, mainly due to the use of larger models and training sets.

Audio-Visual Speech Recognition Automatic Speech Recognition +4

Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation

no code implementations6 Jan 2023 Michał Stypułkowski, Konstantinos Vougioukas, Sen He, Maciej Zięba, Stavros Petridis, Maja Pantic

Talking face generation has historically struggled to produce head movements and natural facial expressions without guidance from additional reference videos.

Talking Face Generation Video Generation

Jointly Learning Visual and Auditory Speech Representations from Raw Data

1 code implementation12 Dec 2022 Alexandros Haliassos, Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Maja Pantic

We observe strong results in low- and high-resource labelled data settings when fine-tuning the visual and auditory encoders resulting from a single pre-training stage, in which the encoders are jointly trained.

 Ranked #1 on Speech Recognition on LRS2 (using extra training data)

Audio-Visual Speech Recognition Lipreading +2

LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

no code implementations20 Nov 2022 Rodrigo Mira, Buye Xu, Jacob Donley, Anurag Kumar, Stavros Petridis, Vamsi Krishna Ithapu, Maja Pantic

Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging not only the audio itself but also the target speaker's lip movements.

Speech Enhancement Speech Synthesis

FAN-Trans: Online Knowledge Distillation for Facial Action Unit Detection

no code implementations11 Nov 2022 Jing Yang, Jie Shen, Yiming Lin, Yordan Hristov, Maja Pantic

Our model consists of a hybrid network of convolution and transformer blocks to learn per-AU features and to model AU co-occurrences.

Action Unit Detection Face Alignment +2

Streaming Audio-Visual Speech Recognition with Alignment Regularization

no code implementations3 Nov 2022 Pingchuan Ma, Niko Moritz, Stavros Petridis, Christian Fuegen, Maja Pantic

In this work, we propose a streaming AV-ASR system based on a hybrid connectionist temporal classification (CTC)/attention neural network architecture.

Audio-Visual Speech Recognition Automatic Speech Recognition +5

SS-VAERR: Self-Supervised Apparent Emotional Reaction Recognition from Video

no code implementations20 Oct 2022 Marija Jegorova, Stavros Petridis, Maja Pantic

This work focuses on the apparent emotional reaction recognition (AERR) from the video-only input, conducted in a self-supervised fashion.

Self-Supervised Learning

Training Strategies for Improved Lip-reading

1 code implementation3 Sep 2022 Pingchuan Ma, Yujiang Wang, Stavros Petridis, Jie Shen, Maja Pantic

In this paper, we systematically investigate the performance of state-of-the-art data augmentation approaches, temporal models and other training strategies, like self-distillation and using word boundary indicators.

 Ranked #1 on Lipreading on Lip Reading in the Wild (using extra training data)

Data Augmentation Lipreading +1

SVTS: Scalable Video-to-Speech Synthesis

2 code implementations4 May 2022 Rodrigo Mira, Alexandros Haliassos, Stavros Petridis, Björn W. Schuller, Maja Pantic

Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip movements into the corresponding audio.

Speech Synthesis

Self-supervised Video-centralised Transformer for Video Face Clustering

no code implementations24 Mar 2022 Yujiang Wang, Mingzhi Dong, Jie Shen, Yiming Luo, Yiming Lin, Pingchuan Ma, Stavros Petridis, Maja Pantic

We also investigate face clustering in egocentric videos, a fast-emerging field that has not been studied yet in works related to face clustering.

Clustering Contrastive Learning +1

Visual Speech Recognition for Multiple Languages in the Wild

2 code implementations26 Feb 2022 Pingchuan Ma, Stavros Petridis, Maja Pantic

However, these advances are usually due to the larger training sets rather than the model design.

 Ranked #1 on Lipreading on GRID corpus (mixed-speech) (using extra training data)

Hyperparameter Optimization Lipreading +2

Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection

1 code implementation CVPR 2022 Alexandros Haliassos, Rodrigo Mira, Stavros Petridis, Maja Pantic

One of the most pressing challenges for the detection of face-manipulated videos is generalising to forgery methods not seen during training while remaining effective under common corruptions such as compression.

DeepFake Detection

Defensive Tensorization

no code implementations26 Oct 2021 Adrian Bulat, Jean Kossaifi, Sourav Bhattacharya, Yannis Panagakis, Timothy Hospedales, Georgios Tzimiropoulos, Nicholas D Lane, Maja Pantic

We propose defensive tensorization, an adversarial defence technique that leverages a latent high-order factorization of the network.

Audio Classification Image Classification

Domain Generalisation for Apparent Emotional Facial Expression Recognition across Age-Groups

no code implementations18 Oct 2021 Rafael Poyiadzi, Jie Shen, Stavros Petridis, Yujiang Wang, Maja Pantic

We then study the effect of variety and number of age-groups used during training on generalisation to unseen age-groups and observe that an increase in the number of training age-groups tends to increase the apparent emotional facial expression recognition performance on unseen age-groups.

Facial Expression Recognition Facial Expression Recognition (FER)

EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments

1 code implementation9 Jul 2021 Jacob Donley, Vladimir Tourbabin, Jung-Suk Lee, Mark Broyles, Hao Jiang, Jie Shen, Maja Pantic, Vamsi Krishna Ithapu, Ravish Mehra

In this work, we describe, evaluate and release a dataset that contains over 5 hours of multi-modal data useful for training and testing algorithms for the application of improving conversations for an AR glasses wearer.

Speech Enhancement

FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in the Wild

1 code implementation21 Jun 2021 Yiming Lin, Jie Shen, Yujiang Wang, Maja Pantic

To evaluate our method on in-the-wild data, we also introduce a new challenging large-scale benchmark called IMDB-Clean.

Age Estimation Constrained Clustering +1

LiRA: Learning Visual Speech Representations from Audio through Self-supervision

no code implementations16 Jun 2021 Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Björn W. Schuller, Maja Pantic

The large amount of audiovisual content being shared online today has drawn substantial attention to the prospect of audiovisual self-supervised learning.

Lip Reading Self-Supervised Learning +1

End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks

no code implementations27 Apr 2021 Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Björn W. Schuller, Maja Pantic

In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm.

Lip Reading Speech Synthesis

DINO: A Conditional Energy-Based GAN for Domain Translation

1 code implementation ICLR 2021 Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

Domain translation is the process of transforming data from one domain to another while preserving the common semantics.

Translation

End-to-end Audio-visual Speech Recognition with Conformers

3 code implementations12 Feb 2021 Pingchuan Ma, Stavros Petridis, Maja Pantic

In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and Convolution-augmented transformer (Conformer), that can be trained in an end-to-end manner.

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +6

RoI Tanh-polar Transformer Network for Face Parsing in the Wild

2 code implementations4 Feb 2021 Yiming Lin, Jie Shen, Yujiang Wang, Maja Pantic

Face parsing aims to predict pixel-wise labels for facial components of a target face in an image.

Face Parsing

Cauchy-Schwarz Regularized Autoencoder

no code implementations6 Jan 2021 Linh Tran, Maja Pantic, Marc Peter Deisenroth

To perform efficient inference for GMM priors, we introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.

Clustering Density Estimation

Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection

1 code implementation CVPR 2021 Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

Extensive experiments show that this simple approach significantly surpasses the state-of-the-art in terms of generalisation to unseen manipulations and robustness to perturbations, as well as shed light on the factors responsible for its performance.

DeepFake Detection Lipreading +2

Lip-reading with Densely Connected Temporal Convolutional Networks

1 code implementation29 Sep 2020 Pingchuan Ma, Yujiang Wang, Jie Shen, Stavros Petridis, Maja Pantic

In this work, we present the Densely Connected Temporal Convolutional Network (DC-TCN) for lip-reading of isolated words.

Lip Reading

Multilinear Latent Conditioning for Generating Unseen Attribute Combinations

no code implementations ICML 2020 Markos Georgopoulos, Grigorios Chrysos, Maja Pantic, Yannis Panagakis

Deep generative models rely on their inductive bias to facilitate generalization, especially for problems with high dimensional data, like images.

Attribute Inductive Bias

Towards Practical Lipreading with Distilled and Efficient Models

1 code implementation13 Jul 2020 Pingchuan Ma, Brais Martinez, Stavros Petridis, Maja Pantic

However, our most promising lightweight models are on par with the current state-of-the-art while showing a reduction of 8. 2x and 3. 9x in terms of computational cost and number of parameters, respectively, which we hope will enable the deployment of lipreading models in practical applications.

Knowledge Distillation Lipreading

Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision

no code implementations8 Jul 2020 Abhinav Shukla, Stavros Petridis, Maja Pantic

This enriches the audio encoder with visual information and the encoder can be used for evaluation without the visual modality.

Acoustic Scene Classification Action Recognition +3

Enhancing Facial Data Diversity with Style-based Face Aging

no code implementations6 Jun 2020 Markos Georgopoulos, James Oldfield, Mihalis A. Nicolaou, Yannis Panagakis, Maja Pantic

By evaluating on several age-annotated datasets in both single- and cross-database experiments, we show that the proposed method outperforms state-of-the-art algorithms for age transfer, especially in the case of age groups that lie in the tails of the label distribution.

Data Augmentation

Investigating Bias in Deep Face Analysis: The KANFace Dataset and Empirical Study

no code implementations15 May 2020 Markos Georgopoulos, Yannis Panagakis, Maja Pantic

In this work, we investigate the demographic bias of deep learning models in face recognition, age estimation, gender recognition and kinship verification.

Age Estimation Face Recognition +1

Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?

no code implementations4 May 2020 Abhinav Shukla, Stavros Petridis, Maja Pantic

Our results demonstrate the potential of visual self-supervision for audio feature learning and suggest that joint visual and audio self-supervision leads to more informative audio representations for speech and emotion recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Toward fast and accurate human pose estimation via soft-gated skip connections

3 code implementations25 Feb 2020 Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic

In addition, with a reduction of 3x in model size and complexity, we show no decrease in performance when compared to the original HourGlass network.

Pose Estimation

Lipreading using Temporal Convolutional Networks

2 code implementations23 Jan 2020 Brais Martinez, Pingchuan Ma, Stavros Petridis, Maja Pantic

We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively.

Lipreading Lip Reading

Detecting Adversarial Attacks On Audiovisual Speech Recognition

no code implementations18 Dec 2019 Pingchuan Ma, Stavros Petridis, Maja Pantic

In this work, we propose an efficient and straightforward detection method based on the temporal correlation between audio and video streams.

Audio-Visual Speech Recognition speech-recognition +1

Towards Pose-invariant Lip-Reading

no code implementations14 Nov 2019 Shiyang Cheng, Pingchuan Ma, Georgios Tzimiropoulos, Stavros Petridis, Adrian Bulat, Jie Shen, Maja Pantic

The proposed model significantly outperforms previous approaches on non-frontal views while retaining the superior performance on frontal and near frontal mouth views.

Lip Reading

Shape Constrained Network for Eye Segmentation in the Wild

no code implementations11 Oct 2019 Bingnan Luo, Jie Shen, Shiyang Cheng, Yujiang Wang, Maja Pantic

Specifically, we learn the shape prior from our dataset using VAE-GAN, and leverage the pre-trained encoder and discriminator to regularise the training of SegNet.

Segmentation Semantic Segmentation

Fast and Effective Adaptation of Facial Action Unit Detection Deep Model

no code implementations26 Sep 2019 Mihee Lee, Ognjen Rudovic, Vladimir Pavlovic, Maja Pantic

In this paper, we propose a deep learning approach for facial AU detection that can easily and in a fast manner adapt to a new AU or target subject by leveraging only a few labeled samples from the new task (either an AU or subject).

Action Unit Detection Facial Action Unit Detection +1

Defensive Tensorization: Randomized Tensor Parametrization for Robust Neural Networks

no code implementations25 Sep 2019 Adrian Bulat, Jean Kossaifi, Sourav Bhattacharya, Yannis Panagakis, Georgios Tzimiropoulos, Nicholas D. Lane, Maja Pantic

As deep neural networks become widely adopted for solving most problems in computer vision and audio-understanding, there are rising concerns about their potential vulnerability.

Adversarial Defense Audio Classification +1

AVEC 2019 Workshop and Challenge: State-of-Mind, Detecting Depression with AI, and Cross-Cultural Affect Recognition

no code implementations10 Jul 2019 Fabien Ringeval, Björn Schuller, Michel Valstar, NIcholas Cummins, Roddy Cowie, Leili Tavabi, Maximilian Schmitt, Sina Alisamir, Shahin Amiriparian, Eva-Maria Messner, Siyang Song, Shuo Liu, Ziping Zhao, Adria Mallol-Ragolta, Zhao Ren, Mohammad Soleymani, Maja Pantic

The Audio/Visual Emotion Challenge and Workshop (AVEC 2019) "State-of-Mind, Detecting Depression with AI, and Cross-cultural Affect Recognition" is the ninth competition event aimed at the comparison of multimedia processing and machine learning methods for automatic audiovisual health and emotion analysis, with all participants competing strictly under the same conditions.

Emotion Recognition

Dynamic Face Video Segmentation via Reinforcement Learning

no code implementations CVPR 2020 Yujiang Wang, Mingzhi Dong, Jie Shen, Yang Wu, Shiyang Cheng, Maja Pantic

To the best of our knowledge, this is the first work to use reinforcement learning for online key-frame decision in dynamic video segmentation, and also the first work on its application on face videos.

reinforcement-learning Reinforcement Learning (RL) +4

Realistic Speech-Driven Facial Animation with GANs

no code implementations14 Jun 2019 Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

We present an end-to-end system that generates videos of a talking head, using only a still image of a person and an audio clip containing speech, without relying on handcrafted intermediate features.

Audio-Visual Synchronization Lip Reading

Factorized Higher-Order CNNs with an Application to Spatio-Temporal Emotion Estimation

no code implementations CVPR 2020 Jean Kossaifi, Antoine Toisoul, Adrian Bulat, Yannis Panagakis, Timothy Hospedales, Maja Pantic

To alleviate this, one approach is to apply low-rank tensor decompositions to convolution kernels in order to compress the network and reduce its number of parameters.

Emotion Recognition Image Classification

Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition

no code implementations5 Jun 2019 Pingchuan Ma, Stavros Petridis, Maja Pantic

Several audio-visual speech recognition models have been recently proposed which aim to improve the robustness over audio-only models in the presence of noise.

Audio-Visual Speech Recognition speech-recognition +1

Incremental multi-domain learning with network latent tensor factorization

no code implementations12 Apr 2019 Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic

Adapting the learned classification to new domains is a hard problem due to at least three reasons: (1) the new domains and the tasks might be drastically different; (2) there might be very limited amount of annotated data on the new domain and (3) full training of a new model for each new task is prohibitive in terms of computation and memory, due to the sheer number of parameters of deep CNNs.

General Classification Image Classification +2

Improved training of binary networks for human pose estimation and image recognition

1 code implementation11 Apr 2019 Adrian Bulat, Georgios Tzimiropoulos, Jean Kossaifi, Maja Pantic

Big neural networks trained on large datasets have advanced the state-of-the-art for a large variety of challenging problems, improving performance by a large margin.

Binarization Classification with Binary Neural Network +4

End-to-End Visual Speech Recognition for Small-Scale Datasets

no code implementations2 Apr 2019 Stavros Petridis, Yujiang Wang, Pingchuan Ma, Zuwei Li, Maja Pantic

In this work, we present an end-to-end visual speech recognition system based on fully-connected layers and Long-Short Memory (LSTM) networks which is suitable for small-scale datasets.

General Classification speech-recognition +1

MeshGAN: Non-linear 3D Morphable Models of Faces

no code implementations25 Mar 2019 Shiyang Cheng, Michael Bronstein, Yuxiang Zhou, Irene Kotsia, Maja Pantic, Stefanos Zafeiriou

Generative Adversarial Networks (GANs) are currently the method of choice for generating visual data.

SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild

no code implementations9 Jan 2019 Jean Kossaifi, Robert Walecki, Yannis Panagakis, Jie Shen, Maximilian Schmitt, Fabien Ringeval, Jing Han, Vedhas Pandit, Antoine Toisoul, Bjorn Schuller, Kam Star, Elnar Hajiyev, Maja Pantic

Natural human-computer interaction and audio-visual human behaviour sensing systems, which would achieve robust performance in-the-wild are more needed than ever as digital devices are increasingly becoming an indispensable part of our life.

Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture

no code implementations28 Sep 2018 Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Georgios Tzimiropoulos, Maja Pantic

Therefore, we could use a CTC loss in combination with an attention-based model in order to force monotonic alignments and at the same time get rid of the conditional independence assumption.

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +3

Face Mask Extraction in Video Sequence

no code implementations24 Jul 2018 Yujiang Wang, Bingnan Luo, Jie Shen, Maja Pantic

Inspired by the recent development of deep network-based methods in semantic image segmentation, we introduce an end-to-end trainable model for face mask extraction in video sequence.

Image Segmentation Segmentation +1

Transfer Learning for Action Unit Recognition

no code implementations19 Jul 2018 Yen Khye Lim, Zukang Liao, Stavros Petridis, Maja Pantic

This paper presents a classifier ensemble for Facial Expression Recognition (FER) based on models derived from transfer learning.

Action Unit Detection Facial Action Unit Detection +3

4DFAB: A Large Scale 4D Database for Facial Expression Analysis and Biometric Applications

no code implementations CVPR 2018 Shiyang Cheng, Irene Kotsia, Maja Pantic, Stefanos Zafeiriou

The progress we are currently witnessing in many computer vision applications, including automatic face analysis, would not be made possible without tremendous efforts in collecting and annotating large scale visual databases.

Facial Expression Recognition Facial Expression Recognition (FER)

MobiFace: A Novel Dataset for Mobile Face Tracking in the Wild

1 code implementation24 May 2018 Yiming Lin, Shiyang Cheng, Jie Shen, Maja Pantic

36 state-of-the-art trackers, including facial landmark trackers, generic object trackers and trackers that we have fine-tuned or improved, are evaluated.

Face Detection Object Tracking +1

End-to-End Speech-Driven Facial Animation with Temporal GANs

1 code implementation23 May 2018 Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

To the best of our knowledge, this is the first method capable of generating subject independent realistic videos directly from raw audio.

Lip Reading

A real-time and unsupervised face Re-Identification system for Human-Robot Interaction

1 code implementation10 Apr 2018 Yujiang Wang, Jie Shen, Stavros Petridis, Maja Pantic

In this paper, we present an effective and unsupervised face Re-ID system which simultaneously re-identifies multiple faces for HRI.

Clustering Face Recognition +1

Visual-Only Recognition of Normal, Whispered and Silent Speech

no code implementations18 Feb 2018 Stavros Petridis, Jie Shen, Doruk Cetin, Maja Pantic

We show that an absolute decrease in classification rate of up to 3. 7% is observed when training and testing on normal and whispered, respectively, and vice versa.

speech-recognition Visual Speech Recognition

End-to-end Audiovisual Speech Recognition

2 code implementations18 Feb 2018 Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Feipeng Cai, Georgios Tzimiropoulos, Maja Pantic

In presence of high levels of noise, the end-to-end audiovisual model significantly outperforms both audio-only models.

Lipreading speech-recognition +1

Modeling of Facial Aging and Kinship: A Survey

no code implementations13 Feb 2018 Markos Georgopoulos, Yannis Panagakis, Maja Pantic

Computational facial models that capture properties of facial cues related to aging and kinship increasingly attract the attention of the research community, enabling the development of reliable methods for age progression, age estimation, age-invariant facial characterization, and kinship verification from visual data.

Age Estimation Kinship Verification

GAGAN: Geometry-Aware Generative Adversarial Networks

no code implementations CVPR 2018 Jean Kossaifi, Linh Tran, Yannis Panagakis, Maja Pantic

Deep generative models learned through adversarial training have become increasingly popular for their ability to generate naturalistic image textures.

Face Generation

End-to-End Audiovisual Fusion with LSTMs

no code implementations12 Sep 2017 Stavros Petridis, Yujiang Wang, Zuwei Li, Maja Pantic

To the best of our knowledge, this is the first audiovisual fusion model which simultaneously learns to extract features directly from the pixels and spectrograms and perform classification of speech and nonlinguistic vocalisations.

Classification General Classification +2

End-to-End Multi-View Lipreading

no code implementations1 Sep 2017 Stavros Petridis, Yujiang Wang, Zuwei Li, Maja Pantic

To the best of our knowledge, this is the first model which simultaneously learns to extract features directly from the pixels and performs visual speech classification from multiple views and also achieves state-of-the-art performance.

General Classification Lipreading

Deep Structured Learning for Facial Action Unit Intensity Estimation

no code implementations CVPR 2017 Robert Walecki, Ognjen, Rudovic, Vladimir Pavlovic, Björn Schuller, Maja Pantic

The goal of this paper is to model these structures and estimate complex feature representations simultaneously by combining conditional random field (CRF) encoded AU dependencies with deep learning.

Local Deep Neural Networks for Age and Gender Classification

no code implementations24 Mar 2017 Zukang Liao, Stavros Petridis, Maja Pantic

We tested the proposed modified local deep neural networks approach on the LFW and Adience databases for the task of gender and age classification.

Age And Gender Classification Age Classification +3

FERA 2017 - Addressing Head Pose in the Third Facial Expression Recognition and Analysis Challenge

no code implementations14 Feb 2017 Michel F. Valstar, Enrique Sánchez-Lozano, Jeffrey F. Cohn, László A. Jeni, Jeffrey M. Girard, Zheng Zhang, Lijun Yin, Maja Pantic

The FG 2017 Facial Expression Recognition and Analysis challenge (FERA 2017) extends FERA 2015 to the estimation of Action Units occurrence and intensity under different camera views.

Benchmarking Facial Action Unit Detection +4

End-To-End Visual Speech Recognition With LSTMs

no code implementations20 Jan 2017 Stavros Petridis, Zuwei Li, Maja Pantic

Recently, several deep learning approaches have been presented which automatically extract features from the mouth images and aim to replace the feature extraction stage.

Classification General Classification +2

TensorLy: Tensor Learning in Python

1 code implementation29 Oct 2016 Jean Kossaifi, Yannis Panagakis, Anima Anandkumar, Maja Pantic

In addition, using the deep-learning frameworks as backend allows users to easily design and train deep tensorized neural networks.

Multi-instance Dynamic Ordinal Random Fields for Weakly-Supervised Pain Intensity Estimation

no code implementations6 Sep 2016 Adria Ruiz, Ognjen Rudovic, Xavier Binefa, Maja Pantic

In this paper, we address the Multi-Instance-Learning (MIL) problem when bag labels are naturally represented as ordinal variables (Multi--Instance--Ordinal Regression).

Temporal Sequences

Variational Gaussian Process Auto-Encoder for Ordinal Prediction of Facial Action Units

no code implementations16 Aug 2016 Stefanos Eleftheriadis, Ognjen Rudovic, Marc P. Deisenroth, Maja Pantic

In particular, we introduce GP encoders to project multiple observed features onto a latent space, while GP decoders are responsible for reconstructing the original features.

Copula Ordinal Regression for Joint Estimation of Facial Action Unit Intensity

no code implementations CVPR 2016 Robert Walecki, Ognjen Rudovic, Vladimir Pavlovic, Maja Pantic

Joint modeling of the intensity of facial action units (AUs) from face images is challenging due to the large number of AUs (30+) and their intensity levels (6).

regression

AVEC 2016 - Depression, Mood, and Emotion Recognition Workshop and Challenge

no code implementations5 May 2016 Michel Valstar, Jonathan Gratch, Bjorn Schuller, Fabien Ringeval, Denis Lalanne, Mercedes Torres Torres, Stefan Scherer, Guiota Stratou, Roddy Cowie, Maja Pantic

The Audio/Visual Emotion Challenge and Workshop (AVEC 2016) "Depression, Mood and Emotion" will be the sixth competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological depression and emotion analysis, with all participants competing under strictly the same conditions.

Emotion Recognition

Gaussian Process Domain Experts for Model Adaptation in Facial Behavior Analysis

no code implementations11 Apr 2016 Stefanos Eleftheriadis, Ognjen Rudovic, Marc P. Deisenroth, Maja Pantic

The adaptation of the classifier is facilitated in probabilistic fashion by conditioning the target expert on multiple source experts.

Domain Adaptation Gaussian Processes +1

Robust Statistical Face Frontalization

no code implementations ICCV 2015 Christos Sagonas, Yannis Panagakis, Stefanos Zafeiriou, Maja Pantic

The proposed method is assessed in frontal face reconstruction, face landmark localization, pose-invariant face recognition, and face verification in unconstrained conditions.

Face Alignment Face Recognition +3

Variable-state Latent Conditional Random Fields for Facial Expression Recognition and Action Unit Detection

no code implementations13 Oct 2015 Robert Walecki, Ognjen Rudovic, Vladimir Pavlovic, Maja Pantic

For instance, in the case of AU detection, the goal is to discriminate between the segments of an image sequence in which this AU is active or inactive.

Action Unit Detection Facial Expression Recognition +1

Latent Trees for Estimating Intensity of Facial Action Units

no code implementations CVPR 2015 Sebastian Kaltwang, Sinisa Todorovic, Maja Pantic

Our model is a latent tree (LT) that represents input features of facial landmark points and FAU intensities as leaf nodes, and encodes their higher-order dependencies with latent nodes at tree levels closer to the root.

Face frontalization for Alignment and Recognition

no code implementations3 Feb 2015 Christos Sagonas, Yannis Panagakis, Stefanos Zafeiriou, Maja Pantic

The proposed method is assessed in frontal face reconstruction (pose correction), face landmark localization, and pose-invariant face recognition and verification by conducting experiments on $6$ facial images databases.

Face Recognition Face Reconstruction +1

RAPS: Robust and Efficient Automatic Construction of Person-Specific Deformable Models

no code implementations CVPR 2014 Christos Sagonas, Yannis Panagakis, Stefanos Zafeiriou, Maja Pantic

Next, to correct the fittings of a generic model, image congealing (i. e., batch image aliment) is performed by employing only the learnt orthonormal subspace.

Face Alignment Image Reconstruction

Merging SVMs with Linear Discriminant Analysis: A Combined Model

no code implementations CVPR 2014 Symeon Nikitidis, Stefanos Zafeiriou, Maja Pantic

A key problem often encountered by many learning algorithms in computer vision dealing with high dimensional data is the so called "curse of dimensionality" which arises when the available training samples are less than the input feature space dimensionality.

Dimensionality Reduction Object Recognition

Gauss-Newton Deformable Part Models for Face Alignment in-the-Wild

no code implementations CVPR 2014 Georgios Tzimiropoulos, Maja Pantic

To address this limitation, in this paper, we propose to jointly optimize a part-based, trained in-the-wild, flexible appearance model along with a global shape model which results in a joint translational motion model for the model parts via Gauss-Newton (GN) optimization.

Face Alignment

Incremental Face Alignment in the Wild

no code implementations CVPR 2014 Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic

We propose very efficient strategies to update the model and we show that is possible to automatically construct robust discriminative person and imaging condition specific models 'in-the-wild' that outperform state-of-the-art generic face alignment strategies.

Face Alignment

Robust Discriminative Response Map Fitting with Constrained Local Models

no code implementations CVPR 2013 Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic

We present a novel discriminative regression based approach for the Constrained Local Models (CLMs) framework, referred to as the Discriminative Response Map Fitting (DRMF) method, which shows impressive performance in the generic face fitting scenario.

regression

Robust Canonical Time Warping for the Alignment of Grossly Corrupted Sequences

no code implementations CVPR 2013 Yannis Panagakis, Mihalis A. Nicolaou, Stefanos Zafeiriou, Maja Pantic

The superiority of the proposed method against the state-of-the-art time alignment methods, namely the canonical time warping and the generalized time warping, is indicated by the experimental results on both synthetic and real datasets.

Compressive Sensing Dynamic Time Warping

A Unified Framework for Probabilistic Component Analysis

no code implementations13 Mar 2013 Mihalis A. Nicolaou, Stefanos Zafeiriou, Maja Pantic

We present a unifying framework which reduces the construction of probabilistic component analysis techniques to a mere selection of the latent neighbourhood, thus providing an elegant and principled framework for creating novel component analysis models as well as constructing probabilistic equivalents of deterministic component analysis methods.

Heteroscedastic Conditional Ordinal Random Fields for Pain Intensity Estimation from Facial Images

no code implementations22 Jan 2013 Ognjen Rudovic, Maja Pantic, Vladimir Pavlovic

We propose a novel method for automatic pain intensity estimation from facial images based on the framework of kernel Conditional Ordinal Random Fields (KCORF).

General Classification regression

Cannot find the paper you are looking for? You can Submit a new open access paper.