Search Results for author: Alexandros Haliassos

Found 9 papers, 7 papers with code

BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition

1 code implementation2 Apr 2024 Alexandros Haliassos, Andreas Zinonos, Rodrigo Mira, Stavros Petridis, Maja Pantic

In this work, we propose BRAVEn, an extension to the recent RAVEn method, which learns speech representations entirely from raw audio-visual data.

speech-recognition Speech Recognition

SparseVSR: Lightweight and Noise Robust Visual Speech Recognition

no code implementations10 Jul 2023 Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Alexandros Haliassos, Stavros Petridis, Maja Pantic

We evaluate our 50% sparse model on 7 different visual noise types and achieve an overall absolute improvement of more than 2% WER compared to the dense equivalent.

speech-recognition Visual Speech Recognition

Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels

1 code implementation25 Mar 2023 Pingchuan Ma, Alexandros Haliassos, Adriana Fernandez-Lopez, Honglie Chen, Stavros Petridis, Maja Pantic

Recently, the performance of automatic, visual, and audio-visual speech recognition (ASR, VSR, and AV-ASR, respectively) has been substantially improved, mainly due to the use of larger models and training sets.

Audio-Visual Speech Recognition Automatic Speech Recognition +4

Jointly Learning Visual and Auditory Speech Representations from Raw Data

1 code implementation12 Dec 2022 Alexandros Haliassos, Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Maja Pantic

We observe strong results in low- and high-resource labelled data settings when fine-tuning the visual and auditory encoders resulting from a single pre-training stage, in which the encoders are jointly trained.

 Ranked #1 on Speech Recognition on LRS2 (using extra training data)

Audio-Visual Speech Recognition Lipreading +2

SVTS: Scalable Video-to-Speech Synthesis

2 code implementations4 May 2022 Rodrigo Mira, Alexandros Haliassos, Stavros Petridis, Björn W. Schuller, Maja Pantic

Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip movements into the corresponding audio.

Speech Synthesis

Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection

1 code implementation CVPR 2022 Alexandros Haliassos, Rodrigo Mira, Stavros Petridis, Maja Pantic

One of the most pressing challenges for the detection of face-manipulated videos is generalising to forgery methods not seen during training while remaining effective under common corruptions such as compression.

DeepFake Detection

Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection

1 code implementation CVPR 2021 Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

Extensive experiments show that this simple approach significantly surpasses the state-of-the-art in terms of generalisation to unseen manipulations and robustness to perturbations, as well as shed light on the factors responsible for its performance.

DeepFake Detection Lipreading +2

Supervised Learning for Non-Sequential Data: A Canonical Polyadic Decomposition Approach

1 code implementation27 Jan 2020 Alexandros Haliassos, Kriton Konstantinidis, Danilo P. Mandic

However, both TT and other Tensor Networks (TNs), such as Tensor Ring and Hierarchical Tucker, are sensitive to the ordering of their indices (and hence to the features).

Recommendation Systems Tensor Networks

Cannot find the paper you are looking for? You can Submit a new open access paper.