1 code implementation • 2 Apr 2024 • Alexandros Haliassos, Andreas Zinonos, Rodrigo Mira, Stavros Petridis, Maja Pantic
In this work, we propose BRAVEn, an extension to the recent RAVEn method, which learns speech representations entirely from raw audio-visual data.
no code implementations • 10 Jul 2023 • Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Alexandros Haliassos, Stavros Petridis, Maja Pantic
We evaluate our 50% sparse model on 7 different visual noise types and achieve an overall absolute improvement of more than 2% WER compared to the dense equivalent.
1 code implementation • 25 Mar 2023 • Pingchuan Ma, Alexandros Haliassos, Adriana Fernandez-Lopez, Honglie Chen, Stavros Petridis, Maja Pantic
Recently, the performance of automatic, visual, and audio-visual speech recognition (ASR, VSR, and AV-ASR, respectively) has been substantially improved, mainly due to the use of larger models and training sets.
Ranked #1 on Automatic Speech Recognition (ASR) on LRS3-TED
Audio-Visual Speech Recognition Automatic Speech Recognition +4
no code implementations • 14 Mar 2023 • Andreas Zinonos, Alexandros Haliassos, Pingchuan Ma, Stavros Petridis, Maja Pantic
Cross-lingual self-supervised learning has been a growing research topic in the last few years.
1 code implementation • 12 Dec 2022 • Alexandros Haliassos, Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Maja Pantic
We observe strong results in low- and high-resource labelled data settings when fine-tuning the visual and auditory encoders resulting from a single pre-training stage, in which the encoders are jointly trained.
Ranked #1 on Speech Recognition on LRS2 (using extra training data)
2 code implementations • 4 May 2022 • Rodrigo Mira, Alexandros Haliassos, Stavros Petridis, Björn W. Schuller, Maja Pantic
Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip movements into the corresponding audio.
1 code implementation • CVPR 2022 • Alexandros Haliassos, Rodrigo Mira, Stavros Petridis, Maja Pantic
One of the most pressing challenges for the detection of face-manipulated videos is generalising to forgery methods not seen during training while remaining effective under common corruptions such as compression.
Ranked #2 on DeepFake Detection on FakeAVCeleb
1 code implementation • CVPR 2021 • Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, Maja Pantic
Extensive experiments show that this simple approach significantly surpasses the state-of-the-art in terms of generalisation to unseen manipulations and robustness to perturbations, as well as shed light on the factors responsible for its performance.
Ranked #5 on DeepFake Detection on FakeAVCeleb
1 code implementation • 27 Jan 2020 • Alexandros Haliassos, Kriton Konstantinidis, Danilo P. Mandic
However, both TT and other Tensor Networks (TNs), such as Tensor Ring and Hierarchical Tucker, are sensitive to the ordering of their indices (and hence to the features).