Search Results for author: Triantafyllos Afouras

Found 22 papers, 8 papers with code

Scaling up sign spotting through sign language dictionaries

no code implementations9 May 2022 Gül Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

The focus of this work is $\textit{sign spotting}$ - given a video of an isolated sign, our task is to identify $\textit{whether}$ and $\textit{where}$ it has been signed in a continuous, co-articulated sign language video.

Multiple Instance Learning

Audio-Visual Synchronisation in the wild

no code implementations8 Dec 2021 Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

Finally, we set the first benchmark for general audio-visual synchronisation with over 160 diverse classes in the new VGG-Sound Sync video dataset.

Lip Reading

BBC-Oxford British Sign Language Dataset

no code implementations5 Nov 2021 Samuel Albanie, Gül Varol, Liliane Momeni, Hannah Bull, Triantafyllos Afouras, Himel Chowdhury, Neil Fox, Bencie Woll, Rob Cooper, Andrew McParland, Andrew Zisserman

In this work, we introduce the BBC-Oxford British Sign Language (BOBSL) dataset, a large-scale video collection of British Sign Language (BSL).

Sign Language Translation Translation

Visual Keyword Spotting with Attention

1 code implementation29 Oct 2021 K R Prajwal, Liliane Momeni, Triantafyllos Afouras, Andrew Zisserman

In this paper, we consider the task of spotting spoken keywords in silent video sequences -- also known as visual keyword spotting.

Lip Reading Visual Keyword Spotting

Sub-word Level Lip Reading With Visual Attention

no code implementations14 Oct 2021 K R Prajwal, Triantafyllos Afouras, Andrew Zisserman

To this end, we make the following contributions: (1) we propose an attention-based pooling mechanism to aggregate visual speech representations; (2) we use sub-word units for lip reading for the first time and show that this allows us to better model the ambiguities of the task; (3) we propose a model for Visual Speech Detection (VSD), trained on top of the lip reading network.

 Ranked #1 on Lipreading on LRS2 (using extra training data)

Audio-Visual Active Speaker Detection Automatic Speech Recognition +3

Read and Attend: Temporal Localisation in Sign Language Videos

no code implementations CVPR 2021 Gül Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

Our contributions are as follows: (1) we demonstrate the ability to leverage large quantities of continuous signing videos with weakly-aligned subtitles to localise signs in continuous sign language; (2) we employ the learned attention to automatically generate hundreds of thousands of annotations for a large sign vocabulary; (3) we collect a set of 37K manually verified sign instances across a vocabulary of 950 sign classes to support our study of sign language recognition; (4) by training on the newly annotated data from our method, we outperform the prior state of the art on the BSL-1K sign language recognition benchmark.

Sign Language Recognition

Watch, read and lookup: learning to spot signs from multiple supervisors

1 code implementation8 Oct 2020 Liliane Momeni, Gül Varol, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

The focus of this work is sign spotting - given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video.

Multiple Instance Learning

Seeing wake words: Audio-visual Keyword Spotting

1 code implementation2 Sep 2020 Liliane Momeni, Triantafyllos Afouras, Themos Stafylakis, Samuel Albanie, Andrew Zisserman

The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio.

Lip Reading Visual Keyword Spotting

BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

1 code implementation ECCV 2020 Samuel Albanie, Gül Varol, Liliane Momeni, Triantafyllos Afouras, Joon Son Chung, Neil Fox, Andrew Zisserman

Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality.

Action Classification Keyword Spotting +2

Spot the conversation: speaker diarisation in the wild

no code implementations2 Jul 2020 Joon Son Chung, Jaesung Huh, Arsha Nagrani, Triantafyllos Afouras, Andrew Zisserman

Finally, we use this pipeline to create a large-scale diarisation dataset called VoxConverse, collected from 'in the wild' videos, which we will release publicly to the research community.

Speaker Verification

ASR is all you need: cross-modal distillation for lip reading

no code implementations28 Nov 2019 Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman

The goal of this work is to train strong models for visual speech recognition without requiring human annotated ground truth data.

Ranked #10 on Lipreading on LRS2 (using extra training data)

Automatic Speech Recognition Frame +3

My lips are concealed: Audio-visual speech enhancement through obstructions

no code implementations11 Jul 2019 Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman

To this end we introduce a deep audio-visual speech enhancement network that is able to separate a speaker's voice by conditioning on both the speaker's lip movements and/or a representation of their voice.

Speech Enhancement

Deep Lip Reading: a comparison of models and an online application

no code implementations15 Jun 2018 Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman

The goal of this paper is to develop state-of-the-art models for lip reading -- visual speech recognition.

Lip Reading Visual Speech Recognition

The Conversation: Deep Audio-Visual Speech Enhancement

no code implementations11 Apr 2018 Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman

Our goal is to isolate individual speakers from multi-talker simultaneous speech in videos.

Speech Enhancement

Counterfactual Multi-Agent Policy Gradients

6 code implementations24 May 2017 Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson

COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.

Autonomous Vehicles Starcraft

Cannot find the paper you are looking for? You can Submit a new open access paper.