Search Results for author: Triantafyllos Afouras

Found 25 papers, 9 papers with code

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

5 code implementations • ICML 2017 • Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H. S. Torr, Pushmeet Kohli, Shimon Whiteson

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems.

Multi-agent Reinforcement Learning Q-Learning +3

351

Paper
Code

Counterfactual Multi-Agent Policy Gradients

6 code implementations • 24 May 2017 • Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson

COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.

Ranked #1 on SMAC+ on Off_Superhard_parallel

Autonomous Vehicles counterfactual +2

2,534

Paper
Code

The Conversation: Deep Audio-Visual Speech Enhancement

no code implementations • 11 Apr 2018 • Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman

Our goal is to isolate individual speakers from multi-talker simultaneous speech in videos.

Speech Enhancement

Paper
Add Code

Deep Lip Reading: a comparison of models and an online application

no code implementations • 15 Jun 2018 • Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman

The goal of this paper is to develop state-of-the-art models for lip reading -- visual speech recognition.

Language Modelling Lip Reading +2

Paper
Add Code

LRS3-TED: a large-scale dataset for visual speech recognition

no code implementations • 3 Sep 2018 • Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman

This paper introduces a new multi-modal dataset for visual and audio-visual speech recognition.

Audio-Visual Speech Recognition speech-recognition +2

Paper
Add Code

Deep Audio-Visual Speech Recognition

4 code implementations • 6 Sep 2018 • Triantafyllos Afouras, Joon Son Chung, Andrew Senior, Oriol Vinyals, Andrew Zisserman

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

Ranked #6 on Audio-Visual Speech Recognition on LRS2

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +4

186

Paper
Code

My lips are concealed: Audio-visual speech enhancement through obstructions

no code implementations • 11 Jul 2019 • Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman

To this end we introduce a deep audio-visual speech enhancement network that is able to separate a speaker's voice by conditioning on both the speaker's lip movements and/or a representation of their voice.

Speech Enhancement

Paper
Add Code

ASR is all you need: cross-modal distillation for lip reading

no code implementations • 28 Nov 2019 • Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman

The goal of this work is to train strong models for visual speech recognition without requiring human annotated ground truth data.

Ranked #14 on Lipreading on LRS3-TED (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Spot the conversation: speaker diarisation in the wild

no code implementations • 2 Jul 2020 • Joon Son Chung, Jaesung Huh, Arsha Nagrani, Triantafyllos Afouras, Andrew Zisserman

Finally, we use this pipeline to create a large-scale diarisation dataset called VoxConverse, collected from 'in the wild' videos, which we will release publicly to the research community.

Speaker Verification

Paper
Add Code

BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

1 code implementation • ECCV 2020 • Samuel Albanie, Gül Varol, Liliane Momeni, Triantafyllos Afouras, Joon Son Chung, Neil Fox, Andrew Zisserman

Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality.

Ranked #4 on Sign Language Recognition on WLASL-2000

Action Classification Keyword Spotting +2

Paper
Code

Self-Supervised Learning of Audio-Visual Objects from Video

1 code implementation • ECCV 2020 • Triantafyllos Afouras, Andrew Owens, Joon Son Chung, Andrew Zisserman

Our objective is to transform a video into a set of discrete audio-visual objects using self-supervised learning.

Face Detection Optical Flow Estimation +1

110

Paper
Code

Seeing wake words: Audio-visual Keyword Spotting

1 code implementation • 2 Sep 2020 • Liliane Momeni, Triantafyllos Afouras, Themos Stafylakis, Samuel Albanie, Andrew Zisserman

The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio.

Lip Reading Visual Keyword Spotting

Paper
Code

Watch, read and lookup: learning to spot signs from multiple supervisors

1 code implementation • 8 Oct 2020 • Liliane Momeni, Gül Varol, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

The focus of this work is sign spotting - given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video.

Multiple Instance Learning

Paper
Code

Read and Attend: Temporal Localisation in Sign Language Videos

no code implementations • CVPR 2021 • Gül Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

Our contributions are as follows: (1) we demonstrate the ability to leverage large quantities of continuous signing videos with weakly-aligned subtitles to localise signs in continuous sign language; (2) we employ the learned attention to automatically generate hundreds of thousands of annotations for a large sign vocabulary; (3) we collect a set of 37K manually verified sign instances across a vocabulary of 950 sign classes to support our study of sign language recognition; (4) by training on the newly annotated data from our method, we outperform the prior state of the art on the BSL-1K sign language recognition benchmark.

Sign Language Recognition

Paper
Add Code

Localizing Visual Sounds the Hard Way

1 code implementation • CVPR 2021 • Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

We show that our algorithm achieves state-of-the-art performance on the popular Flickr SoundNet dataset.

Contrastive Learning

Paper
Code

Self-supervised object detection from audio-visual correspondence

no code implementations • CVPR 2022 • Triantafyllos Afouras, Yuki M. Asano, Francois Fagan, Andrea Vedaldi, Florian Metze

We tackle the problem of learning object detectors without supervision.

Object object-detection +1

Paper
Add Code

Aligning Subtitles in Sign Language Videos

no code implementations • ICCV 2021 • Hannah Bull, Triantafyllos Afouras, Gül Varol, Samuel Albanie, Liliane Momeni, Andrew Zisserman

The goal of this work is to temporally align asynchronous subtitles in sign language videos.

Machine Translation Translation

Paper
Add Code

Sub-word Level Lip Reading With Visual Attention

no code implementations • CVPR 2022 • K R Prajwal, Triantafyllos Afouras, Andrew Zisserman

To this end, we make the following contributions: (1) we propose an attention-based pooling mechanism to aggregate visual speech representations; (2) we use sub-word units for lip reading for the first time and show that this allows us to better model the ambiguities of the task; (3) we propose a model for Visual Speech Detection (VSD), trained on top of the lip reading network.

Ranked #1 on Visual Speech Recognition on LRS2 (using extra training data)

Audio-Visual Active Speaker Detection Automatic Speech Recognition +5

Paper
Add Code

Visual Keyword Spotting with Attention

1 code implementation • 29 Oct 2021 • K R Prajwal, Liliane Momeni, Triantafyllos Afouras, Andrew Zisserman

In this paper, we consider the task of spotting spoken keywords in silent video sequences -- also known as visual keyword spotting.

Ranked #1 on Visual Keyword Spotting on LRS2

Lip Reading Visual Keyword Spotting

Paper
Code

BBC-Oxford British Sign Language Dataset

no code implementations • 5 Nov 2021 • Samuel Albanie, Gül Varol, Liliane Momeni, Hannah Bull, Triantafyllos Afouras, Himel Chowdhury, Neil Fox, Bencie Woll, Rob Cooper, Andrew McParland, Andrew Zisserman

In this work, we introduce the BBC-Oxford British Sign Language (BOBSL) dataset, a large-scale video collection of British Sign Language (BSL).

Sign Language Translation Translation

Paper
Add Code

Audio-Visual Synchronisation in the wild

no code implementations • 8 Dec 2021 • Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

Finally, we set the first benchmark for general audio-visual synchronisation with over 160 diverse classes in the new VGG-Sound Sync video dataset.

Lip Reading

Paper
Add Code

Reading To Listen at the Cocktail Party: Multi-Modal Speech Separation

no code implementations • CVPR 2022 • Akam Rahimi, Triantafyllos Afouras, Andrew Zisserman

The goal of this paper is speech separation and enhancement in multi-speaker and noisy environments using a combination of different modalities.

Sentence Speech Separation

Paper
Add Code

Scaling up sign spotting through sign language dictionaries

no code implementations • 9 May 2022 • Gül Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

The focus of this work is $\textit{sign spotting}$ - given a video of an isolated sign, our task is to identify $\textit{whether}$ and $\textit{where}$ it has been signed in a continuous, co-articulated sign language video.

Multiple Instance Learning

Paper
Add Code

Learning to Ground Instructional Articles in Videos through Narrations

no code implementations • ICCV 2023 • Effrosyni Mavroudi, Triantafyllos Afouras, Lorenzo Torresani

To deal with the scarcity of labeled data at scale, we source the step descriptions from a language knowledge base (wikiHow) containing instructional articles for a large variety of procedural tasks.

Video Alignment

Paper
Add Code

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

no code implementations • 30 Nov 2023 • Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.

Video Understanding

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.