About

Benchmarks

You can find evaluation results in the subtasks. You can also submitting evaluation metrics for this task.

Subtasks

Datasets

Greatest papers with code

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

16 Oct 2018Fengdalu/Lipreading-DenseNet3D

It has shown a large variation in this benchmark in several aspects, including the number of samples in each class, video resolution, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up.

LIPREADING LIP READING VISUAL SPEECH RECOGNITION

Deep word embeddings for visual speech recognition

30 Oct 2017tstafylakis/Lipreading-ResNet

In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition.

LIPREADING VISUAL SPEECH RECOGNITION WORD EMBEDDINGS

Combining Residual Networks with LSTMs for Lipreading

12 Mar 2017tstafylakis/Lipreading-ResNet

We propose an end-to-end deep learning architecture for word-level visual speech recognition.

LIPREADING LIP READING VISUAL SPEECH RECOGNITION

How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

17 Apr 2020georgesterpu/Sigmedia-AVSR

A recently proposed multimodal fusion strategy, AV Align, based on state-of-the-art sequence to sequence neural networks, attempts to model this relationship by explicitly aligning the acoustic and visual representations of speech.

AUDIO-VISUAL SPEECH RECOGNITION VISUAL SPEECH RECOGNITION

Learn an Effective Lip Reading Model without Pains

15 Nov 2020Fengdalu/learn-an-effective-lip-reading-model-without-pains

Considering the non-negligible effects of these strategies and the existing tough status to train an effective lip reading model, we perform a comprehensive quantitative study and comparative analysis, for the first time, to show the effects of several different choices for lip reading.

 Ranked #1 on Lipreading on CAS-VSR-W1k (LRW-1000) (using extra training data)

LIPREADING LIP READING VISUAL SPEECH RECOGNITION

Zero-shot keyword spotting for visual speech recognition in-the-wild

ECCV 2018 lilianemomeni/KWS-Net

Visual keyword spotting (KWS) is the problem of estimating whether a text query occurs in a given recording using only video information.

KEYWORD SPOTTING VISUAL SPEECH RECOGNITION

AV Taris: Online Audio-Visual Speech Recognition

14 Dec 2020georgesterpu/Taris

In recent years, Automatic Speech Recognition (ASR) technology has approached human-level performance on conversational speech under relatively clean listening conditions.

ACTION DETECTION ACTIVITY DETECTION AUDIO-VISUAL SPEECH RECOGNITION VISUAL SPEECH RECOGNITION

Should we hard-code the recurrence concept or learn it instead ? Exploring the Transformer architecture for Audio-Visual Speech Recognition

19 May 2020georgesterpu/Taris

The audio-visual speech fusion strategy AV Align has shown significant performance improvements in audio-visual speech recognition (AVSR) on the challenging LRS2 dataset.

AUDIO-VISUAL SPEECH RECOGNITION VISUAL SPEECH RECOGNITION

Harnessing GANs for Zero-shot Learning of New Classes in Visual Speech Recognition

29 Jan 2019midas-research/DECA

To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen class samples increases the accuracy of a VSR system by a significant margin of 27% and allows it to handle speaker-independent out-of-vocabulary phrases.

VISUAL SPEECH RECOGNITION ZERO-SHOT LEARNING

Deep Audio-Visual Speech Recognition

6 Sep 2018amitai1992/AutomatedLipReading

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

 Ranked #1 on Lipreading on LRS2 (using extra training data)

AUDIO-VISUAL SPEECH RECOGNITION LIPREADING LIP READING VISUAL SPEECH RECOGNITION