Visual Speech Recognition

40 papers with code • 2 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Speech Recognition

Trend	Dataset	Best Model	Paper	Code	Compare
	LRS3-TED	CTC/Attention			See all
	LRS2	VTP with more data			See all

Datasets

Subtasks

Lip to Speech Synthesis

Most implemented papers

Most implemented Social Latest No code

Combining Residual Networks with LSTMs for Lipreading

tstafylakis/Lipreading-ResNet • • 12 Mar 2017

We propose an end-to-end deep learning architecture for word-level visual speech recognition.

Paper
Code

Deep Audio-Visual Speech Recognition

lordmartian/deep_avsr • • 6 Sep 2018

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

Paper
Code

End-to-end Audio-visual Speech Recognition with Conformers

zziz/pwc • • 12 Feb 2021

In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and Convolution-augmented transformer (Conformer), that can be trained in an end-to-end manner.

Paper
Code

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

Fengdalu/Lipreading-DenseNet3D • • 16 Oct 2018

It has shown a large variation in this benchmark in several aspects, including the number of samples in each class, video resolution, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up.

Paper
Code

Visual Speech Recognition for Multiple Languages in the Wild

mpc001/Visual_Speech_Recognition_for_Multiple_Languages • • 26 Feb 2022

However, these advances are usually due to the larger training sets rather than the model design.

Paper
Code

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

exgc/avmust-ted • ICCV 2023

However, despite researchers exploring cross-lingual translation techniques such as machine translation and audio speech translation to overcome language barriers, there is still a shortage of cross-lingual studies on visual speech.

Paper
Code

The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023

mkt-dataoceanai/cnvsrc2023baseline • • 7 Jan 2024

This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023, engaging in the fixed and open tracks of Single-Speaker VSR Task, and the open track of Multi-Speaker VSR Task.

Paper
Code