Visual Speech Recognition

41 papers with code • 2 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Combining Residual Networks with LSTMs for Lipreading

tstafylakis/Lipreading-ResNet 12 Mar 2017

We propose an end-to-end deep learning architecture for word-level visual speech recognition.

Deep Audio-Visual Speech Recognition

lordmartian/deep_avsr 6 Sep 2018

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

End-to-end Audio-visual Speech Recognition with Conformers

zziz/pwc 12 Feb 2021

In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and Convolution-augmented transformer (Conformer), that can be trained in an end-to-end manner.

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

Fengdalu/Lipreading-DenseNet3D 16 Oct 2018

It has shown a large variation in this benchmark in several aspects, including the number of samples in each class, video resolution, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up.

Visual Speech Recognition for Multiple Languages in the Wild

mpc001/Visual_Speech_Recognition_for_Multiple_Languages 26 Feb 2022

However, these advances are usually due to the larger training sets rather than the model design.

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

exgc/avmust-ted ICCV 2023

However, despite researchers exploring cross-lingual translation techniques such as machine translation and audio speech translation to overcome language barriers, there is still a shortage of cross-lingual studies on visual speech.

The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023

mkt-dataoceanai/cnvsrc2023baseline 7 Jan 2024

This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023, engaging in the fixed and open tracks of Single-Speaker VSR Task, and the open track of Multi-Speaker VSR Task.

Lip Reading Sentences in the Wild

parambadiger/Lip-Reading CVPR 2017

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

Deep word embeddings for visual speech recognition

tstafylakis/Lipreading-ResNet 30 Oct 2017

In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition.

Zero-shot keyword spotting for visual speech recognition in-the-wild

lilianemomeni/KWS-Net ECCV 2018

Visual keyword spotting (KWS) is the problem of estimating whether a text query occurs in a given recording using only video information.