About

Benchmarks

No evaluation results yet. Help compare methods by submit evaluation metrics.

Datasets

Greatest papers with code

How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

17 Apr 2020georgesterpu/Sigmedia-AVSR

A recently proposed multimodal fusion strategy, AV Align, based on state-of-the-art sequence to sequence neural networks, attempts to model this relationship by explicitly aligning the acoustic and visual representations of speech.

AUDIO-VISUAL SPEECH RECOGNITION VISUAL SPEECH RECOGNITION

Discriminative Multi-modality Speech Recognition

CVPR 2020 JackSyu/Discriminative-Multi-modality-Speech-Recognition

Vision is often used as a complementary modality for audio speech recognition (ASR), especially in the noisy environment where performance of solo audio modality significantly deteriorates.

AUDIO-VISUAL SPEECH RECOGNITION LIPREADING SPEECH RECOGNITION

AV Taris: Online Audio-Visual Speech Recognition

14 Dec 2020georgesterpu/Taris

In recent years, Automatic Speech Recognition (ASR) technology has approached human-level performance on conversational speech under relatively clean listening conditions.

ACTION DETECTION ACTIVITY DETECTION AUDIO-VISUAL SPEECH RECOGNITION VISUAL SPEECH RECOGNITION

Should we hard-code the recurrence concept or learn it instead ? Exploring the Transformer architecture for Audio-Visual Speech Recognition

19 May 2020georgesterpu/Taris

The audio-visual speech fusion strategy AV Align has shown significant performance improvements in audio-visual speech recognition (AVSR) on the challenging LRS2 dataset.

AUDIO-VISUAL SPEECH RECOGNITION VISUAL SPEECH RECOGNITION

Deep Audio-Visual Speech Recognition

6 Sep 2018amitai1992/AutomatedLipReading

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

 Ranked #1 on Lipreading on LRS2 (using extra training data)

AUDIO-VISUAL SPEECH RECOGNITION LIPREADING LIP READING VISUAL SPEECH RECOGNITION

Recurrent Neural Network Transducer for Audio-Visual Speech Recognition

8 Nov 2019around-star/Speech-Recognition

This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture.

AUDIO-VISUAL SPEECH RECOGNITION VISUAL SPEECH RECOGNITION