TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Lipreading	Lip Reading in the Wild	3D Conv + ResNet-34 + Bi-LSTM	Top-1 Accuracy	83.00	# 19

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/combining-residual-networks-with-lstms-for/lipreading-on-lip-reading-in-the-wild)](https://paperswithcode.com/sota/lipreading-on-lip-reading-in-the-wild?p=combining-residual-networks-with-lstms-for)`

Combining Residual Networks with LSTMs for Lipreading

12 Mar 2017 · Themos Stafylakis, Georgios Tzimiropoulos ·

We propose an end-to-end deep learning architecture for word-level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We train and evaluate it on the Lipreading In-The-Wild benchmark, a challenging database of 500-size target-words consisting of 1.28sec video excerpts from BBC TV broadcasts. The proposed network attains word accuracy equal to 83.0, yielding 6.8 absolute improvement over the current state-of-the-art, without using information about word boundaries during training or testing.

PDF Abstract