Combining Residual Networks with LSTMs for Lipreading

12 Mar 2017Themos StafylakisGeorgios Tzimiropoulos

We propose an end-to-end deep learning architecture for word-level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Lipreading Lip Reading in the Wild 3D Conv + ResNet-34 + BLSTM Top-1 Accuracy 83.00 # 7