TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Lipreading	CAS-VSR-W1k (LRW-1000)	GLMIM	Top-1 Accuracy	38.79%	# 8
Lipreading	Lip Reading in the Wild	3D Conv + ResNet-18 + Bi-GRU	Top-1 Accuracy	84.41	# 13

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mutual-information-maximization-for-effective/lipreading-on-lrw-1000)](https://paperswithcode.com/sota/lipreading-on-lrw-1000?p=mutual-information-maximization-for-effective)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mutual-information-maximization-for-effective/lipreading-on-lip-reading-in-the-wild)](https://paperswithcode.com/sota/lipreading-on-lip-reading-in-the-wild?p=mutual-information-maximization-for-effective)`

Mutual Information Maximization for Effective Lip Reading

13 Mar 2020 · Xing Zhao, Shuang Yang, Shiguang Shan, Xilin Chen ·

Lip reading has received an increasing research interest in recent years due to the rapid development of deep learning and its widespread potential applications. One key point to obtain good performance for the lip reading task depends heavily on how effective the representation can be to capture the lip movement information and meanwhile to resist the noises resulted from the change of pose, lighting conditions, speaker's appearance and so on. Towards this target, we propose to introduce the mutual information constraints on both the local feature's level and the global sequence's level to enhance the relations of the features with the speech content. On the one hand, we constraint the features generated at each time step to enable them carry a strong relation with the speech content by imposing the local mutual information maximization constraint (LMIM), leading to improvements over the model's ability to discover fine-grained lip movements and the fine-grained differences among words with similar pronunciation, such as ``spend'' and ``spending''. On the other hand, we introduce the mutual information maximization constraint on the global sequence's level (GMIM), to make the model be able to pay more attention to discriminate key frames related with the speech content, and less to various noises appeared in the speaking process. By combining these two advantages together, the proposed method is expected to be both discriminative and robust for effective lip reading. To verify this method, we evaluate on two large-scale benchmark. We perform a detailed analysis and comparison on several aspects, including the comparison of the LMIM and GMIM with the baseline, the visualization of the learned representation and so on. The results not only prove the effectiveness of the proposed method but also report new state-of-the-art performance on both the two benchmarks.

PDF Abstract

Code

Add Remove Mark official

xing96/MIM-lipreading official

Tasks

Add Remove

Lipreading

Lip Reading

Datasets

LRW

CAS-VSR-W1k (LRW-1000)

Results from the Paper

Edit

Ranked #8 on Lipreading on CAS-VSR-W1k (LRW-1000)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Lipreading	CAS-VSR-W1k (LRW-1000)	GLMIM	Top-1 Accuracy	38.79%	# 8		Compare
Lipreading	Lip Reading in the Wild	3D Conv + ResNet-18 + Bi-GRU	Top-1 Accuracy	84.41	# 13		Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Mutual Information Maximization for Effective Lip Reading

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove