|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
It has shown a large variation in this benchmark in several aspects, including the number of samples in each class, video resolution, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up.
Ranked #8 on Lipreading on CAS-VSR-W1k (LRW-1000)
Considering the non-negligible effects of these strategies and the existing tough status to train an effective lip reading model, we perform a comprehensive quantitative study and comparative analysis, for the first time, to show the effects of several different choices for lip reading.
Ranked #1 on Lipreading on CAS-VSR-W1k (LRW-1000) (using extra training data)
Vision is often used as a complementary modality for audio speech recognition (ASR), especially in the noisy environment where performance of solo audio modality significantly deteriorates.
Ranked #5 on Lipreading on Lip Reading in the Wild
Observing on the continuity in adjacent frames in the speaking process, and the consistency of the motion patterns among different speakers when they pronounce the same phoneme, we model the lip movements in the speaking process as a sequence of apparent deformations in the lip region.
Ranked #4 on Lipreading on CAS-VSR-W1k (LRW-1000)