VOCASET is a 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio.
43 PAPERS • 1 BENCHMARK
The German Lipreading dataset consists of 250,000 publicly available videos of the faces of speakers of the Hessian Parliament, which was processed for word-level lip reading using an automatic pipeline
5 PAPERS • NO BENCHMARKS YET