1 code implementation • 20 Sep 2022 • Timo Lohrenz, Björn Möller, Zhengyang Li, Tim Fingscheidt
The powerful modeling capabilities of all-attention-based transformer architectures often cause overfitting and - for natural language processing tasks - lead to an implicitly learned internal language model in the autoregressive transformer decoder complicating the integration of external language models.
Ranked #3 on Lipreading on LRS3-TED (using extra training data)