2 code implementations • ICCV 2021 • Yuecong Min, Aiming Hao, Xiujuan Chai, Xilin Chen
Specifically, the proposed VAC comprises two auxiliary losses: one focuses on visual features only, and the other enforces prediction alignment between the feature extractor and the alignment module.
1 code implementation • ICCV 2021 • Aiming Hao, Yuecong Min, Xilin Chen
Currently, a typical network combination for CSLR includes a visual module, which focuses on spatial and short-temporal information, followed by a contextual module, which focuses on long-temporal information, and the Connectionist Temporal Classification (CTC) loss is adopted to train the network.