no code implementations • 27 Nov 2020 • Sijie Mai, Songlong Xing, Jiaxuan He, Ying Zeng, Haifeng Hu
A majority of existing works generally focus on aligned fusion, mostly at word level, of the three modalities to accomplish this task, which is impractical in real-world scenarios.