no code implementations • 12 Mar 2022 • Fuhai Chen, Xuri Ge, Xiaoshuai Sun, Yue Gao, Jianzhuang Liu, Fufeng Chen, Wenjie Li
The key of referring expression comprehension lies in capturing the cross-modal visual-linguistic relevance.
Attribute Object +2