no code implementations • CVPR 2018 • Qingxing Cao, Xiaodan Liang, Bailing Li, Guanbin Li, Liang Lin
This network comprises of two collaborative modules: i) an adversarial attention module to exploit the local visual evidence for each word parsed from the question; ii) a residual composition module to compose the previously mined evidence.