1 code implementation • 9 Aug 2021 • Sundesh Gupta, Aditya Shah, Miten Shah, Laribok Syiemlieh, Chandresh Maurya
We propose a novel architecture that uses the RoBERTa model with a co-attention layer on top to incorporate context incongruity between input text and image attributes.