no code implementations • ICCV 2023 • Jiashuo Fan, Yaoyuan Liang, Leyao Liu, ShaoLun Huang, Lei Zhang
We evaluate our approach on two datasets and show that our proposed RCA-NOC approach outperforms state-of-the-art methods by a large margin, demonstrating its effectiveness in improving vision-language representation for novel object captioning.