no code implementations • 5 Jun 2020 • Bofan Xue, David Chan, John Canny
We present a new publicly available dataset with the goal of advancing multi-modality learning by offering vision and language data within the same context.
Image Captioning Image Classification +2