no code implementations • CVPR 2023 • Duc Minh Vo, Quoc-An Luong, Akihiro Sugimoto, Hideki Nakayama
Humans possess the capacity to reason about the future based on a sparse collection of visual cues acquired over time.
Image Captioning Language Modelling +1