no code implementations • ICCV 2019 • Gilad Vered, Gal Oren, Yuval Atzmon, Gal Chechik
This can be achieved by training two networks: a "speaker" that generates sentences given an image and a "listener" that uses them to perform a task.
no code implementations • 26 Jul 2019 • Gilad Vered, Gal Oren, Yuval Atzmon, Gal Chechik
Second, we show that the generated descriptions can be kept close to natural by constraining them to be similar to human descriptions.