1 code implementation • 22 Jul 2022 • Yoad Tewel, Yoav Shalev, Roy Nadler, Idan Schwartz, Lior Wolf
We introduce a zero-shot video captioning method that employs two frozen networks: the GPT-2 language model and the CLIP image-text matching model.
Image Captioning Image-text matching +6