no code implementations • EMNLP 2021 • Zhen Han, Gengyuan Zhang, Yunpu Ma, Volker Tresp
Various temporal knowledge graph (KG) completion models have been proposed in the recent literature.
no code implementations • 21 Nov 2023 • Gengyuan Zhang, Jinhe Bi, Jindong Gu, Yanyu Chen, Volker Tresp
This raises a question: with such weak supervision, can video representation in video-language models gain the ability to distinguish even factual discrepancies in textual description and understand fine-grained events?
1 code implementation • ICCV 2023 • Gengyuan Zhang, Jisen Ren, Jindong Gu, Volker Tresp
In this study, we introduce the Multi-event Video-Text Retrieval (MeVTR) task, addressing scenarios in which each video contains multiple different events, as a niche scenario of the conventional Video-Text Retrieval Task.
1 code implementation • 24 Jul 2023 • Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr
This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models: multimodal-to-text generation models (e. g. Flamingo), image-text matching models (e. g.
1 code implementation • 12 Jul 2023 • Gengyuan Zhang, Yurui Zhang, Kerui Zhang, Volker Tresp
This makes us wonder if, based on visual cues, Vision-Language Models that are pre-trained with large-scale image-text resources can achieve and even outperform human's capability in reasoning times and location.
no code implementations • 19 Nov 2022 • Yao Zhang, Haokun Chen, Ahmed Frikha, Yezi Yang, Denis Krompass, Gengyuan Zhang, Jindong Gu, Volker Tresp
Visual Question Answering (VQA) is a multi-discipline research task.