Search Results for author: Gengyuan Zhang

Found 6 papers, 3 papers with code

Time-dependent Entity Embedding is not All You Need: A Re-evaluation of Temporal Knowledge Graph Completion Models under a Unified Framework

no code implementations • EMNLP 2021 • Zhen Han, Gengyuan Zhang, Yunpu Ma, Volker Tresp

Various temporal knowledge graph (KG) completion models have been proposed in the recent literature.

Entity Embeddings Knowledge Graph Completion +1

Paper
Add Code

SPOT! Revisiting Video-Language Models for Event Understanding

no code implementations • 21 Nov 2023 • Gengyuan Zhang, Jinhe Bi, Jindong Gu, Yanyu Chen, Volker Tresp

This raises a question: with such weak supervision, can video representation in video-language models gain the ability to distinguish even factual discrepancies in textual description and understand fine-grained events?

Attribute Video Understanding

Paper
Add Code

Multi-event Video-Text Retrieval

1 code implementation • ICCV 2023 • Gengyuan Zhang, Jisen Ren, Jindong Gu, Volker Tresp

In this study, we introduce the Multi-event Video-Text Retrieval (MeVTR) task, addressing scenarios in which each video contains multiple different events, as a niche scenario of the conventional Video-Text Retrieval Task.

Language Modelling Retrieval +2

Paper
Code

A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

1 code implementation • 24 Jul 2023 • Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr

This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models: multimodal-to-text generation models (e. g. Flamingo), image-text matching models (e. g.

Image-text matching Language Modelling +4

257

Paper
Code

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

1 code implementation • 12 Jul 2023 • Gengyuan Zhang, Yurui Zhang, Kerui Zhang, Volker Tresp

This makes us wonder if, based on visual cues, Vision-Language Models that are pre-trained with large-scale image-text resources can achieve and even outperform human's capability in reasoning times and location.

Paper
Code

CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering

no code implementations • 19 Nov 2022 • Yao Zhang, Haokun Chen, Ahmed Frikha, Yezi Yang, Denis Krompass, Gengyuan Zhang, Jindong Gu, Volker Tresp

Visual Question Answering (VQA) is a multi-discipline research task.

Continual Learning Question Answering +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.