1 code implementation • 11 Feb 2024 • Simon Ging, María A. Bravo, Thomas Brox
The evaluation of text-generative vision-language models is a challenging yet crucial endeavor.
1 code implementation • CVPR 2023 • María A. Bravo, Sudhanshu Mittal, Simon Ging, Thomas Brox
The objective of the novel task and benchmark is to probe object-level attribute information learned by vision-language models.
1 code implementation • NeurIPS 2020 • Simon Ging, Mohammadreza Zolfaghari, Hamed Pirsiavash, Thomas Brox
Many real-world video-text tasks involve different levels of granularity, such as frames and words, clip and sentences or videos and paragraphs, each with distinct semantics.
Ranked #4 on Video Captioning on ActivityNet Captions