no code implementations • 25 Jul 2024 • Xinyu Pi, Mingyuan Wu, Jize Jiang, Haozhen Zheng, Beitong Tian, ChengXiang Zhai, Klara Nahrstedt, Zhiting Hu
Smaller-scale Vision-Langauge Models (VLMs) often claim to perform on par with larger models in general-domain visual grounding and question-answering benchmarks while offering advantages in computational efficiency and storage.
no code implementations • 21 Jun 2024 • Mingyuan Wu, Zichuan Liu, Haozhen Zheng, Hongpeng Guo, Bo Chen, Xin Lu, Klara Nahrstedt
To address this, we propose and formulate a one tap driven single instance segmentation task that segments a single instance selected by a user via a positive tap.