no code implementations • 14 Jul 2023 • Zixin Guo, Tzu-Jui Julius Wang, Selen Pehlivan, Abduljalil Radman, Jorma Laaksonen
To further reduce the amount of supervision, we propose Prompts-in-The-Loop (PiTL) that prompts knowledge from large language models (LLMs) to describe images.
no code implementations • 24 Oct 2022 • Tzu-Jui Julius Wang, Jorma Laaksonen, Tomas Langer, Heikki Arponen, Tom E. Bishop
Moreover, in other V-L downstream tasks considered, our WFH models are on par with models trained with paired V-L data, revealing the utility of unpaired data.
1 code implementation • 1 Jun 2022 • Zixin Guo, Tzu-Jui Julius Wang, Jorma Laaksonen
Different from directly fine-tuning CLIP to generate sentences, we introduce an adaptation training process to adapt CLIP's visual encoder to capture and align differences in image pairs based on the textual descriptions.
no code implementations • 18 Aug 2020 • Tzu-Jui Julius Wang, Selen Pehlivan, Jorma Laaksonen
Recent scene graph generation (SGG) models have shown their capability of capturing the most frequent relations among visual entities.