1 code implementation • COLING 2020 • Caren Han, Siqu Long, Siwen Luo, Kunze Wang, Josiah Poon
We propose a new visual contextual text representation for text-to-image multimodal tasks, VICTR, which captures rich visual semantic information of objects from the text input.
1 code implementation • 8 May 2022 • Eileen Wang, Caren Han, Josiah Poon
We measure the reliability of our metric sets by analysing its correlation with human judgement scores on a sample of machine stories obtained from 4 state-of-the-arts models trained on the Visual Storytelling Dataset (VIST).
1 code implementation • Findings (NAACL) 2022 • Eileen Wang, Caren Han, Josiah Poon
We measure the reliability of our metric sets by analysing its correlation with human judgement scores on a sample of machine stories obtained from 4 state-of-the-arts models trained on the Visual Storytelling Dataset (VIST).
1 code implementation • 21 Apr 2024 • Feiqi Cao, Caren Han, Hyunsuk Chung
In this work, we propose a novel tree-based explanation technique, PEACH (Pretrained-embedding Explanation Across Contextual and Hierarchical Structure), that can explain how text-based documents are classified by using any pretrained contextual embeddings in a tree-based human-interpretable manner.
no code implementations • 20 Mar 2021 • Siwen Luo, Hamish Ivison, Caren Han, Josiah Poon
As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models.
no code implementations • 9 Sep 2022 • Chen Chen, Yue Dai, Josiah Poon, Caren Han
Text-based games(TBG) are complex environments which allow users or computer agents to make textual interactions and achieve game goals. In TBG agent design and training process, balancing the efficiency and performance of the agent models is a major challenge.
no code implementations • 16 Dec 2022 • Feiqi Cao, Siwen Luo, Felipe Nunez, Zean Wen, Josiah Poon, Caren Han
To make explicit teaching of the relations between the two modalities, we proposed and integrated two attention modules, namely a scene graph-based semantic relation-aware attention and a positional relation-aware attention.
Optical Character Recognition Optical Character Recognition (OCR) +3
no code implementations • 28 Feb 2024 • Yihao Ding, Lorenzo Vaiani, Caren Han, Jean Lee, Paolo Garza, Josiah Poon, Luca Cagliero
This paper presents a groundbreaking multimodal, multi-task, multi-teacher joint-grained knowledge distillation model for visually-rich form document understanding.