Search Results for author: Caren Han

Found 8 papers, 4 papers with code

VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks

1 code implementation COLING 2020 Caren Han, Siqu Long, Siwen Luo, Kunze Wang, Josiah Poon

We propose a new visual contextual text representation for text-to-image multimodal tasks, VICTR, which captures rich visual semantic information of objects from the text input.

Dependency Parsing Sentence

RoViST:Learning Robust Metrics for Visual Storytelling

1 code implementation8 May 2022 Eileen Wang, Caren Han, Josiah Poon

We measure the reliability of our metric sets by analysing its correlation with human judgement scores on a sample of machine stories obtained from 4 state-of-the-arts models trained on the Visual Storytelling Dataset (VIST).

Sentence Visual Grounding +1

RoViST: Learning Robust Metrics for Visual Storytelling

1 code implementation Findings (NAACL) 2022 Eileen Wang, Caren Han, Josiah Poon

We measure the reliability of our metric sets by analysing its correlation with human judgement scores on a sample of machine stories obtained from 4 state-of-the-arts models trained on the Visual Storytelling Dataset (VIST).

Sentence Visual Grounding +1

PEACH: Pretrained-embedding Explanation Across Contextual and Hierarchical Structure

1 code implementation21 Apr 2024 Feiqi Cao, Caren Han, Hyunsuk Chung

In this work, we propose a novel tree-based explanation technique, PEACH (Pretrained-embedding Explanation Across Contextual and Hierarchical Structure), that can explain how text-based documents are classified by using any pretrained contextual embeddings in a tree-based human-interpretable manner.

Local Interpretations for Explainable Natural Language Processing: A Survey

no code implementations20 Mar 2021 Siwen Luo, Hamish Ivison, Caren Han, Josiah Poon

As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models.

Machine Translation Sentiment Analysis +1

An Analysis of Deep Reinforcement Learning Agents for Text-based Games

no code implementations9 Sep 2022 Chen Chen, Yue Dai, Josiah Poon, Caren Han

Text-based games(TBG) are complex environments which allow users or computer agents to make textual interactions and achieve game goals. In TBG agent design and training process, balancing the efficiency and performance of the agent models is a major challenge.

reinforcement-learning Reinforcement Learning (RL) +1

SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering

no code implementations16 Dec 2022 Feiqi Cao, Siwen Luo, Felipe Nunez, Zean Wen, Josiah Poon, Caren Han

To make explicit teaching of the relations between the two modalities, we proposed and integrated two attention modules, namely a scene graph-based semantic relation-aware attention and a positional relation-aware attention.

Optical Character Recognition Optical Character Recognition (OCR) +3

M3-VRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding

no code implementations28 Feb 2024 Yihao Ding, Lorenzo Vaiani, Caren Han, Jean Lee, Paolo Garza, Josiah Poon, Luca Cagliero

This paper presents a groundbreaking multimodal, multi-task, multi-teacher joint-grained knowledge distillation model for visually-rich form document understanding.

document understanding Knowledge Distillation

Cannot find the paper you are looking for? You can Submit a new open access paper.