Search Results for author: Siwen Luo

Found 10 papers, 4 papers with code

REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering

1 code implementation27 Jul 2020 Siwen Luo, Soyeon Caren Han, Kaiyuan Sun, Josiah Poon

Visual question answering (VQA) is a challenging multi-modal task that requires not only the semantic understanding of both images and questions, but also the sound perception of a step-by-step reasoning process that would lead to the correct answer.

Question Answering Visual Question Answering

VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks

1 code implementation7 Oct 2020 Soyeon Caren Han, Siqu Long, Siwen Luo, Kunze Wang, Josiah Poon

We propose a new visual contextual text representation for text-to-image multimodal tasks, VICTR, which captures rich visual semantic information of objects from the text input.

Ranked #24 on Text-to-Image Generation on MS COCO (Inception score metric)

Dependency Parsing Sentence +1

VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks

1 code implementation COLING 2020 Caren Han, Siqu Long, Siwen Luo, Kunze Wang, Josiah Poon

We propose a new visual contextual text representation for text-to-image multimodal tasks, VICTR, which captures rich visual semantic information of objects from the text input.

Dependency Parsing Sentence

Deep Structured Feature Networks for Table Detection and Tabular Data Extraction from Scanned Financial Document Images

no code implementations20 Feb 2021 Siwen Luo, Mengting Wu, Yiwen Gong, Wanying Zhou, Josiah Poon

The main contributions of this paper are proposing the Financial Documents dataset with table-area annotations, the superior detection model and the rule-based layout segmentation technique for the tabular data extraction from PDF files.

Optical Character Recognition Optical Character Recognition (OCR) +1

Local Interpretations for Explainable Natural Language Processing: A Survey

no code implementations20 Mar 2021 Siwen Luo, Hamish Ivison, Caren Han, Josiah Poon

As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models.

Machine Translation Sentiment Analysis +1

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

1 code implementation COLING 2022 Siwen Luo, Yihao Ding, Siqu Long, Josiah Poon, Soyeon Caren Han

Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications.

Component Classification Document Layout Analysis

PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals

no code implementations29 Nov 2022 Zhihao Zhang, Siwen Luo, Junyi Chen, Sijia Lai, Siqu Long, Hyunsuk Chung, Soyeon Caren Han

We propose a PiggyBack, a Visual Question Answering platform that allows users to apply the state-of-the-art visual-language pretrained models easily.

Question Answering Visual Question Answering

SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering

no code implementations16 Dec 2022 Feiqi Cao, Siwen Luo, Felipe Nunez, Zean Wen, Josiah Poon, Caren Han

To make explicit teaching of the relations between the two modalities, we proposed and integrated two attention modules, namely a scene graph-based semantic relation-aware attention and a positional relation-aware attention.

Optical Character Recognition Optical Character Recognition (OCR) +3

PDFVQA: A New Dataset for Real-World VQA on PDF Documents

no code implementations13 Apr 2023 Yihao Ding, Siwen Luo, Hyunsuk Chung, Soyeon Caren Han

Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions.

document understanding Key Information Extraction +2

Workshop on Document Intelligence Understanding

no code implementations31 Jul 2023 Soyeon Caren Han, Yihao Ding, Siwen Luo, Josiah Poon, HeeGuen Yoon, Zhe Huang, Paul Duuring, Eun Jung Holden

Document understanding and information extraction include different tasks to understand a document and extract valuable information automatically.

document understanding Visual Question Answering (VQA)

Cannot find the paper you are looking for? You can Submit a new open access paper.