Search Results for author: Siwen Luo

Found 9 papers, 4 papers with code

PDFVQA: A New Dataset for Real-World VQA on PDF Documents

no code implementations13 Apr 2023 Yihao Ding, Siwen Luo, Hyunsuk Chung, Soyeon Caren Han

Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions.

Key Information Extraction Question Answering +1

SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering

no code implementations16 Dec 2022 Feiqi Cao, Siwen Luo, Felipe Nunez, Zean Wen, Josiah Poon, Caren Han

To make explicit teaching of the relations between the two modalities, we proposed and integrated two attention modules, namely a scene graph-based semantic relation-aware attention and a positional relation-aware attention.

Optical Character Recognition (OCR) Question Answering +1

PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals

no code implementations29 Nov 2022 Zhihao Zhang, Siwen Luo, Junyi Chen, Sijia Lai, Siqu Long, Hyunsuk Chung, Soyeon Caren Han

We propose a PiggyBack, a Visual Question Answering platform that allows users to apply the state-of-the-art visual-language pretrained models easily.

Question Answering Visual Question Answering

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

1 code implementation COLING 2022 Siwen Luo, Yihao Ding, Siqu Long, Josiah Poon, Soyeon Caren Han

Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications.

Component Classification Document Layout Analysis

Local Interpretations for Explainable Natural Language Processing: A Survey

no code implementations20 Mar 2021 Siwen Luo, Hamish Ivison, Caren Han, Josiah Poon

As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models.

Machine Translation Sentiment Analysis +1

Deep Structured Feature Networks for Table Detection and Tabular Data Extraction from Scanned Financial Document Images

no code implementations20 Feb 2021 Siwen Luo, Mengting Wu, Yiwen Gong, Wanying Zhou, Josiah Poon

The main contributions of this paper are proposing the Financial Documents dataset with table-area annotations, the superior detection model and the rule-based layout segmentation technique for the tabular data extraction from PDF files.

Optical Character Recognition (OCR) Table Detection

VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks

1 code implementation COLING 2020 Caren Han, Siqu Long, Siwen Luo, Kunze Wang, Josiah Poon

We propose a new visual contextual text representation for text-to-image multimodal tasks, VICTR, which captures rich visual semantic information of objects from the text input.

Dependency Parsing

VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks

1 code implementation7 Oct 2020 Soyeon Caren Han, Siqu Long, Siwen Luo, Kunze Wang, Josiah Poon

We propose a new visual contextual text representation for text-to-image multimodal tasks, VICTR, which captures rich visual semantic information of objects from the text input.

Ranked #24 on Text-to-Image Generation on COCO (Inception score metric)

Dependency Parsing Text-to-Image Generation

REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering

1 code implementation27 Jul 2020 Siwen Luo, Soyeon Caren Han, Kaiyuan Sun, Josiah Poon

Visual question answering (VQA) is a challenging multi-modal task that requires not only the semantic understanding of both images and questions, but also the sound perception of a step-by-step reasoning process that would lead to the correct answer.

Question Answering Visual Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.