Search Results for author: Siwen Luo

Found 10 papers, 4 papers with code

REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering

1 code implementation • 27 Jul 2020 • Siwen Luo, Soyeon Caren Han, Kaiyuan Sun, Josiah Poon

Visual question answering (VQA) is a challenging multi-modal task that requires not only the semantic understanding of both images and questions, but also the sound perception of a step-by-step reasoning process that would lead to the correct answer.

Question Answering Visual Question Answering

Paper
Code

VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks

1 code implementation • 7 Oct 2020 • Soyeon Caren Han, Siqu Long, Siwen Luo, Kunze Wang, Josiah Poon

We propose a new visual contextual text representation for text-to-image multimodal tasks, VICTR, which captures rich visual semantic information of objects from the text input.

Ranked #24 on Text-to-Image Generation on MS COCO (Inception score metric)

Dependency Parsing Sentence +1

Paper
Code

VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks

1 code implementation • COLING 2020 • Caren Han, Siqu Long, Siwen Luo, Kunze Wang, Josiah Poon

We propose a new visual contextual text representation for text-to-image multimodal tasks, VICTR, which captures rich visual semantic information of objects from the text input.

Dependency Parsing Sentence

Paper
Code

Deep Structured Feature Networks for Table Detection and Tabular Data Extraction from Scanned Financial Document Images

no code implementations • 20 Feb 2021 • Siwen Luo, Mengting Wu, Yiwen Gong, Wanying Zhou, Josiah Poon

The main contributions of this paper are proposing the Financial Documents dataset with table-area annotations, the superior detection model and the rule-based layout segmentation technique for the tabular data extraction from PDF files.

Optical Character Recognition Optical Character Recognition (OCR) +1

Paper
Add Code

Local Interpretations for Explainable Natural Language Processing: A Survey

no code implementations • 20 Mar 2021 • Siwen Luo, Hamish Ivison, Caren Han, Josiah Poon

As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models.

Machine Translation Sentiment Analysis +1

Paper
Add Code

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

1 code implementation • COLING 2022 • Siwen Luo, Yihao Ding, Siqu Long, Josiah Poon, Soyeon Caren Han

Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications.

Component Classification Document Layout Analysis

Paper
Code

PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals

no code implementations • 29 Nov 2022 • Zhihao Zhang, Siwen Luo, Junyi Chen, Sijia Lai, Siqu Long, Hyunsuk Chung, Soyeon Caren Han

We propose a PiggyBack, a Visual Question Answering platform that allows users to apply the state-of-the-art visual-language pretrained models easily.

Question Answering Visual Question Answering

Paper
Add Code

SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering

no code implementations • 16 Dec 2022 • Feiqi Cao, Siwen Luo, Felipe Nunez, Zean Wen, Josiah Poon, Caren Han

To make explicit teaching of the relations between the two modalities, we proposed and integrated two attention modules, namely a scene graph-based semantic relation-aware attention and a positional relation-aware attention.

Optical Character Recognition Optical Character Recognition (OCR) +3

Paper
Add Code

PDFVQA: A New Dataset for Real-World VQA on PDF Documents

no code implementations • 13 Apr 2023 • Yihao Ding, Siwen Luo, Hyunsuk Chung, Soyeon Caren Han

Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions.

document understanding Key Information Extraction +2

Paper
Add Code

Workshop on Document Intelligence Understanding

no code implementations • 31 Jul 2023 • Soyeon Caren Han, Yihao Ding, Siwen Luo, Josiah Poon, HeeGuen Yoon, Zhe Huang, Paul Duuring, Eun Jung Holden

Document understanding and information extraction include different tasks to understand a document and extract valuable information automatically.

document understanding Visual Question Answering (VQA)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.