The output structure of database-like tables, consisting of values structured in horizontal rows and vertical columns identifiable by name, can cover a wide range of NLP tasks.
We address the challenging problem of Natural Language Comprehension beyond plain-text documents by introducing the TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics.
Ranked #1 on Visual Question Answering on DocVQA (using extra training data)
This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading Comprehension dataset.
A reduction of quadratic time and memory complexity to sublinear was achieved due to a robust trainable top-$k$ operator.
Ranked #2 on Text Summarization on arXiv Summarization Dataset
In this paper, we investigate the Dual-source Transformer architecture on the WikiReading information extraction and machine reading comprehension dataset.