The output structure of database-like tables, consisting of values structured in horizontal rows and vertical columns identifiable by name, can cover a wide range of NLP tasks.
We address the challenging problem of Natural Language Comprehension beyond plain-text documents by introducing the TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics.
Ranked #1 on Visual Question Answering on DocVQA (using extra training data)
This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading Comprehension dataset.
The paper presents a novel method of finding a fragment in a long temporal sequence similar to the set of shorter sequences.
Ranked #2 on Semantic Retrieval on Contract Discovery
A reduction of quadratic time and memory complexity to sublinear was achieved due to a robust trainable top-$k$ operator.
Ranked #2 on Text Summarization on arXiv Summarization Dataset
In this paper, we investigate the Dual-source Transformer architecture on the WikiReading information extraction and machine reading comprehension dataset.
This paper presents the winning system for the propaganda Technique Classification (TC) task and the second-placed system for the propaganda Span Identification (SI) task.
1 code implementation • • Łukasz Borchmann, Dawid Wiśniewski, Andrzej Gretkowski, Izabela Kosmala, Dawid Jurkiewicz, Łukasz Szałkiewicz, Gabriela Pałka, Karol Kaczmarek, Agnieszka Kaliska, Filip Graliński
We propose a new shared task of semantic retrieval from legal texts, in which a so-called contract discovery is to be performed, where legal clauses are extracted from documents, given a few examples of similar clauses from other legal acts.
Ranked #1 on Semantic Retrieval on Contract Discovery