Lawyers, for instance, search for appropriate precedents favorable to their clients, while the number of legal precedents is ever-growing.
Here we present the first large-scale benchmark of Korean legal AI datasets, LBOX OPEN, that consists of one legal corpus, two classification tasks, two legal judgement prediction (LJP) tasks, and one summarization task.
Semi-structured query systems for document-oriented databases have many real applications.
Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.
Ranked #10 on Document Image Classification on RVL-CDIP
On the other hand, this paper tackles the problem by going back to the basic: effective combination of text and layout.
Ranked #3 on Relation Extraction on FUNSD
A real-world information extraction (IE) system for semi-structured document images often involves a long pipeline of multiple modules, whose complexity dramatically increases its development and maintenance cost.
Although the recent advance in OCR enables the accurate extraction of text segments, it is still challenging to extract key information from documents due to the diversity of layouts.
The restricted Boltzmann machine (RBM) is a representative generative model based on the concept of statistical mechanics.
Deep learning approaches to semantic parsing require a large amount of labeled data, but annotating complex logical forms is costly.
Information Extraction (IE) for semi-structured document images is often approached as a sequence tagging problem by classifying each recognized input token into one of the IOB (Inside, Outside, and Beginning) categories.
Parsing textual information embedded in images is important for various down- stream tasks.
We present SQLova, the first Natural-language-to-SQL (NL2SQL) model to achieve human performance in WikiSQL dataset.