1 code implementation • 2 Apr 2025 • Minhu Park, Hongseok Oh, Eunkyung Choi, Wonseok Hwang
Recently, building retrieval-augmented generation (RAG) systems to enhance the capability of large language models (LLMs) has become a common practice.
no code implementations • 5 Mar 2025 • Eunkyung Choi, Young Jin Suh, Hun Park, Wonseok Hwang
How capable are large language models (LLMs) in the domain of taxation?
1 code implementation • 27 Feb 2025 • Hongseok Oh, Wonseok Hwang
Recently, Large Vision-Language Models (LVLMs) show remarkable performance across various domains.
1 code implementation • 11 Oct 2024 • Yeeun Kim, Young Rok Choi, Eunkyung Choi, Jinhwan Choi, Hai Jin Park, Wonseok Hwang
Here, we introduce KBL, a benchmark for assessing the Korean legal language understanding of LLMs, consisting of (1) 7 legal knowledge tasks (510 examples), (2) 4 legal reasoning tasks (288 examples), and (3) the Korean bar exam (4 domains, 53 tasks, 2, 510 examples).
1 code implementation • 31 Aug 2024 • Hongseok Oh, Wonseok Hwang
Large Language Models (LLMs) have shown remarkable progress, but their real-world application necessitates reliable calibration.
no code implementations • 11 Mar 2024 • Yeeun Kim, Hyunseo Shin, Eunkyung Choi, Hongseok Oh, Hyunjun Kim, Wonseok Hwang
Open source is a driving force behind scientific advancement. However, this openness is also a double-edged sword, with the inherent risk that innovative technologies can be misused for purposes harmful to society.
no code implementations • 20 Feb 2024 • Jinu Lee, Wonseok Hwang
To improve the performance and explainability of LLM-based natural language reasoning, structured reasoning can be applied to generate explicitly structured proofs.
1 code implementation • 8 Sep 2023 • Kyoungyeon Cho, Seungkum Han, Young Rok Choi, Wonseok Hwang
Here we provide NESTLE, a no-code tool for large-scale statistical analysis of legal corpus.
1 code implementation • 3 Nov 2022 • Wonseok Hwang, Saehee Eom, Hanuhl Lee, Hai Jin Park, Minjoon Seo
Lawyers, for instance, search for appropriate precedents favorable to their clients, while the number of legal precedents is ever-growing.
1 code implementation • 10 Jun 2022 • Wonseok Hwang, Dongjun Lee, Kyoungyeon Cho, Hanuhl Lee, Minjoon Seo
Here we present the first large-scale benchmark of Korean legal AI datasets, LBOX OPEN, that consists of one legal corpus, two classification tasks, two legal judgement prediction (LJP) tasks, and one summarization task.
no code implementations • 23 Feb 2022 • Geewook Kim, Wonseok Hwang, Minjoon Seo, Seunghyun Park
Semi-structured query systems for document-oriented databases have many real applications.
Optical Character Recognition
Optical Character Recognition (OCR)
+1
5 code implementations • 30 Nov 2021 • Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park
Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.
Ranked #7 on
Key-value Pair Extraction
on SIBR
2 code implementations • 10 Aug 2021 • Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park
On the other hand, this paper tackles the problem by going back to the basic: effective combination of text and layout.
Ranked #7 on
Relation Extraction
on FUNSD
no code implementations • EMNLP 2021 • Wonseok Hwang, Hyunji Lee, Jinyeong Yim, Geewook Kim, Minjoon Seo
A real-world information extraction (IE) system for semi-structured document images often involves a long pipeline of multiple modules, whose complexity dramatically increases its development and maintenance cost.
no code implementations • 1 Jan 2021 • Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park
Although the recent advance in OCR enables the accurate extraction of text segments, it is still challenging to extract key information from documents due to the diversity of layouts.
no code implementations • 27 Nov 2020 • Juno Hwang, Wonseok Hwang, Junghyo Jo
The restricted Boltzmann machine (RBM) is a representative generative model based on the concept of statistical mechanics.
1 code implementation • Findings (ACL) 2021 • Wonseok Hwang, Jinyeong Yim, Seunghyun Park, Sohee Yang, Minjoon Seo
Information Extraction (IE) for semi-structured document images is often approached as a sequence tagging problem by classifying each recognized input token into one of the IOB (Inside, Outside, and Beginning) categories.
no code implementations • AKBC 2020 • Wonseok Hwang, Jinyeong Yim, Seunghyun Park, Minjoon Seo
Deep learning approaches to semantic parsing require a large amount of labeled data, but annotating complex logical forms is costly.
1 code implementation • NeurIPS Workshop Document_Intelligen 2019 • Wonseok Hwang, Seonghyeon Kim, Minjoon Seo, Jinyeong Yim, Seunghyun Park, Sungrae Park, Junyeop Lee, Bado Lee, Hwalsuk Lee
Parsing textual information embedded in images is important for various down- stream tasks.
Optical Character Recognition
Optical Character Recognition (OCR)
5 code implementations • 4 Feb 2019 • Wonseok Hwang, Jinyeong Yim, Seunghyun Park, Minjoon Seo
We present SQLova, the first Natural-language-to-SQL (NL2SQL) model to achieve human performance in WikiSQL dataset.