1 code implementation • 12 Jan 2024 • Seongyun Lee, Seungone Kim, Sue Hyun Park, Geewook Kim, Minjoon Seo
Assessing long-form responses generated by Vision-Language Models (VLMs) is challenging.
no code implementations • ICCV 2023 • Daehee Kim, Yoonsik Kim, Donghyun Kim, Yumin Lim, Geewook Kim, Taeho Kil
In this paper, we investigate effective pre-training tasks in the broader domains and also propose a novel pre-training method called SCOB that leverages character-wise supervised contrastive learning with online text rendering to effectively pre-train document and scene text domains by bridging the domain gap.
1 code implementation • 24 May 2023 • Geewook Kim, Hodong Lee, Daehee Kim, Haeji Jung, SangHee Park, Yoonsik Kim, Sangdoo Yun, Taeho Kil, Bado Lee, Seunghyun Park
Recent advances in Large Language Models (LLMs) have stimulated a surge of research aimed at extending their applications to the visual domain.
1 code implementation • 7 Nov 2022 • Donghyun Kim, Teakgyu Hong, Moonbin Yim, Yoonsik Kim, Geewook Kim
In recent years, research on visual document understanding (VDU) has grown significantly, with a particular emphasis on the development of self-supervised learning methods.
document understanding Optical Character Recognition (OCR) +1
no code implementations • 23 Feb 2022 • Geewook Kim, Wonseok Hwang, Minjoon Seo, Seunghyun Park
Semi-structured query systems for document-oriented databases have many real applications.
Optical Character Recognition Optical Character Recognition (OCR) +1
4 code implementations • 30 Nov 2021 • Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park
Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.
Ranked #10 on Document Image Classification on RVL-CDIP
no code implementations • 18 May 2021 • Masahiro Naito, Sho Yokoi, Geewook Kim, Hidetoshi Shimodaira
(Q2) Ordinary additive compositionality can be seen as an AND operation of word meanings, but it is not well understood how other operations, such as OR and NOT, can be computed by the embeddings.
no code implementations • EMNLP 2021 • Wonseok Hwang, Hyunji Lee, Jinyeong Yim, Geewook Kim, Minjoon Seo
A real-world information extraction (IE) system for semi-structured document images often involves a long pipeline of multiple modules, whose complexity dramatically increases its development and maintenance cost.
1 code implementation • COLING 2020 • Sungrae Park, Geewook Kim, Junyeop Lee, Junbum Cha, Ji-Hoon Kim, Hwalsuk Lee
This paper introduces a method that efficiently reduces the computational cost and parameter size of Transformer.
no code implementations • 2 May 2020 • Morihiro Mizutani, Akifumi Okuno, Geewook Kim, Hidetoshi Shimodaira
Multimodal relational data analysis has become of increasing importance in recent years, for exploring across different domains of data, such as images and their text tags obtained from social networking services (e. g., Flickr).
no code implementations • 25 Sep 2019 • Sungrae Park, Geewook Kim, Junyeop Lee, Junbum Cha, Ji-Hoon Kim Hwalsuk Lee
When compared to Transformers with a comparable number of parameters and time complexity, the proposed model shows better performance.
13 code implementations • ICCV 2019 • Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee
Many new proposals for scene text recognition (STR) models have been introduced in recent years.
Ranked #7 on Scene Text Recognition on ICDAR 2003
1 code implementation • 27 Feb 2019 • Geewook Kim, Akifumi Okuno, Kazuki Fukui, Hidetoshi Shimodaira
In addition to the parameters of neural networks, we optimize the weights of the inner product by allowing positive and negative values.
1 code implementation • WS 2018 • Geewook Kim, Kazuki Fukui, Hidetoshi Shimodaira
We propose a new word embedding method called \textit{word-like character} n\textit{-gram embedding}, which learns distributed representations of words by embedding word-like character n-grams.
no code implementations • 4 Oct 2018 • Akifumi Okuno, Geewook Kim, Hidetoshi Shimodaira
We propose shifted inner-product similarity (SIPS), which is a novel yet very simple extension of the ordinary inner-product similarity (IPS) for neural-network based graph embedding (GE).
2 code implementations • NAACL 2019 • Geewook Kim, Kazuki Fukui, Hidetoshi Shimodaira
We propose a new type of representation learning method that models words, phrases and sentences seamlessly.