no code implementations • 30 Sep 2024 • Masato Fujitake
It is essential for automation processing to correctly recognize scanned voucher text, such as the company name on invoices.
Optical Character Recognition
Optical Character Recognition (OCR)
no code implementations • 9 Apr 2024 • Masato Fujitake
In this paper, we present a method for enhancing the accuracy of scene text recognition tasks by judging whether the image and text match each other.
no code implementations • 21 Mar 2024 • Masato Fujitake
By leveraging the strengths of existing research in document image understanding and LLMs' superior language understanding capabilities, the proposed model, fine-tuned with multimodal instruction datasets, performs an understanding of document images in a single model.
no code implementations • 28 Dec 2023 • Masato Fujitake
Therefore, we propose a deep reinforcement learning localization method for logo recognition (RL-LOGO).
no code implementations • 31 Oct 2023 • Yuki Okumura, Masato Fujitake
The FA team participated in the Table Data Extraction (TDE) and Text-to-Table Relationship Extraction (TTRE) tasks of the NTCIR-17 Understanding of Non-Financial Objects in Financial Reports (UFO).
1 code implementation • 30 Aug 2023 • Masato Fujitake
Typical text recognition methods rely on an encoder-decoder structure, in which the encoder extracts features from an image, and the decoder produces recognized text from these features.
Ranked #1 on
Handwritten Text Recognition
on IAM
no code implementations • 29 Jun 2023 • Masato Fujitake
This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild.
Ranked #12 on
Scene Text Recognition
on IIIT5k
no code implementations • 21 Feb 2023 • Masato Fujitake
Scene-text spotting is a task that predicts a text area on natural scene images and recognizes its text characters simultaneously.
Ranked #1 on
Text Spotting
on SCUT-CTW1500
1 code implementation • IEEE Access 2022 • Masato Fujitake, Akihiro Sugimoto
In this paper, we enhance features element-wisely before the object candidate region detection, proposing Video Sparse Transformer with Attention-guided Memory (VSTAM).
Ranked #1 on
Object Detection
on UA-DETRAC