no code implementations • 9 Jan 2024 • Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, Tomas Pfister
We propose the Chain-of-Table framework, where tabular data is explicitly used in the reasoning chain as a proxy for intermediate thoughts.
Ranked #3 on Table-based Fact Verification on TabFact
1 code implementation • 25 Oct 2023 • Shangbang Long, Siyang Qin, Yasuhisa Fujii, Alessandro Bissacco, Michalis Raptis
We propose Hierarchical Text Spotter (HTS), a novel method for the joint task of word-level text spotting and geometric layout analysis.
no code implementations • 18 Aug 2023 • Peter Garst, Reeve Ingle, Yasuhisa Fujii
Language models are useful adjuncts to optical models for producing accurate optical character recognition (OCR) results.
no code implementations • 1 Aug 2023 • Cheng-Yu Hsieh, Si-An Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister
Today, large language models (LLMs) are taught to use new tools by providing a few demonstrations of the tool's usage.
1 code implementation • 16 May 2023 • Shangbang Long, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco, Yasuhisa Fujii, Michalis Raptis
We organize a competition on hierarchical text detection and recognition.
no code implementations • 4 May 2023 • Renshen Wang, Yasuhisa Fujii, Alessandro Bissacco
Text reading order is a crucial aspect in the output of an OCR engine, with a large impact on downstream tasks.
no code implementations • 4 May 2023 • Chen-Yu Lee, Chun-Liang Li, Hao Zhang, Timothy Dozat, Vincent Perot, Guolong Su, Xiang Zhang, Kihyuk Sohn, Nikolai Glushnev, Renshen Wang, Joshua Ainslie, Shangbang Long, Siyang Qin, Yasuhisa Fujii, Nan Hua, Tomas Pfister
In FormNetV2, we introduce a centralized multimodal graph contrastive learning strategy to unify self-supervised pre-training for all modalities in one loss.
1 code implementation • 3 May 2023 • Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister
Third, we reduce both the model size and the amount of data required to outperform LLMs; our finetuned 770M T5 model outperforms the few-shot prompted 540B PaLM model using only 80% of available data on a benchmark, whereas standard finetuning the same T5 model struggles to match even by using 100% of the dataset.
2 code implementations • CVPR 2022 • Shangbang Long, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco, Yasuhisa Fujii, Michalis Raptis
In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis.
no code implementations • 17 Mar 2022 • Shuang Liu, Renshen Wang, Michalis Raptis, Yasuhisa Fujii
We formulate the task of detecting lines and paragraphs in a document into a unified two-level clustering problem.
no code implementations • ACL 2022 • Chen-Yu Lee, Chun-Liang Li, Timothy Dozat, Vincent Perot, Guolong Su, Nan Hua, Joshua Ainslie, Renshen Wang, Yasuhisa Fujii, Tomas Pfister
Sequence modeling has demonstrated state-of-the-art performance on natural language and document understanding tasks.
no code implementations • ACL 2021 • Chen-Yu Lee, Chun-Liang Li, Chu Wang, Renshen Wang, Yasuhisa Fujii, Siyang Qin, Ashok Popat, Tomas Pfister
Natural reading orders of words are crucial for information extraction from form-like documents.
1 code implementation • 15 Apr 2021 • Daniel Hernandez Diaz, Siyang Qin, Reeve Ingle, Yasuhisa Fujii, Alessandro Bissacco
Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length, a requirement for universal line recognition.
Ranked #2 on Handwritten Text Recognition on IAM (using extra training data)
no code implementations • 29 Jan 2021 • Renshen Wang, Yasuhisa Fujii, Ashok C. Popat
We propose a new approach for paragraph recognition in document images by spatial graph convolutional networks (GCN) applied on OCR text boxes.
no code implementations • ICCV 2019 • Siyang Qin, Alessandro Bissacco, Michalis Raptis, Yasuhisa Fujii, Ying Xiao
We propose an end-to-end trainable network that can simultaneously detect and recognize text of arbitrary shape, making substantial progress on the open problem of reading scene text of irregular shape.
Instance Segmentation Optical Character Recognition (OCR) +3
no code implementations • 19 Apr 2019 • R. Reeve Ingle, Yasuhisa Fujii, Thomas Deselaers, Jonathan Baccash, Ashok C. Popat
These constitute a solution to bring HTR capability into a large scale OCR system.
no code implementations • 15 Aug 2017 • Yasuhisa Fujii, Karel Driesen, Jonathan Baccash, Ash Hurst, Ashok C. Popat
Therefore we reframe line script identification as a sequence-to-label problem and solve it using two components, trained end-toend: Encoder and Summarizer.