no code implementations • EAMT 2020 • Joachim Van den Bogaert, Arne Defauw, Frederic Everaert, Koen Van Winckel, Alina Kramchaninova, Anna Bardadym, Tom Vanallemeersch, Pavel Smrž, Michal Hradiš
The OCCAM project (Optical Character recognition, ClassificAtion & Machine Translation) aims at integrating the CEF (Connecting Europe Facility) Automated Translation service with image classification, Translation Memories (TMs), Optical Character Recognition (OCR), and Machine Translation (MT).
1 code implementation • 1 May 2024 • Martin Kišš, Michal Hradiš
The evaluation shows that the self-supervised pre-training on data from the target domain is very effective, but it struggles to outperform transfer learning from closely related domains.
no code implementations • 13 Feb 2023 • Jan Kohút, Michal Hradiš, Martin Kišš
We experimented with various placements and settings of WSB and contrastively pre-trained embeddings.
no code implementations • 13 Feb 2023 • Jan Kohút, Michal Hradiš
In many machine learning tasks, a large general dataset and a small specialized dataset are available.
1 code implementation • 5 Dec 2022 • Martin Kišš, Michal Hradiš, Karel Beneš, Petr Buchal, Michal Kula
This paper explores semi-supervised training for sequence tasks, such as Optical Character Recognition or Automatic Speech Recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 24 Jan 2022 • Martin Kišš, Jan Kohút, Karel Beneš, Michal Hradiš
The line-level system significantly improves results in script and font classification and in the dating task.
1 code implementation • 27 Apr 2021 • Martin Kišš, Karel Beneš, Michal Hradiš
This paper addresses text recognition for domains with limited manual annotations by a simple self-training strategy.
no code implementations • 9 Mar 2021 • Jan Kohút, Michal Hradiš
Users of OCR systems, from different institutions and scientific disciplines, prefer and produce different transcription styles.
no code implementations • 23 Feb 2021 • Oldřich Kodym, Michal Hradiš
Extraction of text regions and individual text lines from historic documents is necessary for automatic transcription.
1 code implementation • 2 Jul 2019 • Martin Kišš, Michal Hradiš, Oldřich Kodym
We introduce the Brno Mobile OCR Dataset (B-MOD) for document Optical Character Recognition from low-quality images captured by handheld mobile devices.
no code implementations • 12 Jun 2015 • Martin Kolář, Michal Hradiš, Pavel Zemčík
This report presents our submission to the MS COCO Captioning Challenge 2015.