no code implementations • 1 Jun 2022 • Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen, Juha Rautiainen
We were able to show that improvement in optical character recognition quality of documents leads to higher mean relevance evaluation scores of query results in our historical newspaper collection.
no code implementations • 4 Mar 2022 • Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen, Juha Rautiainen
The query interface was able to present the same underlying document for the user based on two alternatives: either based on the lower OCR quality, or based on the higher OCR quality, and the choice was randomized.
no code implementations • 16 Nov 2016 • Kimmo Kettunen
This paper discusses different corpus analysis style methods to approximate overall lexical quality of the Finnish part of the Digi collection.
no code implementations • 9 Nov 2016 • Kimmo Kettunen, Eetu Mäkelä, Teemu Ruokolainen, Juha Kuokkala, Laura Löfberg
In this paper we report first large scale trials and evaluation of NER with data out of a digitized Finnish historical newspaper collection Digi.
no code implementations • LREC 2016 • Kimmo Kettunen, Tuula P{\"a}{\"a}kk{\"o}nen
The National Library of Finland has digitized a large proportion of the historical newspapers published in Finland between 1771 and 1910 (Bremer-Laamanen 2001).
Information Retrieval Optical Character Recognition (OCR) +1