Search Results for author: Saara Hellstr{\"o}m

Found 1 papers, 0 papers with code

From Web Crawl to Clean Register-Annotated Corpora

no code implementations LREC 2020 Veronika Laippala, Samuel R{\"o}nnqvist, Saara Hellstr{\"o}m, Juhani Luotolahti, Liina Repo, Anna Salmela, Valtteri Skantsi, Sampo Pyysalo

However, two critical steps in the development of web corpora remain challenging: the identification of clean text from source HTML and the assignment of genre or register information to the documents.

Cannot find the paper you are looking for? You can Submit a new open access paper.