Allgemeine Musikalische Zeitung as a Searchable Online Corpus

LREC 2020  ·  Bernd Kampe, Tinghui Duan, Udo Hahn ·

The massive digitization efforts related to historical newspapers over the past decades have focused on mass media sources and ordinary people as their primary recipients. Much less attention has been paid to newspapers published for a more specialized audience, e.g., those aiming at scholarly or cultural exchange within intellectual communities much narrower in scope, such as newspapers devoted to music criticism, arts or philosophy. Only some few of these specialized newspapers have been digitized up until now, but they are usually not well curated in terms of digitization quality, data formatting, completeness, redundancy (de-duplication), supply of metadata, and, hence, searchability. This paper describes our approach to eliminate these drawbacks for a major German-language newspaper resource of the Romantic Age, the Allgemeine Musikalische Zeitung (General Music Gazette). We here focus on a workflow that copes with a posteriori digitization problems, inconsistent OCRing and index building for searchability. In addition, we provide a user-friendly graphic interface to empower content-centric access to this (and other) digital resource(s) adopting open-source software for the purpose of Web presentation.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here