Detecting Quality Problems in Research Data: A Model-Driven Approach

22 Jul 2020  ·  Arno Kesper, Viola Wenz, Gabriele Taentzer ·

As scientific progress highly depends on the quality of research data, there are strict requirements for data quality coming from the scientific community. A major challenge in data quality assurance is to localise quality problems that are inherent to data. Due to the dynamic digitalisation in specific scientific fields, especially the humanities, different database technologies and data formats may be used in rather short terms to gain experiences. We present a model-driven approach to analyse the quality of research data. It allows abstracting from the underlying database technology. Based on the observation that many quality problems show anti-patterns, a data engineer formulates analysis patterns that are generic concerning the database format and technology. A domain expert chooses a pattern that has been adapted to a specific database technology and concretises it for a domain-specific database format. The resulting concrete patterns are used by data analysts to locate quality problems in their databases. As proof of concept, we implemented tool support that realises this approach for XML databases. We evaluated our approach concerning expressiveness and performance in the domain of cultural heritage based on a qualitative study on quality problems occurring in cultural heritage data.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here