Foundations of Data Quality Management

Data quality is one of the most important problems in data management. A database system typically aims to support the creation, maintenance and use of large amount of data, focusing on the quantity of data. However, real-life data are often dirty: inconsistent, duplicated, inaccurate, incomplete, or stale. Dirty data in a database routinely generate misleading or biased analytical results and decisions, and lead to loss of revenues, credibility and customers. With this comes the need for data quality management. In contrast to traditional data management tasks, data quality management is to enable the detection and correction of errors in the data, syntactic or semantic, in order to improve the quality of the data and hence, add values to business processes. This monograph gives an overview of fundamental issues underlying central aspects of data quality, namely, data consistency, deduplication, accuracy, currency, and information completeness. We promote a uniform logical framework...