Abstract
In this paper, an alternative view on measurement of data quality is proposed. Current procedures for data quality measurement provide information about the extent to which data misrepresent reality. These procedures are descriptive in the sense that they provide us numerical information about the state of data. In many cases, this information is not sufficient to know whether data is fit for the task it was meant for. To bridge that gap, we propose a procedure that measures the operational characteristics of data. In this paper, we devise such a procedure by measuring the cost it takes to make data fit for use. We lay out the basics of this procedure and then provide more details on two essential components: tasks and transformation functions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bohannon, P., Fan, W., Flaster, M., Rastoqi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 143–154 (2005)
Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 746–755 (2007)
Bronselaer, A., De Mol, R., De Tré, G.: A measure-theoretic foundation for data quality. IEEE Trans. Fuzzy Syst. 26(2), 627–639 (2018)
Bronselaer, A., Nielandt, J., De Mol, R., De Tré, G.: Ordinal assessment of data consistency based on regular expressions. In: Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 317–328 (2016)
Chu, X., Ilyas, I., Papotti, P.: Discovering denial constraints. In: Proceedings of the VLDB Endowment, pp. 1498–1509 (2013)
Even, A., Shankaranarayanan, G.: Value-driven data quality assessment. In: Proceedings of the International Conference on Information Quality, pp. 265–279 (2005)
Even, A., Shankaranarayanan, G.: Understanding impartial versus utility-driven quality assessment in large data-sets. In: Proceedings of the International Conference on Information Quality, pp. 265–279 (2007)
Even, A., Shankaranarayanan, G.: Utility-driven assessment of data quality. DATA BASE Adv. Inf. Syst. 38(2), 75–93 (2007)
Fellegi, I., Holt, D.: A systematic approach to automatic edit and imputation. J. Am. Stat. Assoc. 71(353), 17–35 (1976)
Fenton, N.E., Pfleeger, S.L.: Software Metrics: A Rigorous and Practical Approach, 2nd edn. Thomson Publishing, Stamford (1996)
Fisher, C.W., Lauria, E.J.M., Matheus, C.C.: An accuracy metric: Percentages, randomness, and probabilities. J. Data Inf. Qual. 1(3), 16:1–16:21 (2009)
Frank, H.: Shortest paths in probabilistic graphs. Oper. Res. 17(4), 583–599 (1969)
Haegemans, T., Snoeck, M., Lemahieu, W.: Towards a precise definition of data accuracy and a justification for its measure. In: Proceedings of the International Conference on Information Quality (ICIQ), pp. 16:1–16:13 (2016)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Elsevier Science and Technology (2011)
Heath, I.: Unacceptable file operations in a relational data base. In: Proceedings of the 1971 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control, pp. 19–33 (1971)
Heinrich, B., Kaiser, M., Klier, M.: Does the EU insurance mediation directive help to improve data quality? a metric-based analysis. In: European Conference on Information Systems, pp. 1871–1882 (2008)
Heinrich, B., Klier, M.: Metric-based data quality assessment - developing and evaluation a probability-based currency metric. Decis. Support Syst. 72, 82–96 (2015)
Heinrich, B., Klier, M., Kaiser, M.: A procedure to develop metrics for currency and its application in CRM. ACM J. Data Inf. Qual. 1(1), 5:1–5:28 (2009)
Krantz, D., Luce, D., Suppes, P., Tversky, A.: Foundations of Measurement: Additive and Polynomial representations, vol. I. Academic Press, Cambridge (1971)
Pipino, L., Kopcso, D.P.: Data mining, dirty data, and costs. In: Ninth International Conference on Information Quality (ICIQ 2004), 5–7 November, pp. 164–169 (2004)
Pipino, L., Lee, Y., Wang, R.: Data quality assessment. Commun. ACM 45(4), 211–218 (2002)
Pipino, L.L., Wang, R.Y., Kopcso, D., Rybolt, W.: Developing measurement scales for data-quality dimensions. In: Wang, R.Y., Pierce, E.M., Madnick, S.E., Fisher, C.W. (eds.) Information Quality, chap. 3, pp. 37–51. M.E. Sharpe (2005)
Redman, T.: Data Quality for the Information Age. Artech-House, Massachusetts (1996)
Sigal, E., Pritsker, A., Solberg, J.: The stochastic shortest route problem. Oper. Res. 28(5), 1122–1129 (1969)
Wang, R., Storey, V., Firth, C.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7(4), 623–640 (1995)
Wang, R., Strong, D.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12(4), 5–34 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Bronselaer, A., Nielandt, J., Boeckling, T., De Tré, G. (2018). Operational Measurement of Data Quality. In: Medina, J., Ojeda-Aciego, M., Verdegay, J., Perfilieva, I., Bouchon-Meunier, B., Yager, R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. Applications. IPMU 2018. Communications in Computer and Information Science, vol 855. Springer, Cham. https://doi.org/10.1007/978-3-319-91479-4_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-91479-4_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91478-7
Online ISBN: 978-3-319-91479-4
eBook Packages: Computer ScienceComputer Science (R0)