[go: up one dir, main page]

Skip to main content

Abstract

In this paper, an alternative view on measurement of data quality is proposed. Current procedures for data quality measurement provide information about the extent to which data misrepresent reality. These procedures are descriptive in the sense that they provide us numerical information about the state of data. In many cases, this information is not sufficient to know whether data is fit for the task it was meant for. To bridge that gap, we propose a procedure that measures the operational characteristics of data. In this paper, we devise such a procedure by measuring the cost it takes to make data fit for use. We lay out the basics of this procedure and then provide more details on two essential components: tasks and transformation functions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bohannon, P., Fan, W., Flaster, M., Rastoqi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 143–154 (2005)

    Google Scholar 

  2. Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 746–755 (2007)

    Google Scholar 

  3. Bronselaer, A., De Mol, R., De Tré, G.: A measure-theoretic foundation for data quality. IEEE Trans. Fuzzy Syst. 26(2), 627–639 (2018)

    Article  Google Scholar 

  4. Bronselaer, A., Nielandt, J., De Mol, R., De Tré, G.: Ordinal assessment of data consistency based on regular expressions. In: Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 317–328 (2016)

    Google Scholar 

  5. Chu, X., Ilyas, I., Papotti, P.: Discovering denial constraints. In: Proceedings of the VLDB Endowment, pp. 1498–1509 (2013)

    Google Scholar 

  6. Even, A., Shankaranarayanan, G.: Value-driven data quality assessment. In: Proceedings of the International Conference on Information Quality, pp. 265–279 (2005)

    Google Scholar 

  7. Even, A., Shankaranarayanan, G.: Understanding impartial versus utility-driven quality assessment in large data-sets. In: Proceedings of the International Conference on Information Quality, pp. 265–279 (2007)

    Google Scholar 

  8. Even, A., Shankaranarayanan, G.: Utility-driven assessment of data quality. DATA BASE Adv. Inf. Syst. 38(2), 75–93 (2007)

    Article  Google Scholar 

  9. Fellegi, I., Holt, D.: A systematic approach to automatic edit and imputation. J. Am. Stat. Assoc. 71(353), 17–35 (1976)

    Article  Google Scholar 

  10. Fenton, N.E., Pfleeger, S.L.: Software Metrics: A Rigorous and Practical Approach, 2nd edn. Thomson Publishing, Stamford (1996)

    Google Scholar 

  11. Fisher, C.W., Lauria, E.J.M., Matheus, C.C.: An accuracy metric: Percentages, randomness, and probabilities. J. Data Inf. Qual. 1(3), 16:1–16:21 (2009)

    Google Scholar 

  12. Frank, H.: Shortest paths in probabilistic graphs. Oper. Res. 17(4), 583–599 (1969)

    Article  MathSciNet  Google Scholar 

  13. Haegemans, T., Snoeck, M., Lemahieu, W.: Towards a precise definition of data accuracy and a justification for its measure. In: Proceedings of the International Conference on Information Quality (ICIQ), pp. 16:1–16:13 (2016)

    Google Scholar 

  14. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Elsevier Science and Technology (2011)

    Google Scholar 

  15. Heath, I.: Unacceptable file operations in a relational data base. In: Proceedings of the 1971 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control, pp. 19–33 (1971)

    Google Scholar 

  16. Heinrich, B., Kaiser, M., Klier, M.: Does the EU insurance mediation directive help to improve data quality? a metric-based analysis. In: European Conference on Information Systems, pp. 1871–1882 (2008)

    Google Scholar 

  17. Heinrich, B., Klier, M.: Metric-based data quality assessment - developing and evaluation a probability-based currency metric. Decis. Support Syst. 72, 82–96 (2015)

    Article  Google Scholar 

  18. Heinrich, B., Klier, M., Kaiser, M.: A procedure to develop metrics for currency and its application in CRM. ACM J. Data Inf. Qual. 1(1), 5:1–5:28 (2009)

    Google Scholar 

  19. Krantz, D., Luce, D., Suppes, P., Tversky, A.: Foundations of Measurement: Additive and Polynomial representations, vol. I. Academic Press, Cambridge (1971)

    MATH  Google Scholar 

  20. Pipino, L., Kopcso, D.P.: Data mining, dirty data, and costs. In: Ninth International Conference on Information Quality (ICIQ 2004), 5–7 November, pp. 164–169 (2004)

    Google Scholar 

  21. Pipino, L., Lee, Y., Wang, R.: Data quality assessment. Commun. ACM 45(4), 211–218 (2002)

    Article  Google Scholar 

  22. Pipino, L.L., Wang, R.Y., Kopcso, D., Rybolt, W.: Developing measurement scales for data-quality dimensions. In: Wang, R.Y., Pierce, E.M., Madnick, S.E., Fisher, C.W. (eds.) Information Quality, chap. 3, pp. 37–51. M.E. Sharpe (2005)

    Google Scholar 

  23. Redman, T.: Data Quality for the Information Age. Artech-House, Massachusetts (1996)

    Google Scholar 

  24. Sigal, E., Pritsker, A., Solberg, J.: The stochastic shortest route problem. Oper. Res. 28(5), 1122–1129 (1969)

    Article  MathSciNet  Google Scholar 

  25. Wang, R., Storey, V., Firth, C.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7(4), 623–640 (1995)

    Article  Google Scholar 

  26. Wang, R., Strong, D.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12(4), 5–34 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antoon Bronselaer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bronselaer, A., Nielandt, J., Boeckling, T., De Tré, G. (2018). Operational Measurement of Data Quality. In: Medina, J., Ojeda-Aciego, M., Verdegay, J., Perfilieva, I., Bouchon-Meunier, B., Yager, R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. Applications. IPMU 2018. Communications in Computer and Information Science, vol 855. Springer, Cham. https://doi.org/10.1007/978-3-319-91479-4_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91479-4_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91478-7

  • Online ISBN: 978-3-319-91479-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics