[go: up one dir, main page]

0% found this document useful (0 votes)
30 views1 page

Data Quality Dimensions

The document outlines various data quality dimensions, including validity, timeliness, completeness, consistency, uniqueness, and accuracy, which are essential for ensuring data is fit for purpose. It provides examples of each dimension, such as ensuring customer birth dates are in the past and that all records are loaded by a specific time. Additionally, it emphasizes the importance of maintaining trust in data through high-quality standards.

Uploaded by

Shahd Saad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views1 page

Data Quality Dimensions

The document outlines various data quality dimensions, including validity, timeliness, completeness, consistency, uniqueness, and accuracy, which are essential for ensuring data is fit for purpose. It provides examples of each dimension, such as ensuring customer birth dates are in the past and that all records are loaded by a specific time. Additionally, it emphasizes the importance of maintaining trust in data through high-quality standards.

Uploaded by

Shahd Saad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

V alidity Example Timeliness Example

CustomerBirthDate value must be a date in the past All records in the customer dataset must be loaded by the 9:00 am.

Data Quality Dimensions


CustomerAccountType value must be either Loan or Deposit
LatestAccountOpenDate value must be a date in the past. CustomerID CustomerName

01-01-2023 11:07 am

Introduction to Data Quality


100000192

CustomerID CustomerName CustomerBirthDate CustomerAccountType CustomerAccountBalance LatestAccountOpenDate


100000198 01-01-2023 11:07 am

100000192 Robert Brown 4/12/2000 Loan 40390.00 12/20/2026 100000120 01-01-2023 11:07 am

Learn more at www.DataCamp.com 100000198 Maria Irving 12/1/2025 Deposit -13280.00 10/21/2018

100000120 Ava Shiffer 10/31/1990 Credit Card 320 3/1/2020


> Consistency
100000192 Robert Brown 4/12/2000 Deposit 40390.00 12/20/2026

What are Data Quality Dimensions? 100000124 Matthew Martin 5/9/1965 Deposit 70102.00 5/4/2022
Consistency is a data quality dimension that measures the degree to which
data is the same across all instances of the data. Consistency can be
100000149 2/4/1988 Loan 0.00 9/20/1990 measured by setting a threshold for how much difference there can be
Data Quality is a measurement of the degree to which data is fit for purpose. between two datasets.
Good data quality generates trust in data. Data Quality Dimensions are a
measurement of a specific attribute of a data's quality.
> Uniqueness
Inconsistent

Uniqueness measures the degree to which the records in a dataset are not
> Completeness duplicated.

Completeness measures the degree to which all expected records in a dataset Consistency Example
are present. At a data element level, completeness is the degree to which all Not Unique
records have data populated when expected. The count of records loaded today The count of records loaded today
must be within +/- 5% of the count must be within +/- 5% of the count
Missing
of records loaded yesterday. of records loaded yesterday.
Not Unique

AccountTableCustomerID CustomerTableCustomerID
Count of records in Record count difference
TargetCustomerTable from previous day
108394858 108394858

Uniqueness Example
10,000,000 4,909,797 192039482 192039482
Missing
5,090,203 75 203475849 NULL

Completeness Example All records must have a unique CustomerID and CustomerName. 5,090,128 1 2930485953 NULL

All records must have a value populated in the CustomerName field. CustomerID CustomerName CustomerBirthDate CustomerAccountType CustomerAccountBalance LatestAccountOpenDate
102832748 102832748

100000192 Robert Brown 4/12/2000 Loan 40390.00 12/20/2026


CustomerID CustomerName CustomerBirthDate CustomerAccountType CustomerAccountBalance LatestAccountOpenDate

100000198 Maria Irving 12/1/2025 Deposit -13280.00 10/21/2018


100000192 Robert Brown 4/12/2000 Loan 40390.00

>
12/20/2026

100000198 Maria Irving 12/1/2025 Deposit -13280.00 10/21/2018 100000120 Ava Shiffer 10/31/1990 Credit Card 320 3/1/2020 A ccuracy
100000120 Ava Shiffer 10/31/1990 Credit Card 320 3/1/2020
100000192 Robert Brown 4/12/2000 Deposit 40390.00 12/20/2026
100000192 Robert Brown 4/12/2000 Deposit 40390.00 12/20/2026 Accuracy measures the degree to which data is correct and represents the truth.
100000124 Matthew Martin 5/9/1965 Deposit 70102.00 5/4/2022 100000124 Matthew Martin 5/9/1965 Deposit 70102.00 5/4/2022

100000149 2/4/1988 Loan 0.00 9/20/1990 erified Source Document


V Downstream Table
100000149 2/4/1988 Loan 0.00 9/20/1990

> Validity > Timeliness


Accuracy Example
Validity measures the degree to which the values in a data element are valid. Timeliness is the degree to which a dataset is available when expected and
depends on service level agreements being set up between technical and
All records in the Customer Table must have
business resources.
accurate Customer Name, Customer Birthdate,
Invalid and Customer Address fields when compared to
SL A Table Load Time the Tax Form.
08:00 am 07:59 am
Invalid
CustomerName CustomerBirthDate CustomerAddress CustomerCity CustomerState CustomerZip
Invalid 10:00 am 09:59 am

Ava Shiffer 10/31/1990 910 Quality ashington A 20008


11:00 am 11:01 am Missed the SLA W W

St

You might also like