[go: up one dir, main page]

0% found this document useful (0 votes)
149 views28 pages

Enforcing Data Quality

This document provides an overview of Module 10 which covers enforcing data quality using Data Quality Services (DQS) in SQL Server. It discusses introducing data quality and using DQS to cleanse and match data. It provides lessons on creating a DQS knowledge base, using DQS to cleanse data by mapping columns to domains, and using DQS to match data by defining matching policies and mapping columns. It also includes demonstrations and labs on cleansing and deduplicating data using a DQS project.

Uploaded by

Richie Poo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
149 views28 pages

Enforcing Data Quality

This document provides an overview of Module 10 which covers enforcing data quality using Data Quality Services (DQS) in SQL Server. It discusses introducing data quality and using DQS to cleanse and match data. It provides lessons on creating a DQS knowledge base, using DQS to cleanse data by mapping columns to domains, and using DQS to match data by defining matching policies and mapping columns. It also includes demonstrations and labs on cleansing and deduplicating data using a DQS project.

Uploaded by

Richie Poo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Module 10

Enforcing Data Quality


Module Overview

Introduction to Data Quality


Using Data Quality Services to Cleanse Data
• Using Data Quality Services to Match Data
Lesson 1: Introduction to Data Quality

What Is Data Quality and Why Do You Need It?


Data Quality Services Overview
What Is a Knowledge Base?
What Is a Domain?
What Is a Reference Data Service?
Creating a Knowledge Base
• Demonstration: Creating a Knowledge Base
What Is Data Quality and Why Do You Need It?

• Business decisions should be made on trusted


data
• Data quality issues in sources can be propagated
into the data warehouse:
• Invalid data values
• Inconsistencies
• Duplicate business entities
Data Quality Services Overview

• DQS is a knowledge-based solution for:


• Data cleansing
• Data matching

• DQS Components:
• Server
• Client
• Data cleansing SSIS transformation
What Is a Knowledge Base?

• Repository of knowledge about data:


• Domains define values and rules for each field
• Matching policies define rules for identifying duplicate
records
• Determine data for a DQS knowledge base:
• Analyze source databases and data warehouses for
inconsistencies, inaccuracies, and incompleteness
• Audit website and software forms used for data entry
to find free-form fields prone to creating low quality
data
• Look at dependent reporting systems and find
incorrect results
What Is a Domain?

• Domains:
• Are specific to a data field
• Contain the rules for the data
• Can be individual or composite
What Is a Reference Data Service?

• The Azure Marketplace hosts specialist data


cleansing providers, where you can:
• Set up an account
• Subscribe to a reference service
• Map your domain to the reference service
Creating a Knowledge Base

• Creating a knowledge base is an iterative


process:
1. Knowledge discovery
2. Domain management
Demonstration: Creating a Knowledge Base

In this demonstration, you will see how to:


• Create a knowledge base
• Perform knowledge discovery
• Perform domain management
Lesson 2: Using Data Quality Services to Cleanse
Data

Creating a Data Cleansing Project


Viewing Cleansed Data
Demonstration: Cleansing Data
• Using the Data Cleansing Data Flow
Transformation
Creating a Data Cleansing Project

1. Select a knowledge base


2. Map columns to domains
3. Review suggestions and corrections
4. Export results
Viewing Cleansed Data

• Output: the values for all fields after data


cleansing
• Source: the original value for fields that were
mapped to domains and cleansed
• Reason: the reason the output value was
selected by the cleansing operation
• Confidence: an indication of the confidence Data
Quality Services estimates for corrected values
• Status: the status of the output column (correct
or corrected)
Demonstration: Cleansing Data

In this demonstration, you will see how to:


• Create a data cleansing project
• View cleansed data
Using the Data Cleansing Data Flow
Transformation

• Input data to be cleansed


• Select knowledge base and map columns to
domains
• Output cleansed columns
Lab A: Cleansing Data

Exercise 1: Creating a DQS Knowledge Base


Exercise 2: Using a DQS Project to Cleanse Data
• Exercise 3: Using DQS in an SSIS Package

Logon Information
Virtual machine: 20767C-MIA-SQL
User name: ADVENTUREWORKS\Student
Password: Pa55w.rd

Estimated Time: 30 minutes.


Lab Scenario

You have created an ETL solution for the


Adventure Works data warehouse, and invited
some data stewards to validate the process before
putting it into production.
The data stewards have noticed some data quality
issues in the staged customer data, and have
asked you to provide a way for them to cleanse
data, so that the data warehouse is based on
consistent and reliable data. The data stewards
have given you an Excel workbook containing
some examples of the issues found in the data.
Lab Review

Having completed this lab, you will now be able to:


• Create a DQS knowledge base
• Use DQS to cleanse data
• Incorporate data cleansing into an SSIS data flow
Lesson 3: Using Data Quality Services to Match
Data

Creating a Matching Policy


Creating a Data Matching Project
Viewing Data Matching Results
• Demonstration: Matching Data
Creating a Matching Policy

• Define matching rules for business entities


• Rules match entities based on domains:
 Similarity:similar or exact match
 Weight: percentage to apply if match succeeds
 Prerequisite: mandatory domain match for rule to
succeed
• If the combined weight of all matches meets or
exceeds the rule’s minimum matching score, the
entities are duplicates
Creating a Data Matching Project

1. Select a knowledge base


2. Map columns to domains
3. Review match clusters
4. Export matches and survivors
• Select survivorship rule:
 Pivot record
 Most complete and longest record
 Most complete record
 Longest record
Viewing Data Matching Results

• Cluster ID: identifier for a cluster of matched


records
• Record ID: identifier for a matched record
• Matching Rule: the rule that produced the match
• Score: combined weighting of match criteria
• Pivot Mark: a matched record arbitrarily chosen
by Data Quality Services as the pivot record for a
cluster
Demonstration: Matching Data

In this demonstration, you will see how to:


• Create a matching policy
• Create a data matching project
• View data matching results
Lab B: Deduplicating Data

Exercise 1: Creating a Matching Policy


• Exercise 2: Using a DQS Project to Match Data

Logon Information
Virtual machine: 20767C-MIA-SQL
User name: ADVENTUREWORKS\Student
Password: Pa55w.rd

Estimated Time: 30 minutes


Lab Scenario

You have created a DQS knowledge base and


used it to cleanse customer data. However, data
stewards are concerned that the staged customer
data might include duplicate entries. For records
to be considered a match, the following criteria
must be true:
 The Country/Region column must be an exact
match.
 A total matching score of 80 or higher must be
achieved, based on the following weightings:
o An exact match of the Gender column has a
weighting of 10.
Lab Scenario (Continued)

o An exact match of the City column has a


weighting of 20.
o An exact match of the EmailAddress column
has a weighting of 30.
o A similar FirstName column value has a
weighting of 10.
o A similar LastName column value has a
weighting of 10.
o A similar AddressLine1 column value has a
weighting of 20.
Lab Review

Having completed this lab, you will now be able to:


• Add a matching policy to a DQS knowledge base
• Use DQS to match data
Module Review and Takeaways

• Review Question(s)

You might also like