Lecture-9 Extraction Transformation Loading
Lecture-9 Extraction Transformation Loading
Lecture # 9
Instructor: Mr. Sharjeel Ahmed
ETL PROCESS
Introduction to ETL
The User want to compare and analyze sales by store, by
product, and by month?
• The sale figures are available in several sales applications in the
company.
• Product information is available in a master file.
• Each sales transaction refers to a specific store where it occurred.
• ETL –
• Extraction, Transformation & Loading
Process of Data Extraction
Source Identification → Identify source applications and source structures.
Method of Extraction → Define the extraction process for each data source
Manual/ Automatic
Immediate or Deferred
Extraction frequency → Establish how frequently the data extraction must be
done i.e. daily, weekly, quarterly.
Time Window → Time Window for extraction process.
Job sequencing → Whether the beginning of one job has to wait until the
previous job has finished successfully.
Exception handling → Determine how to handle input records that cannot be
extracted.
Source
Identification
Identify source
applications and
source structures.
Challenges of Source Systems
ETL Functions are challenging primarily because of the nature of the
source systems.
• Sour systems are very diverse and disparate.
• Schedule jobs for the time window during each day when you can
extract data without impacting the usage of the operational systems.
• Re-sequence data
Incremental Load
Applying ongoing changes as necessary in a periodic manner
Full Refresh
Completely erasing the contents of one or more tables and
reloading with fresh data
Considerations in Loading
• Initial loading can populate millions of rows in the data warehouse.
• Creating and managing load images for such large numbers are not easy
tasks.
• Even more difficult is the task of testing and applying the load images to
actually populate the physical files in the data warehouse.
• May take up to few weeks to complete the initial physical loading.
• For extracting and applying the ongoing incremental changes.
• Finding a time window to apply the changes to the data warehouse can be
tricky if your data warehouse cannot suffer long downtimes.
ETL Summary
• Determine the best method to capture the ongoing changes from each
source system.