[go: up one dir, main page]

0% found this document useful (0 votes)
6 views18 pages

Lec 13-ETL

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 18

ETL (Extract,

Transform, Load)
LECTURE 13
MS. USHNA TASLEEM
ETL (Extract, Transform, Load):

ETL—meaning extract, transform, load—is a data integration process


that combines, cleans and organizes data from multiple sources into a
single, consistent data set for storage in a data warehouse, data lake or
other target system.
ETL pipelines are often used by organizations to:
Extract data from legacy systems
Cleanse the data to improve data quality and establish consistency
Load data into a target database
Functions of ETL
Reporting & Dashboards- Share key performance indicators (KPI) with
decision makers.
Forecasting – Project future sales, demand, and maintenance
requirements.
Visualization – Provide a visual way to interact with data and make
new insights.
Architecture
ETL function lies at the core of Business Intelligence systems. With ETL,
enterprises can obtain historical, current, and predictive views of real
business data. Let’s look at some ETL features that are necessary for
business intelligence.
How ETL Works?

ETL systems are designed to accomplish three complex database


functions: extract, transform and load.
1. Extraction
The extraction phase maps the data from different sources into a
unified format before processing.

ETL systems ensure the following while extracting data.


Removing redundant (duplicate) or fragmented data
Removing spam or unwanted data
Reconciling records with source data
Checking data types and key attributes.
2. Transformation

This stage involves applying algorithms and modifying data according to


business-specific rules. The common operations performed in ETL’s
transformation stage is computation, concatenation, filters, and string
operations like currency, time, data format, etc. It also validates the
following-

Data cleaning like adding ‘0’ to null values


Threshold validation like age cannot be more than two digits
Data standardization according to the rules and lookup table.
3. Loading

Loading is a process of migrating structured data into the warehouse. Usually,


large volumes of data need to be loaded in a short time. ETL applications play
a crucial role in optimizing the load process with efficient recovery
mechanisms for the instances of loading failures.

A typical ETL process involves three types of loading functions-

Initial load: it populates the records in the data warehouse.


Incremental load: it applies changes (updates) periodically as per the
requirements.
Full refresh: It reloads the warehouse with fresh records by erasing the old
contents.
Why is ETL important?

Organizations today have both structured and unstructured data from


various sources.
By applying the process of extract, transform, and load (ETL),
individual raw datasets can be prepared in a format and structure that is
more consumable for analytics purposes, resulting in more meaningful
insights.
For example, online retailers can analyze data from points of sale to
forecast demand and manage inventory. Marketing teams can integrate
CRM data with customer feedback on social media to study consumer
behavior.
How does ETL benefit
business intelligence?
Extract, transform, and load (ETL) improves business intelligence and
analytics by making the process more reliable, accurate, detailed, and
efficient.

Historical context
ETL gives deep historical context to the organization’s data. An
enterprise can combine legacy data with data from new platforms and
applications. You can view older datasets alongside more recent
information, which gives you a long-term view of data.
Consolidated data view
ETL provides a consolidated view of data for in-depth analysis and reporting.
Managing multiple datasets demands time and coordination and can result
in inefficiencies and delays. ETL combines databases and various forms of
data into a single, unified view. The data integration process improves the
data quality and saves the time required to move, categorize, or standardize
data. This makes it easier to analyze, visualize, and make sense of large
datasets.
Accurate data analysis
ETL gives more accurate data analysis to meet compliance and regulatory
standards. You can integrate ETL tools with data quality tools to profile,
audit, and clean data, ensuring that the data is trustworthy.
Task automation
ETL automates repeatable data processing tasks for efficient analysis. ETL
tools automate the data migration process, and you can set them up to
integrate data changes periodically or even at runtime. As a result, data
engineers can spend more time innovating and less time managing tedious
tasks like moving and formatting data.
What is ELT?

Extract, load, and transform (ELT) is an extension of extract, transform,


and load (ETL) that reverses the order of operations. You can load data
directly into the target system before processing it.
The intermediate staging area is not required because the target data
warehouse has data mapping capabilities within it. ELT has become
more popular with the adoption of cloud infrastructure, which gives
target databases the processing power they need for transformations.
ETL compared to ELT
The primary difference between ETL (Extract, Transform, Load) and ELT
(Extract, Load, Transform) is the order in which data is processed. In ELT,
raw data is loaded directly into the target data store and transformed as
needed, making it suitable for handling high-volume, unstructured
datasets and big data management without extensive upfront planning.
In contrast, ETL requires detailed planning, where specific data points
and integration keys are identified before loading data into a staging
area for transformation based on business rules. This process often
involves tracking metadata and depends on data requirements for
analysis.
While ELT has gained popularity due to cloud database adoption, it is
still evolving, and best practices are being established.
The benefits and
challenges of ETL
ETL solutions improve quality by performing data cleansing before
loading the data to a different repository. A time-consuming batch
operation, ETL is recommended more often for creating smaller target
data repositories that require less frequent updating, while other data
integration methods—including ELT (extract, load, transform), change
data capture (CDC) and data virtualization—are used to integrate
increasingly larger volumes of data that changes or real-time data
streams
References
1. Successful Business Intelligence by Cindi Howson
2. Business Intelligence by Swain Scheps
3. https://www.ibm.com/topics/etl
4. https://aws.amazon.com/what-is/etl
5. https://mozartdata.com/etl-in-business-intelligence/
6. https://www.mantralabsglobal.com/blog/etl-in-business-intelligence
/
7. https://www.mantralabsglobal.com/blog/etl-in-business-intelligence
/

You might also like