[go: up one dir, main page]

0% found this document useful (0 votes)
252 views18 pages

EB2406 - Teradata PDF

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 18

Data Warehousing

Data Model Overview


Modeling for the Enterprise while Serving the Individual

Debbie Smith
Data Warehouse
Consultant
Teradata Global
Sales Support
Data Model Overview
Table of Contents

Executive Summary 2 Executive Summary


Introduction 3
The data model choice for the data warehouse is often a matter
Revisit Dr. E. F. Codd’s 12 Rules 3
of great controversy. The desire is to offer a self-service type of
So – What is an EDW? 3
environment that allows business users easy access with acceptable
Data Infrastructure 4
response times. Response time also includes the time required
Data Modeling Theory 4
between the conception of a new application and the delivery of
From Logical to Physical Modeling 7
that application. What data model do you employ to provide ease
Physical Models 7
of use for the business user while still being able to address current
Impact of Information Delivery Tools 10
and future needs of the data warehouse in terms of updating,
Impact of ETL Tools 10
expansion, availability, and management? This paper will provide
Surrogate Keys 11
an overview of popular data modeling and the Teradata Corpora-
Changing Dimensions 12
tion position regarding data modeling.
The Teradata Direction 13

Impact of the Data


Warehousing Evolution 13

Summary 14

Appendix A – Codd’s 12 Rules 15

Appendix B – A Comparison 16

Appendix C – Glossary 17

Appendix D – Endnotes 18

Appendix E – References 18

EB-2406 > 1007 > PAGE 2 OF 18


Data Model Overview

Introduction > number 8; Physical Data Independence enterprise. Before continuing, let’s discuss
Data model choices are often a matter of – where the user is isolated from how the environment that Teradata refers to
great controversy when discussing build- the data is physically stored when we say enterprise data warehouse.
ing a data warehouse. When building the > number 9; Logical Data Independence We define an EDW as an area where the
data warehouse, how do you build it? Do – where the user is not impacted should data of the business (the enterprise) is
you attempt to build it for the business the physical data structure change centrally integrated, centrally stored, and
entity – current and future – or do you accessed through common business
build it in a manner that satisfies current These three guidelines specifically provide defined methods. We believe, and our
business users’ needs? This paper will a methodology to isolate users from the customers have shown us, that the value of
discuss the idea of being able to build the impact of IT activities and directly impact centralizing the data is the synergy gained
centralized enterprise data warehouse to and lay the foundation for being able to by storing the data once, managing it once
sustain longevity, and use view layer to build a logical, user-friendly data structure and accessing it for and in many varied
form fit individual business user needs that is independent of the physical data ways, times, methods, and reasons. (By
and requirements, combining the benefits structure. This foundation provisions for gathering and integrating the data of the
of a normalized model with the benefits of form fitting individual business users business, business users are able to have a
a dimensional model. needs while maintaining a physical model 360-degree view of the data.) Of course,
that facilitates the needs of ETL, update the accumulation of business data is an
Revisit Dr. E. F. Codd’s frequencies, and data management ongoing process and ever evolving. One
12 Rules requirements, enabling the enterprise data of the goals is to develop the long-term
In 1985, Dr. Codd published a list of 12 warehouse with user friendliness. strategy that provides a methodology of
principles that have become the design of
adding and refining business data. The
relational database systems guidelines (see So – What is an EDW?
higher the degree of customer success, the
Appendix A).1 Dr. Codd is credited as the The foundation of this paper is built on
greater degree the infrastructure is able to
creator of the relational model, and within the concept of an enterprise data ware-
answer any question, from any user
these guidelines, he outlines the structur- house (EDW). Many people use the term
(internal or external) on any data at any
ing of information into tables, that null data warehouse to represent similar
time, including the yet unknown queries
values are properly identified and a high- concepts; however, there can be some
or unknown future application needs,
level of insert, update, and delete is variances in what is meant. It is generally
without the need or intervention of
maintained. Note guidelines 6, 8, and 9. accepted that the data warehouse is the
additional resources. This synergy allows
environment that provides for decision
> number 6; View Updating Rule – where the business to use the information of the
support within the organization. By
logical views support full range data business to quickly respond to and create
appending the word enterprise, that
manipulation changes in the market place.
environment is now thought of or expect-
ed to become reflective of the entire

EB-2406 > 1007 > PAGE 3 OF 18


Data Model Overview

Data Infrastructure the enterprise with increasing speed. It is With more than 35 years of experience in
The data model is the core of the data also driving changes in how the enterprise database technology and data warehouse
warehouse. The decisions made when conducts business by providing insight design, Bill Inmon is recognized as an
defining the data model determine the data into current business practices. These authority in the data warehousing indus-
infrastructure. This data infrastructure can increasing requirements of the data try. His lectures and papers promote a
impact performance, time to market for warehouse impact data modeling choices. normalized model as the model of choice
new applications, facilitate responses to The goal of the EDW is to address the for the data warehouse, reserving the use
changes in the market place, business users’ needs of the enterprise. This requires a of star schema models for data marts if
ability to retrieve information, data latency, data model that provides an enterprise and when necessary. He states, “It is not
and the optimization of the data ware- solution rather than a localized solution possible to build an effective DSS environ-
house over the long term. The focal point that only addresses specific application ment without an enterprise data ware-
of the data warehouse data modeling needs for individuals within the organiza- house. The design of the enterprise data
discussion typically covers normalization tion. The charge of the EDW is to enable warehouse is typically normalized. The
theory and dimensional theory. the business at an enterprise level. To do classical entity relationship structuring of
that, hard decisions concerning data data accompanied by the ensuing normal-
Teradata® Database, as an RDBMS, is and
modeling must be addressed. Questions ization of the data structures fits very
always has been agnostic about what well
such as: should the data model reflect the conveniently with the requirements of the
defined data model is chosen. There are
needs of the enterprise or the needs of the usage of the enterprise data warehouse.” 2
many customers who execute a normal-
individual? What is the impact on the data
ized model, others using a snowflake Data Modeling Theory
model as the needs of the individual
model, others using a star schema, and Entity-Relationship Modeling
and/or the needs of the enterprise change?
many others using some variation or Before designing and establishing the
Do you have to sacrifice one over the
derivation of each/any and all. With that physical data model, there’s a logical data
other? What if you could minimize those
said, it is important to note that Teradata modeling process. This process includes
sacrifices? Experience has shown us that
as a company, with its vast experience in extended discussions with the business
with Teradata Database, it is possible to
data warehousing, does have preferences to community. During these discussions, the
take full advantage of Codd’s guidelines of
ensure the data model employed provides business requirements are identified, the
physical data independence, logical data
the highest flexibility and responsiveness data entities and elements required to meet
independence, and view updating to create
to the business for not only current needs those business requirements are estab-
a data model that provides the flexibility
but also future needs. lished, and the relationships between the
to satisfy both current and future needs of
data entities are captured. These insights
As data warehouses are evolving and taking the enterprise. And then, through the use
are later diagrammed into a representation
on greater responsibility and active roles in of views, create a view model that address-
of the business, and referred to as an Entity
how an enterprise conducts business, data es the needs of the individual. This
Relationship (ER) model. An ER model or
model requirements, and expectations are provides a methodology to utilize both the
diagram represents the data entities of the
changing at a faster rate to support this normalized model and augment it with
business and the relationships between
increased responsibility. The data ware- views that use dimensional modeling
those entities. At this stage, the specific
house is expected to reflect changes within techniques, taking the best of both worlds.
functions or queries the model will be used

EB-2406 > 1007 > PAGE 4 OF 18


Data Model Overview

Customer Customer–Name and Customer–Addr are


attributes of Customer
Entity Customer–Name
Customer–Addr Business Rule

> A Customer places zero or more Travel-Requests


Relationship Places > A Travel-Request is placed by only one Customer

Boston to
Entity
New York

Travel–Request

Figure 1. Example ER Diagram

to support are not included. This is a functions. It identifies the dimensions and The dimensional model is often thought
logical representation of the enterprise and levels within the business, separating out of or represented as a cube with dimen-
is later used to develop a physical data the facts or measures of the business. The sions such as time, product, and
model or schema. There are a number of dimensional model enforces additional geography. The business metric is at the
tools available to map the ER diagram, rules that eliminate many to many rela- intersection of these dimensions. Visualiz-
ERwin®, now known as AllFusion™ Erwin tionships between entities, allowing only ing the dimensional model as a cube,
Data Modeler, being the most recognized. many-to-one relationships. It fuses sometimes referred to as a star schema,
ER modeling is guided by the rules of multiple entities together into a new (See Figure 2.) makes it easy to imagine
normalization. These are strict rules meant entity. This new entity does not directly being able to slice and dice that segment of
to ensure the essence of business relation- reflect the entities and relationships that data. Creating these cubes for reporting
ships is captured. Examples of the occur in the business, but is established for requires understanding of what questions
components (entities and relationships) of the convenience of storing a data point or will be asked before designing. Each cube
an ER diagram are shown in Figure 1. metric that is important to the targeted is designed to facilitate quick and easy
group of business users. The goal of the access to a specific business application.
Dimensional Modeling
dimensional model is to provide a presen-
Dimensional modeling is another logical
tation layer that facilitates easy navigation
design method used to organize data for
of the data for business users and quick
functional groups of users or business
access to reports.

EB-2406 > 1007 > PAGE 5 OF 18


Data Model Overview

Advertising Fiscal Calendar Product Geography

Ad Year Year Year


Dept

Region
Quarter Quarter
Minor
Dept

Ad Period Period Month Category

Sub
Category
District
Ad Week Week Week
SKU

UPC Store
Day

Sales

Figure 2. A Cube (Star Schema)

The dimensional model flattens the week to a fiscal week providing a method- With a dimensional model, the central fact
dimensions via denormalization. Business ology for drilling through to lower levels. table is forced to a single grain, causing the
rules are used to validate and ensure In effect, the functional rules and process- initial fact table design to become brittle
additional restrictions (as in Figure 2). es are accommodated within the model. or inflexible. You can’t incorporate new
For example, many companies have a This, in turn, means that much more data sources without breaking the original
fiscal year that differs from the calendar needs to be understood about the queries star schema design or creating separate
year; they may even have another time and outputs that the model is expected to fact tables or data marts. To get an enter-
dimension, such as advertising, which support. A dimensional model can be prise view, you would then have to drill
differs from both calendar and fiscal. created from an ER model, however, an across these different stars or data marts.
If/when the advertising week does not fall ER model could not be created from a
within the fiscal week, business rules will dimensional model. Once entities are
be used to determine and assign the ad fused together, separating those fused
entities is difficult, if not impossible.

EB-2406 > 1007 > PAGE 6 OF 18


Data Model Overview

From Logical to Physical


Modeling
Why would you create a physical model
Less Data Redundancy
that is different from the logical model?
The physical model often differs from the
logical model to yield the best overall
performance for specific database tech- Snowflake Star Flattened
nologies, and not all database technologies
are equal. Technologies differ so much Normalized Denormalized

that in an article written for Intelligent


Magazine Neil Raden states, “As you know,
Simpler Data Access
the primary schema for a data warehouse
is either a star schema or a “normalized”
schema. The latter is a term so loosely Figure 3. The Modeling Spectrum
defined that it’s hard to describe, but a
normalized schema typically resembles a
third (or higher) normal form (3NF) is where the greatest variation, deviation, accomplished this, you can see the busi-
schema that’s not dimensional. These 3NF and difference of opinion takes shape, ness functions as if looking down from a
designs don’t support query and analysis. which of course, forms the basis for the skyscraper. The denormalization survival
Their sole purpose is to act as a staging confusion surrounding data modeling. On guide then provides guidelines of how to
area, an upstream data repository for a one end of the spectrum is the flattened transform the normalized logical model
series of star schemas, online analytic denormalized model; while on the other into a physical model. This survival guide
processing (OLAP) cubes, and other end is the normalized model. Of course, provides some of the principles for denormal-
structures that are directly queried by the reality is that most customers develop ization, the basics of which are to minimize
analysts. The only routine exception to this a data model that falls somewhere between denormalization so as not to compromise the
is Teradata implementations: Because of the opposing ends, or some combination business’ current and future needs.
the unique characteristics of the massively of both. This is often the result of the data
parallel architecture and database optimiz- Physical Models
warehouse evolving to understand busi-
er, Teradata can process analytical SQL Normalized Physical Model
ness demands.
against a 3NF schema with acceptable As you move along the modeling spectrum
performance.” 4 In his book Data Modelers Workbench, (see Figure 3), the further to the left your
Steve Hoberman discusses what he refers data model is, the greater the degree of
This implies the physical model is often to as the normalization hike and the normalization. A normalized model
changed from the logical model not denormalized survival guide. Hoberman’s
5
separates data into tables based on very
necessarily due to business requirements principles state that the logical model is strict rules that relate an attribute to the
but to facilitate the technology being used completely normalized, up to fifth normal primary key. One of the fundamental rules
and ensure query speed. This transition form. His analogy states that once you’ve of normalization is the elimination of data
from the logical model to physical model

EB-2406 > 1007 > PAGE 7 OF 18


Data Model Overview

redundancy within the model, keeping tables in a normalized model, the greater power users) were created within the
each attribute within each table function- the degree of normalization, the greater organization to code and generate reports
ally dependent upon the primary key. the degree of storage space conservation for business users. These power users
By following this rule, eliminating data because data is not duplicated. However, became so busy that requests were often
redundancy within the model, the number denormalized structures, such as summary queued for weeks at a time. Business users
of tables along with the complexity of the tables, will and often do exist in a data wanted direct access to information, and
data model increases. As new data subjects warehouse that has been developed under they wanted it instantaneously. Denormal-
for the data warehouse are identified, the the constructs of normalization. When the ized models provided for simple, easy
model is extended following the same business value warrants the cost to build navigation of the data. The data are
normalization rules to include the new and manage summary table structures, typically aggregated along one of the lines
data subjects. they are a viable augmentation of the of dimension. The types of queries
normalized data model. requested are well understood, and the
[While data warehouse modelers are often
model is established to answer those
embroiled in controversy over how to model Denormalized Model
questions. These pre-defined models
the data warehouse, the world of transac- Moving right along the modeling
ensure the known query will return an
tional databases is in agreement that a spectrum (see Figure 3) the degree of
answer set quickly. When new queries
normalized model is the optimal choice. As denormalization increases. A flattened
are identified, the model is extended to
data warehouses evolve towards transaction- model essentially resembles an Excel
include a duplication of existing data in
al decision making using very current data, spreadsheet. Denormalized data models
a form that allows the business user to
the need for a more normalized physical flourished out of a need to provide busi-
navigate and quickly retrieve the answer
design becomes evident.] A normalized data ness users with quick and easy access to
set for the new set of queries.
model provides a methodology for captur- data without the requirement of under-
ing and storing the lowest level of data, standing the underlying data model. The star schema model is typically only
eliminates multiple updates to various Business users, whose job does not include considered for use in data marts or the
tables with the same transaction, and is knowing SQL or the logic of data models data warehouse and is built to address a
the model of choice for the OLTP types of and, therefore, how to join tables together, specific business need, report, or applica-
transactions that are targeted and specific. needed easy access to information. tion. Each star schema is built as an
However, those same business users individual data mart. When the business
Customer experience has shown that
did understand spreadsheet formats and, needs a new report, another star schema,
the normalized model can answer new
in fact, often requested information in a or data mart, is built. If the needed
business questions or previously unknown
report format. Therefore, when data was dimensions are the same as what has
questions without making changes to
presented in a two-dimensional form already been built, they are duplicated
the structure of the database, because
(similar to a spreadsheet); as opposed to to this new data mart. Each of these data
the relationships and entities have been
a series of connected tables, business users marts caters to the needs of the individual.
represented physically and provide the
were protected from data model complexi- There may be a temptation to think of a
greatest flexibility for the business. It
ty. Because the data model was generated group of these data marts as constituting a
eliminates data duplication and, therefore,
to address specific business constituencies, data warehouse. However, they are distinct
the complexity of maintaining duplicated
new user constituencies (often called structures that have duplicated data in
data. Even though there are typically more

EB-2406 > 1007 > PAGE 8 OF 18


Data Model Overview

order to store it either in a different Item Date Sales


format or on a different platform, either 100012 01102001 10.00
a different database or just different 300012 02122001 3.00
hardware. The star schema model doesn’t
200012 01152001 2.50
have the goal of storing data once for
100012 03042001 15.00
multiple uses. Its goal is to facilitate end-
user usage. When the business community Item Jan Sales Feb Sales Mar Sales Apr Sales

has a new request, a new star schema is 100012 345.00 450.00 326.50 245.90
typically built to facilitate the request. 200012 456.60 376.50 210.00 390.00
300012 254.00 112.00 310.00 295.00
For example, when a business user wants
400012 510.00 610.00 590.00 545.00
to know the monthly sales of products,
the report produced would have headings Figure 4. Example table in denormalized format

such as Item, January Sales, February


Sales, March Sales, etc. The normalized forcing all dimensions to conform to what Inmon states, “In short, simply doing
Sales table would consist of columns of is established in the staging area. This bus dimensional modeling as a basis for data
Item, Date, and Sales. This table would architecture is the roadwork that connects warehouse design leads down a dark path
provide the greatest flexibility, allowing for all the different data marts and servers when multiple star joins are considered. It
the next questions. These questions might being used. Kimball states, “It is acceptable is never apparent that there is a problem
include what are sales per week for specific to create a normalized database to support with star joins when you are looking at
items, what were fiscal month sales, what the staging processes; however, this is not just one star join. But when you look at
items sell on Mondays, or what are the end goal. The normalized structures multiple star joins, the limitations of
weekend sales. However, because it is must be off-limits to user queries because dimensional modeling become apparent.
often difficult for users or even database they defeat understandability and per- Does this mean that dimensional model-
software to make the conversion from formance. As soon as a database supports ing is invalid as a database design
the table structure to the report layout, query and presentation services, it must be technique for data warehousing? The
in the dimensional model, DBAs simply considered part of the data warehouse answer is not at all. Dimensional modeling
store the table in the denormalized report presentation area. By default, normalized and star joins fit the bill very nicely for
format (see Figure 4). databases are excluded from the presenta- data marts. In fact, if I had to design a
tion area, which should be strictly data mart tomorrow, I would not consider
In an effort to maintain conformity, the dimensionally structured.” 6
using any other approach. But, when it
star schema is built with constrained comes to the foundation data, it’s another
dimensions. Ralph Kimball explains that Contrasting Ralph Kimball’s belief that
story. The foundation data – the data
data are placed in a staging area for data warehouses are best suited for a star
warehouse – requires a different treatment
transformation. These data, using what schema, Bill Inmon firmly believes the
than dimensional modeling altogether.
he refers to as the data warehouse bus EDW has at its heart a normalized model,
The data warehouse, which is the proper
architecture, are then propagated out to reserving the use of star schema models
foundation for all DSS activity, including
all the different data marts that require it, for data marts if and when necessary.
star joins, requires very granular, very

EB-2406 > 1007 > PAGE 9 OF 18


Data Model Overview

flexible data. The ideal structure for the model would be the easiest to implement. extract data from the transactional sys-
data warehouse is normalized data. The Since the data warehouse was updated or tems, and those are in normalized format,
normalized data can be bent and shaped refreshed on a weekly or even monthly ETL tools work better with the normalized
any old way.” 7
basis, the update schedule mitigated issues data model.
surrounding the complexity of updates
“The fact table is connected to the If the data warehouse is built with a
to maintain simple data models. Business
dimension tables by means of foreign key denormalized data model, then most
users were expected to know, understand,
relationships. The keys to the dimension require additional steps to land the data
and code simple SQL for retrieving
tables sit in the fact table. There may be in a separate area for ETL. While the ETL
information, allowing IT to turn the data
many relationships that find their way tools work well with a normalized model,
warehouse over to the business users and
into the fact table. In addition the fact moving those data into a denormalized
only maintain responsibility for keeping it
table carries with it other non key data model requires manipulation or transfor-
updated. When the business began coming
if needed. In other words, the dimension mation. The suggested way of handling
up with additional questions that the data
table may share with the fact table data this manipulation or transformation is
model couldn’t address, and joins between
other than just the key. When the fact table the creation of a data staging area. This
star schemas became complex, informa-
and the dimension tables are connected, staging area is used to land the extracted
tion delivery tools were developed to help
they form a star, hence the name ‘star join’. data then manipulate or transform them
business users retrieve the information
The net effect of the fact table is a struc- to the format needed before loading. This
they needed through an easy interface.
ture which is highly denormalized and staging area also provides the platform
which is very efficient to access. Once the While many of these information delivery from where data, in particular dimensions,
fact table is created, it can be efficiently tools have grown up expecting this star can be duplicated out to the different star
analysed. But there is a tradeoff. The data model, many are growing in sophis- schema data marts.
extreme efficiency of the star join also tication and technologically developing
When loading to a normalized model, the
makes it inflexible. If there is a desire to to be able to recognize and use more
need to transform the data for modeling
see the data in the star join in a manner normalized data models and making it
purposes is significantly less, and can
other than that for which the structure is easier for end users to navigate a more
actually happen during the load process,
built, then the star join is very inflexible.”8 complex schema.
eliminating the need for a staging area. In
Impact of Information Impact of Extraction, a normalized model, those using a staging
Delivery Tools Transformation, and Loading area are typically using it to correct data
Denormalized models, specifically star (ETL) Tools quality issues that exist in the transactional
schemas, gained general acceptance on the While Information Delivery tools were systems. For example, ensuring character
basis of the ease provided for business developed to assist the business user in fields follow the same set of rules. If an
users to directly access data. After gather- retrieving information from the data address contains the word ‘street’, then all
ing the detailed business requirements, warehouse, ETL tools were developed to formats of ‘street’ are transformed to be
the developers often determined that to assist in extracting data from transactional the same, for example ST becomes Street
mitigate risk and provide the business systems. Since the ETL tools are meant to and St. becomes Street.
users what they requested, a denormalized

EB-2406 > 1007 > PAGE 10 OF 18


Data Model Overview

Surrogate Keys within the table – minimal. Another surrogate key, a process to link the values
In general, a logical key comprises the data argument is that when the natural key is the user does know with the surrogate
fields whose value ensures uniqueness for large or consists of multiple columns, a value must be established to retrieve
each row, and whose value will not change surrogate key would be smaller and require information. Typically, this requires
during the life of the row. Implemented less space as it is populated throughout the additional secondary indexes. Using
physically, it is a natural key. Often, the data model. To facilitate access, a process is surrogate keys tends to add both a column
terms key and index are used interchange- developed that will lead to the surrogate and a unique index to each table, some-
ably, but they are not the same. An index key for joins and retrieval of information. times multiple secondary indexes, and in
is a mechanism used to optimize perform- some cases, entirely new tables. Surrogate
In a data warehouse, which brings together
ance. The physical implementation differs key values also must be propagated to all
data, often disparate data, from the trans-
depending on the database management the dependent tables as foreign keys.
actional environment, surrogate keys may
system employed.
provide a viable methodology to do so. While surrogate keys are frequently with
A surrogate key is an artificial key, or The data warehouse is subject oriented, star schemas, their use is not reserved for
generated key, that is used as a primary which contrasts the application or function specific data models. The decision to use or
key in substitution for natural keys. Using orientation of transactional environments. not use a surrogate key should be part of
a surrogate key is a choice rather than a Integrating and reorganizing multiple the data model design considerations. That
requirement. Typically, they are numerical systems that represent the same subject, decision should be predicated on what the
and randomly generated to uniquely such as customer, can require the integra- data modeler believes to be best for the
identify a row. The use of a surrogate key tion of data that does not occur naturally organization? Will there be requirements
is often promoted on the basis of adding in the operations environment. A surrogate for the natural key to change? Is there a
flexibility and data independence to the key will uniquely identify each instance need to ensure that every row is unique?
data model. In business, change is an of that integrated subject, each customer, Will there be a resulting impact from
ongoing fact of being in business. Business- without worry of duplication in the future adding secondary indexes?
es are involved in mergers and acquisitions, as the business continues to change. This
For example, if the natural key for an
new products and channels are added, they provides a stable data model that will
order table is order number, and a business
are reorganized, and they enter into new withstand the test of time. However,
user wanted to know how many units of
markets. There are a couple of arguments beware not to use surrogates as replace-
product ‘X’ were ordered by company ‘ABC’,
for surrogate keys. One is that as business ments for good data modeling practices.
chances are he wouldn’t know the order
changes, so do the values, and potentially,
Before compromising to use surrogate number. But would the business user know
the format of natural keys. When the
keys, you must consider some things. To the product number and/or the company
natural key is used as the primary key,
generate the surrogate key, a process must name? When a surrogate key is used as the
changing either the value or the format of
be put in place to ensure uniqueness and primary key, order number, product, and
that key is an intrusive effort, which could
consistency. Updating and inserting new ordering company would probably each be
result in loss of historical information and
rows require first the creation of the made secondary indexes. When the natural
data relationships. When a surrogate key is
surrogate key and also a lookup process key of order number is used as the primary
used as the primary key, the impact of
to ensure the newly generated key hasn’t key, product and ordering company would
changing the value of the natural key is the
been used. Since the user doesn’t know the probably each be made secondary indexes.
same as changing the value of any column

EB-2406 > 1007 > PAGE 11 OF 18


Data Model Overview

SCD Type One SCD Type Two SCD Type Three


SKU Catg SKU Catg Updated SKU Catg Prev Catg
1001 01 1001 03 20021029 1001 03 01
1002 02 1001 01 20010101 1001 01 15
1003 03 1002 02 20010101 1001 15 20
1003 01 20020915 1002 02 02
1003 03 20010101 1003 01 03
1003 03 12
1003 12 05
1003 05 03
Figure 5. Methods to address SCD

Outside of the maintenance issues of the However, a changing dimension is only a A normalized model is typically developed
surrogate key, the data modeler must problem when the model is built on dimen- as type two, developing a history as
determine when it is wise to use either a sions. Typically, there are three methods used changed rows are inserted into the table.
surrogate key or a natural key. For example, to address SCDs (see Figure 5). If the business application requires only
while order number may make sense to > Type one updates in place the dimen- the current value, a view could be used
use as the primary key, in the case of a sion value as it changes. This provides to present to the application current values
customer table, it may make sense to use a a dimension that is always current, only. Those applications that need history
surrogate key for customer id. The decision eliminating the capture of history or the ability to identify values during a
to use a surrogate key should be evaluated because the association of the data specific time frame would then have access
on a case by case occurrence and not with the old dimension value has to all the history data.
entered into as a standard. been overlaid.
While the normalized model provides
Changing Dimensions > Type two inserts a new row into the for the type two SCD, the construct of the
A changing dimension is often referred to table as the dimension changes. This dimensional model increases the challenge
as a Slowly Changing Dimension (SCD). method provides both current and of providing for changing dimensions.
The impact of time on the data warehouse history information of the changing Included in this challenge is the need to
can be significant. Time has a tendency to dimension, and usually involves determine how to reflect the changing
generate many changes. Customers move carrying an effective date to indicate dimension. Which type satisfies the
and their address changes; people change which row is current. business need for the dimension? Does
their names. Area demographics change, > Type three updates the dimension it satisfy all the business needs? Which
populations shift. Products move from one in place after moving the changed method satisfies the needs of all the
category to another. These are changing dimension to an old column. This data marts in the dimensional model?
dimensions. Time impacts all data models, method provides for current and
and dimensions will change over time. most recent changes.

EB-2406 > 1007 > PAGE 12 OF 18


Data Model Overview

The Teradata Direction model of atomic data provides the greatest One of the key factors that sets the Teradata
Identifying the most appropriate choice of flexibility and, therefore, often the greatest Database apart from all others is the
data model for a Teradata solution is a two- benefit for long-term business benefit. A ability to use a normalized data model
step process. First, you must distinguish well integrated data warehouse is a valu- that facilitates ease of management, ease
between what Teradata Database lends itself able mechanism in business operations. As of expansion, simplifies load strategies,
to and what Teradata professionals advo- businesses use the data warehouse to and allows for full integration of enterprise
cate. Teradata Database is impartial to the uncover patterns in their manufacturing data. This facilitates building for the
data model. The database has the technical processes, their ordering processes, or in enterprise. Addressing the needs of the
ability and the power to perform with any customer purchasing habits, business individual becomes as easy as applying a
well-designed data model. Teradata’s ability processes change. The business expects the view to the underlying data structure. This
to parallelize every aspect of query process- data warehouse to enable and reflect those view then allows IT and the data warehouse
ing eliminates technical issues and business changes. If business changes to address the needs of the individual.
complexity of decision support processing require an additional column (dimension)
Views are a transparent layer that is on
of large volumes of data, including multi- to one of the act tables or product in one
top of the underlying data structure and
table joins and joins within joins that are category is transferred to another category,
provide a methodology of creating a
often the catalyst for denormalized data the business expects the data warehouse to
presentation layer that eases business
models for some technologies. Teradata reflect these changes based on a business
user access. In Teradata Database, views
engineering is always reviewing and identi- timeline. If that timeline is negatively
don’t require space and will transition
fying areas within the optimizer to improve impacted by time requirements in address-
the underlying data model into nearly
performance whether the data model is ing data model changes to enable and
any presentation with such efficiency the
normalized or the denormalized star reflect the business changes, the data
cost of using views is mitigated to simply
schema. Join efficiencies improvements, warehouse becomes an obstacle to the
the management of them.
such as large table/small table joins in the business’s success. Customer and field
early 1990s, were targeted at star schemas. experience repeatedly shows that a normal- Impact of the Data
Advanced indexing features, join index ized model provides for the most flexibility Warehousing Evolution
introduced in Teradata Database V2R3 and and ease in delivering new products to As data warehousing evolves to an active
aggregate join index introduced in Teradata market and for facilitating rapid business environment where frequent and/or
Database V2R4, provide a methodology for changes. So Teradata professionals often continual data updates are required, where
users of Teradata Database to have a suggest starting with a normalized model processing becomes event driven rather
normalized model and use these features to because of flexibility and ease in adding than report driven, where the need to
create star schemas if and where processing new data elements. However, this does not address changes in the market place are
merits it. Teradata Database V2R5 has made mean that the data warehouse follows all expected with increasing speed, the data
generating surrogate keys easier with the the rules of third normal form in the model becomes even more critical. The
introduction of the Identity column. strictest sense. Based on business require- mixed workload of the shorter tactical
Optimizer learning and intelligence occurs ments, summary tables are often found queries with the longer strategic queries,
without regard to data model. useful to augment the normalized model. against the same data creates additional
The goal is to provide a model that sup- denormalization challenges. Evolving to an
Years of field experience have shown
ports the business through all of the
Teradata professionals that a normalized
changes that occur over time.

EB-2406 > 1007 > PAGE 13 OF 18


Data Model Overview

active data warehouse eliminates many of Summary be unable to resolve their issues in a timely
the known variables, such as what queries The goal of a data warehouse is to provide manner, time that had a direct impact on
will be submitted, who will submit those the business with a tool that facilitates the success of the business. After deter-
queries, which were used in defining the and enables the business to make better mining the data was available in the data
data models of early data warehouses. To business decisions and to take timely warehouse, in a normalized format, the
accommodate this evolution, some of action on those decisions. The robustness data warehouse group was able to generate
Teradata’s competitors’ technologies are of the data infrastructure determines the a presentation layer that resolved the
beginning to add functions and features long-term success of the data warehouse business needs within a couple of hours.
that enable a more normalized model. and the ability of the business to harness The ability to do this was valuable to the
Tools such as information delivery tools the information for its own success. The business bottom line.
are also preparing for this evolution. As robustness of the Teradata Database
The data model for the data warehouse
the data warehouse evolves to an active provides for the ability to combine the
should reflect requirements of the busi-
environment, the requirement for fresh best of all worlds. The DBA can generate
ness, and it should enable the business.
data to be available and accessible erodes a DBA-friendly normalized data model
Taking a hard line with any specific data
the time available to maintain denormal- that facilitates the full integration of data
model type won’t satisfy the business
ized models. representing the enterprise, following
needs. However, the impact of time
the teachings of Codd, Date, and Inmon.
The data model chosen for the data generates change within the business, and
The use of views, a presentation layer that
warehouse should be chosen with a clear the data warehouse requires the flexibility
employs the user friendliness of dimen-
understanding of the benefits of each. The to address and yield to those changes.
sional modeling, enables catering to the
early days of data warehousing tended to A normalized data model is the correct
needs of the individual, interfacing with
depend on a denormalized model. Much place to begin.
the data warehouse for business user ease
of that decision was based on addressing
of access, capitalizing on Codd’s guidelines Debbie Smith is a Data Warehouse
specific business needs rather than
for relational databases, and following the Consultant in Global Sales Support. She had
addressing the breadth of the business,
teachings of Kimball. This allows for data 14 years of experience with Teradata systems
or on existing technology that limited
to be centrally stored, eliminates redun- at a retail customer. Her responsibilities
choices. During those early days, when
dant data easing data management, provides while with the retail customer included
Teradata professionals recommended a
a methodology to support future business application development/support, database
normalized model, it was considered
needs by capitalizing on existing data administrator, business power user, and
unusual. As data warehousing evolves to
structures and shortening time to market. responsibility for developing, implementing,
being active and as the need and require-
and managing an Information Delivery
ment increase for businesses to change, For example, one Teradata customer had
strategy for business end users. Since joining
the idea of a normalized data model for their business users request a new applica-
Teradata, Debbie has worked extensively
the data warehouse no longer seems so tion from IT. After the applications group
with prospects as well as new and mature
radical. In fact, many of the leading said that it would take 6 months just to
data warehouse customers to implement
analysts and data warehouse professionals find the data and once found, then they
effective tactical and strategic data ware-
understand the value of a normalized would have to determine the time required
housing initiatives.
model and are becoming more critical to generate the application. The business
of denormalized models. users left the meeting believing they would

EB-2406 > 1007 > PAGE 14 OF 18


Data Model Overview

Appendix A – Codd’s 12 Rules transaction control. All commercial rela- particularly difficult to satisfy. Most
Rule 1: The Information Rule tional databases use forms of the standard databases rely on strong ties between
All data should be presented to the user in SQL (Structured Query Language) as their the user view of the data and the actual
table form. supported comprehensive language. structure of the underlying tables.

Rule 2: Guaranteed Access Rule Rule 6: View Updating Rule Rule 10: Integrity Independence
All data should be accessible without Data can be presented to the user in The database language (like SQL) should
ambiguity. This can be accomplished different logical combinations, called support constraints on user input that
through a combination of the table name, views. Each view should support the same maintain database integrity. This rule is
primary key, and column name. full range of data manipulation that direct- not fully implemented by most major
access to a table has available. In practice, vendors. At a minimum, all databases do
Rule 3: Systematic Treatment of
providing update and delete access to preserve two constraints through SQL.
Null Values
logical views is difficult and is not fully
A field should be allowed to remain No component of a primary key can have
supported by any current database.
empty. This involves the support of a null a null value. (See Rule 3)
value, which is distinct from an empty Rule 7: High-level Insert, Update,
If a foreign key is defined in one table,
string or a number with a value of zero. and Delete
any value in it must exist as a primary key
Of course, this can’t apply to primary keys. Data can be retrieved from a relational
in another table.
In addition, most database implementa- database in sets constructed of data from
tions support the concept of a non-null multiple rows and/or multiple tables. This Rule 11: Distribution Independence
field constraint that prevents null values rule states that insert, update, and delete A user should be totally unaware of
in a specific table column. operations should be supported for any whether or not the database is distributed
retrievable set rather than just for a single (whether parts of the database exist in
Rule 4: Dynamic On-Line Catalog Based
row in a single table. multiple locations). A variety of reasons
on the Relational Model
make this rule difficult to implement.
A relational database must provide access Rule 8: Physical Data Independence
to its structure through the same tools that The user is isolated from the physical Rule 12: Nonsubversion Rule
are used to access the data. This is usually method of storing and retrieving informa- There should be no way to modify the
accomplished by storing the structure tion from the database. Changes can be database structure other than through
definition within special system tables. made to the underlying architecture the multiple row database language (like
(hardware, disk storage methods) without SQL). Most databases today support
Rule 5: Comprehensive Data
affecting how the user accesses it. administrative tools that allow some direct
Sublanguage Rule
manipulation of the data structure.
The database must support at least one Rule 9: Logical Data Independence
clearly defined language that includes How a user views data should not change (From: ITWorld.com, ‘Codd’s 12 Rules –
functionality for data definition, data when the logical structure (tables structure) Data Management Strategies 05-07-2001’.)
manipulation, data integrity, and database of the database changes. This rule is

EB-2406 > 1007 > PAGE 15 OF 18


Data Model Overview

Appendix B – A Comparison

Normalized Model Denormalized (Star) Model

Active Data Warehousing Supports active data warehousing initiatives. Challenges the ability to be active.

Any Question The flexibility of the normalized model provides for The simplicity of the denormalized model limits
any question to be answered. what questions can be answered without changes.

Complex Analysis A normalized model supports complex analysis. The enforced simplicity of a denormalized model is
unable to support complex analysis.

Data Duplication The rules of normalization require a single version Data is often duplicated multiple times to satisfy
of data. different groups of questions.

Data Granularity The normalized model easily stores the lowest level To maintain the ease of navigation, the denormal-
of data. ized model is typically aggregated along one of the
dimension lines.

Data Navigation The normalized model is more difficult to navigate. The denormalized model is specifically designed for
Business users have a difficult time at best in easy navigation by the business user.
understanding the model, the necessary joins and
sometimes even the relationships between data
elements.

Flexibility The greater the degree of normalization in the data The denormalized model is meant to be simplistic
model, the greater the flexibility. and is designed for specific requirements. As
requirements change, changes to the model are
needed.

Maintenance Updates to the normalized model are easier Updating the denormalized model requires pre-
because a piece of information is in one place and processing, aggregation and longer time frames.
requires one update. The normalized model is the
standard for the transactional environment.
Continuing that model throughout the data ware-
house provides a closer link and reflection of
transactional processes.

EB-2406 > 1007 > PAGE 16 OF 18


Data Model Overview

Appendix C – Glossary Dimensional Model Foreign Key


Attribute A design method that models the data Columns in a table that reflect primary
Data that is attributed or describes an based on a predefined set of business keys of other entities. Determines the
entity. questions to facilitate ease of access and relationship between entities.
response time speed.
Cube Hierarchy
A way to visualize how a dimensional Drill (Across) The arrangement or ranking of entities
model is organized. It is also a type of Movement of end-user request across into a graded series.
Dimensional Model design where the different dimensions.
Intelligent Key
measures and dimensions are stored in a
Drill (Up or Down) Key is made up of or consists of values
format that is expected by an OLAP tool
Movement of end-user request through that reflect a business requirement or use.
(many times proprietary format). One
data along a dimension.
cube may represent one Fact table with its Natural Key
contributing Dimensions. Entity The data fields whose value ensures
An identifiable object that is distinguish- uniqueness for each row, and the value
Data Mart (logical)
able from all other objects. will not change during the life of the row.
The data is stored in one place but Views
are used to present to the end user a Entity Relationship Model Normalized Model
portion or derivation of the underlying Shows the enterprise entities (or objects) Model of entities designed to remove data
data for their purposes. Data is not and the relationships between the entities redundancy, the higher the ‘form’ (2NF,
replicated. in that enterprise, without duplication. 3NF, 4NF, etc) the less redundancy.

Data Mart (physical) ETL OLTP


A segment of data that is aggregated or Terminology given to a group of tools that Online transaction processing. Consists of
preprocessed to enable specific end-user extract data, transform it and then load it. a set of instructions that serially, mechani-
queries and requirements. Sometimes the sequence of processing cally and repetitively processes many
changes to reflect ELT which is extract, thousands of transactions through a
Data Warehouse
load and then transform. predefined access path that ultimately
A reflection of integrated data that is
updates the underlying data store. OLTP
organized by subject areas and is stored to Fact
transactions are used by a transactional
address cross functional and multiple end- A measurement or value (quantity).
system that is focused on the acquisition
user requirements and queries.
Flattened Model of data. OLTP transactions normally have
Dimension Made up of rows that contain the data strict service level agreements (SLA)
The corporate ‘line’ along which facts are elements of all the column headings of a attached to them requiring sub-second
measured (time, product, geography). business report so that nothing has to be response time. OLTP concentrates on data

calculated to generate portions of that update and/or inserts.

report.

EB-2406 > 1007 > PAGE 17 OF 18


Data Model Overview
Teradata.com

Primary Key Appendix D – Endnotes Appendix E – References


Unique identifier for rows in a table. 1. Dr. E. F. Codd, Codd’s 12 Rules – Data Ralph Kimball, The Data Warehouse
Management Strategies, ITWorld.com, Toolkit, Practical Techniques for Building
Schema
05-07-2001. Dimensional Data Warehouses, Solutions
Structured framework that, for purposes
2. William Inmon, Enterprise Data from the Expert., John Wiley & Sons, New
of this paper, provides and represents
Warehouse, http://www.billinmon.com/ York, 1996.
relationships between entities.
library/articles/introedw.asp.
Daniel L .Moody, Department of Informa-
Service Level Agreement (SLA) 3. Ralph Kimball and Margy Ross, The tion Systems, University of Melbourne,
An agreement between IT and the busi- Data Warehouse Toolkit, Second Edition, Kortink, Mark, From Enterprise Models
ness that pertains to a performance aspect Wiley Computer Publishing, New York, to Dimensional Models: A Methodology
of a process. The agreement could be 2002, pp 11-12. for Data Warehouse and Data Mart Design,
based on availability, enforcing that data is
4. Neil Raden, edited by Ralph Kimball, Simsion Bowles & Associates, 2000.
loaded by a specific timeframe, or it could
Intelligent Enterprise, Data Warehouse
be based on retrieval processing enforcing William Inmon/Peter Meers, The Dilemma
Designer, ‘Real Time: Get Real, Part II’,
a processing time for specific queries or of Change: Managing Changes over Time
www.intelligententerprise.com, June
types of queries. in the Data Warehouse/DSS Environment,
30, 2003.
White paper on http://www.billinmon.com/
Slowly Changing Dimensions (SCD) 5. Steve Hoberman, Data Modeler’s
library/whiteprs/Dilemma of Change.pdf,
The impact of time on a dimension and Workbench, Tools and Techniques
March 2001.
that dimensions changes (i.e. customer for Analysis and Design, John Wiley &
addresses). Sons, New York, 2002.
6. Ralph Kimball and Margy Ross,
Snowflake Schema
The Data Warehouse Toolkit, Second
Made up of multiple Star Schemas with-
Edition, Wiley Computer Publishing,
out the duplication of dimension tables
2002, page 9.
that can be joined together.
7. William Inmon, The Problem
Star Schema with Dimensional Modeling,
Individual fact table surrounded by http://www.billinmon.com/library/
dimensional tables to which it will join. articles/artdimmd.asp.
8. William Inmon, Star Joins,
Surrogate Key
http://www.billinmon.com/library/
Artificial key used as a substitute for
articles/starjoin.asp.
natural data keys (i.e. customer id).

AllFusion is a trademark and ERwin is a registered trademark of Computer Associates International, Inc. Teradata continually improves products as new technolo-
gies and components become available. Teradata, therefore, reserves the right to change specifications without prior notice. All features, functions, and operations
described herein may not be marketed in all parts of the world. Consult your Teradata representative or Teradata.com for more information.
Copyright © 2004-2007 by Teradata Corporation All Rights Reserved. Produced in U.S.A.

EB-2406 > 1007 > PAGE 18 OF 18

You might also like