Normalization
Normalization
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
Basically, the Rules of Normalization are enforced by
eliminating redundancy and inconsistent dependency in your
table designs. Here we will explain what that means by
examining the five progressive steps to normalization you
should be aware of in order to create a functional and efficient
database.
Let's say we want to create a table of user information, and
we want to store each users' Name, Company, Company
Address, and some personal bookmarks, or urls. You might
start by defining a table structure like this:
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
Zero Form
users
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
First Normal Form
Notice how we're breaking that first rule by repeating the url1
and url2 fields? And what about Rule Three, primary keys? Rule
Three basically means we want to put some form of unique,
auto-incrementing integer value into every one of our records.
Otherwise, what would happen if we had two users named Joe
and we wanted to tell them apart? When we apply the rules of
the First Normal Form we come up with the following table:
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
users
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
Now our table is said to be in the First Normal Form. We've solved
the problem of url field limitation, but look at the headache we've
now caused ourselves. Every time we input a new record into
the users table, we've got to duplicate all that company and user
name data. Not only will our database grow much larger than
we'd ever want it to, but we could easily begin corrupting our
data by misspelling some of that redundant information. Let's
apply the rules of Second Normal Form:
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
urls
users
2 1 xyz.com
1 Joe ABC 1 Work Lane
3 2 abc.com
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
Ok, we've created separate tables and the primary key in the users
table, userId, is now related to the foreign key in the urls table,
relUserId. We're in much better shape. But what happens when
we want to add another employee of company ABC? Or 200
employees? Now we've got company names and addresses
duplicating themselves all over the place, a situation just rife for
introducing errors into our data. So we'll want to look at
applying the Third Normal Form:
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
Third Normal Form
users
1 Joe 1
2 Jill 2
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
companies
urls
urlId relUserId url
1 1 abc.com
2 1 xyz.com
3 2 abc.com
4 2 xyz.com
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
Now we've got the primary key compId in the companies table
related to the foreign key in the users table called relCompId,
and we can add 200 users while still only inserting the name
"ABC" once. Our users and urls tables can grow as large as they
want without unnecessary duplication or corruption of data.
Most developers will say the Third Normal Form is far enough,
and our data schema could easily handle the load of an entire
enterprise, and in most cases they would be correct.
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
But look at our url fields - do you notice the duplication of data? This
is perfectly acceptable if we are not pre-defining these fields. If the
input page which our users are filling out to input this data allows
a free-form text input there's nothing we can do about this, and it's
just a coincidence that Joe and Jill both input the same
bookmarks. But what if it's a drop-down menu which we know only
allows those two urls, or maybe 20 or even more. We can take our
database schema to the next level, the Fourth Form, one which
many developers overlook because it depends on a very specific
type of relationship, the many-to-many relationship, which we
have not yet encountered in our application.
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
Data Relationships
Before we define the Fourth Normal Form, let's look at the three basic
data relationships: one-to-one, one-to-many, and many-to-many.
Look at the users table in the First Normal Form example above.
For a moment let's imagine we put the url fields in a separate table,
and every time we input one record into the users table we would
input one row into the urls table. We would then have a one-to-one
relationship: each row in the users table would have exactly one
corresponding row in the urls table. For the purposes of our
application this would neither be useful nor normalized.
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
Now look at the tables in the Second Normal Form example. Our tables
allow one user to have many urls associated with his user record.
This is a one-to-many relationship, the most common type, and
until we reached the dilemma presented in the Third Normal Form,
the only kind we needed.
The many-to-many relationship, however, is slightly more complex.
Notice in our Third Normal Form example we have one user related
to many urls. As mentioned, we want to change that structure to
allow many users to be related to many urls, and thus we want a
many-to-many relationship. Let's take a look at what that would do
to our table structure before we discuss it:
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
users
1 Joe 1
2 Jill 2
companies
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
urls
urlId url
1 abc.com
2 xyz.com
url_relations
1 1 1
2 1 2
3 2 1
4 2 2
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
Fourth Normal Form
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
SELECT name, url FROM users, urls, url_relations WHERE
url_relations.relatedUserId = 1 AND users.userId = 1 AND urls.urlId
= url_relations.relatedUrlId
Copyright @ www.bcanotes.com
N o r m a l i z a t i o n…
Fifth Normal Form
There is one more form of normalization which is sometimes
applied, but it is indeed very esoteric and is in most cases
probably not required to get the most functionality out of your data
structure or application. It's tenet suggests:
1. The original table must be reconstructed from the tables into
which it has been broken down.
The benefit of applying this rule ensures you have not created any
extraneous columns in your tables, and that all of the table
structures you have created are only as large as they need to be.
It's good practice to apply this rule, but unless you're dealing with
a very large data schema you probably won't need it.
Copyright @ www.bcanotes.com