Normalization process explained
i have a database course in this semester and now i have pretty good idea on schema of a system. now i am going to
dive deep into the normalizaton of a schema. what is this ?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
So, in normalization, we start with all the entities, and then apply the normalization rules, and then find a normalized
version of our total design.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
what is functional dependency?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
what is the difference between candidate key, superkey and primary key?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
so each candidate key is part of superkeys and each candidate key can be potential primary key?. . . . . . . . . . . . . . . . . . . . 6
what is a repeating group?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
hwo to draw dependency diagram?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
when a dependency is not BCNF?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
but for that case does not that mean that there is transitive dependency? then why not it is also violating 3nf?. . . . . . . 10
what is division operation in relational algebra ?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/14
i have a database course in this semester and now i have pretty good idea on schema of a
system. now i am going to dive deep into the normalizaton of a schema. what is this ?
Normalization is basically the process of improving a database schema so that your data is well-
organized, avoids unnecessary duplication, and stays consistent when it changes.
If designing the schema is like sketching the blueprint of a building, normalization is like checking
the structure for weak points and reinforcing it so it won’t collapse under real-world usage.
Why do we normalize?
Main goals:
1. Remove redundancy – avoid storing the same data in multiple places.
2. Prevent anomalies – avoid weird or inconsistent results when inserting, updating, or deleting data.
3. Make relationships clear – ensure each table represents a single concept.
The idea in short
You start with your rough schema (sometimes called the un-normalized form), then restructure it in
stages called Normal Forms (NF).
The most common stages are:
Normal Form Main Fix Simple Analogy
1NF (First Normal Remove repeating groups and make sure every field
"One cell → one value, not a list."
Form) holds only atomic (indivisible) values.
2NF (Second Remove partial dependencies (when a column "Every non-key fact should
Normal Form) depends on only part of a composite key). depend on the whole key."
3NF (Third Remove transitive dependencies (when a column "Facts depend only on the key, not
Normal Form) depends on another non-key column). on other facts."
BCNF (Boyce-
A stricter version of 3NF that fixes edge cases. "Every determinant is a key."
Codd NF)
There are higher forms like 4NF and 5NF, but most real-world designs stop at 3NF or BCNF.
Example
Imagine you have this table:
StudentID StudentName CourseID CourseName Instructor
Problems:
CourseName is repeated for every student taking the course (redundancy).
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/14
If we update the CourseName in one row but forget others → update anomaly.
If we delete the last student in a course, we lose the course info → delete anomaly.
After normalization:
1. Students table → StudentID, StudentName
2. Courses table → CourseID, CourseName, Instructor
3. Enrollments table → StudentID, CourseID
Now, no redundant storage and no accidental data loss.
If you want, I can walk you through normalization step-by-step on a sample unnormalized schema
so you’ll see exactly how each normal form changes the design. That’s the best way to learn it quickly.
So, in normalization, we start with all the entities, and then apply the normalization rules, and
then find a normalized version of our total design.
Exactly — you’ve got the right idea.
Think of it like this:
1. Start with your raw design — all the entities, their attributes, and relationships.
At this stage, your schema might have redundancy, multi-valued fields, or unclear
dependencies.
It’s often called Unnormalized Form (UNF).
2. Apply normalization rules in sequence:
1NF → get rid of repeating groups and multi-valued attributes.
2NF → fix partial dependencies (only matters if you have composite keys).
3NF → remove transitive dependencies.
BCNF → tighten the design even further if needed.
3. End up with a fully normalized schema — where each table is about one concept, every fact is
stored only once, and the schema can handle inserts, updates, and deletes without weird side
effects.
One extra note:
When you normalize, sometimes you split one table into multiple smaller tables and connect them
with foreign keys. This reduces redundancy but can increase the number of joins you’ll do in queries —
so in real-world systems, after normalization, there’s sometimes a bit of controlled denormalization for
performance reasons.
If you want, I can give you a full worked-out example where we take an unnormalized table and
normalize it through 1NF → 2NF → 3NF step-by-step, so you can see exactly what happens at each
stage. That’s how database professors usually teach it.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/14
what is functional dependency?
A functional dependency (FD) is a way of saying:
"If I know the value of one thing, I can determine the value of another thing."
It’s a constraint between two sets of attributes in a relation (table) that tells you how the data is
connected.
Formal definition
In a table R, we say:
X→Y
means Y is functionally dependent on X if, for any two rows in R, whenever the values of X are the
same, the values of Y are also the same.
X = determinant (the thing you know)
Y = dependent attribute (the thing you can find out)
Simple example
StudentID Name Dept
101 Alice CS
102 Bob Math
103 Charlie CS
We have:
StudentID → Name (knowing StudentID tells you exactly the Name)
StudentID → Dept (knowing StudentID tells you exactly the Dept)
{StudentID} is the key in this table.
But NOT:
Dept → Name — because many students can be in the same department.
Why it matters for normalization
Functional dependencies help you:
1. Identify primary keys (the smallest set of attributes that determines all others).
2. Spot redundancy (if a non-key attribute depends on something it shouldn’t).
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/14
3. Apply normal forms (2NF, 3NF, BCNF all rely on analyzing functional dependencies).
If you want, I can show you a visual trick for spotting functional dependencies in a table — it makes
normalization much easier. Would you like me to go through that?
what is the difference between candidate key, superkey and primary key?
Alright — let’s break these three down clearly, because they’re closely related but not the same thing.
1. Superkey
Definition: Any set of attributes that can uniquely identify a row in a table.
May have extra, unnecessary attributes (not minimal).
Example:
Table: Students(StudentID, Email, Name)
{StudentID} → uniquely identifies a student ✅
{StudentID, Email} → also uniquely identifies a student ✅ (but has extra attribute → still a
superkey)
2. Candidate Key
Definition: A minimal superkey — it can uniquely identify a row, but no attribute can be
removed without losing uniqueness.
All attributes in it are necessary.
There can be multiple candidate keys in a table.
Example:
{StudentID} → minimal superkey ✅
{Email} → also minimal superkey ✅
{StudentID, Email} ❌ (not minimal, because StudentID alone works)
3. Primary Key
Definition: The chosen candidate key that the database designer decides to use for uniquely
identifying rows.
Only one primary key per table.
Chosen based on:
Stability (won’t change over time)
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/14
Simplicity (few attributes)
Readability
Example:
Candidate keys: {StudentID}, {Email}
Designer picks {StudentID} as primary key.
Relationship
mathematica
Superkeys ⊇ Candidate Keys ⊇ Primary Key
All candidate keys are superkeys.
The primary key is just one of the candidate keys.
Superkeys can be bigger than necessary; candidate keys are minimal; primary key is the selected
one.
If you want, I can make you a small diagram that shows this relationship visually — it makes the
“subset” idea much clearer.
Do you want me to draw it?
so each candidate key is part of superkeys and each candidate key can be potential primary
key?
Yes — exactly.
Here’s the relationship in plain terms:
Every candidate key is a superkey ✅ (because it can uniquely identify rows).
Not every superkey is a candidate key ❌ (because some superkeys have extra, unnecessary
attributes).
Any candidate key can be chosen as the primary key — it’s just a matter of designer’s choice.
Think of it like this:
Superkeys = all possible ID badges that can get you into the building (some have extra stuff, like
your photo + birth date + shoe size).
Candidate keys = the smallest, simplest ID badges that still get you in.
Primary key = the one ID badge you officially decide everyone must use.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/14
If you want, I can give you a real example table and list all superkeys, candidate keys, and the
primary key so you can see them side by side. That makes the hierarchy crystal clear.
what is a repeating group?
A repeating group is when you store multiple values of the same kind in one row of a table, usually
in:
Multiple columns with similar meaning (like Phone1, Phone2, Phone3)
Or a single column that holds a list (like "Math, Physics, Chemistry" in one cell)
Why it’s bad
Breaks First Normal Form (1NF), which says each field should be atomic (one value only).
Makes queries harder (you can’t easily search or join on a single phone number in "Phone1, Phone2,
Phone3").
Causes wasted space if some of the repeating columns are empty.
Forces schema changes if you need more values (adding Phone4, Phone5, etc.).
Example of repeating group
StudentID Name Course1 Course2 Course3
101 Alice Math Physics NULL
102 Bob Math Chemistry Biology
Here:
Course1, Course2, Course3 are repeating groups — they represent the same type of fact (“course the
student takes”).
Normalized (1NF) version
StudentID Name Course
101 Alice Math
101 Alice Physics
102 Bob Math
102 Bob Chemistry
102 Bob Biology
Now:
One fact per row.
Easy to query, extend, and maintain.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 7/14
If you want, I can walk you through exactly how removing repeating groups is the first step of
normalization (UNF → 1NF), because that’s where most database normalization examples begin.
hwo to draw dependency diagram?
A dependency diagram is basically a picture showing how attributes depend on each other using
functional dependencies.
It’s often drawn when you’re preparing to normalize a relation.
Here’s how to do it step-by-step:
1️⃣ Start with your table
Example relation:
scss
STUDENT(StudentID, Name, DeptID, DeptName, Advisor)
Suppose the functional dependencies are:
1. StudentID → Name, DeptID, Advisor
2. DeptID → DeptName
2️⃣ Identify the primary key
From the FDs, StudentID can determine everything else (through transitivity), so StudentID is the
primary key.
3️⃣ Draw attributes inside a big rectangle
Draw one box (or oval) for each attribute inside a rectangle that represents the table.
4️⃣ Draw arrows for dependencies
From determinant → dependent attribute.
One arrow can go to multiple attributes.
Example (ASCII version):
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/14
lua
+------------------------------------------+
| STUDENT |
| |
| StudentID -----> Name |
| | |
| +---------> DeptID -----> DeptName |
| +---------> Advisor |
+------------------------------------------+
5️⃣ Mark keys
Underline the primary key (and note candidate keys if there are multiple).
Tips
No arrow loops — dependencies are directional.
Group dependencies: StudentID → Name, DeptID, Advisor means draw one arrow from StudentID
that splits into those attributes.
If you have composite keys, draw a combined box for the composite determinant, like (CourseID,
StudentID) → Grade.
If you want, I can make a clean dependency diagram in proper visual form for you using your own
example schema — that way you’ll have a clear template for your database course.
Do you want me to do that?
when a dependency is not BCNF?
Great question!
A functional dependency violates BCNF (Boyce-Codd Normal Form) when:
Definition of BCNF
A relation (table) is in BCNF if for every non-trivial functional dependency:
X→Y
the determinant X is a superkey of the relation.
So when is a dependency NOT BCNF?
When the determinant X is not a superkey, but it functionally determines some attribute(s) Y .
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 9/14
Example to clarify:
Consider a table COURSE with attributes:
{CourseID, Instructor, Room}
Suppose these dependencies exist:
1. CourseID → Instructor
2. Instructor → Room
Assume the primary key is CourseID (uniquely identifies a course).
The first dependency is fine: CourseID is a key → BCNF compliant.
The second dependency Instructor → Room is a problem if Instructor is not a key (likely true).
Because Instructor determines Room but Instructor alone is not a key, this violates BCNF.
What happens then?
You should decompose the table into smaller tables to fix this:
1. COURSE(CourseID, Instructor)
2. INSTRUCTOR(Instructor, Room)
Now, each table satisfies BCNF.
Summary:
Dependency not BCNF if: determinant is NOT a superkey.
This means some attribute (or set of attributes) determines others without being able to uniquely
identify tuples in the relation.
If you want, I can walk you through a full example of BCNF violation and decomposition step-by-step!
but for that case does not that mean that there is transitive dependency? then why not it is also
violating 3nf?
Excellent observation! You're right to connect BCNF violations with transitive dependencies and 3NF. Let
me clarify the relationship between 3NF and BCNF in this context:
Recap:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 10/14
3NF says:
For every functional dependency X → Y , either
1. X is a superkey, or
2. Y is a prime attribute (i.e., part of some candidate key).
BCNF says (stricter):
For every functional dependency X →Y,
X must be a superkey.
So, why can a FD violate BCNF but not 3NF?
Let’s go back to your example:
| COURSE (CourseID, Instructor, Room) |
CourseID → Instructor
Instructor → Room
Assume:
Primary key = CourseID
Room is not part of any candidate key (so it’s a non-prime attribute).
Check 3NF:
The dependency Instructor → Room has determinant Instructor, which is not a superkey, but
is Room a prime attribute?
No, because Room is not part of any candidate key.
So, this FD violates 3NF if the determinant is not a superkey and the dependent is not prime.
But in this example, the determinant is not a superkey, and dependent is non-prime, so yes — it also
violates 3NF.
When do BCNF violations not violate 3NF?
The key difference is in the exception for prime attributes in 3NF:
If the dependent attribute is part of a candidate key, 3NF allows this FD even if the determinant
is not a superkey.
BCNF does not allow this — it requires the determinant to always be a superkey.
Example of FD that violates BCNF but not 3NF
Suppose a relation:
| R(A, B, C) |
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 11/14
Candidate keys:
A
B
Functional dependency:
A→B
Here:
A is a key, so no problem.
But if you had FD B → A and B is a key too, no problem.
But if you had an FD where the determinant is not a key, and dependent is part of candidate key, it
passes 3NF but fails BCNF.
Summary
Condition Violates 3NF? Violates BCNF?
Determinant is not superkey Maybe (depends if dependent is prime) Yes
Dependent is prime attribute No Yes
Dependent is non-prime Yes Yes
So:
All BCNF violations violate 3NF, except for cases where the dependent is prime — those
violate BCNF but not 3NF.
Your example with Instructor → Room violates both because Room is non-prime.
If you want, I can prepare specific examples illustrating these cases!
what is division operation in relational algebra ?
The division operation in relational algebra is basically a "for all" query — it answers questions like:
"Find all X such that X is related to all Y in some set."
It’s not used as often as selection (σ), projection (π), or join (⋈), but it’s very handy when you need to
match every item in a given group.
Definition
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 12/14
If we have two relations:
R(A, B) — a larger relation with attributes A and B
S(B) — a smaller relation with attribute B (subset of R’s B-values)
Then:
R÷S
gives a relation with attribute A containing all values of A that are paired with every value of B in S .
Example
Data:
R(A, B) — which student took which course:
Student Course
Alice Math
Alice Physics
Bob Math
Bob Physics
Bob CS
Charlie Math
S(B) — required courses:
Course
Math
Physics
Question:
"Find students who have taken all required courses."
Solution using division:
R÷S
Step-by-step:
For each student in R, check if they are related to all courses in S.
Alice: {Math, Physics}✅ matches all in S.
Bob: {Math, Physics, CS} ✅ contains all in S.
Charlie: {Math} ❌ missing Physics.
Result:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 13/14
Student
Alice
Bob
How it works mathematically
R ÷ S = {a ∣ ∀b ∈ S, (a, b) ∈ R}
Keep only the A-values that are paired with every B-value in S.
Why it’s useful
"Find employees who know all required skills"
"Find suppliers who supply all parts in a list"
"Find students who completed all compulsory subjects"
If you want, I can also show you a Venn diagram–style visual of how division works — it’s one of the
easiest ways to understand it.
Do you want me to draw that?
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 14/14