Normalization and Data Integrity
1. Introduction to Normalization
Normalization is a systematic process of organizing data in a database to:
Minimize data redundancy (repetition of data),
Ensure data integrity, and
Improve data consistency and efficient querying.
Normalization is usually performed in stages called normal forms (NF). Each stage refines the structure
of the database to make it more efficient and maintainable.
2. Objectives of Normalization
Eliminate redundant data: Avoid storing the same data in multiple places.
Ensure data dependencies make sense: Data is logically stored.
Improve data integrity: Reduce the chances of anomalies (insertion, update, and deletion
anomalies).
Facilitate easier maintenance: With fewer duplicates, data is easier to maintain and update.
3. Data Redundancy and Anomalies
Data Redundancy
Occurs when the same piece of data exists in multiple places. Redundancy can lead to:
Wasted storage space
Inconsistent data updates
Integrity issues
Types of Anomalies
Insertion anomaly: You can’t add data because other data is missing.
Update anomaly: You must change data in multiple places when a single item changes.
Deletion anomaly: Deleting a record causes unintended loss of data.
4. Normal Forms
4.1 First Normal Form (1NF)
Definition: A table is in 1NF if:
All the values in a column are atomic (indivisible),
Each column contains only one value per record (no repeating groups or arrays).
Example:
StudentID Name Courses
101 Alice Math, English
Not in 1NF because "Courses" has multiple values.
After 1NF:
StudentID Name Course
101 Alice Math
101 Alice English
4.2 Second Normal Form (2NF)
Definition: A table is in 2NF if:
It is in 1NF,
All non-key attributes are fully dependent on the entire primary key.
Applies mainly to tables with composite keys (i.e., more than one attribute in the primary key).
Example:
StudentID CourseID StudentName CourseName
Here:
StudentName depends only on StudentID
CourseName depends only on CourseID
Not in 2NF because non-key attributes do not depend on the full key.
After 2NF:
Student Table:
StudentID StudentName
Course Table:
CourseID CourseName
Enrollment Table:
StudentID CourseID
4.3 Third Normal Form (3NF)
Definition: A table is in 3NF if:
It is in 2NF,
There is no transitive dependency (i.e., non-key attributes do not depend on other non-key
attributes).
Example:
EmployeeID Name Department DeptLocation
If Department → DeptLocation, and EmployeeID → Department, then EmployeeID →
DeptLocation is a transitive dependency.
After 3NF:
Employee Table:
| EmployeeID | Name | Department |
Department Table:
| Department | DeptLocation |
4.4 Boyce-Codd Normal Form (BCNF)
Definition: A stricter version of 3NF. A table is in BCNF if:
It is in 3NF,
Every determinant is a candidate key.
A determinant is any attribute on which some other attribute is fully functionally dependent.
Example:
StudentID Course Instructor
If:
Each Course is taught by only one Instructor,
A student can enroll in many courses,
Then Course → Instructor, but StudentID, Course is the primary key.
This violates BCNF because Course (a non-candidate key) determines Instructor.
After BCNF:
Course Table:
| Course | Instructor |
Enrollment Table:
| StudentID | Course |
5. Data Integrity
Definition:
Data integrity ensures that data is accurate, consistent, and reliable throughout its lifecycle in a
database.
Types of Data Integrity
Type Description
Entity Integrity Ensures each table has a unique primary key and that it is not null.
Referential Integrity Ensures that foreign keys correctly refer to primary keys in related tables.
Domain Integrity Enforces valid entries for a given column using data type, format, or constraints.
Type Description
User-Defined Integrity Custom rules set by users/business logic (e.g., age must be ≥ 18).
Enforcing Data Integrity
Use of primary keys (uniqueness and not null)
Use of foreign keys (to enforce relationships)
Use of constraints like:
o NOT NULL
o UNIQUE
o CHECK
o DEFAULT
Use of triggers and stored procedures for complex validations
6. Relationship Between Normalization and Data Integrity
Normalization Benefit How it Ensures Integrity
Removes redundancy Prevents inconsistent data from multiple storage points
Splits data logically Ensures dependencies are respected
Uses keys and constraints Enforces entity and referential integrity
Avoids anomalies Guarantees consistent insertion, update, deletion
7. Summary Table of Normal Forms
Normal Form Requirement
1NF Atomic columns, no repeating groups
2NF 1NF + Full functional dependency on primary key
3NF 2NF + No transitive dependencies
BCNF 3NF + Every determinant must be a candidate key
SQL Basics – Data Definition Language (DDL)
1. What is DDL?
Data Definition Language (DDL) refers to SQL commands that define the structure of a database,
including:
Creating, modifying, and deleting databases, tables, and other objects like indexes, views, and
schemas.
Key DDL Commands
Command Purpose
CREATE Create databases and tables
ALTER Modify existing tables (add, remove, modify columns)
DROP Delete databases or tables completely
2. Creating Databases and Tables
2.1 CREATE DATABASE
sql
CopyEdit
CREATE DATABASE SchoolDB;
This creates a new database named SchoolDB.
2.2 CREATE TABLE
sql
CopyEdit
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Age INT,
Email VARCHAR(100)
);
Creates a table Students with specified columns and a primary key on StudentID.
3. ALTER TABLE
Used to modify a table's structure after it has been created.
Examples:
Add a new column:
sql
CopyEdit
ALTER TABLE Students ADD Gender VARCHAR(10);
Modify a column:
sql
CopyEdit
ALTER TABLE Students MODIFY Age SMALLINT;
Drop a column:
sql
CopyEdit
ALTER TABLE Students DROP COLUMN Email;
4. DROP TABLE / DROP DATABASE
Deletes a table or database permanently (all data is lost).
Examples:
Drop a table:
sql
CopyEdit
DROP TABLE Students;
Drop a database:
sql
CopyEdit
DROP DATABASE SchoolDB;
⚠️Caution: DROP is irreversible and deletes everything related to the object.
Practical Task for Week 5
Try writing and executing the following SQL statements:
Create a database called CompanyDB
Create a table called Employees with fields: EmpID, Name, Position, Salary
Add a column HireDate to Employees
Delete the column Salary
Drop the entire Employees table
Week 6: SQL – Data Manipulation Language (DML)
1. What is DML?
Data Manipulation Language (DML) is used to manipulate and retrieve data from existing tables.
Key DML Commands
Command Purpose
SELECT Retrieve data
INSERT Add new data
UPDATE Modify existing data
DELETE Remove data from a table
2. SELECT – Retrieving Data
Basic Syntax:
sql
CopyEdit
SELECT column1, column2 FROM table_name;
Example:
sql
CopyEdit
SELECT FirstName, LastName FROM Students;
Select All Columns:
sql
CopyEdit
SELECT * FROM Students;
With WHERE Clause:
sql
CopyEdit
SELECT * FROM Students WHERE Age > 18;
3. INSERT – Adding Data
Syntax:
sql
CopyEdit
INSERT INTO table_name (column1, column2) VALUES (value1, value2);
Example:
sql
CopyEdit
INSERT INTO Students (StudentID, FirstName, LastName, Age)
VALUES (101, 'John', 'Doe', 20);
4. UPDATE – Modifying Data
Syntax:
sql
CopyEdit
UPDATE table_name SET column1 = value1 WHERE condition;
Example:
sql
CopyEdit
UPDATE Students SET Age = 21 WHERE StudentID = 101;
⚠️Always use WHERE to avoid updating all rows.
5. DELETE – Removing Data
Syntax:
sql
CopyEdit
DELETE FROM table_name WHERE condition;
Example:
sql
CopyEdit
DELETE FROM Students WHERE StudentID = 101;
⚠️Without a WHERE clause, all records will be deleted.
Practical Exercises for Week 6
1. Insert 3 records into a Courses table.
2. Update one course's title or description.
3. Retrieve all courses offered.
4. Delete one course from the table.
5. Select students above a certain age.
Tips and Best Practices
Always backup before using DELETE or DROP.
Use NOT NULL, UNIQUE, DEFAULT, and CHECK constraints for integrity.
Combine SELECT with ORDER BY, GROUP BY, JOIN for complex queries.
Use comments (-- comment) to document SQL scripts.
Advanced SQL Queries
1. JOIN Operations
JOINs are used to combine rows from two or more tables based on a related column between them
(usually foreign key relationships).
1.1 INNER JOIN
Returns records that have matching values in both tables.
sql
CopyEdit
SELECT A.Name, B.CourseName
FROM Students A
INNER JOIN Courses B ON A.CourseID = B.CourseID;
Only returns students who are enrolled in a course (matched by CourseID).
1.2 LEFT JOIN (LEFT OUTER JOIN)
Returns all records from the left table and the matched records from the right. If there is no match,
NULLs appear.
sql
CopyEdit
SELECT A.Name, B.CourseName
FROM Students A
LEFT JOIN Courses B ON A.CourseID = B.CourseID;
Shows all students, even if not enrolled in any course.
1.3 RIGHT JOIN (RIGHT OUTER JOIN)
Returns all records from the right table and the matched records from the left.
sql
CopyEdit
SELECT A.Name, B.CourseName
FROM Students A
RIGHT JOIN Courses B ON A.CourseID = B.CourseID;
Shows all courses, even if no students are enrolled.
1.4 FULL JOIN (FULL OUTER JOIN)
Returns all records when there is a match in either table. Non-matching rows get NULLs.
sql
CopyEdit
SELECT A.Name, B.CourseName
FROM Students A
FULL JOIN Courses B ON A.CourseID = B.CourseID;
Combines LEFT and RIGHT JOIN. Shows all students and courses, matched or not.
✅ Note: Not all RDBMS (like MySQL) support FULL JOIN directly. You may need to simulate it using
UNION.
2. Aggregate Functions
Aggregate functions perform calculations on multiple rows and return a single result.
Function Description Example
COUNT() Counts the number of rows SELECT COUNT(*) FROM Students;
SUM() Adds numeric values SELECT SUM(Salary) FROM Employees;
AVG() Calculates average value SELECT AVG(Age) FROM Students;
MAX() Finds the highest value SELECT MAX(Salary) FROM Employees;
MIN() Finds the lowest value SELECT MIN(Salary) FROM Employees;
With GROUP BY
Used to group rows that share the same values, allowing aggregate functions to be applied to each
group.
sql
CopyEdit
SELECT CourseID, COUNT(*) AS StudentCount
FROM Students
GROUP BY CourseID;
Counts students enrolled in each course.
With HAVING
Used to filter groups, like WHERE filters rows.
sql
CopyEdit
SELECT CourseID, COUNT(*) AS StudentCount
FROM Students
GROUP BY CourseID
HAVING COUNT(*) > 5;
Shows only courses with more than 5 students.
Week 8: Database Security and User Management
1. Authentication vs. Authorization
Term Meaning
Authentication Verifying a user’s identity (login credentials)
Authorization Determining what actions or resources a user is allowed to access
2. User Management in SQL
You can create, modify, and remove users, and control their access permissions.
2.1 Creating a User
sql
CopyEdit
CREATE USER 'john'@'localhost' IDENTIFIED BY 'password123';
Creates a user named john who can connect from the local machine.
2.2 Granting Privileges
sql
CopyEdit
GRANT SELECT, INSERT, UPDATE ON SchoolDB.* TO 'john'@'localhost';
Allows user john to select, insert, and update on all tables in SchoolDB.
ALL PRIVILEGES: Grant everything.
ON dbname.tablename: Grant on a specific table.
ON .: Grant on all databases and tables.
2.3 Revoking Privileges
sql
CopyEdit
REVOKE UPDATE ON SchoolDB.Students FROM 'john'@'localhost';
Removes UPDATE permission from user john.
2.4 Deleting a User
sql
CopyEdit
DROP USER 'john'@'localhost';
Deletes the user account from the system.
3. Role Management (in supported DBMSs like PostgreSQL,
Oracle, MySQL 8+)
Creating and Using Roles
sql
CopyEdit
CREATE ROLE staff;
GRANT SELECT, INSERT ON SchoolDB.* TO staff;
GRANT staff TO 'john'@'localhost';
Makes permission management easier across multiple users.
4. Best Practices for Database Security
Use strong passwords and password policies.
Grant minimum required privileges (Principle of Least Privilege).
Regularly audit user activity and permissions.
Use encryption (TLS/SSL) for data in transit.
Keep database software updated with latest security patches.
Avoid using root/admin for daily operations.
Sample Security Report Activity (for case study)
List of users with access and their privileges.
Tables or resources accessed by each user.
Identify excessive permissions and recommend revocations.
Suggest strong authentication and encryption measures.