08 Databases
Candidates should be able to:
Show understanding of the limitations of using a file-based approach for the storage and
retrieval of data
Describe the features of a relational database that address the limitations of a file-based
approach
Show understanding of and use the terminology associated with a relational database
model
Use an entity-relationship (E-R) diagram to document a database design
Show understanding of the normalization process
Explain why a given set of database tables are, or are not, in 3NF
Produce a normalized database design for a description of a database, a given set of
data, or a given set of tables
Show understanding of the features provided by a Database Management System
(DBMS) that address the issues of a file based approach
Show understanding of how software tools found within a DBMS are used in practice
Show understanding that the DBMS carries out all creation/modification of the database
structure using its Data Definition Language (DDL)
Show understanding that the DBMS carries out all queries and maintenance of data using
its DML
Show understanding that the industry standard for both DDL and DML is Structured
Query Language (SQL)
Understand given SQL (DDL) statements and be able to write simple SQL (DDL)
statements using a sub-set of statements
Definitions
Field Collection of items of data
An organized collection of structured information, or data
Database
typically stored electronically in a computer system
A database in which the data items are linked by internal
Relational Database
pointers
Entity Anything that can have data stored about it
A group of similar data, in a database, with rows for each
Table
instance of an entity and columns for each attribute
Record A row in a table in a database
Field A column in a table in a database
Sometimes called a logical record or an occurrence of an
Tuple
entity
Attribute Individual data item stored for an entity
An attribute or smallest set of attributes in a table where each
Candidate Key
tuple is unique
A unique identifier for a table, a special case of a candidate
Primary Key
key
Secondary Key A candidate key that is an alternative to the primary key
A set of attributes in one table that refer to the primary key in
Foreign Key
another table
It is maintained when the database must not contain any
Referential Integrity values of a foreign key that are not matched to the
corresponding primary key
Indexing A pointer to a data in a table
It is formed when one table in a database has a foreign key
Relationship
that refers to a primary key in another table in the database
Entity-Relationship (E- A way to visually represent the relationship between the
R) Diagram entities in a table
A process used to construct a relational database that has
Normalization Process
integrity and in which data redundancy is reduced
Entities in a database do not contain repeated groups of
First Normal Form (1NF)
attributes
Second Normal
There are no partial dependencies (and are in 1NF)
Form(2NF)
Third Normal Form All non-key attributes are independent, and there are no non-
(3NF) key dependencies (and are in 2NF)
A situation in which a non-key attribute of a table depends on
Partial Dependency
only a part of the primary key
A situation in which there is at least one field that is dependent
Non-key dependency
on other fields that are not part of the key
Data Redundancy The same data is stored in multiple places
Updating data in one place does not update data stored in
Data Inconsistency
some other place
Changing structure of data has major effect on the
Data Dependency
applications using the database
Data Modelling A tool used to how the data structure of a database
A data model for a specific database that is independent of
Logical Schema
the DBMS used to build the database
Interface provided by DBMS that allows a developer to write
Developer Interface
queries in Structured Programming Language (SQL)
Query Processor A processor that processes queries written in SQL
Data Definition A language used to create, modify, and remove the data
Language (DDL) structures that form a relational database
Data Manipulation A language used to add, modify, delete, and retrieve the data
Language (DML) stored in a relational database
8.1 Database Concepts
8.1.1 The Limitations of a File-Based Approach
Storage space is wasted when data items are duplicated by the separate applications and
some data is redundant
Inconsistent as data can be altered by one application and not by another
Enquiries available can depend on the structure of data and software used so data is not
independent
8.1.2 The Advantages of a Relational Database over a File-
Based Approach
Database
An organized collection of structured information, or data, typically stored electronically in
a computer system
Advantages
Storage space is not wasted since data item are only stored once, hence minimal
redundant data
Data altered in one application is available in other applications, hence data is consistent
Enquiries available do not depend on the structure of data and software used so data is
independent
8.1.3 Relational Database Model Terminology
Database: An organized collection of structured information, or data typically stored
electronically in a computer system
Relational Database: A database in which the data items are linked by internal pointers
Entity: Anything that can have data stored about it
Table: A group of similar data, in a database, with rows for each instance of an entity and
columns for each attribute
Record: A row in a table in a database
Field: A column in a table in a database
Tuple: Sometimes called a logical record or an occurrence of an entity
Attribute: Individual data item stored for an entity
Types of keys:
Candidate Key: An attribute or smallest set of attributes in a table where each tuple is
unique
Primary Key: A unique identifier for a table, a special case of a candidate key
Secondary Key: A candidate key that is an alternative to the primary key
Foreign Key: A set of attributes in one table that refer to the primary key in another table
Relationships
Formed when one table in a database has a foreign key that refers to a primary key in
another table in the database
one-to-one
one-to-many
many-to-one
many-to-many
Referential Integrity: It is maintained when the database must not contain any values of a
foreign key that are not matched to the corresponding primary key
Indexing: A pointer to a data in a table
8.1.4 Entity-Relationship (E-R) Diagrams
Used to document the design of a database
Provides visual representation of how the entities in a database are related
The cardinality of relationships are:
Example of an E-R Diagram:
8.1.5 The Normalization Process
Used to construct a relational database that has integrity and in which data redundancy is
reduced
Tables that are not normalized are generally larger
For this section, assume the following table (STUDENT) to be used for reference:
This table is currently not normalized and hence can cause problems.
First Normal Form (1NF)
Entities do not contain repeated group of attributes
In this case, subjects and subject teachers has multiple entries for the same entry
To fix a separate table (STUDENTSUBJECT) could be made
1. Here, the primary key for this table is a composite key from the two attributes
StudentID and SubjectName since that tuple is always unique for each entry in this
table
2. StudentID is also a foreign key linking to the original table
The original table gets shortened to
Second Normal Form (2NF)
Entities are in 1NF and any non-key attributes depend upon the primary key. There are no
partial dependencies
In the STUDENTSUBJECT table, Subject Teacher is only dependent on the Subject Name
part of the primary key, a partial dependence that needs to be removed by introducing a
third table, SUBJECT
1. Hence, the STUDENTSUBJECT now has the Subject Teacher field removed
Third Normal Form (3NF)
Entities are in 2NF and all non-key attributes are independent. The table contains no non-
key dependencies
In STUDENT table, Location and Teacher Name depends upon the attribute Class ID,
while a few other attributes depend upon Teacher Name.
Since Teacher Name could be same for different teachers, Licence Number should be
used as the primary key for TEACHER table.
The updated tables are:
STUDENT
TEACHER
CLASS
SUBJECT
STUDENTSUBJECT
Skill Check 1
Bobby and Kim are discussing databases.
1. Bobby tells Kim that a file-based approach is usually better than a relational
database.
Explain why Bobby is incorrect. [3]
2. Bobby has a shop that sells products to customers his database will store data
about his customers, their payment details, orders and the products he sells.
Customers will have login details to access their accounts. The database will
update customers’ payment and login details without keeping any historical
records.
a. Give one example of each of the following relationships from Bobby’s
database.
i. one-to-one
ii. one-to-many
iii. many-to-many
b. From the above three, identify the relationship that cannot be directly
implemented in a normalized relational database.
Solution
‣
1. Flat-file has more data redundancy than a relational database. Secondly, it
also has more data inconsistency and it is not easy to perform complex
searches
e6ec3a10a169432d96f923cfed0e55363b6ee2c027e64e16a902f3ba4a24
a. Customer to payment details
Customer to order
Customer to product
b. many-to-many
8.2 Database Management Systems (DBMS)
8.2.1 How a DBMS Addresses the Limitations of a File-Based
Approach
Data Redundancy Issue
Duplication of data reduced by storing data in separate linked tables, as most data is
stored only once
Items of data used to link tables by the use of foreign keys are stored more than once
DBMS flags any possible errors when any attempt is made to accidentally delete this type
of item
Data Inconsistency Issue
Since data is stored only once, updating once allows the updating items to be seen by all
applications
Integrity of data improved by increasing consistency
Easier to maintain since data needs be updated only once
Data Dependency Issue
Data is independent of the applications using the database, so changes to structure of
data has minimal effect on the applications using the database
Any changes to unused fields/tables does not affect the application since it only has
access to fields/tables it requires
The DBMS Approach
Uses a more structured approach to the management, organization, and maintenance of
data in a database
An already defined data structured can be used to set up and create the database
Uses a data dictionary to store the metadata, including the definition of tables, attributes,
relationship between tables and any indexing
Can also define the validation rules used for entry of data
Contains data about the physical storage of the data
Improve the integrity of data stored, helping to ensure that it is accurate, complete
and consistent
Data Modeling (For example, E-R diagram)
To show the data structure of a database
Logical Schema
A data model for a specific database that is independent of the DBMS used to build
the database
Provides data security to prevent unwanted alteration, corruption, deletion, or sharing
data with others that has no access
Using usernames, passwords to prevent unauthorized access
Encryption of the data stored
Using access rights to allow for different level of access
Creation of an activity log to record any actions taken by users of the database
Scheduling regular back-ups
8.2.2 The Use and Purpose of DBMS Software Tools
Developer Interface
Allows a developer to write queries in SQL (Structured Query Language) rather than using
query-by-example
Queries processed and executed by query processor
This allows construction of more complex queries to interrogate the database
Query Processor
Processes a query written in SQL
Includes a DDL interpreter, a DML compiler and a query evaluation engine
DDL statements are interpreted and recorded in the database’s data dictionary
DML statements are compiled into low level instructions that are executed by the
query evaluation engine
DML compiler will optimize the query
Skill Check 2
Moheem is creating a relational database to store data about his customers.
1. Moheem has been told a relational database addresses some of the limitation of
a file-based approach by reducing data redundancy.
a. State what is meant by the term data redundancy. [1]
b. Explain how a relational database can help to reduce data redundancy. [3]
2. Moheem uses a Database Management System (DBMS) to ensure the security
and integrity of the data.
a. Explain the difference between security and integrity. [2]
b. Name and describe two security features provided by a DBMS.
3. The DBMS provides software tools for the database developer.
Fill in the names of the missing software tools in the following statements.
A _________ allows a developer to extract data from a database.
A _________ enables a developer to create user-friendly forms and reports. [2]
Solution
‣
719bc52d4f2744a583349f8fc7fd4fd204bb497d29224bfa9a719fcaf2fc87a
a. Repeated data
b. Firstly, relational database stores each record once and references it by
a primary. Secondly, it stores the data in individual tables and the tables
are linked by relationships. And lastly, by the proper use of Primary and
Foreign keys
59c354f9b3a940bd94f7b7d023a911abea9c8cd35cc9423abccfa802b778
a. Security ensures that data is safe from loss whereas integrity ensures
that the data is consistent
59fce48e69b34f378723c600777122e30af99788f16b478faa6111fd31
i. Automatic Backup: It creates regular copies of data in case of loss
ii. Encryption: Data is incomprehensible to unauthorized users
1. Query Processor, Developer Interface
8.3 Data Definition Language (DDL) and Data
Manipulation Language (DML)
8.3.1 Industry Standard Methods for Building and Modifying
a Database
DDL is a language to create, modify, and remove the data structures that form a relational
database.
Written in a script that uses syntax similar to a computer program
DML is a language to add, modify, delete, and retrieve the data stored in a relational
database.
Written in a script that is similar to a computer program
Therefore, DDL is used to work relational database structures while DML is used to work
with the data stored in the relational databases
SQL is used for these two languages
8.3.2 SQL (DDL) Commands and Scripts
Data types for attributes Description
CHARACTER Fixed length text
VARCHAR(n) Variable length text
True or False, SQL uses the integers
BOOLEAN
1 and 0
INTEGER Whole Number
REAL Number with decimal places
A date usually formatted as YYYY-
DATE
MM-DD
A time usually formatted as
TIME
HH:MM:SS
To create a database
CREATE DATABASE
To create a table
CREATE TABLE
To edit structure of table
ALTER TABLE
To add primary key to a table
PRIMARY KEY
To add foreign key to a table
FOREIGN … REFERENCES …
Here’s a complete example of a DDL script used when the school database was created
8.3.3 SQL (DML) Commands and Scripts
The following are SQL (DML) Query commands:
To fetch data from database. Queries always begin with SELECT
SELECT FROM
To include only rows in a query that match a given condition
WHERE
To sort the results from a query by a given column either alphabetically or numerically
ORDER BY
To arrange data into groups
GROUP BY
Combine rows from different different tables if the join condition is true
INNER JOIN
Returns sum of all values in a column
SUM
Counts the number of rows where the column is not NUL
COUNT
Returns the average value for a column with a numeric data type
AVG
The following are SQL (DML) maintenance commands
To add new rows to a table
INSERT INTO
To remove rows from a table
DELETE FROM
To edit rows in a table
UPDATE
Some examples of DML commands for querying and updating the above school database
are:
To show the first and second name of students in class 7A in alphabetical order based on
their second name
Revision Guides / AS / COMPUTER SCIENCE / 08 Databases
To show the teacher’s name and the subject taught
To insert a row in Student table
If values for all columns are not known, then the table columns need to be specified
before the values are inserted
To delete specified rows from Student table
Just writing the first line would delete the whole table
To find the sum of exam marks from the table STUDENTSUBJECT
Skill Check 3
Example data from the table RENTAL are given:
RentalI Customer HouseI MonthlyC DepositPai
D ID D ost d
1 22 15B5L 1000.00 Yes
2 13 3F 687.00 No
3 1 12AB 550.00 Yes
4 3 37 444.50 Yes
1. Complete the following Data Definition Language (DDL) statement to define
the table RENTAL.
CREATE ___________ ______________ (
RentalID INTEGER NOT NULL,
CustomerID INTEGER NOT NULL,
HouseID ___________ (5) NOT NULL,
MonthlyCost ____________ NOT NULL,
DepositPaid BOOLEAN NOT NULL,
______________ (RentalID)
);
2. Write a Data Manipulation Language (DML) script to return the first name
and last name of all customers who have not paid their deposit.
Solution
‣
Points to Note
A database offers improved methods for ensuring data integrity compared to a file-based
approach
A relational database comprises tables of a special type; each table has a primary key
and may contain foreign keys
Entity-relationship modelling is a top-down approach to database design
Normalization is a database design method that starts with a collection of attributes and
converts them into first normal form then into second normal form and, finally, into third
normal form
A database architecture provides, for the user, a conceptual level interface to the stored
data
Features provided by a database management system (DBMS) include: a data dictionary,
indexing capability, control of user access rights and backup procedures
Structured Query Language (SQL) includes Data Definition Language (DDL) commands
for establishing a database and Data Manipulation Language (DML) commands for
creating queries