Relational Databases: 8. Why Are Entity Integrity and Referential Integrity Important in A Database?
Relational Databases: 8. Why Are Entity Integrity and Referential Integrity Important in A Database?
Ans. Entity integrity and referential integrity are two forms of data integrity that are particularly
important in relational databases. Relational databases break the storage of data down into
elements that usually have to be joined back together again in order to produce meaningful
results. Without guarantees of these two types of integrity, data would get dropped or duplicated.
Relational Databases
Relational database management systems represent the most widely implemented database
model in the world today. The relational model relies on "Normalized" data. Normalization is a
process carried out in the database design process to ensure that the tables holding the data do
not have repeating data and cannot create repetition or loss when they are joined together
Reference Data
Most database applications use two different types of data: transaction data and reference data.
As an example, a very basic warehouse database will record stock movement transactions.
Imagine a transaction record such as: Item 1010, Quantity 5, Direction Out. A manager may wish
to see a report of this data giving the item name and the item description as well as the item
number. These details are attributes of a different entity that describes each part in the
warehouse. This is a reference table.
Entity Integrity
In the warehouse database example, the primary key of the parts table should be an attribute that
uniquely identifies each record. This is entity integrity. Generally, numeric fields are better for a
primary key than text fields. In the parts table, the best field for the primary key would be the
Item number. The organization may sell a variety of products with the same name, like "small
wrench." The Item number will always be unique per item, and so this ensures entity integrity for
the parts table. If several records had the same primary key value, referencing details from this
table by joining it to the transaction table would cause transaction records to duplicate in order to
match with each record with the same identifier.
Referential Integrity
Referential integrity ensures that the link between the transactions table and the parts table is
correctly formed. In the transaction table, the item number forms the foreign key, which is the
attribute that links to the primary key of the parts table. Only item numbers listed in the parts
table can appear in the transaction table. This is referential integrity. If the transactions table
contains an item number that is not also in the parts table, joining the tables together would cause
transactions with the missing item number to be left out of the results.
Composite key
A composite key is composed of two or more attributes, but it must be minimal.
Using the example from the candidate key section, possible composite keys are:
• First Name and Last Name – assuming there is no one else in the company with
the same name
• Last Name and Department ID – assuming two people with the same last name
don’t work in the same department
Primary key
The primary key is a candidate key that is selected by the database designer to be used
as an identifying mechanism for the whole entity set. It must uniquely identify tuples
in a table and not be null. The primary key is indicated in the ER model by
underlining the attribute.
• A candidate key is selected by the designer to uniquely identify tuples in a
table. It must not be null.
• A key is chosen by the database designer to be used as an identifying
mechanism for the whole entity set. This is referred to as the primary key. This
key is indicated by underlining the attribute in the ER model.
In the following example, EID is the primary key:
Employee(EID, First Name, Last Name, SIN, Address, Phone, BirthDate, Salary,
DepartmentID)
Secondary key
A secondary key is an attribute used strictly for retrieval purposes (can be composite),
for example: Phone and Last Name.
Alternate key
Alternate keys are all candidate keys not chosen as the primary key.
Foreign key
A foreign key (FK) is an attribute in a table that references the primary key in another
table OR it can be null. Both foreign and primary keys must be of the same data type.
In the COMPANY database example below, DepartmentID is the foreign key:
Employee(EID, First Name, Last Name, SIN, Address, Phone, BirthDate, Salary,
DepartmentID)
5) What are Nulls?
A null is a special symbol, independent of data type, which means either unknown or
inapplicable. It does not mean zero or blank. Features of null include:
• No data entry
• Not permitted in the primary key
• Should be avoided in other attributes
• Can represent
o An unknown attribute value
o A known, but missing, attribute value
o A “not applicable” condition
• Can create problems when functions such as COUNT, AVERAGE and SUM
are used
• Can create logical problems when relational tables are linked
NOTE: The result of a comparison operation is null when either argument is null. The
result of an arithmetic operation is null when either argument is null (except functions
that ignore nulls).
9. How would you implement a M:N relationship in a database composed of two tables? Give an
example.
Referential and entity integrity are crucial to preserving valid relationships between tables and data
within a database. SQL queries will begin to fail if the data keys that connect the dots between their
relationships do not match. If an entity or table is relying on the keys in another entity or table, then
relationships between the two can be lost if bad data is entered into one location. For instance,
referential integrity can be used to ensure foreign key values are valid. For instance, a database
table listing all the parts installed on a specific aircraft should have referential integrity connecting
the part numbers to a table listing valid part numbers for that aircraft so that in the event of a bad
part number being “fat-fingered” into the database, the RBDMS will return an error concerning the
bad data (IBM 2001).