CSC2308 Lec 04
CSC2308 Lec 04
(CSC2308)
_______________________________________________
Zauwali S. Paki
Department of Computer Science
Yusuf Maitama Sule University, Kano
zspaki@yumsuk.edu.ng
Quiz 2 from the last lecture
Q1. Briefly explain the CIA triad (maximum of 6 lines)
Q2. Use monoalphabetic substitution to encipher the
text Come
“ back home with a roasted chicken”
Q3. Use monoalphabetic substitution to decipher
the text
“ep opu dpnf opz”.
Q4. Briefly explain the difference between
symmetric and Asymmetric encryption schemes
(maximum of 6 lines)
Data Management I 2
Reference book
Data Management I 3
Introduction to Database
Management System (DBMS)
Data Management I 4
What is a database?
• Database is a collection of data, typically
describing the activities of one or more related
organizations
• Example: a university database may contain
information about the following
• Entities:students, department, courses,
classrooms
• Relationships between entities: studentsregister
courses , facultyteaches courses, classroomsuse
for teaching courses
Data Management I 5
What is DBMS?
Data Management I 6
File System vs DBMS
• Why DBMS?
• Consider the following scenario: a company has a
large collection (say, 500 GB ) of data on employees,
1
Data Management I 7
File System vs DBMS
Data Management I 8
File System vs DBMS
Data Management I 9
Advantages of DBMS
• Data independence
• DBMS provides the abstract view of the data that
insulates application code from the details of the
data
• Efficient data access
• DBMS provides efficient methods to store and
retrieve data efficiently
• Data integrity and security
• DBMS enforces integrity constraints on the data,
enforcing access control on different classes of
users, etc.
Data Management I 10
Advantages of DBMS
• Data administration
• With multiple user access, centralizing data
administration offers significant improvement
• Concurrent access and crash recovery
• Allows multiple users accessing the same data.
Each user has the illusion that he/she is the only
one accessing the data. DBMS also protects users
from the effects of system failures
• Reduced application development time
Data Management I 11
Describing and storing data in DBMS
Data Management I 12
Describing and storing data in DBMS
Data Management I 13
The relational model
• The central thing in this model is therelation ,
which can be considered as a set of records
• A description of data in terms of a model is
called a schema
• Example
• Students(sid: string, name: string, login: string, dob:
date,gpa: real)
• Department(did: string, dname: string)
• Courses(cid: string, ctitle: string, status : string,
semester : string)
• Cgpas(sid: string, session : string, cgpa : real)
Data Management I 14
The relational model
• Instance of Students relation
sid name login dob Gpa
0001 Sani sani@css 12/02/1980 3.00
0002 Garba garba@mth 01/01/1992 2.80
0003 Sadiya sadiya@itc 10/09/2000 4.00
0004 John john@chm 04/01/1995 3.56
0005 Sabrina sabrina@phy 12/11/2001 2.50
Data Management I 15
Levels of abstractions in DBMS
• There are 3 levels of abstractions
• Conceptual schema
• External schema
• Physical schema
Data Management I 16
Conceptual schema
• Also known as logical schema
• Conceptual schema describes the stored data
in terms of data model of DBMS
• In RDBMS (Relational DBMS), the conceptual
schema describes all the relations that are
stored in the database
• These relations contain information about
entities
• Collection of entities and their associated
relationships can be described using relations
Data Management I 17
Conceptual schema
• Example of conceptual schema
Data Management I 18
Conceptual schema
• Determining the needed relations and their
fields is not always obvious
• The process of determining and arriving at a
good conceptual schema is known as logical
database design
Data Management I 19
Physical schema
• The physical schema gives additional storage
information
• It concisely specifies how the relations
described by the conceptual schema are going
to be stored on physical storage media such as
disks, tapes, and so on
• We decide on the file organization to use and
create indexes to speedup data lookup
• The process of coming up with a good physical
schema is known as physical database design
Data Management I 20
External schema
• External schemas are in terms of data models of the
DBMS
• It allows a customized data access at different levels
of users (user groups)
• Every database has just one conceptual schema and
one physical schema but may have many external
schemas each tailored towards a specific user group
• External schema consists of a collection of one or
more views and relations from the conceptual
schema
• View is just like a relation the difference being that it
is not stored
Data Management I 21
Queries in Database
• A query is a question posed on the data that
have been stored in the database
• Suppose we have a university database that
keeps data about many entities in the university
• We may wish to ask questions on the data that
the database contains
Data Management I 22
Queries in Database
• Example of queries:
• What is the name of the student with student
registration number UG18CSC0001?
• What is the average salary of professors who teach
the course with course code CSC5364?
• How many students are enrolled in course
CSC2308?
• What fraction of students in course CSC1201
received a grade better than B?
• Is there any student with a GPA less than 3.0
enrolled in course CSC2364?
• A query helps answer the above questions
Data Management I 23
Queries in Database
• DBMS provides a specialized language for
submitting queries
• This language is calledquery language
• DBMS allows users to create, modify, and query
data through a Data Manipulation Language
(DML)
• Query language is one part of the DML which
allows the insertion, deletion, and modification
of data
Data Management I 24
Transaction management
• A concurrent access to a database requires
some careful management
• E.g., airline reservation and banking databases
• In airline reservation system, multiple agents
can be making reservation at the same time
• There is need to avoid conflicts
• DBMS must order some actions that occur
concurrently
• There is the need to avoid multiple reservations of
the same seat
Data Management I 25
Transaction management
• In banking database several operations are
possible concurrently
• One application might be calculating total deposits
• Another application might be transferring some
money from one account to another
• These concurrent operations on a single account
need to be done carefully and in some logical
sequence while allowing concurrent operations
• We need to avoid anomalies (not having the global
snapshot of the account seen by all the
applications)
Data Management I 26
Transaction management
• DBMS must protect users from the effect of
system failures
• Data should be restored to a consistent state when
the system restarts after the crash
• Also the state of the application
• When the DBMS says a seat reservation is done
successfully, it should not be lost due to a system
crash
• Also, if the system crashes while an application is
making changes to the data, the partial changes
should undone when the system recovers
Data Management I 27
Transaction management
• A transaction is any one execution of the user
program in a DBMS
• Executing the same program several times will
generate several transactions
• Partial transaction is not permitted
• Group transact is in effect equivalent to some
sequential execution of the transactions
Data Management I 28
Concurrent execution of transactions
• DBMS allows concurrent accesses by multiple
users
• E.g., one application is debiting account and another
is crediting
• DBMS must ensure these operations are not
interleaved
• A user will have illusion that he/she is the only one
accessing the database
• Locking protocol is a set of rules that governs how
transactions are executed
• A lock is the technique used to control access to a
database
Data Management I 29
Concurrent execution of transactions
• Operations that could potentially lead to
inconsistency need to obtain lock on the data object
• DBMS provides 2 types of lock:shared andexclusive
• Shared lock can be obtained by multiple transactions
while exclusive lock can only be held by one transaction
• A locking protocol may be as follows:any transaction
that wants to read a data object will first need to obtain a
shared lock on that object and if the transaction wants to
modify the data will first obtain an exclusive lock on the
data
• The lock is released when a transaction completes
successfully
Data Management I 30
Concurrent execution of transactions
• Consider two transactionsT 1 andT 2 such that
T 1 wants to modify a data object andT 2 wants
T 1's
to read the same object. Intuitively, if
request for an exclusive lock on the object is
T 2 cannot proceed until
granted first, T 1
T
releases this lock, because 2's request for a
shared lock will not be granted by the DBMS
T of 1's actions will be
until then. Thus, all
completed before any T of 2's actions are
initiated
Data Management I 31
Incomplete Transactions and System
Crashes
• Transactions can be interrupted before running to
completion as a result of various reasons like
system crash
• So what will happen to changes made by
incomplete transactions?
• DBMS ensures that changes made by incomplete
transactions are removed from the database
• For example, if the DBMS is in the middle of
transferring money from account A to account B, and
has debited the first account but not yet credited the
second when the crash occurs, the money debited
from account A must be restored when the system
comes back up after the crash
Data Management I 32
Incomplete Transactions and System
Crashes
• How does DBMS undo the changes of incomplete
transaction?
• It maintains a log of all writes to the database
• All writes are first recorded in the log file (on disk)
before effecting it in the database, this is called
Write Ahead Log (WAL)
• Logs make it possible for the changes made by the
successfully completed transactions not to be lost
due to system crashes
• The time to recover from crash can be reduced by
periodically writing some information to the disk
• This is called checkpointing
Data Management I 33
Structure of a DBMS
Data Management I 34
People who deal with databases
Data Management I 35
Database application programmers
Data Management I 36
Database Administrator (DBA)
Data Management I 37
Entity-Relationship Model
Data Management I 38
Overview of Database Design
Data Management I 39
ER Model Basics
Data Management I 40
ER Model Basics
Data Management I 41
ER Model Basics
Data Management I 42
ER Model Basics
Data Management I 43
Key Constraints
Data Management I 44
Key Constraints
Data Management I 45
Participation Constraints
Data Management I 46
Participation Constraints
Data Management I 47
Weak Entities
Data Management I 48
Weak Entities
Data Management I 49
The Relational Model
Data Management I 50
Why Study the Relational Model?
Data Management I 51
Relational Database: Definitions
• Relational database : a set of relations
• Relation : made up of 2 parts:
• Instance : a table, with rows and columns.
#Rows = cardinality, #fields = degree / arity.
• Schema : specifies name of relation, plus name and
type of each column.
• E.g. Students(sid: string, name: string, login: string, age:
integer, gpa: real).
• Can think of a relation as a set of rows or tuples
(i.e., all rows are distinct).
Data Management I 52
Example Instance of Students
Relation
Data Management I 53
Relational Query Languages
• A major strength of the relational model:
supports simple, powerful querying of data
• Queries can be written intuitively, and the DBMS
is responsible for efficient evaluation.
• The key: precise semantics for relational queries.
• Allows the optimizer to extensively re-order
operations, and still ensure that the answer does
not change.
Data Management I 54
The SQL Query Language
• Developed by IBM (system R) in the 1970s
• Need for a standard since it is used by many
vendors
• Standards:
• SQL-86
• SQL-89 (minor revision)
• SQL-92 (major revision)
• SQL-99 (major extensions, current standard)
Data Management I 55
The SQL Query Language
• To find all 18 year old students, we can write:
• SELECT * FROM Students S WHERE S.age =18
Data Management I 56
Querying Multiple Relations
• What does the following query compute?
• SELECT S.name, E.cid FROM Students S, Enrolled E
WHERE S.sid = E.sid AND E.grade = “A”
Data Management I 57
Creating Relations in SQL
• Creates the Students relation. Observe that the
type (domain) of each field is specified, and
enforced by the DBMS whenever tuples are
added or modified
• CREATE TABLE Students (sid: CHAR(20), name:
CHAR(20),
login: CHAR(10), age: INTEGER, gpa: REAL)
Data Management I 58
Creating Relations in SQL
• As another example, theEnrolled table holds
information about courses that students take
• CREATE TABLE Enrolled (sid: CHAR(20), cid:
CHAR(20), grade: CHAR(2))
Data Management I 59
Destroying and Altering Relations
DROP TABLE Students
• Destroys the relation Students. The schema
informationand the tuples are deleted
ALTER TABLE Students ADD COLUMN firstYear:
integer
• The schema of Students is altered by adding a
new field; every tuple in the current instance is
extended with anull value in the new field
Data Management I 60
Adding and Deleting Tuples
• Can insert a single tuple using:
• INSERT INTO Students (sid, name, login, age, gpa)
VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)
• Can delete all tuples satisfying some condition (e.g.,
name = Smith):
• DELETE FROM Students S
WHERE S.name = ‘Smith’
Data Management I 61
Integrity Constraints (ICs)
• IC: condition that must be true for any instance
of the database; e.g.,domain constraints
• ICs are specified when schema is defined
• ICs are checked when relations are modified
• Alegal instance of a relation is one that
satisfies all specified ICs.
• DBMS should not allow illegal instances.
• If the DBMS checks ICs, stored data is more
faithful to real-world meaning.
• Avoids data entry errors, too!
Data Management I 62
Primary Key Constraints
• A set of fields is a key for a relation if :
1. No two distinct tuples can have same
values in all key fields, and
2. This is not true for any subset of the key
• Part 2 false? A superkey.
• If there’s >1 key for a relation, one of the
keys is chosen (by DBA) to be the primary
key
• E.g., sid is a key for Students. (What about
name?) The set {sid, gpa} is a superkey
Data Management I 63
Primary and Candidate Keys in
SQL
• Possibly many candidate keys (specified using
UNIQUE), one of which is chosen as the
primary key
• “For a given student and course, there is a
single grade.” vs. “Students can take only one
course, and receive a single grade for that
course; further, no two students in a course
receive the same grade.”
• CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid) )
Data Management I 64
Primary and Candidate Keys in
SQL
• Used carelessly, an IC can prevent the storage
of database instances that arise in practice
• CREATE TABLE Enrolled (sid CHAR(20) cid
CHAR(20),grade CHAR(2), PRIMARY KEY (sid),
UNIQUE (cid, grade) )
Data Management I 65
Foreign Keys, Referential
Integrity
• Foreign key: Set of fields in one relation that is used
to `refer’ to a tuple in another relation. (Must
correspond to primary key of the second relation.)
Like a ‘logical pointer’
• E.g. sid is a foreign key referring to Students:
• Enrolled(sid: string, cid: string, grade: string)
• If all foreign key constraints are enforced, referential
integrity is achieved, i.e., no dangling references
• Can you name a data model w/o referential
integrity?
• Links in HTML!
Data Management I 66
Foreign Keys in SQL
• Only students listed in the Students relation
should be allowed to enroll for courses
• CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid),
FOREIGN KEY (sid) REFERENCES Students )
Data Management I 67
Enforcing Referential Integrity
• ConsiderStudents andEnrolled ; sid in Enrolled is a
foreign key that references Students
• What should be done if an Enrolled tuple with a non-
existent student id is inserted? (Reject it!)
• What should be done if a Students tuple is deleted?
• Also delete all Enrolled tuples that refer to it
• Disallow deletion of a Students tuple that is referred to
• Set sid in Enrolled tuples that refer to it to a default sid
• (In SQL, also: Set sid in Enrolled tuples that refer to it to a
special value null, denoting `unknown’ or ‘inapplicable’.)
• Similar if primary key of Students tuple is updated
Data Management I 68
Referential Integrity in SQL
• SQL/92 and SQL:1999 support all 4 options on
deletes and updates.
• Default is NO ACTION (delete/update is rejected)
• CASCADE (also delete all tuples that refer to the
deleted tuple)
• SET NULL / SET DEFAULT (sets foreign key value of
referencing tuple)
• CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2), PRIMARY KEY (sid,
cid), FOREIGN KEY (sid) REFERENCES
Students ON DELETE CASCADE ON UPDATE
SET DEFAULT )
Data Management I 69
Logical DB Design: ER to
Relational
• Entity sets to tables:
CREATE TABLE
Employees (ssn
CHAR(11), name
CHAR(20), lot
INTEGER, PRIMARY
KEY (ssn))
Data Management I 70
Relationship Sets to Tables
• In translating a relationship set to a relation,
attributes of the relation must include:
• Keys for each participating entity set (as foreign
keys)
• This set of attributes forms a superkey for the relation.
• All descriptive attributes
• CREATE TABLE Works_In( ssn CHAR(1), did
INTEGER, since DATE, PRIMARY KEY (ssn,
did), FOREIGN KEY (ssn) REFERENCES
Employees, FOREIGN KEY (did) REFERENCES
Departments)
Data Management I 71
Translating ER Diagrams with Key
Constraints
Data Management I 72