[go: up one dir, main page]

0% found this document useful (0 votes)
55 views72 pages

CSC2308 Lec 04

This document provides an introduction to the Data Management I course. It discusses key concepts like the CIA triad, database management systems (DBMS), file systems vs DBMS, advantages of DBMS, data modeling, relational model, levels of abstraction in DBMS including conceptual, external and physical schemas, queries, and transaction management. The document is intended to familiarize students with fundamental database topics that will be covered in the course.

Uploaded by

aabdurrahaman647
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views72 pages

CSC2308 Lec 04

This document provides an introduction to the Data Management I course. It discusses key concepts like the CIA triad, database management systems (DBMS), file systems vs DBMS, advantages of DBMS, data modeling, relational model, levels of abstraction in DBMS including conceptual, external and physical schemas, queries, and transaction management. The document is intended to familiarize students with fundamental database topics that will be covered in the course.

Uploaded by

aabdurrahaman647
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Data Management I

(CSC2308)

_______________________________________________
Zauwali S. Paki
Department of Computer Science
Yusuf Maitama Sule University, Kano
zspaki@yumsuk.edu.ng
Quiz 2 from the last lecture
Q1. Briefly explain the CIA triad (maximum of 6 lines)
Q2. Use monoalphabetic substitution to encipher the
text Come
“ back home with a roasted chicken”
Q3. Use monoalphabetic substitution to decipher
the text
“ep opu dpnf opz”.
Q4. Briefly explain the difference between
symmetric and Asymmetric encryption schemes
(maximum of 6 lines)

You have 20 mins

Data Management I 2
Reference book

Database Management Systems 2nd


Edition (chapters 1 through 3) by Raghu
Ramakrishnan and Johannes Gehrke
printed by McGraw-Hills Higher Education

Data Management I 3
Introduction to Database
Management System (DBMS)

Data Management I 4
What is a database?
• Database is a collection of data, typically
describing the activities of one or more related
organizations
• Example: a university database may contain
information about the following
• Entities:students, department, courses,
classrooms
• Relationships between entities: studentsregister
courses , facultyteaches courses, classroomsuse
for teaching courses

Data Management I 5
What is DBMS?

• DBMS is a software designed to assist in


maintaining and utilizing large
collections of data.
• The need for such DBMS, as well as their
use, is growing rapidly
• Examples of DBMS: MySQL, MS Access,
SQL

Data Management I 6
File System vs DBMS

• Why DBMS?
• Consider the following scenario: a company has a
large collection (say, 500 GB ) of data on employees,
1

departments, products, sales, and so on. This data


is accessed concurrently by several employees.
Questions about the data must be answered quickly,
changes made to the data by different users must
be applied consistently, and access to certain parts
of the data (e.g., salaries) must be restricted.

Data Management I 7
File System vs DBMS

• Problems with using file system to store


large dataset
• We probably do not have 500 GB of main memory
• Even if we have 500 GB of main memory, a
computer with 32-bit addressing system cannot
refer to more 4 GB of data
• We have to write special programs to answer
questions that users have about the data. These
programs are likely to be very complex due to large
volume of the data
• We need to protect inconsistent changes to the
data by the users accessing the data concurrently

Data Management I 8
File System vs DBMS

• Problems with using file system to store


large dataset
• We have to ensure that the data is returned to a
consistent state if a system crashes when changes
are being made
• Operating system only provides password security.
This is insufficient to provide the flexibility such that
different users are assigned different permissions
to access different subsets of the data

Data Management I 9
Advantages of DBMS
• Data independence
• DBMS provides the abstract view of the data that
insulates application code from the details of the
data
• Efficient data access
• DBMS provides efficient methods to store and
retrieve data efficiently
• Data integrity and security
• DBMS enforces integrity constraints on the data,
enforcing access control on different classes of
users, etc.

Data Management I 10
Advantages of DBMS
• Data administration
• With multiple user access, centralizing data
administration offers significant improvement
• Concurrent access and crash recovery
• Allows multiple users accessing the same data.
Each user has the illusion that he/she is the only
one accessing the data. DBMS also protects users
from the effects of system failures
• Reduced application development time

Data Management I 11
Describing and storing data in DBMS

• We are concerned about the real-world entities. For


examples, a university database stores data about
students, departments, and courses as well as the
relationships between these entities
• DBMS allows user to specify the high level
description of the data about entities in terms of a
data model
• Data model
• A data model is a collection of high-level data
description constructs that hides many low-level
storage details

Data Management I 12
Describing and storing data in DBMS

• Majority of the DBMS today are based on a


relational data model
• A semantic data model is a more abstract,
high-level data model that enables user to
come up with good initial description of data in
an organization
• A popular semantic data model is the Entity
Relationship (ER) model that enables us to
graphically illustrates the entities and their
relationships

Data Management I 13
The relational model
• The central thing in this model is therelation ,
which can be considered as a set of records
• A description of data in terms of a model is
called a schema
• Example
• Students(sid: string, name: string, login: string, dob:
date,gpa: real)
• Department(did: string, dname: string)
• Courses(cid: string, ctitle: string, status : string,
semester : string)
• Cgpas(sid: string, session : string, cgpa : real)

Data Management I 14
The relational model
• Instance of Students relation
sid name login dob Gpa
0001 Sani sani@css 12/02/1980 3.00
0002 Garba garba@mth 01/01/1992 2.80
0003 Sadiya sadiya@itc 10/09/2000 4.00
0004 John john@chm 04/01/1995 3.56
0005 Sabrina sabrina@phy 12/11/2001 2.50

Data Management I 15
Levels of abstractions in DBMS
• There are 3 levels of abstractions
• Conceptual schema
• External schema
• Physical schema

Data Management I 16
Conceptual schema
• Also known as logical schema
• Conceptual schema describes the stored data
in terms of data model of DBMS
• In RDBMS (Relational DBMS), the conceptual
schema describes all the relations that are
stored in the database
• These relations contain information about
entities
• Collection of entities and their associated
relationships can be described using relations

Data Management I 17
Conceptual schema
• Example of conceptual schema

Data Management I 18
Conceptual schema
• Determining the needed relations and their
fields is not always obvious
• The process of determining and arriving at a
good conceptual schema is known as logical
database design

Data Management I 19
Physical schema
• The physical schema gives additional storage
information
• It concisely specifies how the relations
described by the conceptual schema are going
to be stored on physical storage media such as
disks, tapes, and so on
• We decide on the file organization to use and
create indexes to speedup data lookup
• The process of coming up with a good physical
schema is known as physical database design

Data Management I 20
External schema
• External schemas are in terms of data models of the
DBMS
• It allows a customized data access at different levels
of users (user groups)
• Every database has just one conceptual schema and
one physical schema but may have many external
schemas each tailored towards a specific user group
• External schema consists of a collection of one or
more views and relations from the conceptual
schema
• View is just like a relation the difference being that it
is not stored

Data Management I 21
Queries in Database
• A query is a question posed on the data that
have been stored in the database
• Suppose we have a university database that
keeps data about many entities in the university
• We may wish to ask questions on the data that
the database contains

Data Management I 22
Queries in Database
• Example of queries:
• What is the name of the student with student
registration number UG18CSC0001?
• What is the average salary of professors who teach
the course with course code CSC5364?
• How many students are enrolled in course
CSC2308?
• What fraction of students in course CSC1201
received a grade better than B?
• Is there any student with a GPA less than 3.0
enrolled in course CSC2364?
• A query helps answer the above questions

Data Management I 23
Queries in Database
• DBMS provides a specialized language for
submitting queries
• This language is calledquery language
• DBMS allows users to create, modify, and query
data through a Data Manipulation Language
(DML)
• Query language is one part of the DML which
allows the insertion, deletion, and modification
of data

Data Management I 24
Transaction management
• A concurrent access to a database requires
some careful management
• E.g., airline reservation and banking databases
• In airline reservation system, multiple agents
can be making reservation at the same time
• There is need to avoid conflicts
• DBMS must order some actions that occur
concurrently
• There is the need to avoid multiple reservations of
the same seat

Data Management I 25
Transaction management
• In banking database several operations are
possible concurrently
• One application might be calculating total deposits
• Another application might be transferring some
money from one account to another
• These concurrent operations on a single account
need to be done carefully and in some logical
sequence while allowing concurrent operations
• We need to avoid anomalies (not having the global
snapshot of the account seen by all the
applications)

Data Management I 26
Transaction management
• DBMS must protect users from the effect of
system failures
• Data should be restored to a consistent state when
the system restarts after the crash
• Also the state of the application
• When the DBMS says a seat reservation is done
successfully, it should not be lost due to a system
crash
• Also, if the system crashes while an application is
making changes to the data, the partial changes
should undone when the system recovers

Data Management I 27
Transaction management
• A transaction is any one execution of the user
program in a DBMS
• Executing the same program several times will
generate several transactions
• Partial transaction is not permitted
• Group transact is in effect equivalent to some
sequential execution of the transactions

Data Management I 28
Concurrent execution of transactions
• DBMS allows concurrent accesses by multiple
users
• E.g., one application is debiting account and another
is crediting
• DBMS must ensure these operations are not
interleaved
• A user will have illusion that he/she is the only one
accessing the database
• Locking protocol is a set of rules that governs how
transactions are executed
• A lock is the technique used to control access to a
database

Data Management I 29
Concurrent execution of transactions
• Operations that could potentially lead to
inconsistency need to obtain lock on the data object
• DBMS provides 2 types of lock:shared andexclusive
• Shared lock can be obtained by multiple transactions
while exclusive lock can only be held by one transaction
• A locking protocol may be as follows:any transaction
that wants to read a data object will first need to obtain a
shared lock on that object and if the transaction wants to
modify the data will first obtain an exclusive lock on the
data
• The lock is released when a transaction completes
successfully

Data Management I 30
Concurrent execution of transactions
• Consider two transactionsT 1 andT 2 such that
T 1 wants to modify a data object andT 2 wants
T 1's
to read the same object. Intuitively, if
request for an exclusive lock on the object is
T 2 cannot proceed until
granted first, T 1
T
releases this lock, because 2's request for a
shared lock will not be granted by the DBMS
T of 1's actions will be
until then. Thus, all
completed before any T of 2's actions are
initiated

Data Management I 31
Incomplete Transactions and System
Crashes
• Transactions can be interrupted before running to
completion as a result of various reasons like
system crash
• So what will happen to changes made by
incomplete transactions?
• DBMS ensures that changes made by incomplete
transactions are removed from the database
• For example, if the DBMS is in the middle of
transferring money from account A to account B, and
has debited the first account but not yet credited the
second when the crash occurs, the money debited
from account A must be restored when the system
comes back up after the crash

Data Management I 32
Incomplete Transactions and System
Crashes
• How does DBMS undo the changes of incomplete
transaction?
• It maintains a log of all writes to the database
• All writes are first recorded in the log file (on disk)
before effecting it in the database, this is called
Write Ahead Log (WAL)
• Logs make it possible for the changes made by the
successfully completed transactions not to be lost
due to system crashes
• The time to recover from crash can be reduced by
periodically writing some information to the disk
• This is called checkpointing

Data Management I 33
Structure of a DBMS

Data Management I 34
People who deal with databases

• A variety of people are involved in the creation


and use of databases
• Database implementors who build the DBMS
software They work for vendors like IBM and Oracle
• End users who use the database
• Besides database implementors and end users,
other classes of people associated with a
DBMS are:
• Application programmers
• Database administrators (DBAs)

Data Management I 35
Database application programmers

• Develop application packages that facilitate


data access by end users
• They use the tools that DBMS vendors provide
such as report writers, spreadsheets, statistical
packages, etc.
• Application programs support data access
through the external schemas (ideally)

Data Management I 36
Database Administrator (DBA)

• Design of the conceptual and physical schemas by


interacting with the users
• Security and authorization: DBA disallows
unauthorized access to the database
• Users are granted permission to access only certain
portion of the data
• Data availability and recovery from failures:
• Users should continue to access data in the event of crash
• Periodic backups
• Database tuning:
• Modifying the database to meetup with the changes in
users requirements (modifying the conceptual and physical
schemas)

Data Management I 37
Entity-Relationship Model

Data Management I 38
Overview of Database Design

• Conceptual design: (ER Model is used at this


stage)
• What are the entities and relationships in the
enterprise?
• What information about these entities and
relationships should we store in the database?
• What are the integrity constraints or business rules
that hold?
• A database schema in the ER Model can be
represented pictorially (ER diagrams)
• Can map an ER diagram into a relational schema

Data Management I 39
ER Model Basics

• Entity : Real-world object distinguishable from


other objects. An entity is described (in DB)
using a set of attributes
• Entity Set : A collection of similar entities E.g.,
all employees, all students, all professors
• All entities in an entity set have the same set of
attributes
• Each entity set has a key
• Each attribute has a domain

Data Management I 40
ER Model Basics

Data Management I 41
ER Model Basics

• Relationship : Association among two or more


entities E.g., Abubakar works in Pharmacy
department.
• Relationship Set : Collection of similar
relationships.
• An n-ary relationship set R relates n entity sets E1 ...
En; each relationship in R involves entities e1 E1, ...
, en En
• Same entity set could participate in different relationship
sets, or in different “roles” in same set

Data Management I 42
ER Model Basics

Data Management I 43
Key Constraints

• ConsiderWorks_In : An employee can work in


many departments; a dept can have many
employees.
• In contrast, each dept has at most one manager,
according to thekey constraint on Manages

Data Management I 44
Key Constraints

Data Management I 45
Participation Constraints

• Does every department have a manager?


• If so, this is aparticipation constraint : the
participation of Departments in Manages is
said to be total (vs. partial)
• Everydid value in Departments table
must appear in a row of theManages
table (with a non-null ssn value!)

Data Management I 46
Participation Constraints

Data Management I 47
Weak Entities

• A weak entity can be identified uniquely only by


considering theprimary key of another (owner)
entity
• Owner entity set and weak entity set must
participate in a one-to-many relationship set (one
owner, many weak entities)
• Weak entity set must have total participation in this
identifying relationship set.

Data Management I 48
Weak Entities

Data Management I 49
The Relational Model

Data Management I 50
Why Study the Relational Model?

• Most widely used model


• Vendors: IBM, Informix, Microsoft, Oracle, Sybase,
etc.
• “Legacy systems” in older models
• E.g., IBM’s IMS
• Recent competitor: object-oriented model
• ObjectStore, Versant, Ontos
• A synthesis emerging: object-relational model
• Informix Universal Server, UniSQL, O2, Oracle, DB2

Data Management I 51
Relational Database: Definitions
• Relational database : a set of relations
• Relation : made up of 2 parts:
• Instance : a table, with rows and columns.
#Rows = cardinality, #fields = degree / arity.
• Schema : specifies name of relation, plus name and
type of each column.
• E.g. Students(sid: string, name: string, login: string, age:
integer, gpa: real).
• Can think of a relation as a set of rows or tuples
(i.e., all rows are distinct).

Data Management I 52
Example Instance of Students
Relation

• Cardinality = 3, degree = 5, all rows are distinct


• Do all columns in a relation instance have to be
distinct?

Data Management I 53
Relational Query Languages
• A major strength of the relational model:
supports simple, powerful querying of data
• Queries can be written intuitively, and the DBMS
is responsible for efficient evaluation.
• The key: precise semantics for relational queries.
• Allows the optimizer to extensively re-order
operations, and still ensure that the answer does
not change.

Data Management I 54
The SQL Query Language
• Developed by IBM (system R) in the 1970s
• Need for a standard since it is used by many
vendors
• Standards:
• SQL-86
• SQL-89 (minor revision)
• SQL-92 (major revision)
• SQL-99 (major extensions, current standard)

Data Management I 55
The SQL Query Language
• To find all 18 year old students, we can write:
• SELECT * FROM Students S WHERE S.age =18

• To find just names and logins, replace the first


line:
• SELECT S.name, S.login

Data Management I 56
Querying Multiple Relations
• What does the following query compute?
• SELECT S.name, E.cid FROM Students S, Enrolled E
WHERE S.sid = E.sid AND E.grade = “A”

Data Management I 57
Creating Relations in SQL
• Creates the Students relation. Observe that the
type (domain) of each field is specified, and
enforced by the DBMS whenever tuples are
added or modified
• CREATE TABLE Students (sid: CHAR(20), name:
CHAR(20),
login: CHAR(10), age: INTEGER, gpa: REAL)

Data Management I 58
Creating Relations in SQL
• As another example, theEnrolled table holds
information about courses that students take
• CREATE TABLE Enrolled (sid: CHAR(20), cid:
CHAR(20), grade: CHAR(2))

Data Management I 59
Destroying and Altering Relations
DROP TABLE Students
• Destroys the relation Students. The schema
informationand the tuples are deleted
ALTER TABLE Students ADD COLUMN firstYear:
integer
• The schema of Students is altered by adding a
new field; every tuple in the current instance is
extended with anull value in the new field

Data Management I 60
Adding and Deleting Tuples
• Can insert a single tuple using:
• INSERT INTO Students (sid, name, login, age, gpa)
VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)
• Can delete all tuples satisfying some condition (e.g.,
name = Smith):
• DELETE FROM Students S
WHERE S.name = ‘Smith’

Data Management I 61
Integrity Constraints (ICs)
• IC: condition that must be true for any instance
of the database; e.g.,domain constraints
• ICs are specified when schema is defined
• ICs are checked when relations are modified
• Alegal instance of a relation is one that
satisfies all specified ICs.
• DBMS should not allow illegal instances.
• If the DBMS checks ICs, stored data is more
faithful to real-world meaning.
• Avoids data entry errors, too!

Data Management I 62
Primary Key Constraints
• A set of fields is a key for a relation if :
1. No two distinct tuples can have same
values in all key fields, and
2. This is not true for any subset of the key
• Part 2 false? A superkey.
• If there’s >1 key for a relation, one of the
keys is chosen (by DBA) to be the primary
key
• E.g., sid is a key for Students. (What about
name?) The set {sid, gpa} is a superkey

Data Management I 63
Primary and Candidate Keys in
SQL
• Possibly many candidate keys (specified using
UNIQUE), one of which is chosen as the
primary key
• “For a given student and course, there is a
single grade.” vs. “Students can take only one
course, and receive a single grade for that
course; further, no two students in a course
receive the same grade.”
• CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid) )

Data Management I 64
Primary and Candidate Keys in
SQL
• Used carelessly, an IC can prevent the storage
of database instances that arise in practice
• CREATE TABLE Enrolled (sid CHAR(20) cid
CHAR(20),grade CHAR(2), PRIMARY KEY (sid),
UNIQUE (cid, grade) )

Data Management I 65
Foreign Keys, Referential
Integrity
• Foreign key: Set of fields in one relation that is used
to `refer’ to a tuple in another relation. (Must
correspond to primary key of the second relation.)
Like a ‘logical pointer’
• E.g. sid is a foreign key referring to Students:
• Enrolled(sid: string, cid: string, grade: string)
• If all foreign key constraints are enforced, referential
integrity is achieved, i.e., no dangling references
• Can you name a data model w/o referential
integrity?
• Links in HTML!

Data Management I 66
Foreign Keys in SQL
• Only students listed in the Students relation
should be allowed to enroll for courses
• CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid),
FOREIGN KEY (sid) REFERENCES Students )

Data Management I 67
Enforcing Referential Integrity
• ConsiderStudents andEnrolled ; sid in Enrolled is a
foreign key that references Students
• What should be done if an Enrolled tuple with a non-
existent student id is inserted? (Reject it!)
• What should be done if a Students tuple is deleted?
• Also delete all Enrolled tuples that refer to it
• Disallow deletion of a Students tuple that is referred to
• Set sid in Enrolled tuples that refer to it to a default sid
• (In SQL, also: Set sid in Enrolled tuples that refer to it to a
special value null, denoting `unknown’ or ‘inapplicable’.)
• Similar if primary key of Students tuple is updated

Data Management I 68
Referential Integrity in SQL
• SQL/92 and SQL:1999 support all 4 options on
deletes and updates.
• Default is NO ACTION (delete/update is rejected)
• CASCADE (also delete all tuples that refer to the
deleted tuple)
• SET NULL / SET DEFAULT (sets foreign key value of
referencing tuple)
• CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2), PRIMARY KEY (sid,
cid), FOREIGN KEY (sid) REFERENCES
Students ON DELETE CASCADE ON UPDATE
SET DEFAULT )
Data Management I 69
Logical DB Design: ER to
Relational
• Entity sets to tables:

CREATE TABLE
Employees (ssn
CHAR(11), name
CHAR(20), lot
INTEGER, PRIMARY
KEY (ssn))

Data Management I 70
Relationship Sets to Tables
• In translating a relationship set to a relation,
attributes of the relation must include:
• Keys for each participating entity set (as foreign
keys)
• This set of attributes forms a superkey for the relation.
• All descriptive attributes
• CREATE TABLE Works_In( ssn CHAR(1), did
INTEGER, since DATE, PRIMARY KEY (ssn,
did), FOREIGN KEY (ssn) REFERENCES
Employees, FOREIGN KEY (did) REFERENCES
Departments)

Data Management I 71
Translating ER Diagrams with Key
Constraints

Data Management I 72

You might also like