[go: up one dir, main page]

0% found this document useful (0 votes)
26 views60 pages

Database Chapter 1 CS307

Uploaded by

elbana795
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views60 pages

Database Chapter 1 CS307

Uploaded by

elbana795
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Database Systems (CIS 340)

BY: Hatem Moharram


Course Outline

Lectures are day Tuesday 12:00-2:00 PM,


Lab. day 2,4 2-4 PM

will be given by Dr. Hatem Moharram


(email: hatem@sci.cu.edu.eg).

Labs will start in the first week:


Units of Credit

• 4 Units of Credit.

• Along with lectures and the exam, the


assignments and projects make up a
significant part of the course load.
Assessment

• Type Worth Due Date


• Mid term exam 10% week 8
• Home works 15%
• Lab assignment and exams 15%
• Final exam (closed book) 60%
REQUIRED COURSE MATERIALS AND
READINGS

• "Database Management Systems" second edition,


Raghu Ramakrishnan, Johannes Gehrke.
CLASS SCHEDULE
Week Lecture topic Lab
Week 1 Introduction to database systems Tutorial: database systems
Week 2 Transaction Management and security Tutorial: SQL, Relation
creation
Week 3 THE ENTITY-RELATIONSHIP MODE Tutorial: ER model
Week 4 THE ENTITY-RELATIONSHIP MODE Tutorial: ER model
Week 5 Conceptual Database Design With the ER Tutorial: student ER projects
Model
Week 6 THE RELATIONAL MODEL Tutorial: student ER projects
Week 7 THE RELATIONAL MODEL Tutorial: SQL
Week8 Midterm exam
Week 9 ER to Relational Tutorial: SQL
Week 10 ER to Relational Tutorial: SQL
Week 11 Querying Relational Data Tutorial: SQL
Week 12 RELATIONAL ALGEBRA AND CALCULUS Practice: relational algebra
and calculus.
Week 13 RELATIONAL ALGEBRA AND CALCULUS Practice: relational algebra
and calculus.
Week 14 RELATIONAL ALGEBRA AND CALCULUS Practice: relational algebra
and calculus.
Week 15 revision revision
CHAPTER 1
OVERVIEW OF DATABASE SYSTEMS

- What is a DBMS, in particular, a relational DBMS?


- Why should we consider a DBMS to manage data?
- How is application data represented in a DBMS?
- How is data in a DBMS retrieved and manipulated?
- How does a DBMS support concurrent access and
protect data during system failures?
- What are the main components of a DBMS?
- Who is involved with databases in real life?
Database

A database is a collection of data, typically describing


the activities of one or more related organizations.

Ex.: a university database might contain information


about the following:
• Entities such as students, faculty, courses, and
classrooms.

• Relationships between entities, such as students'


enrollment in courses, faculty teaching courses, and
the use of rooms for courses.
database management system (DBMS)

• A database management system, or DBMS, is


software designed to assist in maintaining
and utilizing large collections of data.

• The alternative to using a DBMS is to store


the data in files and write application-specific
code to manage it. The use of a DBMS has
several important advantages.
FILE SYSTEMS VERSUS A DBMS
To understand the need for a DBMS:
A company has a large collection (say, 500 GB ) of data on
employees, departments, products, sales, and so on. This data is
accessed concurrently by several employees. Questions about the
data must be answered quickly, changes made to the data by
different users must be applied consistently, and access to certain
parts of the data (e.g., salaries) must be restricted.

storing it in operating system files has many drawbacks, including


the following:

• We probably do not have 500 GB of main memory to hold all the


data. We must therefore store data in a storage device such as a
disk or tape and bring relevant parts into main memory for
processing as needed.
•Even if we have 500 GB of main memory, on computer
systems with 32-bit addressing, we cannot refer directly
to more than about 4 GB of data. We have to program
some method of identifying all data items.

• We have to write special programs to answer each


question a user may want to ask about the data. These
programs are likely to be complex because of the large
volume of data to be searched.

• We must protect the data from inconsistent changes


made by different users accessing the data concurrently. If
applications must address the details of such concurrent
access, this adds greatly to their complexity.
• We must ensure that data is restored to a consistent
state if the system crashes while changes are being made.

• Operating systems provide only a password mechanism


for security. This is not sufficiently flexible to enforce
security policies in which different users have permission
to access different subsets of the data.
ADVANTAGES OF A DBMS
Using a DBMS to manage data has many advantages:

• Data Independence: Application programs should not, be


exposed to details of data representation and storage, The
DBMS provides an abstract view of the data that hides
such details.

• Efficient Data Access: A DBMS utilizes a variety of


sophisticated techniques to store and retrieve data
efficiently. This feature is especially important if the data
is stored on external storage devices.
•Data Integrity and Security: If data is always accessed
through the DBMS, the DBMS can enforce integrity
constraints.
For example, before inserting salary information for
an employee, the DBMS can check that the department
budget is not exceeded.
Also, it can enforce access controls that govern what
data is visible to different classes of users.
•Data Administration:
- offer significant improvements for data sharing.
- responsible for organizing the data representation to
minimize redundancy.
- fine-tuning the storage of the data to make retrieval
efficient.
•Concurrent Access and Crash Recovery: A DBMS
schedules concurrent accesses to the data in such a
manner that users can think of the data as being accessed
by only one user at a time. Further, the DBMS protects
users from the effects of system failures.

•Reduced Application Development Time: Clearly, the


DBMS supports important functions that are common to
many applications accessing data in the DBMS. This, in
conjunction with the high-level interface to the data,
facilitates quick application development. DBMS
applications are also likely to be more robust than similar
stand-alone applications because many important tasks
are handled by the DBMS (and do not have to be
debugged and tested in the application).
DESCRIBING AND STORING DATA IN A DBMS
The user of a DBMS is concerned with some real-world
enterprise, and the data to be stored describes various
aspects of this enterprise.

Example (university database ):


- Students - faculty - courses
the data in a university database describes these entities
and their relationships.
A data model is a collection of high-level data description
constructs that hide many low-level storage details.

A DBMS allows a user to define the data to be stored in


terms of a data model. Most database management systems
today are based on the relational data model.

sid name Login age gpa


5678 Fady Fady@Kfu 19 3.5
5679 Shady Shady@kfu 20 3.8
5680 Fatma Fatma@kfu 19 2.7
A semantic data model is a more abstract, high-level data
model that makes it easier for a user to come up with a
good initial description of the data in an enterprise.
These models contain a wide variety of constructs that
help describe a real application scenario.

A DBMS is not intended to support all these constructs


directly; it is typically built around a data model with just a
few basic constructs, such as the relational model.
A widely used semantic data model called the entity-
relationship (ER) model allows us to pictorially denote
entities and the relationships among them.

since
ssn name lot did dname budget

Employees Works_In Departments

Locations
capacity address
The Relational Model

The central data description construct in this model is a


relation, which can be thought of as a set of records. A
description of data in terms of a data model is called a
schema.

sid name Login age gpa


5678 Fady Fady@Kfu 19 3.5
5679 Shady Shady@kfu 20 3.8
5680 Fatma Fatma@kfu 19 2.7
The schema for a relation specifies:
1- its name,
2- the name of each field (or attribute or column),
3- the type of each field.

Ex.: student information in a university database may be


stored in a relation with the following schema:

Students( sid: string, name: string, login: string,


age: integer, gpa: real)
Students( sid: string, name: string, login: string, age: integer, gpa: real)

An example instance of the Students relation.

sid name Login age gpa


5678 Fady Fady@Kfu 19 3.5
5679 Shady Shady@kfu 20 3.8
5680 Fatma Fatma@kfu 19 2.7

Every row follows the schema of the Students relation. The


schema call therefore be regarded as a template for
describing a student.
we can make the description of a collection of students
more precise by specifying integrity constraints,
which are conditions that the records in a
relation must satisfy.

for example, we could specify that every student has a


unique sid value.

The expressiveness of the constructs available for


specifying integrity constraints is an important aspect of
a data model.
Other Data Models

the relational data model Includes:


IBM's DB2 Informix Oracle
Sybase Microsoft's Access FoxBase,

Another data models include the hierarchical model


IBM's IMS DBMS

the network model


IDS IDMS
the object-oriented model
Objectstore Versant

the object-relational model


DBMS products from IBM Informix
ObjectStore Oracle Versant

the object-relational model is an effort to combine the


best features of the relational and object-oriented
models, and a good grasp of the relational model is
necessary to understand object-relational concepts.
Levels of Abstraction in a DBMS
The data in a DBMS is described at three levels of
abstraction. The database description consists of a
schema at each of these three levels of abstraction:
conceptual level physical level external
level
External Schema 1 External Schema 2 External Schema 3

Conceptual Schema

Physical Schema

Disk
•A data definition language (DDL) is used to define the
external and conceptual schemas.
- the most widely used database language is SQL.

•All DBMS vendors also support SQL commands to describe


aspects of the physical schema, but these commands are not
part of the SQL language standard.

•Information about the conceptual, external, and physical


schemas is stored in the system catalogs.
Conceptual Schema
sometimes called the logical schema, describes the stored
data in terms of the data model of the DBMS.

In a relational DBMS, the conceptual schema describes all


relations that are stored in the database.

In the university database, these relations contain


information about
1- entities, such as students and faculty, and about
2- relationships, such as students‘ enrollment in
courses.
since
sid name login cid cname credits

Students Enrolled Courses

Rooms Teaches
rno Meets_In
fid
address Faculty
fname fid
All student entities can be described using records in a
Students relation, as we saw earlier.

In fact, each collection of entities and each collection of


relationships can be described as a relation, leading to the
following conceptual schema:

Students (sid: string, name: string, login: string, age:


integer, gpa: real)
Faculty(fid: string, fname: string, sal: real)
Courses( cid: string, cname: string, credits: integer)
Rooms(rno: integer, address: string, capacity: integer)
Enrolled (sid: string, cid: string, grade: string)
Teaches(fid: string, cid: string)
Meets_In( cid: string, rno: integer, time: string)
Physical Schema

- specifies additional storage details.

-summarizes how the relations described in the conceptual


schema are actually stored on secondary storage devices
such as disks and tapes.

- We must decide what file organizations to use to store the


relations and create auxiliary data structures, called indexes,
to speed up data retrieval operations.
A sample physical schema for the university database
follows:
• Store all relations as unsorted files of records. (A file in a
DBMS is either a collection of records or a collection of
pages, rather than a string of characters as in an operating
system.)
• Create indexes on the first column of the Students,
Faculty, and Courses relations, the sal column of Faculty,
and the capacity column of Rooms.
Decisions about the physical schema are based on an
understanding of how the data is typically accessed. The
process of arriving at a good physical schema is called
physical database design.
External Schema

-Allow data access to be customized and authorized at the


level of individual users or groups of users.

-Any given database has exactly one conceptual


schema and one physical schema,

-but it may have several external schemas, each


tailored to a particular group of users.
-Each external schema consists of a collection of one or
more views and relations from the conceptual schema.

-A view is conceptually a relation, but the


records in a view are not stored in the DBMS.
Rather, they are computed using a definition
for the view, in terms of relations stored in
the DBMS.
- The external schema design is guided by end user
requirements.

- example: we might want to allow students to find out


the names of faculty members teaching courses as well as
course enrollments. This can be done by defining the
following view:

Courseinfo( cid: string, fname: string, enrollment: integer)

-A user can treat a view just like a relation and ask


questions about the records in the view.

-the records in the view are not stored explicitly, they are
computed as needed.
Data Independence

-application programs are insulated from changes in the


way the data is structured and stored.

-the conceptual schema and the external schema provide


distinct benefits in this area.

-If the conceptual schema is changed, the definition of a


view relation can be modified so that the same relation is
computed as before.
Example: suppose that the Faculty relation is replaced by
the following two relations:
Faculty(fid: string, fname: string, sal: real)

Faculty_public (fid: string, fname: string, office: integer)


Faculty_private (fid: string, sal: real)

-The Courseinfo view relation:

Courseinfo( cid: string, fname: string, enrollment: integer)

can be redefined in terms of Faculty_public and


Faculty_private, so that a user who queries Courseinfo will
get the same answers as before.
- logical data independence: users are shielded from
changes in the logical structure of the data, or changes in
the choice of relations to be stored.

- physical data independence: the conceptual schema


insulates users from changes in physical storage details.

- The conceptual schema hides details such as how the


data is actually laid out on disk, the file structure, and the
choice of indexes.
QUERIES IN A DBMS
Relational database systems allow a rich class of questions
to be posed easily.

The following are some questions a user might ask:

1. What is the name of the student with student ID


1234567
2. What is the average salary of professors who teach course
CS5647
3. How many students are enrolled in CS5647
4. What fraction of students in CS564 received a grade
better than B7
5. Is any student with a CPA less than 3.0 enrolled in CS5647
- Such questions involving the data stored in a DBMS are
called queries.

-A DBMS provides a specialized language, called the query


language, in which queries can be posed.

-Relational calculus is a formal query language based on


mathematical logic, and queries in this language have an
intuitive, precise meaning.

-Relational algebra is another formal query language,


based on a collection of operators for manipulating
relations, which is equivalent in power to the calculus.
-A DBMS enables users to create, modify, and query data
through a data manipulation language (DML).

-the query language is only one part of the DML, which


also provides constructs to insert, delete, and modify data.

- The DML and DDL are collectively referred to as the data


sublanguage when embedded within a host language
(e.g., C or COBOL).
TRANSACTION MANAGEMENT

- A transaction is anyone execution of a user program in a


DBMS.

- Executing the same program several times will generate


several transactions.

- The transaction is the basic unit of change as seen by the


DBMS.

- Partial transactions are not allowed.


airline reservations database
At any given instant: travel agents are looking up
information about available seats on various flights and
making new seat reservations.
When several users access (and possibly modify) a
database concurrently, the DBMS must order their
requests carefully to avoid conflicts.
when one travel agent looks up Flight 100 and finds
an empty seat, another travel agent may simultaneously
be making a reservation for that seat, thereby making the
information seen by the first agent obsolete.
if a travel agent asks for a reservation, and the DBMS
responds saying that the reservation has been made, the
reservation should not be lost if the system crashes.

if the DBMS has not yet responded to the request, but is


making the necessary changes to the data when the crash
occurs, the partial changes should be undone when the
system comes back up.
Ex.: bank's database

While one user's application program is computing the


total deposits, another application may transfer money
from an account that the first application has just 'seen' to
an account that has not yet been seen, thereby causing
the total to appear larger than it should be.

disallowing concurrent access can degrade performance.

the DBMS must protect users from the effects of system


failures by ensuring that all data (and the status of active
applications) is restored to a consistent state when the
system is restarted after a crash.
Concurrent Execution of Transactions
An important task of a DBMS is to schedule concurrent
accesses to data so that each user can safely ignore the fact
that others are accessing the data concurrently.

A DBMS allows users to think of their programs as if they


were executing in isolation, one after the other in some
order chosen by the DBMS.

A locking protocol is a set of rules to be followed by each


transaction (and enforced by the DBMS) to ensure that,
even with interleaving the actions of several transactions,
the net effect is identical to executing all transactions in
some serial order.
A lock is a mechanism used to control access to database
objects.

Two kinds of locks are commonly supported by a DBMS:

shared lock and exclusive lock.


- shared locks on an object can be held by two different
transactions at the same time.

- exclusive lock on an object ensures that no other


transactions hold any lock on this object.
-example for a locking protocol:

1- Every transaction begins by obtaining a shared


lock on each data object that it needs to read.

2- Every transaction obtains an exclusive lock on


each data object that it needs to modify.

3- Every transaction releases all its locks after


completing all actions.
Consider two transactions T1 and T2 such that:
T1 wants to modify a data object and
T2 wants to read the same object.

- if T1's request for an exclusive lock on the object is


granted first, T2 cannot proceed until T1 releases this
lock, because T2's request for a shared lock will not be
granted by the DBMS until then. Thus, all of T1's actions
will be completed before any of T2's actions are initiated.
Incomplete Transactions and System Crashes

Transactions can be interrupted before running to


completion for a variety of reasons, e.g., a system crash.

A DBMS must ensure that the changes made by such


incomplete transactions are removed from the database.

For example, if the DBMS is in the middle of transferring


money from account A to account B and has debited the
first account but not yet credited the second when the
crash occurs, the money debited from account A must be
restored when the system comes back up after the crash.
the DBMS maintains a log of all writes to the database.

each write action must be recorded in the log (on disk)


before the corresponding change is reflected in the
database itself--otherwise, if the system crashes just after
making the change in the database but before the change
is recorded in the log, the DBMS would be unable to detect
and undo this change. This property is called Write-Ahead
Log, or WAL.
To ensure this property, the DBMS must be able to
selectively force a page in memory to disk.
The log is also used to ensure that the changes made by a
successfully completed transaction are not lost due to a
system crash.
The process of ensuring that the DB is in a consistent state
after system crash is a slow process.

After a system crash:


-The DBMS ensure that the effects of all transactions that
completed prior to the crash are restored, and

-The DBMS must ensure that the effects of incomplete


transactions are undone.

The time required to recover from a crash can be reduced


by periodically forcing some information to disk; this
periodic operation is called a checkpoint.
STRUCTURE OF A DBMS
query optimizer: uses
information about how the
data is stored to produce an
efficient execution plan for
evaluating the query.

An execution plan: is a blueprint for evaluating a query,


usually represented as a tree of relational operators
Relational operators: serve as the building blocks for
evaluating queries posed against the data.

the file and access


methods: This
layer supports the
concept of a file,
which is a
collection of pages
or a collection of
records,

and keeps track of the pages in a file. This layer


organizes the information within a page.
the buffer manager: brings pages in
from disk to main memory as
needed in response to read requests.
the disk space manager: deals with management of space
on disk, where the data is stored. Higher layers allocate,
deallocate, read, and write pages through (routines
provided by) this layer.
concurrency control : ensures that transactions request and
release locks according to a suitable locking protocol and
schedules the execution transactions.
recovery manager: is responsible for maintaining a log and
restoring the system to a consistent state after a crash.
PEOPLE WHO WORK WITH DATABASES
database implementors: who build DBMS software.

end users : simply use applications written by database


application programmers and so require little technical
knowledge about DBMS software.
Database application programmers: develop packages
that facilitate data access for end users, who are usually
not computer professionals, using the host or data
languages and software tools that DBMS vendors provide.
Application programs should ideally access data through
the external schema.
database administrator (DBA): designs and maintains the
database
The DBA is responsible for many critical tasks:
-Design of the Conceptual and Physical Schemas: the DBA
must design the conceptual schema (decide what relations
to store) and the physical schema (decide how to store
them).
- Security and Authorization: The DBA is responsible for
ensuring that unauthorized data access is not permitted.
- Data Availability and Recovery from Failures: The DBA
must take steps to ensure that if the system fails, users can
continue to access as much of the uncorrupted data as
possible. The DBA must also work to restore the data to a
consistent state.
- Database Tuning: The DBA is responsible for modifying
the database, in particular the conceptual and physical
schemas, to ensure adequate performance as
requirements change.

You might also like