[go: up one dir, main page]

0% found this document useful (0 votes)
22 views274 pages

Fundamentals of Database Systems

The document discusses the fundamentals of database systems, emphasizing their importance for efficient data management and decision-making in organizations. It outlines various chapters covering topics such as database development processes, types of databases, database modeling, and the role of big data. The book aims to provide a comprehensive understanding of database management to prepare readers for success in a data-driven world.

Uploaded by

francisco guzman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views274 pages

Fundamentals of Database Systems

The document discusses the fundamentals of database systems, emphasizing their importance for efficient data management and decision-making in organizations. It outlines various chapters covering topics such as database development processes, types of databases, database modeling, and the role of big data. The book aims to provide a comprehensive understanding of database management to prepare readers for success in a data-driven world.

Uploaded by

francisco guzman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 274

Salter

Fundamentals of Database Systems


The field of database systems has evolved significantly over the years, and as a result, it
has become more complex. Understanding the fundamentals of database systems is
About the Author the key to unlocking the full potential of databases. Database systems provide a
centralized and structured approach to storing and organizing data, enabling efficient
data retrieval, manipulation, and analysis. This is essential for businesses that deal with
large volumes of data, as it provides a systematic way of managing data that leads to
more accurate and informed decision-making. Furthermore, a solid foundation in
database systems is essential for anyone involved in the development of software
applications that utilize databases, as it helps to ensure data consistency and integrity.
Ultimately, a good understanding of the fundamentals of database systems is essential
for anyone seeking to work with databases in any capacity.
The first chapter provides an introduction to database systems, discussing the basic

Fundamentals of Database Systems


components, features, and advantages of database management. This chapter also
covers the evolution of database systems and their growing role in modern organiza-
Kaitlyn Salter is an accomplished marketing
tions. The second chapter explores the database development process, including
professional with over a decade of experience in
requirements gathering, database design, implementation, testing, and maintenance.
the industry. She currently serves as the Director
This chapter provides a detailed guide to designing and implementing an efficient
of Marketing at a leading digital agency, where
database system that meets the needs of the organization. The third chapter discusses
she oversees the development and execution of
the different types of databases, including relational, NoSQL, and graph databases, and
marketing strategies for a diverse range of clients.
their respective strengths and weaknesses. This chapter also covers the differences
Kaitlyn's expertise spans a wide range of market-
between these types of databases and their use cases. The fourth chapter focuses on
ing disciplines, including digital marketing,
database modeling, covering entity-relationship (ER) diagrams, normalization, and
branding, social media, content marketing, and
other techniques used to design a database schema. This chapter also covers database
advertising. Her extensive knowledge of industry

Fundamentals of
design principles and best practices. The fifth chapter delves into the fundamentals of
trends and consumer behavior allows her to
relational databases, including SQL querying, constraints, and transactions. This
develop effective campaigns that drive engage-
chapter provides practical tips and techniques for managing relational data and
ment, conversions, and ROI. Prior to her current
optimizing SQL queries. The sixth chapter discusses the role of big data in database
role, Kaitlyn worked as a Marketing Manager at
systems, including the challenges and opportunities presented by the increasing
several leading companies, where she honed her
volume, velocity, and variety of data. This chapter also covers the use of distributed
skills in brand development, campaign manage-
systems and cloud technologies for managing big data. The seventh chapter covers
ment, and customer acquisition. She has also

Database Systems
data warehousing and business intelligence (BI), including the design and develop-
worked as a freelance consultant, helping
ment of data warehouses (DWs), OLAP cubes, and the use of business intelligence tools
businesses of all sizes to create and execute
for data analysis and reporting. This chapter also covers data mining (DM) and predic-
effective marketing strategies. Kaitlyn holds a
tive analytics techniques. The eighth chapter explores the various applications of
Bachelor of Science in Marketing from the Univer-
database systems, including e-commerce, healthcare, and social media platforms. This
sity of California, Los Angeles (UCLA), where she
chapter highlights the diverse and growing role of databases in modern organizations
graduated with honors. She is a member of
and covers the unique challenges and opportunities presented by each industry.
several professional organizations, including the
With real-world examples, case studies, and practical exercises, Fundamentals of
American Marketing Association (AMA) and the
Database Systems provides readers with a comprehensive understanding of database

Kaitlyn Salter
Digital Marketing Association (DMA).
management, preparing them for success in today’s data-driven world.

ISBN 978-1-77956-170-1
00000

TAP
9 781779 561701
Toronto Academic Press
TAP TAP
FUNDAMENTALS OF
DATABASE SYSTEMS

Kaitlyn Salter

TAP
Toronto Academic Press
Fundamentals of Database Systems

Kaitlyn Salter

Toronto Academic Press


224 Shoreacres Road
Burlington, ON L7L 2H2
Canada
www.tap-books.com
Email: orders@arclereducation.com

© 2024
ISBN: 978-1-77956-170-1 (e-book)

This book contains information obtained from highly regarded resources. Reprinted material sources are indicated
and copyright remains with the original owners. Copyright for images and other graphics remains with the original
owners as indicated. A Wide variety of references are listed. Reasonable efforts have been made to publish reliable
data. Authors or Editors or Publishers are not responsible for the accuracy of the information in the published
chapters or consequences of their use. The publisher assumes no responsibility for any damage or grievance to the
persons or property arising out of the use of any materials, instructions, methods or thoughts in the book. The
authors or editors and the publisher have attempted to trace the copyright holders of all material reproduced in this
publication and apologize to copyright holders if permission has not been obtained. If any copyright holder has not
been acknowledged, please write to us so we may rectify.

Notice: Registered trademark of products or corporate names are used only for explanation and identification
without intent of infringement.

© 2024 Toronto Academic Press


ISBN: 978-1-77469-758-0

Toronto Academic Press publishes wide variety of books and eBooks. For more information about Toronto
Academic Press and its products, visit our website at www.tap-books.com.
ABOUT THE AUTHOR

Kaitlyn Salter is an accomplished marketing professional with over a decade of experience in the
industry. She currently serves as the Director of Marketing at a leading digital agency, where she
oversees the development and execution of marketing strategies for a diverse range of clients. Kaitlyn’s
expertise spans a wide range of marketing disciplines, including digital marketing, branding, social
media, content marketing, and advertising. Her extensive knowledge of industry trends and consumer
behavior allows her to develop effective campaigns that drive engagement, conversions, and ROI.
Prior to her current role, Kaitlyn worked as a Marketing Manager at several leading companies, where
she honed her skills in brand development, campaign management, and customer acquisition. She
has also worked as a freelance consultant, helping businesses of all sizes to create and execute
effective marketing strategies. Kaitlyn holds a Bachelor of Science in Marketing from the University
of California, Los Angeles (UCLA), where she graduated with honors. She is a member of several
professional organizations, including the American Marketing Association (AMA) and the Digital
Marketing Association (DMA).
Contents
Preface xvii
List of Figures xi
List of Tables xiii
List of Abbreviations xv

Introduction to
1
1.5.3. Indexing 16
1.5.4. Query Optimization 16
Database Systems 1 1.5.5. Query Execution 16
1.6. Characteristics of Database Systems 17
Unit Introduction 1 1.6.1. Data Independence 17
1.1. Basics of Database Systems 4 1.6.2. Concurrent Access 17
1.1.1. Data 4 1.6.3. Data Integrity 17
1.1.2. Database 5 1.6.4. Security 18
1.2. History of Database Systems 6 1.6.5. Scalability 18
1.3. Importance of Database Systems 7 1.7. Architecture of Database Systems 19
1.3.1. Data Integration and Data Quality 7 1.7.1. Client-Server Architecture 19
1.3.2. Decision Making 7 1.7.2. Tiered Architecture 19
1.3.3. Efficient Data Management 8 1.7.3. Distributed Architecture 19
1.3.4. Cost Savings 8 1.8. Query Processing and Optimization 20
1.3.5. Scalability 8 1.8.1. Query Processing Phases 20
1.3.6. Regulatory Compliance 9 1.8.2. Query Optimization Techniques 21
1.3.7. Improved Customer Experience 9 1.8.3. Query Execution Plans 21
1.4. Components of Database Systems 9 1.9. Importance of Database Systems
1.4.1. Data Definition Language (DDL) 10 in Modern World 21
1.4.2. Data Manipulation Language (DML) 11 1.9.1. Data Management 22
1.4.3. Data Query Language (DQL) 12 1.9.2. Decision Making 22
1.4.4. Data Control Language (DCL) 13 1.9.3. Improved Efficiency 22
1.4.5. Database Management System (DBMS) 14 1.9.4. Cost Savings 22
1.5. Data Storage and Retrieval 15 1.9.5. Data Security 22
1.5.1. Primary Storage 16 1.9.6. Compliance 22
1.5.2. Secondary Storage 16 1.9.7. Innovation 23
1.10. Summary 24 3.3.1. History of Relational Databases 67
Review Questions 24 3.3.2. Importance of Relational Databases 68
Multiple Choice Questions 24 3.3.3. Applications of Relational Databases 69
References 26 3.3.4. Advantages of Relational Database 71
3.3.5. Disadvantages of Relational Database 72

Database
2
3.4. Object-Oriented Databases (OODBs) 73

Development
3.4.1. History of Object-Oriented Databases
(OODBs) 74

Process 31 3.4.2. Importance of Object-Oriented


Databases (OODBs) 75
3.4.3. Applications of Object-Oriented
Unit Introduction 31 Databases (OODBs) 76
2.1. Development Life Cycle – Waterfall 33 3.4.4. Advantages of Object-Oriented
2.2. Database Life Cycle 34 Databases (OODBs) 78

2.3. Requirements Gathering 35 3.4.5. Disadvantages of Object-Oriented


Databases (OODBs) 79
2.4. Analysis 36
3.5. Graph Databases 80
2.5. Logical Design 37
3.5.1. History of Graph Databases 81
2.6. Implementation 40
3.5.2. Importance of Graph Databases 82
2.7. Realizing the Design 41
3.5.3. Applications of Graph Databases 83
2.8. Populating the Database 41
3.5.4. Advantages of Graph Databases 85
2.9. Guiding Principles for the
Development of an ER Diagram 42 3.5.5. Disadvantages of Graph Databases 86

2.10. Summary 44 3.6. NoSQL Databases 87

Review Questions 44 3.6.1. History of NoSQL Databases 89

Multiple Choice Questions 44 3.6.2. Importance of NoSQL Databases 89

References 45 3.6.3. Applications of NoSQL Databases 91

3 Types of Databases
3.6.4. Advantages of NoSQL Databases 92

49
3.6.5. Disadvantages of NoSQL Databases 93
3.7. Document Databases 94
3.7.1. History Document Databases 95
Unit Introduction 49
3.7.2. Importance Document Databases 96
3.1. Hierarchical Databases 52
3.7.3. Applications of Document Databases 97
3.1.1. History of Hierarchical Databases 52
3.7.4. Advantages of Document Databases 98
3.1.2. Importance of Hierarchical Databases 53
3.7.5. Disadvantages of Document Databases100
3.1.3. Applications of Hierarchical Databases 54
3.8. Summary 102
3.1.4. Advantages of Hierarchical Databases 55
Review Questions 102
3.1.5. Disadvantages of Hierarchical Databases57
Multiple Choice Questions 102
3.2. Network Databases 58
References 105

4 Database Modeling
3.2.1. History of Network Databases 60

115
3.2.2. Importance of Network Databases 61
3.2.3. Applications of Network Databases 62
3.2.4. Advantages of Network Databases 63
3.2.5. Disadvantages of Network Databases 65 Unit Introduction 115

3.3. Relational Databases 66 4.1. Overview of Data Modeling 118


4.1.1. Methodology 118

vi
4.1.2. Data Modeling in the Setting of 4.8.5. Define Key Attributes 142
Database Design 119 4.8.6. Validate Keys and Relationships 142
4.1.3. Constituents of a Data Model 120 4.8.7. Foreign Keys 142
4.1.4. Significance of Data Modeling 120 4.8.8. Categorizing Foreign Keys 143
4.2. The Entity-Relationship (ER) Model 120 4.8.9. Foreign Key Ownership 143
4.2.1. Basic Concepts of E-R Modeling 121 4.8.10. Diagramming Foreign Keys 143
4.2.2. Entities 121 4.9. Adding Qualities to the Model 143
4.2.3. Special Entity Types 121 4.9.1. Relation of Attributes to Entities 143
4.3. Database Design is a Part of 4.9.2. Parent-Child Relationships 144
Data Modelling 122
4.9.3. Multivalued Attributes 144
4.3.1. Requirements Analysis 123
4.9.4. Relations Described by Attributes 145
4.3.2. Phases in Building the Data Model 124
4.9.5. Code Values and Derived Attributes 145
4.4. Classifying Data Objectsand
Relationships 125 4.9.6. Attributes in the ER Diagram 146

4.4.1. Entities 126 4.10. Generalization Hierarchies 146

4.4.2. Attributes 127 4.10.1. Description 147

4.4.3. Validating Attributes 128 4.10.2. Making a Generalization Hierarchy 147

4.5. Derived Attributes and Code Values 128 4.10.3. Types of Hierarchies 147

4.5.1. Relationships 129 4.10.4. Rules 148

4.5.2. Naming Data Objects 130 4.11. Adding Data Integrity Rules 148

4.5.3. Object Definition 131 4.11.1. Entity Integrity 148

4.5.4. Recording Information’s in Designing 4.11.2. Referential Integrity 148


Document 132 4.11.3. Inserting and Deleting Rules 148
4.5.5. Recording Information in Designing 4.11.4. Insert Rules 149
Document 133 4.11.5. Delete Rules 149
4.6. Developing the Basic Schema 134 4.11.6. Insert and Delete Guidelines 150
4.6.1. Binary Relationships 134 4.11.7. Domains 150
4.6.2. One-To-One 135 4.11.8. Primary Key Domains 151
4.6.3. One-To-Many 136 4.11.9. Foreign Key Domains 151
4.6.4. Many-To-Many 136 4.12. Outline of the Relational Model 151
4.6.5. Recursive Relations 136 4.13. Summary 153
4.7. Refining – The Entity-Relationships (ERs) Review Questions 153
Diagrams 137
Multiple Choice Question 153
4.7.1. Entities Participation in Relationships 137
References 155
4.7.2. Resolve Many-To-Many Relationships 137

5 Relational
4.7.3. Transform Complex Relations into Binary
Relationships 138
4.7.4. Eliminate, Redundant, and Relationships
139 Database and SQL 163
4.8. Primary and Foreign Keys 139
4.8.1. Primary Key Attributes 140 Unit Introduction 163

4.8.2. Composite Keys 141 5.1. Relational Database Concepts 165

4.8.3. Artificial Keys 141 5.2. Hierarchical Databases 165

4.8.4. Primary Key Migration 141 5.3. Network Databases 166

vii
5.4. Components of Relational Database 167 6.4.1. Scalability 195
5.4.1. Table 167 6.4.2. Availability and Fault Tolerance 195
5.4.2. Record/Row 168 6.4.3. Efficient Network Setup 195
5.4.3. Field/Column 168 6.4.4. Flexibility 196
5.4.4. Datatype 168 6.4.5. Privacy and Access Control 196
5.4.5. Query/View 170 6.4.6. Elasticity 196
5.4.6. Stored Procedure 171 6.4.7. Batch Processing and Interactive
5.5. Overview of SQL 171 Processing 196

5.5.1. Working of SQL 172 6.4.8. Efficient Storage 196

5.5.2. History of SQL 172 6.4.9. Multi-Tenancy 196

5.5.3. Standard of SQL 173 6.4.10. Efficient Processing 197

5.5.4. Importance of SQL Today and Tomorrow 174 6.4.11. Efficient Scheduling 197

5.6. The Practice Of SQL Commands 175 6.5. Summary 198

5.6.1. Microsoft Access 175 Review Questions 198

5.6.2. SQL Server 176 Multiple Choice Questions 198

5.6.3. MySQL 177 References 199

5.6.4. Oracle 178


Data Warehousing
5.7. Summary
Review Questions
Multiple Choice Questions
179
179
179
7 and Business
References 180 Intelligence 203
Role of Big Data in
6
Unit Introduction 203
7.1. Data Warehouse (DW) Concepts 205
Database Systems 185 7.2. Statements Business Intelligence (BI) 206
7.3. Business Intelligence (BI)
Unit Introduction 185 Architecture 208
6.1. Understanding Big Data 187 7.3.1. Operational Applications vs. Business
6.1.1. Characteristics of Big Data 187 Intelligence (BI) Applications 208

6.1.2. Importance of Big Data 188 7.3.2. Requirement for Data Warehouse (DW) 209

6.2. Big Data Technologies 189 7.3.3. Improved Decision-Making via


Analysis and Reporting 210
6.2.1. Hadoop 190
7.4. Data Warehouse (DW) Data Model 212
6.2.2. Spark 191
7.4.1. DW Modeling Techniques 213
6.2.3. NoSQL Databases 191
7.4.2. DW Database Design Modeling 213
6.2.4. Comparison of the Technologies 192
7.4.3. Developing Data Warehouse (DW) 214
6.3. Type of Data: Transactional or
Analytical 193 7.5. Business Intelligence (BI) Concepts 215

6.3.1. Transactional Systems 193 7.5.1. Customer and Market Analysis 216

6.3.2. Analytical Systems 193 7.5.2. Channel Analysis 216

6.3.3. CAP Theorem 194 7.5.3. Forecasting and Planning 216

6.3.4. ACID vs. BASE 195 7.6. Data Warehousing Online


Transactional Processing (OLTP) 217
6.4. Requirements and Challenges
of Big Data 195 7.7. Data Warehouse (DW) and Business
Intelligence (BI) High-Level Architecture 217
7.8. Summary 219 8.4.1. Role of Databases in Social Media
Review Questions 219 Platforms 233

Multiple Choice Questions 219 8.4.2. Examples of Social Media Databases 234

References 220 8.5. Education 235


8.5.1. Importance of Databases in Education 235

8 Applications of 8.5.2. Examples of Educational Databases 236


8.6. Logistics and Supply Chain
Database Systems 225 Management 237
8.6.1. Role of Databases in Logistics and
Supply Chain Management 237
Unit Introduction 225
8.6.2. Examples of Logistics and Supply
8.1. E-Commerce 227 Chain Databases 238
8.1.1. Importance of Database in 8.7. Internet of Things (IoT) 239
E-Commerce 227
8.7.1. Role of Databases in IoT Data 239
8.1.2. Examples of E-Commerce Databases 228
8.7.2. Examples of IoT Databases 240
8.2. Healthcare 229
8.8. Summary 242
8.2.1. Role of Databases in Healthcare 229
Review Questions 242
8.2.2. Examples of Healthcare Databases 230
Multiple Choice Questions 242
8.3. Banking and Finance 231
References 243
8.3.1. Importance of Databases in Financial
Management 231
8.3.2. Examples of Financial Databases 232 INDEX 247
8.4. Social Media 233

ix
List of Figures

Figure 1.1. Illustration of the data-driven decision Figure 3.3. An example of the network database
making model

Figure 1.2. Schematic of a database Figure 3.4. An example of the complex data
relationship
Figure 1.3. Schematic of the data management
Figure 3.5. Network database in financial system
Figure 1.4. Illustration of the importance of customer
database Figure 3.6. Schematic of a relational database

Figure 1.5. Components of the database components Figure 3.7. Comparison of the structured and
unstructured data
Figure 1.6. Schematic of the DDL commands
Figure 3.8. Illustration of the object-oriented database
Figure 1.7. Illustration of the data manipulation
language (DML) Figure 3.9. Illustration of the object-oriented database
and object-oriented programming
Figure 1.8. Schematic of the DCL statements
Figure 3.10. Illustration of a graph database with an
Figure 1.9. Illustration of the DBMS example

Figure 1.10. Data integrity and its components Figure 3.11. Relationship modeling in graph database

Figure 1.11. Illustration of the distributed architecture Figure 3.12. Fraud detection using graph database

Figure 2.1. Waterfall model Figure 3.13. Illustration of the NoSQL database

Figure 2.2. Illustration of the requirement gathering Figure 3.14. Illustration of an example of the NOSQL
and its ways database

Figure 2.3. Sequence of the logical design in Figure 3.15. Illustration of the document database
database
Figure 3.16. Illustration of the content management
Figure 2.4. A summary of the repetitive stages system
involved in designing database
Figure 4.1. Illustration of working of database
Figure 2.5. An example of the ER diagram modelling

Figure 3.1. An example of the hierarchical database Figure 4.2. Flow diagram of entity-relationship (ER)

Figure 3.2. Hierarchy of networks and systems in a Figure 4.3. Schematic of data modelling
generic telecom infrastructure
Figure 4.4. Flow chart of entity and set of entity Figure 6.4. Schematic of the transactional or
analytical datatypes
Figure 4.5. Example of linear process
Figure 6.5. Diagram representing CAP theorem
Figure 4.6. Picture of King Lear
Figure 7.1. Illustration of the data warehousing
Figure 4.7. Illustration of cardinality
Figure 7.2. Schematic of the business intelligence and
Figure 4.8. Illustration of types of marketing entities its components

Figure 4.9. Example of entity attribute matrix Figure 7.3. Connections between business and
operational intelligence apps
Figure 4.10. ENTITY-ENTITY matrix and an ENTITY-
ATTRIBUTE matrix Figure 7.4. Displaying data warehouse components

Figure 4.11. Schematic of binary relationship Figure 7.5. Representing reporting system parts

Figure 4.12. Diagram of instance of recursive Figure 7.6. Showing DW development lifecycle
association (DWLC) model

Figure 4.13. Illustration of many-to-many relationship Figure 7.7. Schematic of the online transaction
processing (OLTP)
Figure 4.14. Schematic of removing redundant
relationship Figure 8.1. Illustration of the order management
system and its components
Figure 5.1. An example of the relational database
Figure 8.2. Health data analytics
Figure 5.2. An example of the hierarchical database
Figure 8.3. Customer relationship management and
Figure 5.3. An example of the network data model its components
Figure 6.1. The five V’s of big data Figure 8.4. Content management system and its
components
Figure 6.2. Illustration of Hadoop and its benefits
Figure 8.5. Various functions of logistics
Figure 6.3. Illustration of the NoSQL database
Figure 8.6. Using Amazon DynamoDB document API
with the AWS mobile SDK for android
List of Tables
Table 4.1. Tabular representation of models of an ENTITY-ENTITY matrix, and an ENTITY-ATTRIBUTE matrix

Table 4.2. Tabular representation of example of composite keys

Table 4.3. Tabular representation of PROJECT entity

Table 5.1. Tabular data about various datatypes

Table 5.2. Comparison of access 97 and access 2000+


List of Abbreviations

ACID Atomicity, Consistency, Isolation, Durability

ANSI American National Standards Institute

BI Business Intelligence

CAD Computer-Aided Design

CDC Centers for Disease Control

CMS Content Management Systems

CRM Customer Relationship Management

CSV Comma Separated Values

DBAs Database Administrators

DBMS Database Management System

DCL Data Control Language

DDL Data Definition Language

DM Data Mining

DML Data Manipulation Language

DQL Data Query Language

DW Data Warehouse

ER Entity-Relationship

ERD Entity Relational Diagram

ETL Extraction, Transformation, and Loading

GFT Google Flu Trends

GIS Geographic Information Systems


HDFS Hadoop Distributed File System

IDMS Integrated Database Management System

IDS Integrated Data Store

IMS Information Management System

JSON JavaScript Object Notation

LAN Local Area Network

MTSDBMS Michigan Terminal System Database Management System

OLAP Online Analytical Processing

OLTP Online Transaction Processing

OODBs Object-Oriented Databases

OOP Object-Oriented Programming

ORDBs Object-Relational Databases

QBE Query by Example

RDBMS Relational Database Management System

SDLC Software Development Life Cycle

SEQUEL Structured English Query Language

SQL Standard Query Language

W3C World Wide Web Consortium

WAN Wide Area Network

XML Extensible Markup Language


PREFACE

The field of database systems has evolved significantly over the years, and as a result, it has
become more complex. Understanding the fundamentals of database systems is the key to
unlocking the full potential of databases. Database systems provide a centralized and structured
approach to storing and organizing data, enabling efficient data retrieval, manipulation, and
analysis. This is essential for businesses that deal with large volumes of data, as it provides a
systematic way of managing data that leads to more accurate and informed decision-making.
Furthermore, a solid foundation in database systems is essential for anyone involved in
the development of software applications that utilize databases, as it helps to ensure data
consistency and integrity. Ultimately, a good understanding of the fundamentals of database
systems is essential for anyone seeking to work with databases in any capacity.

The first chapter provides an introduction to database systems, discussing the basic
components, features, and advantages of database management. This chapter also covers
the evolution of database systems and their growing role in modern organizations. The second
chapter explores the database development process, including requirements gathering,
database design, implementation, testing, and maintenance. This chapter provides a detailed
guide to designing and implementing an efficient database system that meets the needs of
the organization.

The third chapter discusses the different types of databases, including relational, NoSQL, and
graph databases, and their respective strengths and weaknesses. This chapter also covers
the differences between these types of databases and their use cases. The fourth chapter
focuses on database modeling, covering entity-relationship (ER) diagrams, normalization,
and other techniques used to design a database schema. This chapter also covers database
design principles and best practices.

The fifth chapter delves into the fundamentals of relational databases, including SQL querying,
constraints, and transactions. This chapter provides practical tips and techniques for managing
relational data and optimizing SQL queries. The sixth chapter discusses the role of big data
in database systems, including the challenges and opportunities presented by the increasing
volume, velocity, and variety of data. This chapter also covers the use of distributed systems
and cloud technologies for managing big data.

The seventh chapter covers data warehousing and business intelligence (BI), including the
design and development of data warehouses (DWs), OLAP cubes, and the use of business
intelligence tools for data analysis and reporting. This chapter also covers data mining (DM)
and predictive analytics techniques. The eighth chapter explores the various applications
of database systems, including e-commerce, healthcare, and social media platforms. This chapter
highlights the diverse and growing role of databases in modern organizations and covers the unique
challenges and opportunities presented by each industry.

With real-world examples, case studies, and practical exercises, Fundamentals of Database Systems
provides readers with a comprehensive understanding of database management, preparing them for
success in today’s data-driven world.

—Author
CHAPTER 1

INTRODUCTION TO
DATABASE SYSTEMS

UNIT INTRODUCTION
In today’s digital age, we generate and consume vast amounts of data on a daily basis,
ranging from personal information to financial transactions, to social media interactions, and
more. Database systems play a critical role in the management and storage of this data,
making it easily accessible and organized for efficient retrieval and processing. Database
systems can be defined as software systems that manage and store large amounts of
data, allowing users to retrieve, add, update, and delete data as needed (Hellerstein
& Stonebraker, 2005). They are used in a variety of industries, including e-commerce,
healthcare, finance, education, logistics, and many more.
The concept of database systems has been around for several decades, with the first
database management system (DBMS) developed in the 1960s. The early DBMSs were
hierarchical and network models, which were eventually replaced by the relational model in
the 1980s. The relational model is still widely used today and is considered the foundation of
modern database systems. Database systems consist of several key components, including
data definition language (DDL), data manipulation language (DML), data query language
(DQL), data control language (DCL), and DBMS (Yeung & Hall, 2007). DDL is used to
define the structure of the database, including tables, relationships, and constraints. DML
is used to manipulate the data in the database, including adding, updating, and deleting
data. DQL is used to retrieve data from the database, and DCL is used to control access
to the data in the database. DBMS is the software system that manages the database,
providing tools for data storage, retrieval, and processing. DBMSs can be classified into
several types, including relational, NoSQL, object-oriented, and others (Jukic et al., 2014).
Data storage and retrieval is a critical components of database systems. Primary storage
2 Fundamentals of Database Systems

refers to the use of main memory, which provides fast access to data but has limited
storage capacity. Secondary storage refers to the use of hard disks or solid-state drives,
which provide larger storage capacity but slower access speeds.
Indexing involves creating a data structure that enables swift access to data based
on particular conditions, thereby enhancing data retrieval speed. On the other hand,
query optimization is employed to boost the efficiency of data retrieval by selecting the
most optimal query plan based on the complexity and size of the data. Query execution
is the process of executing the selected query plan and returning the results to the user.
Database systems have several key characteristics that make them critical for managing
large amounts of data. These include data independence, concurrent access, data integrity,
security, and scalability. Data independence refers to the ability to change the structure of
the database without affecting the applications that use the data (Widom & Ceri, 1995).
Concurrent access refers to the ability to allow multiple users to access the data at the
same time. Data integrity refers to the ability to maintain the accuracy and consistency of
the data in the database (Elmasri & Navathe, 2006). Security refers to the ability to control
access to the data in the database to prevent unauthorized access. Scalability refers to
the ability to handle increasing amounts of data without compromising performance.
The architecture of database systems can be classified into several types, including
client-server architecture, tiered architecture, and distributed architecture. Client-server
architecture involves a client that sends requests to a server, which processes the requests
and returns the results to the client (Coronel & Morris, 2016). Tiered architecture involves
multiple layers of servers, with each layer responsible for specific functions such as web
serving, application serving, and database serving. Distributed architecture involves multiple
servers distributed across different locations, with each server responsible for a subset of
the data. Query processing and optimization are critical components of database systems.
Query processing involves several phases, including parsing, translation, optimization,
and execution.

Learning Objectives
At the end of this chapter, the readers will be able to:
• Understand the definition of database systems.
• Recognize the importance of database systems in the modern world.
• Trace the brief history of database systems.
• Identify the key components of database systems, including DDL, DML, DQL,
DCL, and DBMS.
• Describe the process of data storage and retrieval, including primary storage,
secondary storage, indexing, query optimization, and query execution.
• Explain the characteristics of database systems, such as data independence,
concurrent access, data integrity, security, and scalability.

CHAPTER
1
Introduction to Database Systems 3

• Compare and contrast the different types of architecture in database systems,


including client-server, tiered, and distributed architectures.
• Understand the process of query processing and optimization, including query
processing phases, query optimization techniques, and query execution plans.
• Appreciate the importance of database systems in modern society and how they
contribute to the management and processing of large amounts of data.

Key Terms
• Client-server architecture
• Data
• Data definition language (DDL)
• Data independence
• Database
• Database management system (DBMS)
• Query optimization
• Scalability

CHAPTER
1
4 Fundamentals of Database Systems

1.1. BASICS OF DATABASE SYSTEMS


The basics of database systems are discussed in subsections.

1.1.1. Data
Data is any distinct piece of information that is recorded or stored
in a computer or other electronic device. It is a collection of
unorganized facts, figures, and statistics that are meaningless
until they are processed and turned into useful information. Data
can be in different forms, including text, numbers, images, audio,
and video. In computing, data is the foundation of any computer
program, software, or system (Sokolinsky, 2004). It is the input that
the computer uses to perform calculations, process instructions,
or produce output. Data is stored in a variety of ways, such as
databases, spreadsheets, text files, binary files, and multimedia
files. The way in which data is stored and organized can affect
KEYWORD its usefulness and accessibility.
Structured data Data is essential for decision-making, research, analysis,
is data that has a
and problem-solving in many fields, including business, science,
standardized format
for efficient access medicine, education, and government. For example, in business,
by software and data is used to monitor sales, track inventory, analyze customer
humans alike. behavior, and make strategic decisions. In medicine, data is
used to study diseases, monitor patient health, and develop new
treatments. In science, data is used to test hypotheses, make
predictions, and create models. There are two types of data:
structured and unstructured. Structured data is organized in a
specific format, such as a spreadsheet or database, and can be
easily processed and analyzed. Unstructured data, on the other
hand, is not organized in a specific format and may include text,
images, or multimedia (Maryanski et al., 1986). It is more difficult to
process and analyze unstructured data, but it can provide valuable
insights and information (Figure 1.1).
Data can also be classified based on its quality. High-quality
data is accurate, complete, relevant, and timely (Chaudhri et al.,
2003). It is free from errors, inconsistencies, and duplications.
Low-quality data, on the other hand, may be incomplete, outdated,
or contain errors, which can lead to incorrect conclusions or
decisions. In recent years, the amount of data generated has
increased exponentially due to the widespread use of technology
and the internet. This has led to the development of big data, which
refers to large datasets that are too complex or unstructured to be

CHAPTER
1
Introduction to Database Systems 5

processed using traditional methods. Big data requires specialized


tools and techniques, such as DM, machine learning, and artificial
intelligence, to extract insights and patterns.

Figure 1.1. Illustration


of the data-driven
decision making.

Source: Jason Williams, Creative Commons License.

1.1.2. Database
A database is an organized collection of structured data that is
stored and managed in a computer or other electronic device. It is
designed to store and retrieve information efficiently and accurately.
Databases can be used for a wide range of applications, from small
personal projects to large enterprise systems (Elmasri & Navathe,
2006). A database typically consists of one or more tables that
are related to each other. Each table contains a set of records,
and each record represents a specific instance of the data that is
being stored. The columns in a table represent different attributes
of the data, such as name, age, or address. Each column is given
a data type that specifies the type of data that can be stored in
it, such as text, number, or date (Figure 1.2).

Figure 1.2.
Schematic of a
database.

Source: Night Born, Creative Commons License.

The relationships between tables in a database are defined


by a set of rules called database constraints. These constraints
ensure that the data is accurate and consistent by enforcing rules
CHAPTER
1
6 Fundamentals of Database Systems

such as unique values, data type restrictions, and referential


integrity. Referential integrity ensures that the data in one table
is consistent with the data in another table by enforcing a set of
rules that dictate how data can be entered, updated, or deleted
(Connolly & Begg, 2005). Databases can be categorized based on
their organization and management approach. The most common
types of databases are relational databases, which organize data
into tables with predefined relationships between them, and non-
relational databases, which organize data in other ways such
as key-value pairs or documents. Relational databases use a
standardized language called Structured Query Language (SQL) to
manipulate data (Bernstein & Goodman, 1981). SQL is a powerful
language that allows users to create, modify, and query databases.
KEYWORD It can be used to perform a wide range of operations, from simple
queries to complex data transformations.
Relational database
is a type of database Non-relational databases, on the other hand, use a variety of
that stores and languages and data models to manage data. Some of the most
provides access popular non-relational databases include MongoDB, Cassandra,
to data points that and Redis (Ulusoy, 1995). Databases are used in a wide range
are related to one
of applications, from small-scale personal projects to large-scale
another.
enterprise systems. Some common applications of databases
include e-commerce websites, inventory management systems,
banking systems, healthcare systems, and social media platforms.

1.2. HISTORY OF DATABASE SYSTEMS


The history of database systems dates back to the 1960s, when
the first electronic computers were developed. During this time,
data was primarily stored on magnetic tapes and punched cards,
and the processing of data was time-consuming and error-prone.
In the 1970s, Edgar F. Codd, a researcher at IBM, introduced the
concept of relational databases. Codd proposed that data should
be organized into tables with a fixed number of columns, each with
a defined data type, and that relationships between tables should
be established using primary and foreign keys. This approach was
designed to simplify data management and improve data integrity
(Grad & Bergin, 2009).
The first commercially available relational database management
system (RDBMS) was developed by IBM in the late 1970s. This
system, called System R, was based on Codd’s relational model
and was used internally by IBM for several years before being
released to the public (Miklau et al., 2007). In the 1980s and
CHAPTER
1
Introduction to Database Systems 7

1990s, the use of database systems became more widespread


as the cost of computing hardware decreased and the need for
efficient data management increased. During this time, several new
RDBMSs were developed, including Oracle, Sybase, and Microsoft
SQL Server. In the late 1990s and early 2000s, the emergence of
the internet and the growth of e-commerce led to the development
of new database systems designed to handle large volumes of
web-based data. These systems, known as NoSQL databases,
were based on non-relational data models and were designed to
be more flexible and scalable than traditional RDBMSs (Abadi et KEYWORD
al., 2013). Cloud computing
is the on-demand
Today, database systems continue to evolve and improve, with
availability of
new technologies such as cloud computing, big data analytics, and computer system
machine learning playing an increasingly important role. The use of resources, especially
databases has become ubiquitous, with virtually every organization data storage and
relying on some form of database system to manage their data. computing power,
without direct active
management by the
1.3. IMPORTANCE OF DATABASE SYSTEMS user.

Database systems play a critical role in modern organizations by


providing a centralized and efficient way to store, manage, and
access large amounts of data. The importance of database systems
can be understood from the following perspectives discussed in
subsections.

1.3.1. Data Integration and Data Quality


Database systems allow organizations to integrate data from multiple
sources, ensuring data consistency and accuracy (Cai & Zhu,
2015). By eliminating data redundancy, database systems help to
maintain data integrity and enhance data quality.

1.3.2. Decision Making


Database systems provide users with the ability to access and
analyze data quickly and easily, allowing organizations to make
informed decisions based on accurate and up-to-date information
(Azeroual et al., 2018). This is particularly important for organizations
operating in dynamic environments where quick decisions can
make a significant difference.

CHAPTER
1
8 Fundamentals of Database Systems

1.3.3. Efficient Data Management

Did you Know? Database systems provide an efficient way to manage data,
including indexing, searching, and updating data. This helps to
According to a reduce data entry errors, increase data consistency, and improve
survey conducted by data security (Figure 1.3).
Forbes, over 90% of
organizations rely on
data-driven insights
to make strategic
business decisions,
highlighting the critical
role that databases
play in decision
making.

Figure 1.3.
Schematic of the
data management.

Source: Virtual Staffing, Creative Commons License.

1.3.4. Cost Savings


By centralizing data storage and management, database systems
can reduce hardware and software costs, as well as administrative
costs associated with maintaining and updating data (Wang &
Strong, 1996).

1.3.5. Scalability
Database systems can be designed to scale up or down based
on the needs of an organization, allowing them to handle large
volumes of data without sacrificing performance.

CHAPTER
1
Introduction to Database Systems 9

1.3.6. Regulatory Compliance


Many organizations are subject to regulatory requirements that
mandate the secure storage and management of sensitive data
(DeFazio et al., 2001). Database systems provide the necessary
features and controls to ensure compliance with these regulations.

1.3.7. Improved Customer Experience


Database systems can be used to store and manage customer
data, allowing organizations to personalize their interactions with
customers and provide a better overall customer experience (Sawyer
& Mariani, 1995) (Figure 1.4).

Figure 1.4. Illustration


of the importance of
customer database.

Source: MBA Skool, Creative Commons License.

In conclusion, the importance of database systems in modern


organizations cannot be overstated. By providing a centralized
and efficient way to store, manage, and access data, database
systems help organizations to make informed decisions, improve
data quality, and reduce costs.

1.4. COMPONENTS OF DATABASE SYSTEMS


Database systems are complex software systems that include
various components that work together to manage and store large
CHAPTER
1
10 Fundamentals of Database Systems

amounts of data. The main components of a database system are


discussed in subsections (Figure 1.5).

Figure 1.5.
Components of
the database
components.

Source: Testing Docs, Creative Commons License.

1.4.1. Data Definition Language (DDL)


Data definition language (DDL) is a set of SQL commands used
to create and modify the structure of database objects, such as
tables, indexes, and views. DDL is one of the core components
of database management systems (DBMS) and is used to define
and manage the database schema (Wells, 2001).
Some common DDL statements include:
1. CREATE: This statement is used to create a new database
object, such as a table or index. The syntax of the CREATE
statement varies depending on the type of object being
created.
2. ALTER: This statement is used to modify the structure of
an existing database object, such as adding a new column
to a table or modifying an index (Anwar, 2018).
3. DROP: This statement is used to delete an existing
database object, such as a table or view. The DROP
statement is a destructive operation and should be used
with caution.
4. TRUNCATE: This statement is used to remove all the data
from a table while keeping the table structure intact. The
TRUNCATE statement is faster than using the DELETE
statement to remove data (Wu, 2018).
5. RENAME: This statement is used to rename an existing
database object, such as a table or column name. The

CHAPTER
1
Introduction to Database Systems 11

RENAME statement is often used to improve the readability


and maintainability of database objects (Figure 1.6).

Figure 1.6. Schematic


of the DDL commands.

Source: Algo Daily, Creative Commons License.

It is important to note that DDL statements are not transactional,


which means that once a DDL statement is executed, it cannot be
rolled back like a transaction. Therefore, it is important to carefully
review and test DDL statements before executing them to ensure
that they do not cause any unintended consequences.

1.4.2. Data Manipulation Language (DML)


Data manipulation language (DML) is a set of SQL commands used
to retrieve, insert, update, and delete data in a database. DML is
an important component of DBMS and is used to manage the data
stored in the database (Ghali & Abu-Naser, 2019) (Figure 1.7).

Figure 1.7. Illustration


of the data manipulation
language (DML).

Source: Anushree Goswami, Creative Commons License.

Some common DML statements include:


1. INSERT: This statement is used to insert new data into

CHAPTER
1
12 Fundamentals of Database Systems

a table. The INSERT statement specifies the values to


be inserted into each column of the table (Shaw et al.,
2016).
2. UPDATE: This statement is used to modify existing data in
a table. The UPDATE statement specifies the new values
for each column that is being modified.
3. DELETE: This statement is used to delete data from a
table. The DELETE statement specifies the rows to be
deleted from the table (Litwin & Abdellatif, 1987).
It is important to note that DML statements are transactional,
KEYWORD which means that they can be rolled back if an error occurs during
Data query
execution. This allows for data consistency and helps ensure
languages (DQL) that the database remains in a valid state. ML is an important
are computer component of DBMS that is used to manipulate the data stored in
languages that the database. DML statements are used to retrieve, insert, update,
are used to make and delete data from tables, and are executed by applications
various queries in or users who need to manage the data in the database. DML
information systems
statements are transactional and can be rolled back if an error
and databases.
occurs during execution.

1.4.3. Data Query Language (DQL)


Data query language (DQL) is a subset of Structured Query
Language (SQL) used to retrieve data from a database (Brodsky
et al., 2009). DQL statements are designed to extract information
from one or more database tables based on specific criteria, and
are used extensively in DBMS to retrieve data required by users
or applications.
There are several DQL statements used to retrieve data from
a database, including:
1. SELECT: This is the most commonly used DQL statement.
It allows the user to retrieve specific data from one or
more tables in the database based on certain criteria. The
SELECT statement can also be used to calculate values,
group data, and join multiple tables.
2. FROM: This statement specifies the name of the table or
tables from which data is to be retrieved.
3. WHERE: This statement specifies the conditions that must
be met for the data to be retrieved. It allows users to filter
data based on specific criteria, such as a particular date
range or value.
CHAPTER
1
Introduction to Database Systems 13

4. ORDER BY: This statement is used to sort the retrieved


data in a specific order, such as ascending or descending
order (Jäkel et al., 2014).
5. GROUP BY: This statement is used to group data based
on a specific column in the table. It is often used with
aggregate functions such as SUM, AVG, and COUNT.
6. HAVING: This statement is used in conjunction with GROUP
BY to filter the results of the group by statement based
on a specific condition (Fikes et al., 2002).
In summary, DQL is a subset of SQL used to retrieve data from
a database. DQL statements are used to extract information from
one or more tables based on specific criteria, and are executed by
the DBMS to extract the requested data. Common DQL statements
include SELECT, FROM, WHERE, ORDER BY, GROUP BY, and
HAVING. DQL is an essential component of DBMS and is used
by application developers and database administrators (DBAs) to
retrieve data for a variety of purposes.

1.4.4. Data Control Language (DCL)


Data control language (DCL) is a subset of Structured Query
Language (SQL) used to manage the security and access control
of a database (Rao et al., 2019). It allows DBAs to grant or revoke
access to the database for users and to control the actions that
users can perform on the database (Figure 1.8).

Figure 1.8. Schematic


of the DCL statements.

Source: Developers Tutorials, Creative Commons License.

There are several DCL statements used to control access to


a database, including:

CHAPTER
1
14 Fundamentals of Database Systems

1. GRANT: This statement is used to give specific privileges


to a user, such as the ability to read, write or execute
database objects like tables, views, or procedures (Wyatt,
1994).
2. REVOKE: This statement is used to remove the previously
granted privileges from the user.
3. DENY: This statement is used to explicitly deny a user
access to a specific database object or system resource
(El-Mehalawi & Miller, 2003).
DBAs use DCL statements to create security policies and
enforce them in the database system. They can specify which
users or groups of users have access to the database and which
actions they are allowed to perform. This helps to ensure that data
KEYWORD is only accessed and manipulated by authorized personnel. DCL is
a subset of SQL used to manage the security and access control of
Structured query
language is a a database. DCL statements such as GRANT, REVOKE, and DENY
domain-specific are used to grant or revoke access to specific database objects
language used or system resources. DCL statements help to enforce security
in programming policies in a database system and protect data from unauthorized
and designed access (Yu & Zhou, 2019). DBAs use DCL statements to control
for managing access to the database and to ensure that data is only accessed
data held in a
relational database
and manipulated by authorized personnel.
management
system, or for
stream processing
1.4.5. Database Management System (DBMS)
in a relational data
stream management A DBMS is software that manages the storage, retrieval, and
system. updating of data in a computer system. It is a collection of programs
that enable users to create and maintain a database, and to
access and manipulate data stored in the database. A DBMS
provides an interface for users to interact with the database, and
it acts as an intermediary between the user and the database
(Rawat & Purnama, 2021). Users can use commands or queries
to retrieve and manipulate data stored in the database. The DBMS
is responsible for managing the storage of data, ensuring data
integrity, and controlling access to the database (Figure 1.9).
There are several types of DBMS, including:
1. Relational DBMS: A type of DBMS that stores data in
tables, with each table consisting of rows and columns.
Relational DBMSs use SQL (Structured Query Language)
to access and manipulate data (Dittrich et al., 1995).
2. Object-Oriented DBMS: A type of DBMS that stores data
CHAPTER
1
Introduction to Database Systems 15

in objects, which can include data and the methods used


to manipulate the data.
3. Hierarchical DBMS: A type of DBMS that organizes data
in a hierarchical structure, with each record having a parent
record and zero or more child records.
4. Network DBMS: A type of DBMS that stores data in a
network structure, with each record having multiple parent
and child records (Olle, 2003).

Figure 1.9. Illustration


of the DBMS.

Source: Learn Computer Science, Creative Commons License.

A DBMS is essential for managing large amounts of data in an


organized and efficient manner. It allows multiple users to access
and manipulate the same data simultaneously, while maintaining
Remember
data integrity and security. A DBMS also enables users to retrieve A DBMS is a
software system
and manipulate data easily, using a variety of tools and interfaces. designed to manage
DBMS is software that manages the storage, retrieval, and updating and control access
of data in a computer system. It provides an interface for users to to a database,
interact with the database, and it is responsible for managing the ensuring data
integrity, security,
storage of data, ensuring data integrity, and controlling access to and efficient data
the database (Anjard, 1994). There are several types of DBMS, retrieval and
including relational, object-oriented, hierarchical, and network manipulation.
DBMS. A DBMS is essential for managing large amounts of data
in an organized and efficient manner.

1.5. DATA STORAGE AND RETRIEVAL


Data storage and retrieval are two essential aspects of DBMS.
A DBMS must efficiently store and retrieve data to support the
application’s requirements (Levine, 1985). We will discuss the
various techniques and methods used for data storage and retrieval
in DBMS.
CHAPTER
1
16 Fundamentals of Database Systems

1.5.1. Primary Storage


Primary storage, also known as main memory or RAM, is the
memory used to store the currently executing application and its
data. Primary storage is volatile, meaning its contents are lost
Ensure data reliability when the power is turned off. DBMS uses primary storage to
and accessibility by cache frequently accessed data and improve query response time
storing primary data in (Chessa & Maestrini, 2003).
multiple locations with
data backups.
1.5.2. Secondary Storage
Secondary storage is the non-volatile memory used to store data
persistently. Secondary storage devices include hard disk drives,
solid-state drives, and magnetic tapes. Secondary storage is used
for long-term storage of the database, transaction logs, and backups.

1.5.3. Indexing
Indexing is a technique used to improve the performance of data
retrieval operations (Xue et al., 2019). It involves creating an
index, which is a data structure that contains the values of one
or more columns of a table, along with a pointer to the location of
the corresponding record in the table. When a query is executed,
the index is searched instead of the table, resulting in faster data
retrieval.

1.5.4. Query Optimization


Query optimization is the process of selecting the most efficient
query execution plan to retrieve data from the database (Heanue
et al., 1994). The query optimizer analyzes the query and the
database’s schema to generate an execution plan that minimizes
the amount of disk I/O and CPU processing required to retrieve
the data.

1.5.5. Query Execution


Query execution is the process of retrieving data from the database
based on the execution plan generated by the query optimizer.
The DBMS retrieves data from the storage devices, applies any
necessary sorting or aggregation operations, and returns the
results to the application (Barrett & Edgar, 2006). Query execution
CHAPTER
1
Introduction to Database Systems 17

performance is critical for applications that require fast and efficient


data retrieval.
In conclusion, data storage and retrieval are crucial components KEYWORD
of a DBMS. Efficient data storage and retrieval techniques are
essential for supporting fast and reliable application performance. Query optimization
Primary and secondary storage, indexing, query optimization, and is formally described
as the process of
query execution are some of the critical techniques used for efficient transforming a query
data storage and retrieval in a DBMS. into an equivalent
form that may be
evaluated more
1.6. CHARACTERISTICS OF DATABASE efficiently.
SYSTEMS
A database system must possess certain characteristics to be
considered efficient and reliable. In this section, we will discuss
some of the key characteristics of a database system.

1.6.1. Data Independence


One of the most important characteristics of a database system
is data independence. It means that the application programs
should be independent of the physical storage structure of the data
(Naughton, 1985). It allows changes to be made to the database
structure without affecting the application programs. There are two
types of data independence: logical and physical.

1.6.2. Concurrent Access


Database systems must be designed to support multiple users
accessing the database at the same time. This is known as
concurrent access. The DBMS must ensure that each user sees a
consistent view of the database, even if other users are modifying
the data at the same time (Bernstein & Goodman, 1981). This
is achieved through concurrency control mechanisms that allow
multiple transactions to execute concurrently while ensuring the
consistency of the database.

1.6.3. Data Integrity


Data integrity is the assurance of the accuracy and consistency of
the data stored in the database. The DBMS must ensure that data
is stored correctly and is consistent with other data in the database.
CHAPTER
1
18 Fundamentals of Database Systems

The database system must enforce the integrity constraints that are
specified in the database schema to ensure the data is accurate
and consistent (Yesin et al., 2021) (Figure 1.10).

Figure 1.10. Data


integrity and its
components.

Source: CFI Team, Creative Commons License.

1.6.4. Security
Database systems must provide security mechanisms to protect the
database from unauthorized access and ensure the confidentiality,
integrity, and availability of the data (Bertino & Sandhu, 2005). This
includes authentication, authorization, and encryption mechanisms
to protect the data from unauthorized access.

1.6.5. Scalability
Database systems must be designed to scale up or down to meet
the changing demands of the application (Kuhlenkamp et al., 2014).
The system must be able to handle increasing amounts of data and
users without compromising performance or availability. This can
be achieved through techniques such as partitioning, replication,
and clustering. The characteristics of a database system include
data independence, concurrent access, data integrity, security, and
scalability. These characteristics ensure that the database system
is efficient, reliable, and able to handle the demands of modern
applications.

CHAPTER
1
Introduction to Database Systems 19

1.7. ARCHITECTURE OF DATABASE SYSTEMS


Database systems can have different architectures, depending on KEYWORD
the specific requirements of the application. The architecture of a
Client-server
database system determines how the different components of the architecture refers
system are organized, how they interact with each other, and how to a system that
the data is processed and stored. There are three main types of hosts, delivers, and
database architectures which are discussed in subsections. manages most of
the resources and
services that the
1.7.1. Client-Server Architecture client requests.

The client-server architecture is a common architecture for database


systems. In this architecture, the system is divided into two parts:
the client and the server (McLeod & Heimbigner, 1980). The client
is the user interface that interacts with the user, while the server
is responsible for storing and managing the data. The client sends
requests to the server, which processes the request and sends
back the results.

1.7.2. Tiered Architecture


The tiered architecture is a multi-layered architecture that separates
the presentation layer, application logic layer, and data storage
layer. In this architecture, the application is divided into three tiers:
the user interface, the business logic layer, and the data storage
layer (Hurson et al., 1989). The user interface communicates with
the business logic layer, which in turn communicates with the data
storage layer. This architecture provides scalability and flexibility
to the system.

1.7.3. Distributed Architecture


The distributed architecture is a distributed database system that
is spread across multiple locations or sites. In this architecture,
the data is distributed across multiple servers, and each server is
responsible for a specific set of data. The servers are connected
through a network, and they work together to process user requests.
This architecture provides scalability and fault tolerance to the
system (Lam & Kuo, 2000) (Figure 1.11).
Each of these architectures has its own advantages and
disadvantages, and the choice of architecture depends on the
specific requirements of the application.
CHAPTER
1
20 Fundamentals of Database Systems

Figure 1.11.
Illustration of
the distributed
architecture.

Source: Katembo Kituta Ezéchiel et al. Creative Commons License.

1.8. QUERY PROCESSING AND OPTIMIZATION


Query processing and optimization are essential components of
database systems that ensure efficient and fast retrieval of data.
Query processing refers to the translation of a user’s query into
an efficient form that can be executed by the system. Query
optimization, on the other hand, involves selecting the most efficient
execution plan for a given query.

1.8.1. Query Processing Phases


The query processing phases are as follows:
1. Parsing: This is the first phase of query processing, where
the system checks the syntax of the query and creates a
parse tree (Park et al., 2013).
2. Optimization: In this phase, the system generates different
execution plans for the query and selects the most efficient
one based on the query cost model.
CHAPTER
1
Introduction to Database Systems 21

3. Code Generation: Once the optimized execution plan is


selected, the system generates the code to execute the KEYWORD
query (Antoshenkov & Ziauddin, 1996). Execution plan is
generated when you
1.8.2. Query Optimization Techniques execute any query
which necessarily
There are several query optimization techniques used in database includes the query
along with the plan.
systems, including:
1. Cost-based Optimization: This technique uses a cost
model to estimate the execution time and selects the plan
with the lowest estimated cost (Alom et al., 2009).
2. Rule-based Optimization: This technique uses a set of
predefined rules to generate execution plans.
3. Dynamic Programming: This technique solves the query
optimization problem by breaking it down into smaller sub-
problems (Ouzzani & Bouguettaya, 2004).

1.8.3. Query Execution Plans


The execution plan describes how the system will execute the
query. It includes information on how the system will access the
data, the order in which the operations will be performed, and
how the results will be returned. The query execution plan is
generated during the query optimization phase and is critical to
the performance of the system. The system will execute the query
according to the plan, and any inefficiencies in the plan can lead
to slow performance and decreased efficiency.
In conclusion, query processing and optimization are critical
components of a database system. They ensure efficient and fast
retrieval of data, and the performance of the system depends on
their proper implementation. A well-designed query processing and
optimization system can make a significant difference in the overall
performance and efficiency of a database system.

1.9. IMPORTANCE OF DATABASE SYSTEMS IN


MODERN WORLD
Database systems play a crucial role in the modern world where
data has become an essential part of businesses, organizations,
and individuals. Some of the key reasons why database systems
are important in the modern world are discussed in subsections.
CHAPTER
1
22 Fundamentals of Database Systems

1.9.1. Data Management


Database systems help organizations manage large volumes of
KEYWORD data efficiently (Chaudhri et al., 2003). It provides a structured
way to store, manage, and retrieve data, ensuring data accuracy
Database systems and consistency.
is software that
caters to the
collection of 1.9.2. Decision Making
electronic and digital
records to extract Database systems provide a platform to analyze and process data,
useful information
which can help in making informed decisions (Wang et al., 2016).
and store that
information. For instance, businesses can analyze sales data to determine
which products are selling well and which ones need improvement.

1.9.3. Improved Efficiency


With a database system, data can be accessed and processed
faster, reducing the time required to perform routine tasks. This
improved efficiency can help organizations to be more productive
and profitable.

1.9.4. Cost Savings


Database systems can help businesses save money by reducing the
need for physical storage space, paper records, and manual data
entry. It can also reduce the likelihood of errors and inconsistencies,
which can be costly to correct.

1.9.5. Data Security


Database systems can help organizations secure sensitive data and
prevent unauthorized access. By implementing user authentication
and access control mechanisms, businesses can protect data from
internal and external threats (Malik & Patel, 2016).

1.9.6. Compliance
Many organizations are required to comply with regulatory
requirements such as GDPR, HIPAA, and PCI DSS. Database
systems can help ensure compliance by providing features such
as auditing, logging, and encryption (Wang, 2022).

CHAPTER
1
Introduction to Database Systems 23

1.9.7. Innovation
Database systems provide a foundation for innovation in areas
such as machine learning, artificial intelligence, and the internet
of things (Bradley et al., 2019). By leveraging data stored in a
database, businesses can gain insights into new opportunities,
products, and services.
Overall, the importance of database systems in the modern
world cannot be overstated. They provide a structured way to
manage and analyze data, enabling businesses and organizations
to make informed decisions and stay competitive in a data-driven
world.

ACTIVITY 1.1.
You are tasked with explaining the basic components of a database system to a
group of stakeholders. Describe the various components, such as the data model,
database schema, query language, and database management system, and explain
how they work together to store, manage, and retrieve data.

CHAPTER
1
24 Fundamentals of Database Systems

1.10. SUMMARY
The chapter on Introduction to Database Systems provides an overview of the fundamentals
of database systems. It starts by defining what database systems are and their importance
in the modern world. The chapter then delves into the history of database systems,
highlighting the major milestones that have led to the development of modern database
systems. The chapter also discusses the different components of database systems,
including DDL, DML, DQL, DCL, and DBMS. The discussion on each component provides
a detailed explanation of what it does and its role in the database system.
The chapter also covers data storage and retrieval, with a focus on primary storage,
secondary storage, indexing, query optimization, and query execution. Additionally, the
chapter highlights the key characteristics of database systems, such as data independence,
concurrent access, data integrity, security, and scalability. The architecture of database
systems is also discussed in detail, with an emphasis on the client-server architecture,
tiered architecture, and distributed architecture. Finally, the chapter provides an overview of
query processing and optimization, covering query processing phases, query optimization
techniques, and query execution plans.

REVIEW QUESTIONS
1. Define database systems and explain their importance.
2. Describe the components of database systems and their roles.
3. Explain the different types of data storage in database systems.
4. Discuss the characteristics of database systems and their significance.
5. Compare and contrast the different types of database architecture.
6. Describe the phases of query processing and explain their importance.
7. Explain the techniques used for query optimization.
8. Discuss the role of query execution plans in database systems.

MULTIPLE CHOICE QUESTIONS


1. Which of the following is not a component of database systems?
a. Data definition language
b. Data control language
c. Data access language
d. Database management system
2. Which of the following is a characteristic of database systems?
a. Scalability
b. Flexibility
CHAPTER
1
Introduction to Database Systems 25

c. Reliability
d. Speed
3. Which of the following is a type of database architecture?
a. Centralized architecture
b. Decentralized architecture
c. Distributed architecture
d. All of the above
4. Which of the following is used to define the structure of a database?
a. Data definition language
b. Data manipulation language
c. Data query language
d. Data control language
5. Which type of storage is volatile in nature?
a. Primary storage
b. Secondary storage
c. Both (a) and (b)
d. None of the above
6. Which of the following is not a phase of query processing?
a. Parsing
b. Optimization
c. Execution
d. Storage
7. Which type of architecture allows for scalability in database systems?
a. Client-server architecture
b. Tiered architecture
c. Distributed architecture
d. All of the above
8. Which of the following is not a characteristic of database systems?
a. Scalability
b. Reliability
c. Compatibility
d. Security

Answers to Multiple Choice Questions


1. (a); 2. (d); 3. (b); 4. (c); 5. (a); 6 (b); 7. (c); 8. (d)
CHAPTER
1
26 Fundamentals of Database Systems

REFERENCES
1. Abadi, D., Boncz, P., Harizopoulos, S., Idreos, S., & Madden, S., (2013). The design
and implementation of modern column-oriented database systems. Foundations and
Trends® in Databases, 5(3), 197–280.
2. Alom, B. M., Henskens, F., & Hannaford, M., (2009). Query processing and optimization
in distributed database systems. IJCSNS, 9(9), 143.
3. Anjard, R. P., (1994). The basics of database management systems (DBMS). Industrial
Management & Data Systems, 94(5), 11–15.
4. Antoshenkov, G., & Ziauddin, M., (1996). Query processing and optimization in oracle
Rdb. The VLDB Journal, 5, 229–237.
5. Anwar, I., (2018). Penerjemahan Teks Bahasa Indonesia Menjadi Data Definition
Language (Ddl) Dengan Penanganan Kalimat Majemuk (2nd edn., pp. 4–10). Doctoral
dissertation, Universitas Komputer Indonesia.
6. Azeroual, O., Saake, G., & Schallehn, E., (2018). Analyzing data quality issues in
research information systems via data profiling. International Journal of Information
Management, 41, 50–56.
7. Barrett, T., & Edgar, R., (2006). [19] gene expression omnibus: Microarray data
storage, submission, retrieval, and analysis. Methods in Enzymology, 411, 352–369.
8. Bernstein, P. A., & Goodman, N., (1981). Concurrency control in distributed database
systems. ACM Computing Surveys (CSUR), 13(2), 185–221.
9. Bertino, E., & Sandhu, R., (2005). Database security-concepts, approaches, and
challenges. IEEE Transactions on Dependable and Secure Computing, 2(1), 2–19.
10. Bradley, D., Merrifield, M., Miller, K. M., Lomonico, S., Wilson, J. R., & Gleason,
M. G., (2019). Opportunities to improve fisheries management through innovative
technology and advanced data systems. Fish and Fisheries, 20(3), 564–583.
11. Brodsky, A., Bhot, M. M., Chandrashekar, M., Egge, N. E., & Wang, X. S., (2009). A
decisions query language (DQL) high-level abstraction for mathematical programming
over databases. In: Proceedings of the 2009 ACM SIGMOD International Conference
on Management of Data (Vol. 1, pp. 1059–1062).
12. Cai, L., & Zhu, Y., (2015). The challenges of data quality and data quality assessment
in the big data era. Data Science Journal, 14(1), 4–10.
13. Chaudhri, A. B., Rashid, A., & Zicari, R., (2003). XML Data Management: Native
XML and XML-Enabled Database Systems (Vol. 4, No. 1, pp. 2–9.). Addison-Wesley
Professional.
14. Chessa, S., & Maestrini, P., (2003). Dependable and secure data storage and retrieval
in mobile, wireless networks. In: DSN (Vol. 2003, pp. 207–216).
15. Connolly, T. M., & Begg, C. E., (2005). Database Systems: A Practical Approach to
Design, Implementation, and Management (2nd edn., pp. 1–15). Pearson Education.

CHAPTER
1
Introduction to Database Systems 27

16. Coronel, C., & Morris, S., (2016). Database systems: Design, implementation, &
management. Cengage Learning, 3(2), 2–8.
17. DeFazio, S., Krishnan, R., Srinivasan, J., & Zeldin, S., (2001). The importance of
extensible database systems for e-commerce. In: Proceedings 17th International
Conference on Data Engineering (Vol. 1, pp. 63–70). IEEE.
18. DeWitt, D., & Gray, J., (1992). Parallel database systems: The future of high
performance database systems. Communications of the ACM, 35(6), 85–98.
19. Dittrich, K. R., Gatziu, S., & Geppert, A., (1995). The active database management
system manifesto: A rule base of ADBMS features. In: Rules in Database Systems:
Second International Workshop, RIDS’95 Glyfada, Athens, Greece, September 25–27,
1995 Proceedings 2 (Vol. 1, pp. 1–17). Springer Berlin Heidelberg.
20. Elmasri, R., & Navathe, S. B., (2006). Database Systems: Models, Languages, Design,
and Application Programming (Vol. 4, No. 1, pp. 7–9). Pearson Education India.
21. El-Mehalawi, M., & Miller, R. A., (2003). A database system of mechanical components
based on geometric and topological similarity. Part I: Representation. Computer-
Aided Design, 35(1), 83–94.
22. Fikes, R., Hayes, P., & Horrocks, I., (2002). DQL-a Query Language for the Semantic
Web (2nd edn., pp. 4–9). Knowledge Systems Laboratory.
23. Ghali, M. J. A., & Abu-Naser, S. S., (2019). ITS for Data Manipulation Language
(DML) Commands Using SQLite (3rd edn., pp. 6–9).
24. Grad, B., & Bergin, T. J., (2009). Guest editors’ introduction: History of database
management systems. IEEE Annals of the History of Computing, 31(4), 3–5.
25. Heanue, J. F., Bashaw, M. C., & Hesselink, L., (1994). Volume holographic storage
and retrieval of digital data. Science, 265(5173), 749–752.
26. Hellerstein, J. M., & Stonebraker, M., (2005). Readings in Database Systems (Vol.
1, pp. 2–5). MIT press.
27. Hurson, A. R., Miller, L. L., Pakzad, S. H., Eich, M. H., & Shirazi, B., (1989).
Parallel architectures for database systems. In: Advances in Computers (Vol. 28,
pp. 107–151). Elsevier.
28. Jäkel, T., Kühn, T., Voigt, H., & Lehner, W., (2014). RSQL-a query language for
dynamic data types. In: Proceedings of the 18th International Database Engineering
& Applications Symposium (Vol. 1, pp. 185–194).
29. Jukic, N., Vrbsky, S., Nestorov, S., & Sharma, A., (2014). Database Systems:
Introduction to Databases and Data Warehouses (Vol. 1, p. 400). Pearson.
30. Kuhlenkamp, J., Klems, M., & Röss, O., (2014). Benchmarking scalability and
elasticity of distributed database systems. Proceedings of the VLDB Endowment,
7(12), 1219–1230.
31. Lam, K. Y., & Kuo, T. W., (2000). Real-Time Database Systems: Architecture and
Techniques (2nd edn., Vol. 593, pp. 4–10). Springer Science & Business Media.

CHAPTER
1
28 Fundamentals of Database Systems

32. Levine, H. G., (1985). Principles of data storage and retrieval for use in qualitative
evaluations. Educational Evaluation and Policy Analysis, 7(2), 169–186.
33. Litwin, W., & Abdellatif, A., (1987). An overview of the multi-database manipulation
language MDSL. Proceedings of the IEEE, 75(5), 621–632.
34. Malik, M., & Patel, T., (2016). Database security-attacks and control methods.
International Journal of Information, 6(1, 2), 175–183.
35. Maryanski, F., Bedell, J., Hoelscher, S., Hong, S., McDonald, L., Peckham, J., &
Stock, D., (1986). The data model compiler: A tool for generating object-oriented
database systems. In: Proceedings on the 1986 International Workshop on Object-
Oriented Database Systems (Vol. 1, pp. 73–84).
36. McLeod, D., & Heimbigner, D., (1980). A federated architecture for database systems.
In: Proceedings of the May 19–22, 1980, National Computer Conference (Vol. 1,
pp. 283–289).
37. Miklau, G., Levine, B. N., & Stahlberg, P., (2007). Securing history: Privacy and
accountability in database systems. In: CIDR (2nd edn., pp. 387–396).
38. Naughton, J., (1985). Data independent recursion in deductive databases. In:
Proceedings of the Fifth ACM SIGACT-SIGMOD Symposium on Principles of Database
Systems (2nd edn., pp. 267–279).
39. Olle, T. W., (2003). Database management system (DBMS). In: Encyclopedia of
Computer Science (Vol. 1, pp. 517–520).
40. Ouzzani, M., & Bouguettaya, A., (2004). Query processing and optimization on the
web. Distributed and Parallel Databases, 15(1), 187–218.
41. Park, H., Pang, R., Parameswaran, A., Garcia-Molina, H., Polyzotis, N., & Widom,
J., (2013). An overview of the deco system: Data model and query language; query
processing and optimization. ACM SIGMOD Record, 41(4), 22–27.
42. Rao, T. R., Mitra, P., Bhatt, R., & Goswami, A., (2019). The big data system,
components, tools, and technologies: A survey. Knowledge and Information Systems,
60(1), 1165–1245.
43. Rawat, B., & Purnama, S., (2021). MySQL database management system (DBMS) on
FTP site LAPAN bandung. International Journal of Cyber and IT Service Management,
1(2), 173–179.
44. Sawyer, P., & Mariani, J. A., (1995). Database systems: Challenges and opportunities
for graphical HCI. Interacting with Computers, 7(3), 273–303.
45. Shaw, S., Vermeulen, A. F., Gupta, A., Kjerrumgaard, D., Shaw, S., Vermeulen, A.
F., & Kjerrumgaard, D., (2016). Data manipulation language (DML). Practical Hive:
A Guide to Hadoop’s Data Warehouse System, 1, 77–98.
46. Sheth, A. P., & Larson, J. A., (1990). Federated database systems for managing
distributed, heterogeneous, and autonomous databases. ACM Computing Surveys
(CSUR), 22(3), 183–236.

CHAPTER
1
Introduction to Database Systems 29

47. Sokolinsky, L. B., (2004). Survey of architectures of parallel database systems.


Programming and Computer Software, 30, 337–346.
48. Ulusoy, Ö., (1995). Research issues in real-time database systems: Survey paper.
Information Sciences, 87(1–3), 123–151.
49. Wang, H., Xu, Z., Fujita, H., & Liu, S., (2016). Towards felicitous decision making: An
overview on challenges and trends of big data. Information Sciences, 367, 747–765.
50. Wang, L., (2022). Providing compliance in critical computing systems. In: System
Dependability and Analytics: Approaching System Dependability from Data, System
and Analytics Perspectives (Vol. 1, pp. 191–206). Cham: Springer International
Publishing.
51. Wang, R. Y., & Strong, D. M., (1996). Beyond accuracy: What data quality means
to data consumers. Journal of Management Information Systems, 12(4), 5–33.
52. Wells, G., (2001). Data definition language. In: Code Centric: T-SQL Programming
with Stored Procedures and Triggers (Vol. 1, pp. 35–70). Berkeley, CA: A press.
53. Widom, J., & Ceri, S., 1995). Active Database Systems: Triggers and Rules for
Advanced Database Processing (Vol. 2, No. 1, pp. 5–10). Morgan Kaufmann.
54. Wu, M., (2018). Property Graph Type System and Data Definition Language (2nd
edn., pp. 5–9). arXiv preprint arXiv:1810.08755.
55. Wyatt, J. C., (1994). Clinical data systems, part 2: Components and techniques.
The Lancet, 344(8937), 1609–1614.
56. Xue, J., Xu, C., & Bai, L., (2019). DStore: A distributed system for outsourced data
storage and retrieval. Future Generation Computer Systems, 99(1), 106–114.
57. Yesin, V., Karpinski, M., Yesina, M., Vilihura, V., & Warwas, K., (2021). Ensuring
data integrity in databases with the universal basis of relations. Applied Sciences,
11(18), 8781.
58. Yeung, A. K., & Hall, G. B., (2007). Spatial Database Systems: Design, Implementation
and Project Management (Vol. 87, pp. 4–8). Springer Science & Business Media.
59. Yu, J. H., & Zhou, Z. M., (2019). Components and development in big data system:
A survey. Journal of Electronic Science and Technology, 17(1), 51–72.

CHAPTER
1
CHAPTER 2

DATABASE
DEVELOPMENT PROCESS

UNIT INTRODUCTION
The main part of software engineering is to subdivide the process of development into
a sequence of steps or stages, and each of these steps emphasizes one feature of the
development. If we collect all of these steps, it is known as the software development
life cycle (SDLC; Vela et al., 2004). A software invention has to move through SDLC until
it cannot be used anymore. Ideally, it is possible to check each phase of the life cycle
whether is correct or not before we move to the next phase.

Learning Objectives
At the end of this chapter, readers will be able to understand:
• Development life cycle or waterfall cycle of the database;
• Database life cycle;
• Requirement gathering as the first phase of development;
• Analysis phase of the database development;
• Logical design database development;
• Implementation of the database development cycle;
• Realizing the design of the development process;
• Populating the database;
• Guiding principles for developing an ER diagram.
32 Fundamentals of Database Systems

Key Terms
• Computer system
• Database
• Design
• Development
• Life cycle
• Logical
• Maintenance
• Requirement
• Waterfall

CHAPTER
2
Database Development Process 33

2.1. DEVELOPMENT LIFE CYCLE – WATERFALL


We will begin with an outline of a waterfall model by the way it is
presented in most of the software engineering course books. As
we can see in Figure 2.1, a common waterfall model is presented
that can be applied to nearly every computer system development
(Kramer, 2018). It depicts a method in which an order of steps
is involved. The output from one phase acts as the input for the
succeeding phase and each phase has to be accomplished before
we move to the succeeding phase.

Figure 2.1. Waterfall


model.

Source: Ahmad Mukhtar, Creative Commons License.

The waterfall process can be passed down to identify the


required tasks and input and output for each step. The points that
are important in the scope of activities are summarized below:
• A statement of requirements is necessary which involves
the agreement among, and consultation with stakeholders
regarding what they demand from the system (Rastogi,
2015).
• The analysis begins by compelling a version of the
declaration of requirements and finishes with generating
a system requirement. This specification formally represents
how a system ought to work, indicated in forms that are
not dependent of how it might be understood.
CHAPTER
2
34 Fundamentals of Database Systems

• A design starts with a system description and generates


designed documents. It gives a detailed account of the
construction of the system (Bassil, 2012).
• The creation of a computer system conforming to a specific
design document and considering the surroundings where
the scheme will operate, for sample, specific software
or hardware accessible for development, is known as
application. It can be staged and contains a preliminary
scheme that can be authenticated and verified before the
KEYWORD release of a concluding system.
• The implemented system is compared with the design
Waterfall model document and requirements specification through testing
is a breakdown of (Trivedi & Sharma, 2013). This process generates a
project activities reception report or a record containing bugs and errors
into linear
that need an appraisal of the implementation, analysis,
sequential phases,
and design procedure to be precise (Testing typically
meaning they
leads the waterfall model to keep on repeating through
are passed down
onto each other,
the life cycle).
where each phase • Maintenance is the process of coping with the changes
depends on the in the implementation of the requirements (Hardyanto
deliverables of the et al., 2017). It involves fixing a bug or relocating the
previous one and system to another environment (e.g., migration of a scheme
corresponds to a from a separate PC to a UNIX computer terminal). As
specialization of maintenance requires investigation of required changes,
tasks. solution design, testing, and implementation of the solution
above the lifespan of a software system, and the waterfall
life sequence demands to be revisited repeatedly.

2.2. DATABASE LIFE CYCLE


The waterfall cycle could be used as the base for the model
database advancement that includes three norms:
• Database development can be separated. It means to
specify and create a scheme for data in the database from
the procedures of the user making usage of the database
(Saouter & Hoof, 2002);
• The three-schema architecture can be used as a base to
distinguish the actions linked with a schema; and
• The limitations could be represented to implement the
semanticist of the data one time in a database, instead
of each user process using the data (Frischknecht et al.,
2005).
CHAPTER
2
Database Development Process 35

This figure depicts an ideal of the actions plus the output


obtained for database developments. This is not just a logical
approach but can be applied to any class of DBMS (Jungbluth,
(2005).
The process of database application development involves
acquiring real-world requirements, analysis of necessities, plotting
the data, and system roles, and employing the processes in the
system.

2.3. REQUIREMENTS GATHERING


The initial phase is to gather supplies and it requires the database Remember
engineers to meet the clients to comprehend the suggested system The database
and find out and manuscript the functional requirements and data life cycle refers
(Holtzblatt & Beyer, 1995). This process results in the development to the various
stages involved
of a document containing all the comprehensive requirements in designing,
gathered from the users. developing,
implementing,
and maintaining a
database system.

Figure 2.2. Illustration


of the requirement
gathering and its ways.

Source: NMG Technologies, Creative Commons License.

CHAPTER
2
36 Fundamentals of Database Systems

Establishing requirements is a process of discussion and


agreement among the users regarding the data they wish to stick
with a contract as to the explanation and meaning of the data
basics. The major role during this process is played by the data
administrator who overviews the business and moral and legal
perspectives inside the organization that affect the data necessities
(Bekker et al., 2003) (Figure 2.2).
A data requisite document describes the demands of the users.
This document should not be highly encoded or much formal so
that it is easy to understand. It should provide a summary of the
user requirements rather than just a compilation of individual needs
because the purpose is to generate a single shared database
(Newell et al., 2006).
The conditions ought to be about the data items, their attributes,
the limitations that affect them, and the relationship between these
KEYWORD items, rather than how to process data.
Data analysis
is the process
2.4. ANALYSIS
of inspecting,
cleansing, Data requirements are necessary to initiate data analysis to produce
transforming, and a theoretical data model. The purpose of the examination is to
modeling data gain a comprehensive explanation of the data suitable to come
with the goal across the operator requirements to deal with the low and high-
of discovering level characteristics of data and their usage (Batini et al., 1986).
useful information, These comprise characteristics for example; the potential choice
informing
of values that may be allowed for properties for example, in the
conclusions,
database of a school, the course title, course code, and the acclaim
and supporting
points of the students.
decision-making.
The theoretical data model gives a proper characterization of the
communication between the developers and the clients throughout
database development. It focuses on data in a database, regardless
of the ultimate use of the data in the processes of users or the
use of the data in specified computer settings (Ratner, 2003). So,
a theoretical model of the data is focused on the structure and
meaning of data rather than the particulars of implementation.
The conceptual data model now is a recognized illustration
of which data, a database must have and the restraints the
data required to be satisfied. This ought to be exposed in such
terms that are not dependent on what way the model might be
applied. Consequently, analysis emphasizes the queries, “What is
necessary?” not “In what way is it accomplished?”
CHAPTER
2
Database Development Process 37

2.5. LOGICAL DESIGN


Database design started by way of a conceptual data model, which
in addition yields a specification of a rational schema; then this will
find out the precise form of database scheme (relational, network,
object-oriented) which are mandatory. The relational demonstration
is yet self-governing of any definite DBMS; this one is an additional
conceptual data model (Teorey et al., 2011). Invest in logical design
planning to create an
We can practice a complementary illustration of the conceptual
efficient and easily
data model by way of input to the rational/logical design method. The maintainable database
outcome of this phase is a thorough complimentary specification, structure.
and logical schema, of every table and limitation, desired to fulfill
the explanation of all the data in conceptual data models (Banek
et al., 2005). Throughout this project’s activities, selections are
based on which tables are maximally suitable as a representative
of the database data. These selections essentially must account
for numerous design standards comprising of, let’s say, control of
duplication, flexibility for change, and which best wat is to symbolize
the limitations. This will be a table well-defined by the rational
schema that will find out what data should be saved, and in what
manner they could be operated in the databases (Figure 2.3).

Figure 2.3. Sequence


of the logical design in
database.

Source: Mohamed Rasheed Gomaa, Creative Commons License.

Database creators who use relational databases plus SQL may


be desirous to drive straight to operation once they have designed
a conceptual data model. On the other hand, such a straight
conversion of the complementary illustration to the SQL tables
may not be essential consequences in databases that will have
all the essential characteristics: integrity, completeness, flexibility,
usability, and efficiency An upright conceptual data model remains
an important first stage in the direction of a database having these

CHAPTER
2
38 Fundamentals of Database Systems

characteristics, but then again it does not mean that this straight
conversion to SQL tables spontaneously yields an upright database
(Halevi et al., 1995). This initial phase will precisely exemplify the
tables and constrictions preferred to fulfill the conceptual data model
explanation, plus so will fulfill the comprehensiveness and reliability
demands, however, it might be in supple or deal bad usage. The
initial design is formerly flexed to advance the excellence of the
database design. Flexing is a terminology that is envisioned to
internment the concurrent concepts of curving some things for a
dissimilar determination and dwindling facets of that one, such as
it is curved.

Figure 2.4. A summary of the repetitive stages involved in designing


database.

Source: Kate Eby, Creative Commons License.


CHAPTER
2
Database Development Process 39

Figure 2.4 is summarizing the repeated (iterative) stages


involved in designing a database, grounded on the given outline
(Higa & Sheng, 1989). The main aim of this is to differentiate
the overall concerns of what tables ought to use beginning the
comprehensive definition of the fundamental portions of every table.
These tables are accounted for individually at a time, even though
they are dependent on each other. Each repetition that includes
a review of these tables would clue to a novel design; together
they are commonly stated by way of second-cut designs, rather
the method repeats additionally than one loop.
Primarily, for a certain conceptual data model, the aforementioned
is not essential that every user requisite it signifies be fulfilled by KEYWORD
a solitary database (Kalninˆ & Fricnoviˆs, 1979). There could be Database
numerous motives for the progression of additional databases than development is
one database, for instance, as the necessity for self-governing designing, creating
processes in dissimilar department regulators and locations upon a database or
“their” data. Still, if the assortment of databases comprises replicated data model,
data and operator’s prerequisite to reach data in additional than one and analyzing
database, at that point there are probable ins and outs that one requirements and
database can fulfill several requirements, or else issues associated their intents as raw
with data duplication and dissemination must be inspected. data.

Following, any of the expectations regarding database


development we may isolate the database development from the
development of operator methods that will make it useful. This is
grounded on the anticipation that, as soon as a database has been
applied, all of the essential data for presently recognized operator
procedures have been distinct and now can be retrieved; but we
need the affability as well to permit us to encounter upcoming
demand variations (Khan, 1984). In evolving a database for more
or fewer applications, it might be probable to foresee the communal
appeals that will be offered to the database so we could enhance
our strategy for the maximum communal appeals.
Third, at a comprehensive stage, several facets of database
proposal and application be contingent on the specific DBMS being
utilized (Romberg, 1981). If the selection of DBMS is secured or
completed former to the designing task, that selection can be utilized
to conclude designing standards relatively than waiting till put into
practice. That is; it is probable to combine designing choices for an
explicit DBMS rather than yielding a generic strategy and formerly
modify it to the DBMS for the duration of the application. It is very
common to figure out that a particular project cannot instantaneously
accomplish entirely all the features of a worthy database. That is
CHAPTER
2
40 Fundamentals of Database Systems

why it is significant that the designer has highlighted these features


(typically using info from the required description); for illustration;
to choose if reliability is more significant than effectiveness and
whether the utility is more significant than flexibility in an assumed
development.
KEYWORD In the last part of our designing phase, the rational schema will
be identified by SQL data description language (DDL) declarations,
Data description
which refer to the database that is required to be applied to
language (DDL)
encounter the operator’s demands.
is a syntax for
creating and
modifying database
objects such as
2.6. IMPLEMENTATION
tables, indices, and The implementation comprises the creation of a database
users. conferring the requirement of a rational schema. It will comprise the
requirements of a suitable stowage schema, safety implementation,
peripheral schema, etc. Employment is profoundly affected by the
choice of accessible DBMSs, database gears, and functioning
environments (Owens, 2007). Here are supplementary tasks
outside simply generating a database schema and applying the
restrictions – data need to be entered within the tables, subjects
concerning the operators and operator practices must be lectured,
and the administration activities supplementary with broader parts
of business data administration must be maintained (Schwaber,
1997). In possession of the DBMS method, we need as several of
these apprehensions as probable to be lectured inside the DBMS.
We aspect at about such apprehensions brie y present day.
In exercise, employment of the rational schema in an agreed
DBMS necessitates very thorough information on the precise
features and conveniences that the DBMS offers. In a perfect
realm, and charge with upright software engineering rehearsal, the
initial point of operation would include corresponding the design
necessities with the finest accessible employing gears and then
utilizing those gears for the execution (Subramanian et al., 2007).
In database standings, this influence includes selecting vendor
yields with DBMS plus SQL alternatives most well-matched to the
database we prerequisite to contrivance. Nonetheless, we don’t
animate in supreme design and additionally normally than not,
hardware optimal choice and choices concerning the DBMS would
have been finished driving in before contemplation of database
design. Subsequently, putting it into practice can comprise more
flexing of the proposal to overwhelm any hardware or software
constraints.
CHAPTER
2
Database Development Process 41

2.7. REALIZING THE DESIGN


Afterward, the rational design has been formed; we want our
database to be shaped conferring to the descriptions we have
created. For putting into practice an interactive DBMS, this
will possibly include the usage of SQL to generate tables and
constrictions that fulfill the rational schema explanation and the
selection of suitable storage schema (the DBMS if licenses that
stage of control) (Cheng et al., 2018).
One method to attain this is to inscribe the suitable SQL DDL
declarations into a le that might be implemented by a DBMS
therefore there is self-governing evidence and a text le, of the
SQL declarations describing the database (Joannou et al., 2020).
Additionally, to work relatively using a database device, such as
SQL Microsoft Access or Server Management Studio. If any kind
of procedure is utilized to instrument the rational schema, the
consequence is that a catalog, with tables and constrictions, is
well-defined but then comprises no data on behalf of the operator’s
processes.

2.8. POPULATING THE DATABASE


Later a database is formed, and there are binary methods of
settling the tables – one or the other from current data or by the
use of the operator applications advanced for the database (Bisbal
et al., 2005).
For a few tables, there might be current data as an additional
Did you Know?
database or there will be fewer data. For instance, in founding a Populating a database
catalog for a clinic, you will suppose that there are previously a involves inserting data
few archives of the entire staff that must be counted in the catalog. into tables, either
Data may also be carried in from external assistance (address lists manually or through
are commonly carried in from exterior corporations) or created in automated processes
the course of a huge data entry assignment (transfiguring hard- such as ETL (extract,
copy booklet archives into processes may be completed by a transform, load).
data entrance agency) (Greene-Colozzi et al., 2021). In these
circumstances, the meekest method to fill the database is to utilize
the importation and exportation services established in the DBMS.
Conveniences to export and import data in several typical
layouts are frequently accessible (these purposes are recognized
in a few of the schemes as well as filling and infilling data)
(Westfechtel, 1999). Bringing in data permits a le of information

CHAPTER
2
42 Fundamentals of Database Systems

to be unoriginal straight on a table. After the data are alleged in a


le layout that is not suitable via the importation function, formerly
it is essential to formulate a submission platform that states in
the ancient data, alters them as essential, and then insertions
into the catalog by the use of SQL code precisely formed for that
determination. The transmission of huge amounts of prevailing data
into a catalog is devoted, such as a bulk load. Bulk filling of data
might include very bulky amounts of data being laden, 1 table at
once, so you might discover that there are DBMS conveniences
to delay constriction examination in anticipation of the completion
of the bulk filling.

2.9. GUIDING PRINCIPLES FOR THE


DEVELOPMENT OF AN ER DIAGRAM
These are overall recommendations that drive assistance in
emerging a sturdy foundation for the real database strategy (the
rational model) (Figure 2.5).

Figure 2.5. An
example of the ER
diagram.

Source: Ravikiran, Creative Commons License.

• Document entire objects concealed in the course of the


information collection phase (Chen, 1997).
• Document entire features that be appropriate for each
entity. Choose primary keys and candidates. Guarantee
that altogether non-key features for every entity stand fully
functional, reliant on the principal key (Teorey et al., 1986).
• Advance an early ER diagram and analyze it through
suitable employees (recall that this is a repetitive process).
CHAPTER
2
Database Development Process 43

• Generate novel entities (tables) aimed at multi-valued


features and reiterating groups. Integrate these novel
entities (tables) in the ER diagram. Appraisal by suitable
employees (Masri et al., 2008).

ACTIVITY 2.1.
Imagine you are part of a team tasked with developing a database system for a
startup company. Describe the stages involved in the database development process,
including requirements gathering, design, implementation, testing, and maintenance.

CHAPTER
2
44 Fundamentals of Database Systems

2.10. SUMMARY
The chapter development procedure of database delivers an outline of the developing
life series of a database. This chapter encompasses the distinct stages incorporated in
database development, counting requirement assembly, logical design, analysis, realizing
the design, implementation, populating the database, and guiding principle for evolving
an ER diagram. Moreover, there are various stages of database progression and the
strategies for generating an ER diagram.

REVIEW QUESTIONS
1. What is the development life cycle of a database? Explain the different phases
involved in the development process.
2. What is requirement gathering? Why is it important in the database development
process?
3. What is the analysis phase of the database development process? Explain its
significance.
4. What is the logical design stage of the database development process? How is
it different from the physical design phase?
5. What is the implementation phase of the database development cycle? Explain
its importance.
6. What is realizing the design in the database development process? How is it
different from the logical and physical design phases?

MULTIPLE CHOICE QUESTIONS


1. The first phase of the development life cycle of a database is:
a. Analysis
b. Design
c. Requirement gathering
d. Implementation
2. Which phase of database development involves the creation of an ER diagram?
a. Analysis
b. Design
c. Requirement gathering
d. Implementation
3. Which phase of the database development cycle involves creating the physical
design of the database?
a. Logical design
CHAPTER
2
Database Development Process 45

b. Analysis
c. Realizing the design
d. Implementation
4. What is the last phase of the database development cycle?
a. Requirement gathering
b. Analysis
c. Design
d. Implementation
5. Which of the following guidelines should be followed while creating an ER
diagram?
a. Use singular nouns for entity names
b. Use plurals for relationship names
c. Use diamonds to represent entities
d. All of the above

Answers to Multiple Choice Questions


1. (c); 2. (b); 3. (c); 4. (d); 5. (d)

REFERENCES
1. Banek, M., Skocir, Z., & Vrdoljak, B., (2005). Logical design of data warehouses
from xml. In: ConTEL (Vol. 5, pp. 289–295).
2. Bassil, Y., (2012). A Simulation Model for the Waterfall Software Development Life
Cycle, 1, 3–6.
3. Batini, C., Lenzerini, M., & Navathe, S. B., (1986). A comparative analysis of
methodologies for database schema integration. ACM Computing Surveys (CSUR),
18(4), 323–364.
4. Bekker, M., Beusmans, J., Keyson, D., & Lloyd, P., (2003). KidReporter: A user
requirements gathering technique for designing with children. Interacting with
Computers, 15(2), 187–202.
5. Bisbal, J., Grimson, J., & Bell, D., (2005). A formal framework for database sampling.
Information and Software Technology, 47(12), 819–828.
6. Chen, P. P. S., (1997). English, Chinese and ER diagrams. Data & Knowledge
Engineering, 23(1), 5–16.
7. Cheng, B., Zhang, J., Hancke, G. P., Karnouskos, S., & Colombo, A. W., (2018).
Industrial cyberphysical systems: Realizing cloud-based big data infrastructures.
IEEE Industrial Electronics Magazine, 12(1), 25–35.

CHAPTER
2
46 Fundamentals of Database Systems

8. Frischknecht, R., Jungbluth, N., Althaus, H. J., Doka, G., Dones, R., Heck, T.,
& Spielmann, M., (2005). The ecoinvent database: Overview and methodological
framework (7 pp). The International Journal of Life Cycle Assessment, 10(1), 3–9.
9. Greene-Colozzi, E. A., Freilich, J. D., & Chermak, S. M., (2021). Developing open-
source databases from online sources to study online and offline phenomena.
Researching Cybercrimes: Methodologies, Ethics, and Critical Approaches (2nd edn.,
pp. 169–190).
10. Halevi, G., Weill, R. D., Halevi, G., & Weill, R. D., (1995). Logical design of a process
plan. Principles of Process Planning: A Logical Approach, 1, 15–35.
11. Hardyanto, W., Purwinarko, A., Sujito, F., & Alighiri, D., (2017). Applying an MVC
framework for the system development life cycle with waterfall model extended. In:
Journal of Physics: Conference Series (Vol. 824, No. 1, p. 012007). IOP Publishing.
12. Higa, K., & Sheng, O. R. L., (1989). An object-oriented methodology for end-user
logical database design: The structured entity model approach. In: [1989] Proceedings
of the Thirteenth Annual International Computer Software & Applications Conference
(Vol. 1, pp. 365–373). IEEE.
13. Holtzblatt, K., & Beyer, H. R., (1995). Requirements gathering: The human factor.
Communications of the ACM, 38(5), 31–32.
14. Joannou, D., Kalawsky, R., Martínez-García, M., Fowler, C., & Fowler, K., (2020).
Realizing the role of permissioned blockchains in a systems engineering lifecycle.
Systems, 8(4), 41–44.
15. Jungbluth, N., (2005). Life cycle assessment of crystalline photovoltaics in the Swiss
ecoinvent database. Progress in Photovoltaics: Research and Applications, 13(5),
429–446.
16. Kalninˆ, J. J., & Fricnoviˆs, G. F., (1979). On the development of an interactive
system for the logical design of discrete devices. In: Software for Computer Control
(Vol. 1, pp. 283–286). Pergamon.
17. Khan, A. A., (1984). A formal technique for the logical design of organizational
information systems (Vol. 1, pp. 5–10). Doctoral dissertation, University of York.
18. Kramer, M., (2018). Best practices in systems development lifecycle: An analyses
based on the waterfall model. Review of Business & Finance Studies, 9(1), 77–84.
19. Masri, K., Parker, D., & Gemino, A., (2008). Using iconic graphics in entity-relationship
diagrams: The impact on understanding. Journal of Database Management (JDM),
19(3), 22–41.
20. Newell, A. F., Carmichael, A., Morgan, M., & Dickinson, A., (2006). The use of
theatre in requirements gathering and usability studies. Interacting with Computers,
18(5), 996–1011.
21. Owens, J. D., (2007). Why do some UK SMEs still find the implementation of
a new product development process problematical? An exploratory investigation.
Management Decision, 1, 2–6.

CHAPTER
2
Database Development Process 47

22. Rastogi, V., (2015). Software development life cycle models-comparison, consequences.
International Journal of Computer Science and Information Technologies, 6(1),
168–172.
23. Ratner, B., (2003). Statistical Modeling and Analysis for Database Marketing: Effective
Techniques for Mining Big Data (Vol. 1, pp. 6–9). CRC Press.
24. Romberg, F. A., (1981). A Logical Design Methodology for Complex Databases Such
as a Manufacturing Operations Database (Vol. 1, pp. 4–8). Southern Methodist
University.
25. Saouter, E., & Hoof, G. V., (2002). A database for the life-cycle assessment of Procter
& gamble laundry detergents. The International Journal of Life Cycle Assessment,
7(1), 103–114.
26. Schwaber, K., (1997). Scrum development process. In: Business Object Design and
Implementation: OOPSLA’95 Workshop Proceedings 16 October 1995, Austin, Texas
(Vol. 1, pp. 117–134). Springer London.
27. Subramanian, G. H., Jiang, J. J., & Klein, G., (2007). Software quality and IS project
performance improvements from software development process maturity and IS
implementation strategies. Journal of Systems and Software, 80(4), 616–627.
28. Teorey, T. J., Lightstone, S. S., Nadeau, T., & Jagadish, H. V., (2011). Database
Modeling and Design: Logical Design (Vol. 3, No. 2, pp. 3–5). Elsevier.
29. Teorey, T. J., Yang, D., & Fry, J. P., (1986). A logical design methodology for relational
databases using the extended entity-relationship model. ACM Computing Surveys
(CSUR), 18(2), 197–222.
30. Trivedi, P., & Sharma, A., (2013). A comparative study between iterative waterfall
and incremental software development life cycle model for optimizing the resources
using computer simulation. In: 2013 2nd International Conference on Information
Management in the Knowledge Economy (Vol. 1, pp. 188–194). IEEE.
31. Vela, B., Acuña, C. J., & Marcos, E., (2004). A model driven approach for XML database
development. In: Conceptual Modeling–ER 2004: 23rd International Conference on
Conceptual Modeling, Shanghai, China, November 8–12, 2004. Proceedings 23 (Vol.
1, pp. 780–794). Springer Berlin Heidelberg.
32. Westfechtel, B., (1999). Models and Tools for Managing Development Processes
(2nd edn., Vol. 1, pp. 8, 9). Springer Science & Business Media.

CHAPTER
2
CHAPTER 3

TYPES OF DATABASES

UNIT INTRODUCTION
Database systems are computer programs that allow users to store, organize, and retrieve
large amounts of data (Güting & Schneider, 1993). There are several different types
of database systems, each with its own strengths and weaknesses. Some of the most
common types of database systems include:
• Hierarchical Databases: These database systems organize data in a tree-like
structure, with each record having one parent and multiple children. Hierarchical
databases are often used in mainframe applications.
• Network Databases: These database systems organize data in a more complex
network-like structure, with each record having multiple parents and children.
Network databases are useful for modeling complex relationships between data
(Albano et al., 1989).
• Relational Databases: These database systems are based on the relational data
model, which organizes data into tables with rows and columns. RDBMSs are
widely used for storing structured data and are popular in applications such as
online transaction processing (OLTP) and business intelligence (BI) (Brodie, 1980).
• Object-Oriented Databases (OODBs): These database systems are designed to
store complex data types such as objects, classes, and inheritance hierarchies.
OODBMSs are often used in object-oriented programming (OOP) languages such
as Java or C++.
• Graph Databases: These database systems are designed to store and query
data that has complex relationships, such as social networks or supply chain
50 Fundamentals of Database Systems

systems. Graph databases use nodes and edges to represent data and can
perform complex queries quickly (Ohori, 1990).
• Document Databases: These database systems store unstructured or semi-
structured data in documents, such as JSON or XML. Document databases are
often used in web applications and content management systems (CMS).
• NoSQL Databases: These database systems are designed to handle large
volumes of unstructured or semi-structured data, such as text, images, or social
media posts. NoSQL database systems are often used in big data applications
or real-time analytics (Schneider, 1997).
Each type of database system has its own unique features and benefits, and choosing
the right system for a particular application requires careful consideration of factors such
as data structure, scalability, performance, and cost.

Learning Objectives
• Define database systems and identify the different types of database systems,
including hierarchical, network, relational, object-oriented, graph, document, and
NoSQL databases.
• Understand the strengths and weaknesses of each type of database system and
their suitability for different applications.
• Explain the relational data model and the basics of RDBMSs.
• Identify the benefits and limitations of using object-oriented databases (OODBs)
in software development.
• Describe the unique features and benefits of graph databases and their use cases.
• Understand the purpose of document databases and their applications in web
development and content management systems (CMS).
• Discuss the advantages and limitations of NoSQL databases in handling unstructured
or semi-structured data and their use in big data applications and real-time analytics.
• Analyze the factors to consider when choosing the appropriate database system
for a particular application, such as data structure, scalability, performance, and
cost.

Key Terms
• Big data
• Data
• Database
• Document
• E-commerce

CHAPTER
3
Types of Databases 51

• Graph
• Hierarchical
• Management
• Model
• Network
• Object-oriented
• Relational
• Social network

CHAPTER
3
52 Fundamentals of Database Systems

3.1. HIERARCHICAL DATABASES


Hierarchical databases are a type of database management system
(DBMS) that organizes data in a tree-like structure, with each record
having one parent and multiple children (Tsichritzis & Lochovsky,
1976). This structure resembles the organizational hierarchy of
a company, hence the name “hierarchical.” We will discuss the
history, importance, applications, advantages, and disadvantages
of hierarchical databases (Figure 3.1).

Figure 3.1. An
example of the
hierarchical database.

Source: Geeks for Geeks, Creative Commons License.

3.1.1. History of Hierarchical Databases


The hierarchical database model was developed in the 1960s and
was one of the earliest forms of DBMS. At that time, computers
were large and expensive, and the data they processed was
mostly structured and hierarchical in nature. The hierarchical
database model was well-suited for this type of data and computing
environment. The first hierarchical DBMS was IBM’s information
management system (IMS), which was introduced in the mid-
1960s. IMS was widely used in large enterprises, particularly in the
banking and insurance industries, where it was used to manage
vast amounts of customer data (Deniša & Ude, 2015). IMS was
based on a hierarchical data model, where data was organized
in a tree-like structure with a single root node at the top and
multiple child nodes branching out from it. Each node represented
a data element or record, and the relationships between nodes
were defined by parent-child relationships (Bouganim et al., 1996).
This structure made it easy to represent data that had a fixed and
predictable structure.

CHAPTER
3
Types of Databases 53

The hierarchical database model was popular in the 1960s and


1970s, but it began to decline in popularity with the advent of the KEYWORD
relational database model in the 1970s. Relational databases offered
more flexibility and scalability than hierarchical databases, making Hierarchical
them better suited for complex and changing data environments database model
(Deng et al., 2009). is a data model in
which the data are
Despite the decline in popularity, hierarchical databases are organized into a
still used in some applications today, particularly in mainframe tree-like structure.
systems and other environments where the data is well-defined
and predictable. They continue to be used in industries such as
banking, insurance, and manufacturing, where there is a need to
store large amounts of structured data. The hierarchical database
model has a long and rich history, and it played an important role
in the early days of DBMS. While it has been largely replaced by
more flexible and scalable database models, it continues to be
used in certain applications today and remains an important part
of the evolution of database technology.

3.1.2. Importance of Hierarchical Databases


Hierarchical databases were one of the earliest forms of DBMS and
played an important role in the development of modern database
technology (Jindal & Bali, 2012). While they have been largely
replaced by more flexible and scalable database models, they
continue to be used in some applications today.
One of the key advantages of hierarchical databases is that they
are well-suited for managing large amounts of structured data that
has a predictable and fixed structure. This makes them ideal for
use in applications such as banking, insurance, and manufacturing,
where there is a need to store and manage vast amounts of data
with a consistent and hierarchical structure.
Another advantage of hierarchical databases is that they are
highly efficient and performant. Because data is organized in a
tree-like structure with a single root node at the top and multiple
child nodes branching out from it, it is easy to navigate and access
data quickly. This makes hierarchical databases well-suited for
applications where speed and performance are critical. Hierarchical
databases also offer a high degree of data security and control
(Shokoufandeh et al., 2005).
Because the data is organized in a fixed and predictable
structure, it is easier to control who has access to different parts
CHAPTER
3
54 Fundamentals of Database Systems

of the database. This makes hierarchical databases ideal for


applications where data security and access control are critical, such
as in financial institutions. Despite these advantages, hierarchical
databases also have some limitations. They are not well-suited for
managing complex or unstructured data, and they can be difficult
to modify or update when the underlying data structure changes
(Nicklin et al., 1985).
KEYWORD This is why they have been largely replaced by more flexible
Unstructured data and scalable database models, such as relational databases.
is information that Hierarchical databases played an important role in the development
is not arranged of modern database technology and continue to be used in some
according to a applications today. While they have some limitations, their strengths
preset data model in managing structured data efficiently and securely make them
or schema, and well-suited for certain applications.
therefore cannot
be stored in a
traditional relational 3.1.3. Applications of Hierarchical Databases
database or
Hierarchical databases have been widely used in a variety of
RDBMS.
applications, especially in early database systems. Some of the
common applications of hierarchical databases are discussed in
subsections.

3.1.3.1. Banking and Financial Systems


Hierarchical databases have been widely used in banking and
financial systems to manage customer data, transactions, and
account information (Hsu & Madnick, 1983). Because hierarchical
databases are highly efficient and performant, they can easily
handle the large volume of structured data that is typically found
in financial systems.

3.1.3.2. Inventory Management Systems


Hierarchical databases have been used in inventory management
systems to manage product data, stock levels, and order information
(Vargo et al., 1992). The predictable and fixed structure of
hierarchical databases makes it easy to manage and track inventory
data.

3.1.3.3. Manufacturing and Production Systems


Hierarchical databases have also been used in manufacturing and
production systems to manage product data, bill of materials, and
CHAPTER
3
Types of Databases 55

production schedules (Kearns & DeFazio, 1983). The hierarchical


structure makes it easy to manage and track data related to different
components and sub-assemblies in the production process.

3.1.3.4. Telecommunications Systems


Hierarchical databases have been used in telecommunications
systems to manage customer data, billing information, and call
records. The efficient and performant nature of hierarchical databases
makes it easy to handle the large volume of structured data that is
typically found in telecommunications systems (Figure 3.2).

Figure 3.2.
Hierarchy of
networks and
systems in a
generic telecom
infrastructure.

Source: Kjell Jørgen Hole, Creative Commons License.

3.1.3.5. Government Systems


Hierarchical databases have been used in various government
systems to manage citizen data, tax records, and other administrative
data (Banerjee et al., 1980). The predictable and fixed structure of
hierarchical databases makes it easy to manage and track data
related to different government departments and agencies.
Hierarchical databases have been used in a wide range of
applications where structured data is the predominant data type.
While they have been largely replaced by more flexible and scalable
database models such as relational databases, they still remain
a viable option for managing structured data in applications that
require a fixed and predictable data structure.

3.1.4. Advantages of Hierarchical Databases


Hierarchical databases have several advantages that make them
well-suited for managing large amounts of structured data with a

CHAPTER
3
56 Fundamentals of Database Systems

predictable and fixed structure. Some of the key advantages of


hierarchical databases are discussed in subsections.

3.1.4.1. Efficient and Performant


Hierarchical databases are highly efficient and performant because
data is organized in a tree-like structure with a single root node at
the top and multiple child nodes branching out from it (Tangorra
& Chiarolla, 1995). This makes it easy to navigate and access
data quickly, which is important in applications where speed and
performance are critical.

KEYWORD 3.1.4.2. Data Security and Control


Data security Because the data in hierarchical databases is organized in a fixed
means protecting and predictable structure, it is easier to control who has access
digital data, such to different parts of the database (Black et al., 2004). This makes
as those in a hierarchical databases ideal for applications where data security
database, from
and access control are critical, such as in financial institutions.
destructive forces
and from the
unwanted actions 3.1.4.3. Simple and Easy to Use
of unauthorized
users, such as a Hierarchical databases are simple and easy to use because they
cyberattack or a have a predictable and fixed structure. This makes it easy to
data breach. understand and navigate the database, even for non-technical
users (Ho & Akyildiz, 1997).

3.1.4.4. Low Maintenance


Because hierarchical databases have a fixed and predictable
structure, they require less maintenance than other types of
databases. This can be particularly important in applications where
downtime or maintenance windows are limited.

3.1.4.5. Well-Established Technology


Hierarchical databases have been around for a long time and are
a well-established technology. This means that there is a large
community of developers and users who are familiar with them
and can provide support and expertise (Abiteboul & Hull, 1986).
Despite these advantages, hierarchical databases also have
some limitations. They are not well-suited for managing complex
or unstructured data, and they can be difficult to modify or update
CHAPTER
3
Types of Databases 57

when the underlying data structure changes. This is why they have
been largely replaced by more flexible and scalable database
models, such as relational databases. However, for applications
that require a fixed and predictable data structure, hierarchical
databases can still be an effective and efficient solution.
KEYWORD
3.1.5. Disadvantages of Hierarchical Databases Relational
While hierarchical databases have several advantages in managing database is a
type of database
structured data, they also have some limitations and disadvantages.
that stores and
Some of the key disadvantages of hierarchical databases are
provides access
discussed in subsections.
to data points that
are related to one
3.1.5.1. Limited Flexibility another.

Hierarchical databases have a rigid structure, which makes it difficult


to accommodate changes to the data structure or schema. Adding
new data fields or restructuring data requires modifying the entire
database structure, which can be time-consuming and expensive
(Ipeirotis et al., 2002).

3.1.5.2. Limited Query Capabilities


Hierarchical databases do not have the same level of query
capabilities as other types of databases, such as relational
databases. This can make it difficult to extract specific subsets
of data or perform complex queries.

3.1.5.3. Limited Scalability


Hierarchical databases are not as scalable as other types of
databases. As the database grows in size, it becomes more difficult
to maintain and manage (Silberschatz & Kedem, 1980). This can
result in slower performance and increased maintenance costs.

3.1.5.4. Limited Data Sharing


Hierarchical databases do not support the same level of data
sharing capabilities as other types of databases. This can make
it difficult to integrate data from multiple sources or share data
with other systems.

CHAPTER
3
58 Fundamentals of Database Systems

3.1.5.5. Limited Data Modeling


Hierarchical databases do not support complex data modeling or
relationships between data entities (Domdouzis et al., 2021). This
can make it difficult to model data in a way that accurately reflects
the real-world relationships between data entities.
In conclusion, while hierarchical databases have advantages
in managing structured data with a predictable and fixed structure,
they also have limitations in terms of flexibility, scalability, and
query capabilities. These limitations make them less suitable for
managing complex or unstructured data, or for applications that
require a high level of flexibility and scalability.

3.2. NETWORK DATABASES


A network database is a type of database model that is designed
to manage complex data structures and relationships between data
entities (Robinson, 2013). It is similar to the hierarchical database
model, but it allows for more flexible relationships between data
entities. In a network database, data is organized into records and
sets, with each set containing one or more records. The relationships
between sets are defined by a network structure, which allows for
many-to-many relationships between data entities (Figure 3.3).

Figure 3.3. An
example of the
network database
model.

Source: Sketch Bubble, Creative Commons License.

In a network database model, data is organized into a series


of interconnected records, with each record containing one or more
data fields. The records are organized into sets, with each set
representing a different type of entity or object (Dunham & Helal,
1995). For example, in a database for a university, there might
be sets for students, professors, courses, and departments. Each
set would contain records with data fields for specific attributes or
properties of that entity.
CHAPTER
3
Types of Databases 59

The network structure in a network database is defined by a set


of pointers that connect related records together. Each record has
one or more pointers that define its relationships to other records in
the database. This allows for many-to-many relationships between
data entities, which is not possible in the hierarchical database
model (Lai & Hsia, 2007). For example, in a university database,
a student record might be connected to multiple course records,
and each course record might be connected to multiple student
records. One of the main advantages of network databases is
their ability to manage complex data relationships. They are well-
suited to applications where there are many-to-many relationships
between data entities, or where the data structure is subject to
frequent changes. Network databases are also highly efficient in
terms of data retrieval, since they use pointers to quickly navigate
between related records (Figure 3.4).

Figure 3.4. An
example of the
complex data
relationship.

Source: Doulkifli Boukraâ, Creative Commons License.

However, network databases also have some disadvantages.


They can be complex and difficult to design and maintain, and
they are not as widely used as other database models such as the
CHAPTER
3
60 Fundamentals of Database Systems

relational database model. Additionally, they are not as scalable as


other database models, since the network structure can become
unwieldy as the database grows in size.
Overall, network databases are a powerful tool for managing
complex data relationships, but they are best suited to applications
where their advantages outweigh their disadvantages.

3.2.1. History of Network Databases


KEYWORD The network database model was first introduced in the late 1960s
as an improvement over the hierarchical database model, which
Network database
had limitations in terms of flexibility and scalability. The network
is based on a
network data
model was developed by Charles Bachman, who was working at
model, which General Electric at the time. Bachman was looking for a way to
allows each record manage complex data relationships in a more flexible and efficient
to be related to way than the hierarchical model allowed (Murty et al., 2011).
multiple primary The first network DBMS was developed by the integrated
records and
data store (IDS) project at the University of Michigan in the early
multiple secondary
1970s. This system, called the Michigan terminal system database
records.
management system (MTSDBMS), was used to manage data for
a variety of research projects at the university (Alonso & Korth,
1993). It was later commercialized as IDMS (integrated database
management system) by Cullinane Corporation.
Other companies, including IBM and Univac, also developed
network database systems in the 1970s. IBM’s System R project,
which began in the mid-1970s, was the first to implement a relational
database model, which eventually became the dominant database
model. However, the network model continued to be used in certain
specialized applications, such as airline reservation systems and
financial transaction systems.
In the 1980s and 1990s, the popularity of network databases
declined as the relational database model became the dominant
database model. The relational model offered greater flexibility and
scalability than the network model, and it was also easier to use
and maintain (Wu & Chang, 1991). However, network databases
continued to be used in certain specialized applications where the
hierarchical structure was better suited to the data.
Today, network databases are still used in some legacy systems,
but they have largely been replaced by relational databases and
other database models that offer greater flexibility, scalability, and

CHAPTER
3
Types of Databases 61

ease of use. However, the network model remains an important


development in the history of database systems, as it paved the
way for later innovations in data management and storage.
KEYWORD
3.2.2. Importance of Network Databases Data management
is the practice
The network database model has played an important role in the of collecting,
development of database systems, particularly in applications that organizing,
require the management of complex data relationships. Some of and accessing
data to support
the key importance of network databases in database systems are
productivity,
discussed in subsections.
efficiency, and
decision-making.
3.2.2.1. Flexibility
The network database model offers greater flexibility than the
hierarchical database model, which allows for more complex data
relationships (Pham & Klamma, 2010). This makes it a good
choice for applications where data entities have many-to-many
relationships.

3.2.2.2. Efficiency
The network database model is highly efficient in terms of data
retrieval, since it uses pointers to quickly navigate between related
records. This makes it a good choice for applications that require
fast access to large amounts of data.

3.2.2.3. Specialized Applications


Network databases are still used in some specialized applications,
such as airline reservation systems and financial transaction
systems, where the hierarchical structure is better suited to the data.

3.2.2.4. Historical Significance


The network database model played an important role in the
development of database systems, and paved the way for later
innovations in data management and storage (Dy-Liacco, 1994). It
represented a significant improvement over the hierarchical model
and paved the way for the development of the relational model.
However, network databases also have some limitations and
challenges. They can be complex and difficult to design and
maintain, and they are not as widely used as other database
CHAPTER
3
62 Fundamentals of Database Systems

models such as the relational database model. Additionally, they


are not as scalable as other database models, since the network
structure can become unwieldy as the database grows in size.
Overall, network databases remain an important part of the
history and development of database systems, and they continue
to be used in some specialized applications where their advantages
outweigh their limitations.

3.2.3. Applications of Network Databases


Network databases have been used in a variety of applications
over the years, particularly in those that require the management
of complex data relationships (Chung et al., 2013). Here are some
of the key applications of network databases in database systems
are discussed in subsections.

3.2.3.1. Financial Systems


Network databases have been used extensively in financial systems,
where they are used to manage complex data relationships between
financial transactions, accounts, and customers (Gavish & Pirkul,
1986). These systems require fast and efficient data retrieval,
which the network database model is well-suited to (Figure 3.5).

Figure 3.5. Network


database in financial
system.

Source: Jianwei Yan, Creative Commons License.

CHAPTER
3
Types of Databases 63

3.2.3.2. Inventory Systems


Inventory management systems often require the management
of complex data relationships between inventory items, suppliers, KEYWORD
and customers (Hesse et al., 1993). The network database model
is well-suited to these applications, since it allows for efficient Inventory
navigation between related records. management
system is a
combination
3.2.3.3. Airline Reservation Systems of hardware
and software
The airline industry has used network databases for many years
technology,
to manage flight schedules, reservations, and passenger data.
which tracks
These systems require fast and efficient data retrieval, as well as and manages
the ability to manage complex data relationships between flights, product inventory,
passengers, and destinations (Chou et al., 2008). product sales and
other production
3.2.3.4. Telecommunications processes.

Network databases have been used extensively in telecommunications


systems, where they are used to manage complex data relationships
between customers, phone numbers, and billing information. These
systems require fast and efficient data retrieval, which the network
database model is well-suited to.

3.2.3.5. Manufacturing Systems


Manufacturing systems often require the management of complex
data relationships between production processes, materials, and
finished products (Nakada et al., 1999). The network database
model is well-suited to these applications, since it allows for efficient
navigation between related records.
Overall, the network database model has been used in a
variety of applications where complex data relationships need to
be managed. While it has been largely replaced by the relational
database model, it remains an important part of the history and
development of database systems, and continues to be used in
some specialized applications where its advantages outweigh its
limitations.

3.2.4. Advantages of Network Databases


The network database model offers several advantages over other
database models, such as the hierarchical model, that make it
CHAPTER
3
64 Fundamentals of Database Systems

well-suited to certain types of applications (Larson, 1983). Here


are some of the key advantages of network databases in database
systems are discussed in subsections.

KEYWORD 3.2.4.1. Flexibility

Data entity is One of the main advantages of the network database model is its
an abstraction flexibility. Unlike the hierarchical model, which can only represent
from the physical one-to-many relationships, the network model can represent
implementation of many-to-many relationships between data entities. This makes it
database tables. a good choice for applications where data entities have complex
relationships.

3.2.4.2. Efficient Data Retrieval


The network database model uses pointers to quickly navigate
between related records (Sato et al., 1997). This makes it highly
efficient in terms of data retrieval, and makes it a good choice
for applications that require fast access to large amounts of data.

3.2.4.3. Scalability
The network database model is highly scalable, since it allows
for the addition of new data entities without having to modify the
entire database structure. This makes it well-suited to applications
that require frequent updates and modifications.

3.2.4.4. Redundancy
The network database model allows for the redundancy of data,
which can improve data integrity and system reliability (Bader et
al., 2003). In the event that one data entity is lost or damaged,
there may be other copies of the same entity elsewhere in the
database.

3.2.4.5. Data Independence


The network database model allows for data independence, which
means that changes to the structure of the database do not
necessarily require changes to the application programs that use the
database (Kolahdouzan & Shahabi, 2004). This can make it easier
to update and modify the database over time. Overall, the network
database model offers several advantages over other database
models, particularly in applications that require the management of
CHAPTER
3
Types of Databases 65

complex data relationships. While it may not be as widely used as


other database models such as the relational model, it remains an
important part of the history and development of database systems,
and continues to be used in some specialized applications where
its advantages outweigh its limitations.

3.2.5. Disadvantages of Network Databases


While the network database model offers several advantages over
other database models, it also has some limitations and drawbacks
that can make it less suitable for certain types of applications.
Here are some of the key disadvantages of network databases
in database systems are discussed in subsections.

3.2.5.1. Complexity
The network database model can be more complex than other
database models, such as the hierarchical model, due to the
KEYWORD
many-to-many relationships between data entities (Padmanabhan
et al., 2008). This complexity can make it more difficult to design Database
and maintain a network database, particularly for applications with administrator is
a large number of data entities. the information
technician who
directs and
3.2.5.2. Lack of Standards
performs all
The network database model lacks a standardized language activities related
for accessing and manipulating data, which can make it more to maintaining
difficult to work with compared to other database models that have a successful
standardized query languages, such as SQL. database
environment.

3.2.5.3. Performance Issues


While the network database model is efficient in terms of data
retrieval, it can suffer from performance issues when dealing with
large amounts of data. This is because navigating the complex
relationships between data entities can be computationally intensive
and time-consuming (de Almeida & Güting, 2005).

3.2.5.4. Limited Support


The network database model is not as widely supported as other
database models, such as the relational model. This can make
it more difficult to find developers and database administrators
(DBAs) with experience working with network databases.
CHAPTER
3
66 Fundamentals of Database Systems

3.2.5.5. Data Redundancy


While data redundancy can be an advantage in some cases, it
can also be a disadvantage in terms of storage space and data
KEYWORD management. Network databases can require significant amounts
of storage space due to the duplication of data, and managing
Data redundancy
and updating redundant data can be complex and time-consuming
is the existence
of data that is (Papadias et al., 2003).
additional to the Overall, the network database model offers several advantages
actual data and over other database models, particularly in applications that require
permits correction the management of complex data relationships. However, it also
of errors in stored
has some limitations and drawbacks that can make it less suitable
or transmitted data.
for certain types of applications. It is important to carefully consider
the requirements of a particular application before choosing a
database model.

3.3. RELATIONAL DATABASES


A relational database is a type of database model that stores
data in tables that are related to each other through key fields
(Teorey et al., 1986). The relational database model is based on
the relational algebra developed by Edgar F. Codd in the 1970s
and has become one of the most widely used database models in
the world. In this model, data is organized into tables, with each
table representing a specific type of data entity (e.g., customers,
orders, products). Each row in a table represents a single instance
of that data entity, and each column represents a specific attribute
of that entity (Outrata & Vychodil, 2012) (Figure 3.6).

Figure 3.6.
Schematic of a
relational database.

Source: Educba, Creative Commons License.


CHAPTER
3
Types of Databases 67

For example, a customer table might have columns for customer


ID, name, address, and phone number, with each row representing
a different customer. An orders table might have columns for
order ID, customer ID, order date, and order total, with each row
representing a different order.
The relationships between tables in a relational database KEYWORD
are defined through key fields. A key field is a field that uniquely Foreign key
identifies each row in a table. For example, in the customer and is a column or
orders tables described above, the customer ID field would be a combination of
key field in the customer table, and the customer ID field in the columns that is
orders table would be a foreign key that links each order to a used to establish
specific customer in the customer table. and enforce a link
between the data
Relational databases are known for their ability to handle
in two tables to
complex relationships between data entities, as well as their ability
control the data
to support complex queries and transactions. They are widely that can be stored
used in a variety of applications, from small business websites to in the foreign key
large-scale enterprise systems. table.

3.3.1. History of Relational Databases


The history of relational databases dates back to the 1970s, when
Edgar F. Codd, a researcher at IBM, introduced the concept of the
relational database model in a paper titled “A Relational Model of
Data for Large Shared Data Banks” (Moszer et al., 1995). Codd’s
paper proposed a new way of organizing and storing data that
differed from the hierarchical and network database models that
were popular at the time.
The relational database model was based on the idea of using
tables to store data, with each table representing a specific type
of entity (e.g., customers, orders, products) and each row in the
table representing a single instance of that entity. The relationships
between tables were defined through key fields, which were used
to link related data together.
Codd’s paper sparked a revolution in the field of database
management, and the relational database model quickly gained
popularity among researchers and practitioners alike. In the
following years, a number of commercial DBMS were developed
that implemented the relational model, including IBM’s System
R, Oracle, and Microsoft SQL Server (Bordoloi & Kalita, 2013).
The widespread adoption of the relational database model was
driven by several factors, including the growing complexity of data
CHAPTER
3
68 Fundamentals of Database Systems

management tasks, the need for standardized data management


techniques, and the desire for more flexible and powerful database
systems (Meier et al., 1994).
Over the years, the relational database model has continued
to evolve, with new features and enhancements being added
to improve performance, scalability, and functionality. Today, the
relational database model remains one of the most widely used
database models in the world, and continues to be an important
tool for managing complex data relationships in a wide range of
applications.

3.3.2. Importance of Relational Databases


Relational databases are important in database systems for a
number of reasons which are discussed in subsections.

3.3.2.1. Structured Data Storage


Relational databases provide a structured way of storing data that
is easy to understand and use (Roussopoulos, 1982). Tables with
rows and columns make it simple to store data in a consistent
and organized manner (Figure 3.7).

Figure 3.7. Comparison


of the structured and
unstructured data.

Source: Lawtomated, Creative Commons License.

CHAPTER
3
Types of Databases 69

3.3.2.2. Data Consistency


With the use of constraints and foreign key relationships, relational
databases ensure that data is consistent and accurate (Blasgen &
Eswaran, 1977). This is important in ensuring that data is reliable KEYWORD
and can be used for decision-making.
Data set is a
3.3.2.3. Querying and Reporting collection of
related, discrete
Relational databases allow for complex querying and reporting of items of related
data, making it possible to extract specific information from large data that may
datasets quickly and efficiently. be accessed
individually or in
combination or
3.3.2.4. Scalability
managed as a
Relational databases are highly scalable, allowing for the addition whole entity.
of new data and the ability to handle large datasets without a
significant reduction in performance (Bhat & Jadhav, 2010).

3.3.2.5. Security
Relational databases provide robust security features that ensure
data is protected from unauthorized access or modification. This
is critical for organizations that store sensitive or confidential data.

3.3.2.6. Integration
Relational databases can be integrated with other systems and
applications, making it easy to share data between different parts
of an organization or with external partners (Stonebraker, 1986).
Overall, relational databases are important in database systems
because they provide a reliable, scalable, and secure way of
storing and managing data that is essential for many organizations
to operate effectively.

3.3.3. Applications of Relational Databases


Relational databases have a wide range of applications in various
industries and domains. Some common applications of relational
databases are discussed in subsections.

3.3.3.1. E-Commerce
Online retailers use relational databases to manage customer
CHAPTER
3
70 Fundamentals of Database Systems

information, orders, and inventory (Stefanidis et al., 2009). This


allows them to easily track sales, manage stock levels, and provide
a personalized shopping experience for customers.
KEYWORD
Financial 3.3.3.2. Banking and Finance
institution is a
company engaged Banks and financial institutions use relational databases to store
in the business customer information, transactions, and account balances. This
of dealing enables them to track and analyze financial data, detect fraudulent
with financial activity, and make informed decisions about lending and investment.
and monetary
transactions such 3.3.3.3. Healthcare
as deposits, loans,
investments, and Healthcare providers use relational databases to store patient
currency exchange. records, medical history, and treatment plans. This allows doctors
and nurses to easily access patient information and provide
personalized care.

3.3.3.4. Human Resources


Companies use relational databases to manage employee
information, including personal details, job titles, and performance
data (Imieliński & Lipski, 1984). This helps HR departments to make
informed decisions about hiring, training, and career development.

3.3.3.5. Government
Government agencies use relational databases to manage citizen
data, track public services, and analyze trends in social and
economic data. This enables them to make evidence-based policy
decisions and provide better services to citizens.

3.3.3.6. Education
Educational institutions use relational databases to manage student
records, grades, and attendance data (Ogle & Stonebraker, 1995).
This allows them to track student progress, identify areas for
improvement, and provide support to students.
Overall, relational databases have a broad range of applications
in various domains and industries, and their versatility and flexibility
make them an important tool for managing and analyzing data.

CHAPTER
3
Types of Databases 71

3.3.4. Advantages of Relational Database


KEYWORD
Relational databases offer several advantages over other types of
databases, which make them a popular choice for managing data Data integrity
in various applications. Some of the key advantages of relational is the overall
databases are discussed in subsections. accuracy,
completeness, and
consistency of
3.3.4.1. Data Integrity and Consistency data.
Relational databases use various constraints like primary keys,
foreign keys, and check constraints to ensure data integrity and
consistency (Levene & Loizou, 2012). This means that data is
accurate, complete, and consistent, which is essential for making
informed decisions.

3.3.4.2. Easy to Use


Relational databases are easy to understand and use, with a
simple structure of tables and relationships. This makes it easy
for users to query and retrieve data from the database, even if
they are not database experts.

3.3.4.3. Flexibility
Relational databases are highly flexible, allowing users to add new
tables, fields, and relationships as needed (Stolte et al., 2002).
This makes it easy to adapt to changing business needs or new
requirements.

3.3.4.4. Scalability
Relational databases are highly scalable and can handle large
amounts of data and users without a significant loss in performance.
This makes it ideal for growing businesses that need to manage
large amounts of data.

3.3.4.5. Security
Relational databases offer strong security features to protect data
from unauthorized access, modification, and theft (Maier, 1983).
This includes access control, data encryption, and data backup
and recovery.

CHAPTER
3
72 Fundamentals of Database Systems

3.3.4.6. Data Sharing


Relational databases allow for easy data sharing between different
applications and users (Padmanabhan et al., 2001). This makes
it easy to integrate data from multiple sources and ensures that
everyone has access to the same data. Overall, the advantages
of relational databases make them a popular choice for managing
data in various applications, from e-commerce to healthcare and
finance. Their flexibility, scalability, and strong security features
make them an ideal choice for businesses and organizations of
all sizes.
KEYWORD
Data consistency
is the accuracy, 3.3.5. Disadvantages of Relational Database
completeness,
While relational databases offer many advantages, they also have
and correctness of
some disadvantages that should be considered before choosing
data stored in a
database. them as a solution. Some of the key disadvantages of relational
databases are discussed in subsections.

3.3.5.1. Complexity
Relational databases can be complex to design, implement, and
maintain, especially for large-scale systems (Codd, 2007). Managing
the relationships between tables and ensuring data consistency
can be challenging and time-consuming.

3.3.5.2. Performance
Relational databases can become slow and inefficient when dealing
with large amounts of data. Complex queries or joins can take a
long time to execute, leading to slower performance and decreased
productivity (Mishra & Eich, 1992).

3.3.5.3. Cost
Relational databases can be expensive, both in terms of software
licensing and hardware requirements. As the amount of data grows,
businesses may need to invest in more powerful servers or storage
solutions to maintain performance.

3.3.5.4. Limited Scalability


While relational databases are scalable, there are limitations to
how much they can scale (Atzeni & De Antonellis, 1993). As the
CHAPTER
3
Types of Databases 73

database grows larger and more complex, it can become more


challenging to maintain and scale.

3.3.5.5. Data Redundancy


In relational databases, data can be duplicated across multiple
tables, which can lead to data redundancy and inconsistencies.
This can make it difficult to maintain data integrity and can lead
KEYWORD
to data quality issues.
Database
technologies take
3.3.5.6. Lack of Flexibility
information and
Relational databases have a rigid structure, with tables and store, organize,
relationships predefined (Jatana et al., 2012). This can make it and process it in a
difficult to adapt to changing business needs or to add new data way that enables
types or fields. Overall, the disadvantages of relational databases users to easily and
should be carefully considered before choosing them as a solution. intuitively go back
While they offer many advantages, businesses, and organizations and find details
need to weigh the costs and complexity of implementing and they are searching
for.
maintaining them against their specific needs and requirements.

3.4. OBJECT-ORIENTED DATABASES (OODBS)


Object-oriented databases (OODBs) are a type of DBMS that are
based on the principles of object-oriented programming (OOP).
OODBs store data as objects, which consist of attributes (data)
and methods (procedures). Each object is associated with a class,
which defines its structure and behavior (De Caluwe, 1997). OODBs
are used in applications that require complex data structures
and relationships, such as scientific research, multimedia, and
e-commerce (Figure 3.8).
The history of OODBs can be traced back to the 1970s and
1980s, when researchers began exploring ways to combine the
benefits of OOP with database technology. The first OODBs were
developed in the 1980s, and by the 1990s, they had become a
popular alternative to traditional relational databases. One of the
key advantages of OODBs is that they provide a more natural and
intuitive way to model complex data structures and relationships.
OOP provides a powerful way to encapsulate data and behavior,
which makes it easier to develop and maintain complex software
applications. OODBs also support more flexible querying and
indexing, which can lead to better performance and scalability.

CHAPTER
3
74 Fundamentals of Database Systems

Figure 3.8. Illustration


of the object-oriented
database.

Source: PhoenixNAP, Creative Commons License.

OODBs are particularly well-suited for applications that require


complex data modeling and relationships, such as scientific research,
Remember multimedia, and e-commerce (DeWitt et al., 1990). For example,
Object-oriented an OODB could be used to store and manage multimedia content,
databases are such as images and videos, along with associated metadata and
designed to store user information. OODBs are also used in scientific research,
and manage
complex objects where they can be used to model complex relationships between
and data structures, data, such as in bioinformatics or genomics.
allowing for efficient
and flexible data However, OODBs also have some disadvantages. One of the
retrieval and main challenges with OODBs is that they can be more difficult
manipulation.
to implement and maintain than traditional relational databases.
This is because they require more specialized knowledge of OOP
principles and programming languages. Additionally, OODBs may
not be as well-suited for applications that require simple data
models and relationships, such as accounting or finance. Finally,
OODBs may not be as widely supported or adopted as traditional
relational databases, which could limit their usefulness in some
contexts.

3.4.1. History of Object-Oriented Databases


(OODBs)
The concept of OOP emerged in the 1960s and 1970s, but it
wasn’t until the 1980s that researchers began exploring ways
to combine the benefits of OOP with database technology. The
first OODBs were developed in the mid-1980s, and they quickly
gained popularity as a powerful alternative to traditional relational
databases (Wuu & Dayal, 1992).

CHAPTER
3
Types of Databases 75

One of the earliest OODBs was the ObjectStore system,


developed by Object Design Inc. in 1986. ObjectStore was based
on the Smalltalk-80 programming language, which was one of KEYWORD
the first languages to support OOP. Other early OODBs included
GemStone and ONTOS, both of which were developed in the late Programming
1980s. language is
a system of
By the early 1990s, OODBs had become a popular choice for notation for
applications that required complex data structures and relationships, writing computer
such as scientific research, multimedia, and e-commerce (Cellary programs.
& Jomier, 1990). However, OODBs were also more complex and
difficult to implement than traditional relational databases, and they
required specialized knowledge of OOP principles and programming
languages.
In the late 1990s and early 2000s, OODBs faced increasing
competition from other types of databases, such as object-relational
databases (ORDBs) and NoSQL databases. ORDBs were designed
to provide some of the benefits of OODBs while still maintaining
compatibility with traditional SQL databases. NoSQL databases,
on the other hand, were designed to provide more flexible and
scalable data storage solutions for web applications and other
large-scale systems (Beeri, 1990).
Despite this competition, OODBs remain a valuable tool for
applications that require complex data modeling and relationships.
Today, OODBs are used in a variety of applications, including
scientific research, multimedia, and e-commerce, as well as in
specialized fields such as bioinformatics and genomics.

3.4.2. Importance of Object-Oriented Databases


(OODBs)
OODBs are important in database systems because they provide a
way to store and manage complex data structures that are difficult
to represent using traditional relational databases. OODBs use the
principles of OOP to create a more natural representation of the
data, which makes it easier to work with and manipulate (Joseph
et al., 1991).
One of the key advantages of OODBs is that they allow
developers to create complex data models that reflect the real-world
relationships between objects. This makes it easier to represent
complex structures such as hierarchies, networks, and graphs
(Kim et al., 1990). For example, an OODB could be used to
CHAPTER
3
76 Fundamentals of Database Systems

model a complex supply chain, with multiple levels of suppliers,


manufacturers, and distributors, each with their own relationships
and attributes.
Another advantage of OODBs is that they provide a more flexible
way to query and manipulate the data. Unlike traditional relational
databases, which require complex SQL queries to join tables
KEYWORD together, OODBs allow developers to query the data using OOP
Polymorphism is principles such as inheritance, polymorphism, and encapsulation.
the provision of a This makes it easier to build complex applications that require
single interface to sophisticated data access and manipulation.
entities of different
OODBs are also important because they support the development
types or the use of
a single symbol to
of software systems that are more modular and reusable (Zand
represent multiple et al., 1995). By encapsulating the data and behavior of objects
different types. within the database, OODBs allow developers to create objects that
can be easily reused across different applications and systems.
This can help to reduce development time and improve the overall
quality of software systems.
Finally, OODBs are important because they support the
development of new applications and systems that require complex
data modeling and analysis. This includes fields such as scientific
research, multimedia, and e-commerce, as well as specialized
fields such as bioinformatics and genomics. By providing a more
natural way to represent and manipulate complex data, OODBs
enable developers to create new applications and systems that
would be difficult or impossible to implement using traditional
relational databases.

3.4.3. Applications of Object-Oriented Databases


(OODBs)
OODBs are widely used in a variety of applications, particularly
those that require the management and analysis of complex data
structures. Some of the key applications of OODBs are discussed
in subsections.

3.4.3.1. Scientific Research


OODBs are commonly used in scientific research, where complex
data structures such as protein structures, molecular interactions,
and genetic sequences must be managed and analyzed. OODBs
provide a natural way to represent these data structures, making

CHAPTER
3
Types of Databases 77

it easier to develop and maintain scientific applications.

3.4.3.2. Multimedia KEYWORD


OODBs are also used extensively in multimedia applications, such Transactional data
as image and video databases (Zdonik & Maier, 1990). These is information that
applications often require the management of large volumes of is captured from
complex data, and OODBs provide a flexible and efficient way to transactions. It
records the time
store and access this data.
of the transaction,
the place where
3.4.3.3. E-Commerce it occurred, the
price points of
OODBs are increasingly used in e-commerce applications, where the items bought,
they are used to manage large volumes of customer data, product the payment
catalogs, and transactional data. OODBs provide a flexible and method employed,
scalable platform for managing these data structures, making it discounts if
easier to build and maintain e-commerce applications (Bernstein, any, and other
1998). quantities
and qualities
3.4.3.4. Finance associated with the
transaction.
OODBs are also used in the finance industry, where they are
used to manage complex financial instruments such as derivatives,
options, and futures. These instruments have complex relationships
and attributes, and OODBs provide a natural way to represent
and manage this data.

3.4.3.5. Content Management


OODBs are commonly used in content management systems
(CMS), where they are used to manage large volumes of content
such as articles, videos, and images. OODBs provide a flexible
and scalable platform for managing this content, making it easier
to build and maintain CMS (Gray et al., 1992).

3.4.3.6. Internet of Things (IoT)


OODBs are increasingly being used in IoT applications, where
they are used to manage large volumes of data generated by
connected devices. These data structures can be highly complex,
with multiple levels of relationships and attributes, and OODBs
provide a natural way to represent and manage this data (Orenstein,
1986). In general, OODBs are used in applications that require the
management and analysis of complex data structures, particularly
CHAPTER
3
78 Fundamentals of Database Systems

those that have relationships and attributes that are difficult to


represent using traditional relational databases.

KEYWORD
3.4.4. Advantages of Object-Oriented Databases
Complex
data type is a
(OODBs)
transformation
OODBs offer a number of advantages over traditional relational
data type that
databases. Some of the key advantages of OODBs are discussed
represents multiple
data values in
in subsections.
a single column 3.4.4.1. Support for Complex Data Structures
position.
OODBs provide a natural way to represent and manage complex
data structures, including hierarchical, network, and graph data
(Hurson et al., 1993). This makes it easier to develop and maintain
applications that require the management of complex data.
3.4.4.2. Improved Performance
Because OODBs are designed to work with complex data structures,
they can often perform more efficiently than relational databases
when dealing with these structures (Leavitt, 2000). This can result in
faster query performance and improved overall system performance.

3.4.4.3. Flexibility
OODBs are highly flexible, allowing developers to easily add new
data types and modify existing data structures as needed. This
makes it easier to adapt to changing business requirements and
to evolve the database schema over time.

3.4.4.4. Object-Oriented Programming (OOP)


OODBs are closely aligned with OOP languages, making it easier for
developers to integrate database functionality into their application
code (Kim et al., 1987). This can improve developer productivity
and make it easier to develop and maintain complex applications
(Figure 3.9).

Figure 3.9. Illustration


of the object-oriented
database and object-
oriented programming.

Source: Ebrary, Creative Commons License.


CHAPTER
3
Types of Databases 79

3.4.4.5. Support for Inheritance


KEYWORD
OODBs support inheritance, which allows developers to reuse
existing data structures and to create new data structures that inherit Referential
properties from existing ones. This can help to reduce development integrity is a
time and improve code quality by promoting code reuse. property of data
stating that all its
references are
3.4.4.6. Improved Data Integrity valid.
OODBs provide better support for data integrity, including the
ability to enforce referential integrity constraints and to perform
complex validation checks (Wells et al., 1992). This can help to
ensure that the data in the database is accurate and consistent.
Overall, OODBs offer a number of advantages over traditional
relational databases, particularly for applications that require the
management of complex data structures. By providing support for
complex data structures, improved performance, flexibility, and
better data integrity, OODBs can help developers to build more
robust and scalable applications.

3.4.5. Disadvantages of Object-Oriented


Databases (OODBs)
Despite their advantages, OODBs also have some disadvantages
that need to be considered when deciding whether to use them for
a particular application. Some of the key disadvantages of OODBs
are discussed in subsections.

3.4.5.1. Complexity
While OODBs are designed to work with complex data structures,
they can also be more complex to design, develop, and maintain
than traditional relational databases (Gotthard et al., 1992). This
can require more specialized skills and expertise, which can be
a challenge for some organizations.

3.4.5.2. Limited Tool Support


Compared to relational databases, there are fewer tools available
for working with OODBs, which can make it more difficult to develop
and maintain applications (Atkinson et al., 1990). This can be a
particular challenge for organizations that rely heavily on third-party
tools and applications.
CHAPTER
3
80 Fundamentals of Database Systems

3.4.5.3. Lack of Standardization


While there are standards for OOP languages, there are no widely
accepted standards for OODBs. This can make it more difficult to
move data between different OODBs or to integrate OODBs with
other systems.

3.4.5.4. Performance Issues


KEYWORD While OODBs can offer improved performance for complex data
structures, they can also be slower than relational databases for
Graph databases some types of queries (Bertino & Martino, 1991). This is because
are purpose-built to OODBs often have to traverse complex object graphs to retrieve
store and navigate
data, which can be slower than using SQL to join tables in a
relationships.
relational database.

3.4.5.5. Scalability
While OODBs can be highly flexible and adaptable, they can also
be more difficult to scale than relational databases. This is because
the complex object graphs in an OODB can make it more difficult to
partition data across multiple servers or to distribute queries efficiently.

3.4.5.6. Cost
Finally, OODBs can be more expensive to license and maintain
than traditional relational databases (Kim, 1990). This is because
there are fewer OODBs on the market and because they require
more specialized skills and expertise to work with.
Overall, OODBs are not always the best choice for every
application. While they offer some significant advantages over
traditional relational databases, they also have some significant
drawbacks that need to be considered. Organizations should
carefully weigh the pros and cons of OODBs when making decisions
about database management.

3.5. GRAPH DATABASES


A graph database is a type of NoSQL database that uses graph
theory to store, map, and query relationships between data elements
(Zhang, 2017). In a graph database, data is represented as nodes,
edges, and properties, which can be used to model complex
relationships between entities (Figure 3.10).
CHAPTER
3
Types of Databases 81

Figure 3.10. Illustration


of a graph database
Source: Neo4j, Inc. Creative Commons License. with an example.

Nodes represent the entities in the database, while edges


represent the relationships between those entities. For example,
in a social network graph database, a node might represent a
person, while an edge might represent a connection between two
people, such as a friendship or a follow relationship. Properties
provide additional information about the nodes and edges, such
as a person’s name or a connection’s strength.
Graph databases are designed to handle complex, highly
connected data sets, making them well-suited for use cases
such as social networks, recommendation engines, and fraud
detection (Singh & Kaur, 2015). Because graph databases can
model relationships between data elements in a natural way, they
can provide powerful insights into how different parts of a system
are connected and how they influence one another. Utilize graph
databases for
complex, interrelated
3.5.1. History of Graph Databases data sets that require
flexibility and high
Graph databases have their roots in graph theory, which is a branch performance.
of mathematics that studies networks of interconnected objects. In
the 18th and 19th centuries, mathematicians began studying graphs
as a way to model and understand complex systems, such as
transportation networks and social relationships (Schindler, 2018).
The concept of graph databases as a computer science tool
emerged in the 1960s and 1970s, when researchers began using
graph theory to model and analyze data in computer systems. In
the 1980s, graph databases became more widely used for specific
applications such as geographic information systems (GIS) and
CHAPTER
3
82 Fundamentals of Database Systems

computer-aided design (CAD) systems. The first commercial graph


database was released in the mid-2000s by a company called Neo
KEYWORD Technology. Neo Technology’s product, called Neo4j, was designed
to provide a high-performance, scalable graph database for modern
Computer-aided applications. Since then, other companies and open-source projects
design is the use have emerged, offering a range of graph databases for different
of computer-based use cases and platforms (Jaiswal & Agrawal, 2013).
software to aid in
design processes Today, graph databases are increasingly used in a wide range
of applications, including social networks, recommendation engines,
fraud detection, and bioinformatics. As more organizations recognize
the value of modeling and analyzing complex relationships between
data elements, the use of graph databases is likely to continue
to grow in popularity.

3.5.2. Importance of Graph Databases


Graph databases are important for several reasons which are
discussed in subsections.

3.5.2.1. Relationship Modeling


Graph databases are designed to handle complex relationships
between data elements (Ranu & Singh, 2009). This makes them well-
suited for modeling and analyzing data that has many interconnected
components. By using a graph database, organizations can better
understand the relationships between different data points, which
can lead to insights and better decision-making (Figure 3.11).

Figure 3.11.
Relationship modeling
in graph database.

Source: Neo4j, Inc. Creative Commons License.


CHAPTER
3
Types of Databases 83

3.5.2.2. Flexibility
Graph databases are highly flexible and can be used for a wide
range of applications (Lee et al., 2012). They can handle unstructured
data as well as structured data, which means they can be used KEYWORD
for a wide range of use cases. This makes them a versatile tool Social networks
for data management. are websites
and apps that
3.5.2.3. Performance allow users and
organizations
Graph databases are optimized for handling complex relationships to connect,
and can deliver high performance for applications that require communicate,
real-time query processing. This makes them well-suited for use share information
cases such as fraud detection, recommendation engines, and social and form
networks, where fast query processing is essential. relationships.

3.5.2.4. Scalability
Graph databases are designed to scale horizontally, which means
that they can handle large amounts of data and can grow as data
needs increase (Mendelzon & Wood, 1995). This makes them a
good choice for organizations that need to store and manage large
amounts of data.
Overall, graph databases are an important tool for data
management and analysis, particularly in applications that involve
complex relationships between data elements. By using a graph
database, organizations can gain new insights into their data,
improve decision-making, and deliver better results.

3.5.3. Applications of Graph Databases


Graph databases have numerous applications across a wide range
of industries. Some of the most common applications are discussed
in subsections.

3.5.3.1. Social Networks


Social networks are built on complex relationships between users,
their friends, and their interactions. Graph databases are ideal for
modeling and analyzing these relationships, making them a popular
choice for social networking platforms (Pokorný et al., 2017).

CHAPTER
3
84 Fundamentals of Database Systems

3.5.3.2. Recommendation Engines


Recommendation engines rely on data about user behavior and
preferences to make personalized recommendations. Graph
databases can be used to model these relationships and deliver
fast, accurate recommendations (Jouili & Vansteenberghe, 2013).

3.5.3.3. Fraud Detection


Graph databases can be used to detect fraudulent activities by
analyzing relationships between different entities (Castellana et al.,
2015). For example, a bank might use a graph database to identify
fraudulent transactions by looking for patterns in the relationships
between account holders, merchants, and transactions (Figure 3.12).

Figure 3.12. Fraud


detection using
graph database.

Source: Jim Webber, Creative Commons License.

3.5.3.4. Knowledge Management


Graph databases can be used to manage complex knowledge
systems, such as those used in healthcare, scientific research, and
engineering (Wood, 2012). By modeling the relationships between
different data elements, graph databases can help organizations better
understand complex systems and make more informed decisions.

3.5.3.5. Logistics and Supply Chain Management


Graph databases can be used to model complex supply chain
networks and optimize logistics operations. By analyzing relationships
between suppliers, customers, and products, organizations can
improve efficiency and reduce costs.
CHAPTER
3
Types of Databases 85

3.5.3.6. Recommendation Engines


Graph databases can be used to model complex relationships
between different products, such as books, movies, and music
(Debrouvier et al., 2021). By analyzing these relationships,
KEYWORD
recommendation engines can make accurate recommendations Data management
to users based on their preferences and behavior. is the practice
of collecting,
Overall, graph databases have a wide range of applications
organizing,
across many industries. Their ability to handle complex relationships and accessing
and deliver fast, accurate insights makes them a valuable tool for data to support
data management and analysis. productivity,
efficiency, and
decision-making.
3.5.4. Advantages of Graph Databases
Graph databases offer several advantages over other types of
databases which are discussed in subsections.

3.5.4.1. Flexibility
Graph databases are highly flexible and can be used to model
a wide variety of data structures, including complex relationships
between entities (ShefaliPatil & Bhatia, 2014). This makes them
ideal for use in applications where data is constantly changing or
evolving.

3.5.4.2. Performance
Graph databases are designed to be highly performant, even when
dealing with large and complex datasets. This is because graph
databases use indexing and caching techniques to retrieve data
quickly, and because they are optimized for querying relationships
between entities.

3.5.4.3. Scalability
Graph databases are highly scalable, meaning that they can handle
large amounts of data and users without sacrificing performance
(Angles et al., 2017). This makes them ideal for use in applications
with high growth potential.

3.5.4.4. Schema-Less Design


Graph databases do not require a fixed schema, meaning that
CHAPTER
3
86 Fundamentals of Database Systems

the database can be easily adapted to changing business needs


without requiring significant modifications to the underlying data
model (Shimpi & Chaudhari, 2012).

3.5.4.5. Real-Time Analysis


KEYWORD
Graph databases can be used to perform real-time analysis of data,
Natural language making them ideal for use in applications where quick decision-
processing is a making is required.
subfield of artificial
intelligence (AI).
It helps machines 3.5.4.6. Natural Language Processing
process and Graph databases are well-suited for natural language processing
understand the
applications, where relationships between entities and concepts
human language
need to be identified and analyzed (Pokorný, 2015).
so that they can
automatically Overall, graph databases offer significant advantages over
perform repetitive other types of databases when it comes to handling complex
tasks. relationships between entities, delivering high performance, and
providing flexibility and scalability.

3.5.5. Disadvantages of Graph Databases


While graph databases offer many advantages, they also have
some disadvantages that should be taken into consideration which
are discussed in subsections.

3.5.5.1. Complexity
Graph databases can be more complex to implement and maintain
than other types of databases (Jin et al., 2010). This is because
they require a deeper understanding of graph theory and specialized
query languages, and may require more advanced technical
expertise to manage.

3.5.5.2. Limited Support


Graph databases are a relatively new technology, and as such,
there may be limited support available in terms of tools, frameworks,
and third-party integrations.

3.5.5.3. Data Modeling Challenges


Graph databases require careful consideration of data modeling to
properly represent relationships between entities (Kumar, 2015).
CHAPTER
3
Types of Databases 87

This can be a challenge, especially when working with complex


data structures or large datasets.

KEYWORD
3.5.5.4. Cost
Semi-structured
Some graph database solutions may be more expensive than data refers to data
other types of databases, which can be a barrier to adoption for that is not captured
smaller organizations or projects. or formatted in
conventional ways.
3.5.5.5. Performance Limitations
While graph databases are designed to be highly performant, their
performance may be limited in some use cases (Batra & Tyagi,
2012). For example, when working with large or complex graphs,
or when performing complex queries that require traversing multiple
relationships.
Overall, while graph databases offer many advantages, they may
not be the best fit for every use case. Organizations considering
using graph databases should carefully evaluate the pros and
cons and determine whether the benefits outweigh the potential
drawbacks.

3.6. NOSQL DATABASES


NoSQL databases, as the name suggests, are databases that do
not use the traditional structured query language (SQL) used in
relational databases. They are designed to handle large volumes
of unstructured or semi-structured data, which are increasingly
becoming more common in modern applications (Liao et al., 2016).
This data can include text, images, videos, and other types of
unstructured data, as well as data from distributed systems, social
networks, and big data analytics (Figure 3.13).
Unlike traditional relational databases, NoSQL databases use
a variety of data models, including key-value stores, document-
oriented databases, graph databases, and column-family stores,
among others. Key-value stores, for example, are simple databases
that store data as key-value pairs, making them ideal for caching
and other performance optimizations. Document-oriented databases,
on the other hand, store data as JSON or XML documents, making
them ideal for handling complex data structures (Figure 3.14).

CHAPTER
3
88 Fundamentals of Database Systems

Figure 3.13.
Illustration of the
NoSQL database.

Source: Educba, Creative Commons License.

Figure 3.14.
Illustration of an
example of the
NOSQL database.

Source: Vladimir Kaplarevic, Creative Commons License.

One of the key advantages of NoSQL databases is their high


scalability and flexibility. They can handle large volumes of data
and distribute data across multiple servers, making them ideal for
applications that require high availability and horizontal scalability.
NoSQL databases are also highly performant, as they can perform
queries and updates quickly, without the overhead of traditional
relational databases.
NoSQL databases are widely used in web applications,
social networks, and big data analytics, as well as in other areas
where large volumes of unstructured data are generated and
need to be processed quickly. They are also popular in the cloud
computing space, as they can be easily scaled up or down to
meet changing demands. However, NoSQL databases also have

CHAPTER
3
Types of Databases 89

their disadvantages, including a lack of standardization and a


higher learning curve for developers who are used to traditional
relational databases. Did you Know?
NoSQL databases
3.6.1. History of NoSQL Databases offer flexibility,
scalability, and high
NoSQL databases have been around for decades, but they only availability by utilizing
gained widespread popularity in the early 2000s. The term “NoSQL” non-relational data
was first used in 1998 to describe a lightweight, web-based database models and distributed
that didn’t use SQL. However, it wasn’t until the mid-2000s that architectures.
NoSQL databases began to gain traction among large-scale web
companies like Google, Amazon, and Facebook (Meier et al., 2019).
These companies needed a way to handle massive amounts of
data and to scale their systems horizontally, which is difficult to
do with traditional relational databases.
In 2007, Amazon released Dynamo, a highly available key-value
store designed for use in distributed systems. The following year,
Google published a paper on Bigtable, a scalable, distributed data
storage system. Both Dynamo and Bigtable are considered to be
early examples of NoSQL databases. In 2009, a group of open-
source developers created the first NoSQL database conference,
which helped to popularize the term and the concept of NoSQL
databases.
Since then, NoSQL databases have continued to evolve
and have become an essential part of many modern software
applications. Today, there are many different types of NoSQL
databases, each with their own strengths and weaknesses. Some
of the most popular types include document databases, graph
databases, key-value stores, and column-family stores. NoSQL
databases are now used by a wide range of organizations, from
small startups to large enterprises, and are often used for big data
and real-time applications.

3.6.2. Importance of NoSQL Databases


NoSQL databases have become increasingly important in modern
computing due to the rise of big data, cloud computing, and web
applications. The importance of NoSQL databases can be explained
in many ways which are discussed in subsections.

CHAPTER
3
90 Fundamentals of Database Systems

3.6.2.1. Scalability
NoSQL databases are designed to handle large amounts of
unstructured and semi-structured data, making them highly scalable
(Zaki, 2014). They can scale horizontally by adding more nodes
to a cluster, allowing them to handle high-traffic websites and
applications with ease.

3.6.2.2. Flexibility
Unlike traditional relational databases, NoSQL databases are
schema-less, which means that they do not require a predefined
schema to store data. This makes it easier to store and retrieve
complex data types, such as graphs, documents, and key-value
pairs.

3.6.2.3. Performance
NoSQL databases are optimized for performance, making them
ideal for high-traffic websites and applications (Moniruzzaman &
Hossain, 2013). They use distributed architectures and caching
KEYWORD techniques to improve query response times and minimize downtime.
Web application
is application 3.6.2.4. Cost-Effective
software that is
accessed using a NoSQL databases are typically open-source and do not require
web browser. expensive licenses or hardware. This makes them a cost-effective
option for businesses and organizations that need to store large
amounts of data.

3.6.2.5. Availability
NoSQL databases are designed to provide high availability and
fault tolerance (Pokorny, 2011). They use techniques such as
replication and sharding to ensure that data is always available,
even in the event of hardware failure or network outages.
Overall, NoSQL databases have become increasingly important
due to their ability to handle big data and scale to meet the
needs of modern computing. They offer flexibility, performance, and
availability, making them a popular choice for web applications,
e-commerce sites, social networks, and other high-traffic websites.

CHAPTER
3
Types of Databases 91

3.6.3. Applications of NoSQL Databases


NoSQL databases are used in a wide range of applications and
industries due to their flexibility, scalability, and performance
advantages over traditional relational databases. Some of the
common applications of NoSQL databases are discussed in
subsections.

3.6.3.1. Big Data


NoSQL databases are commonly used in big data applications,
which involve processing and analyzing vast amounts of data
(Haseeb & Pattun, 2017). These databases can handle large KEYWORD
amounts of unstructured data, such as text, images, and videos.
NoSQL database
provides a
3.6.3.2. E-Commerce mechanism for
storage and
NoSQL databases are used in e-commerce applications, where retrieval of data
high availability, scalability, and performance are essential. These that is modeled
databases can handle large volumes of transactions and provide in means other
real-time inventory and order management. than the tabular
relations used
3.6.3.3. Social Networking in relational
databases.
NoSQL databases are commonly used in social networking
applications, which involve managing large volumes of user-
generated content (Abramova et al., 2013). These databases can
handle complex relationships between users and provide real-time
updates.

3.6.3.4. Internet of Things (IoT)


NoSQL databases are used in IoT applications, where large amounts
of data are generated from sensors and devices. These databases
can handle high volumes of data and provide real-time analysis
and insights.

3.6.3.5. Content Management


NoSQL databases are used in content management applications,
where flexible schema and high performance are essential. These
databases can handle large volumes of unstructured data and
provide real-time search and retrieval (Chandra, 2015).

CHAPTER
3
92 Fundamentals of Database Systems

3.6.3.6. Gaming
NoSQL databases are commonly used in gaming applications, which
involve managing large amounts of user data, such as profiles,
preferences, and game progress. These databases can provide
real-time updates and ensure high availability and performance.

3.6.3.7. Mobile Applications


KEYWORD
NoSQL databases are used in mobile applications, where high
Real-time performance and scalability are essential. These databases can
synchronization handle large volumes of data and provide real-time synchronization
is the capability to and offline support (Membrey et al., 2010).
quickly update the
latest changes by
a central system, 3.6.4. Advantages of NoSQL Databases
often from multiple
channels. NoSQL databases offer several advantages over traditional relational
databases which are discussed in subsections.

3.6.4.1. Scalability
NoSQL databases are designed to scale horizontally, which means
that they can easily handle large amounts of data by adding more
nodes to a cluster. This makes them ideal for applications that
require high levels of scalability, such as social media platforms,
e-commerce websites, and online gaming platforms.

3.6.4.2. Flexibility
NoSQL databases are highly flexible and can easily accommodate
changes in data structures and data types (Han et al., 2011). This
makes them ideal for applications that require frequent updates
and modifications, such as CMS, mobile apps, and IoT devices.

3.6.4.3. High Performance


NoSQL databases are optimized for read and write performance,
making them ideal for applications that require fast and efficient
data access (Leavitt, 2010). This includes real-time analytics,
financial trading systems, and high-speed trading platforms.

CHAPTER
3
Types of Databases 93

3.6.4.4. Cost-Effective
NoSQL databases are often more cost-effective than traditional
relational databases, as they do not require expensive hardware
or software licenses (Okman et al., 2011). They are also highly
scalable, which means that they can handle large amounts of data
without requiring additional infrastructure.

3.6.4.5. Availability
KEYWORD
Software license
NoSQL databases are designed for high availability, which means
is a document that
that they can continue to operate even if some nodes in the cluster
provides legally
fail (Nayak et al., 2013). This makes them ideal for applications binding guidelines
that require high levels of uptime and reliability, such as online for the use and
banking and e-commerce platforms. distribution of
software.
3.6.4.6. Schema-Less Design
NoSQL databases do not enforce a fixed schema, allowing for a
more flexible data model. This makes it easier to store and access
unstructured or semi-structured data, such as JSON documents.
Overall, the advantages of NoSQL databases make them an
attractive option for many different types of applications, particularly
those that require high levels of scalability, performance, and
availability.

3.6.5. Disadvantages of NoSQL Databases


NoSQL databases also have certain disadvantages, which are
discussed in subsections.

3.6.5.1. Limited Query Capability


NoSQL databases do not support complex queries as compared
to SQL databases. They are designed to provide quick access to
large amounts of data, but when it comes to data analysis, they
may not be the best fit (Sicari et al., 2022).

3.6.5.2. Limited Support for ACID Transactions


ACID stands for Atomicity, Consistency, Isolation, and Durability.
While NoSQL databases provide high availability and scalability,
they do not offer the same level of transactional consistency as
SQL databases (Tauro et al., 2012).
CHAPTER
3
94 Fundamentals of Database Systems

3.6.5.3. Lack of Standardization


Unlike SQL databases, NoSQL databases lack a standard language
for querying and accessing data. Each database has its own set
of APIs and query languages, which can be difficult for developers
to learn (Meier & Kaufmann, 2019).

KEYWORD 3.6.5.4. Lack of Maturity

Data breach is a NoSQL databases are relatively new compared to SQL databases,
security violation, which have been around for decades. This means that NoSQL
in which sensitive, databases may not have the same level of maturity or stability
protected or as SQL databases.
confidential data is
copied, transmitted,
viewed, stolen, 3.6.5.5. Limited Community Support
altered or used As NoSQL databases are relatively new, the community support
by an individual around them may be limited (Han et al., 2011). This can make it
unauthorized to do difficult for developers to find resources and solutions when facing
so.
problems.

3.6.5.6. Security Concerns


As with any database, NoSQL databases can be vulnerable to
security threats such as hacking and data breaches. However,
NoSQL databases may be more vulnerable than SQL databases
due to their lack of standardization and the fact that they often
store large amounts of unstructured data.

3.7. DOCUMENT DATABASES


Document databases are a type of NoSQL database that are
specifically designed to store and manage semi-structured and
unstructured data (Indermühle et al., 2010). Unlike traditional
relational databases, which rely on a rigid schema to organize data,
document databases allow for flexible and dynamic data modeling.
In document databases, data is stored as JSON (JavaScript
object notation) documents. These documents can be nested and
can include arrays and other complex data structures. This allows
for more natural and intuitive data modeling, since the structure of
the data can be tailored to the specific needs of the application
(Figure 3.15).

CHAPTER
3
Types of Databases 95

Figure 3.15.
Illustration of
the document
database.

Source: Alex Williams, Creative Commons License.

Document databases have gained popularity in recent years


due to the rise of web and mobile applications that generate
and consume large amounts of unstructured data, such as social
media posts, sensor data, and user-generated content (O’Gorman
& Kasturi, 1995). Document databases provide a scalable and
efficient way to store and manage this data, while also allowing
for flexible querying and analysis.
Some popular examples of document databases include
MongoDB, Couchbase, and Amazon DocumentDB. These databases
are often used in e-commerce, content management, and real-
time analytics applications. Overall, document databases offer a
powerful and versatile tool for managing large and complex data
sets in modern applications.

3.7.1. History Document Databases


Document databases, also known as document-oriented databases,
are a type of NoSQL database that stores and manages unstructured
data in the form of documents. Document databases have a relatively
recent history, emerging in the early 2000s as an alternative
to traditional relational databases for handling large volumes of
unstructured data (Navarro & Baeza-Yates, 1997).
The first document database, Lotus Notes, was developed by
Lotus Development Corporation in 1989. It was initially designed
as an email and messaging system, but it evolved to become a
platform for building collaborative applications. Lotus Notes stored
data in documents, which could contain rich text, images, and other
CHAPTER
3
96 Fundamentals of Database Systems

types of data. Documents could be organized in a hierarchical


structure, and users could search and retrieve documents using
full-text search.
In the early 2000s, document databases began to emerge as
a new type of NoSQL database. Apache CouchDB, one of the first
popular document databases, was released in 2005. CouchDB
was designed to be scalable, fault-tolerant, and easy to use, and
it was particularly well-suited for web applications that needed
to store and retrieve large volumes of JSON data (AbuSafiya &
KEYWORD Mazumdar, 2004).

Document Since then, a number of other document databases have


database is a emerged, including MongoDB, RavenDB, and Couchbase. These
database that databases have become popular for a variety of use cases, including
stores information content management, e-commerce, social media, and online gaming.
in documents.

3.7.2. Importance Document Databases


Document databases are becoming increasingly popular due to
their ability to store unstructured data in a scalable and efficient
manner. In traditional databases, data is typically stored in tables
with predefined schemas, which can make it difficult to store
and manage data that does not fit neatly into these structures.
Document databases, on the other hand, are designed to store
and manage data as flexible, schema-less documents, which can
be easily updated and expanded as needed.
This flexibility makes document databases particularly well-suited
for applications that handle large amounts of unstructured data,
such as CMS, social media platforms, and e-commerce websites
(Zantout & Marir, 1999). With document databases, developers
can easily store and retrieve data in a way that is intuitive and
natural, without having to worry about the limitations imposed by
traditional database schemas.
Additionally, document databases offer excellent performance
and scalability, thanks to their ability to distribute data across multiple
nodes and clusters. This makes them an ideal choice for applications
that need to handle large volumes of data and scale quickly as
demand grows. Overall, the importance of document databases
lies in their ability to provide a flexible, scalable, and efficient
way to store and manage unstructured data, which is increasingly
becoming a critical part of modern application development.

CHAPTER
3
Types of Databases 97

3.7.3. Applications of Document Databases


Document databases are increasingly becoming popular due to
their flexibility in handling complex and unstructured data (Kawell
et al., 1988). They are suitable for various applications that require
fast and efficient data processing. Some of the applications of
document databases are discussed in subsections.

3.7.3.1. Content Management Systems (CMS)


Content management systems (CMSs) rely heavily on document
databases to manage, store, and retrieve vast amounts of
unstructured data such as images, videos, and text (Figure 3.16).

Figure 3.16.
Illustration of the
content management
system.

Source: Pieter Arntz, Creative Commons License.

3.7.3.2. E-Commerce Applications


Document databases are used to store and retrieve product data,
customer orders, and transactional data, enabling e-commerce
sites to provide a fast and responsive user experience (Moffat et
al., 1997).

3.7.3.3. IoT Applications


Internet of Things (IoT) devices generate large volumes of data,
which is typically unstructured. Document databases can be used
CHAPTER
3
98 Fundamentals of Database Systems

to store and process this data, allowing IoT applications to provide


real-time insights and responses.

3.7.3.4. Mobile Applications


Mobile apps often require fast and reliable access to data stored
in a remote database. Document databases provide a flexible data
model and fast read/write operations that are well-suited to the
requirements of mobile apps (Joshi et al., 2002).
KEYWORD
3.7.3.5. Real-Time Analytics
Social media
analytics is Document databases enable businesses to quickly process and
the ability to analyze large amounts of unstructured data in real-time (Han et
gather and find al., 2011). This capability is useful in applications such as fraud
meaning in data detection, social media analytics, and sentiment analysis.
gathered from
social channels to
support business 3.7.3.6. Scientific Research
decisions — and Document databases are used in scientific research to store and
measure the analyze complex data such as gene sequences, climate data, and
performance of
astronomical data (Nayak et al., 2013).
actions based on
those decisions Overall, document databases are suitable for applications that
through social require flexibility in handling complex and unstructured data, fast
media. read/write operations, and real-time data processing and analysis.

3.7.4. Advantages of Document Databases


Document databases offer several advantages, which are discussed
in subsections.

3.7.4.1. Flexible Data Modeling


Document databases allow flexible data modeling, enabling
businesses to store complex and unstructured data in a structured
format. This makes it easier to manage and process data and to
scale the database system as the business grows.

3.7.4.2. Scalability
Document databases are highly scalable, both horizontally and
vertically (Clifton & Garcie-Molina, 2000). Horizontal scalability
is achieved by distributing data across multiple servers, while
CHAPTER
3
Types of Databases 99

vertical scalability involves increasing the processing power of a


single server.

3.7.4.3. High Performance KEYWORD


Document databases can handle large amounts of data and Traditional
complex queries quickly and efficiently. They use advanced indexing database is based
techniques to improve query performance and support fast read on a fixed schema
and write operations (Walker, 1988). that is static in
nature. It could
only work with
3.7.4.4. Easy to Use
structured data
Document databases are often easier to use than traditional that fit effortlessly
relational databases. They do not require a predefined schema, into relational
so developers can start building applications immediately without databases or
having to design a schema. tables.

3.7.4.5. Developer Productivity


Document databases allow developers to work with data in the
same format as their programming language, reducing the need
for data mapping and conversion (Ha & Shichkina, 2022). This
increases developer productivity and reduces development time.

3.7.4.6. Cost-Effective
Document databases can be less expensive to run than traditional
databases. They require less hardware, and the open-source
versions are often free to use.

3.7.4.7. Availability
Document databases are highly available, with many offering
automatic failover and replication capabilities (Liansheng, 2000).
This ensures that data is always accessible, even in the event of
a server failure.

3.7.4.8. Schema Evolution


Document databases allow for schema evolution, which means that
the database schema can evolve over time as the data changes.
This reduces the need for downtime or complex schema migrations
(Bouaziz et al., 2019).

CHAPTER
3
100 Fundamentals of Database Systems

Overall, Document databases are a powerful and flexible


solution for managing complex and unstructured data. They offer
high performance, scalability, and availability, making them an
excellent choice for modern applications that require a flexible
KEYWORD data model.
Traditional
analytics is the 3.7.5. Disadvantages of Document Databases
process of sorting
through enormous Document databases, like any other database system, have their own
data sets to find, set of advantages and disadvantages. Some of the disadvantages
understand, and of document databases are discussed in subsections.
communicate new
information.
3.7.5.1. Lack of Standardized Query Language
Unlike relational databases, document databases lack a standardized
query language. This can make it difficult for users to interact with
the database, as they must learn a new query language or use
a specialized interface to interact with the data.

3.7.5.2. Limited Scalability


Document databases are generally less scalable than other types
of databases, which can limit their use in large-scale applications
(Popescul et al., 2000). This is because document databases
often store data in a single machine or cluster, making it difficult
to scale horizontally.

3.7.5.3. Limited Support for Transactions


Many document databases lack support for transactions, making
it difficult to maintain data integrity and consistency. This can
be particularly problematic for applications that require ACID
compliance.

3.7.5.4. Limited Support for Analytics


Document databases are often less suitable for data analysis and
reporting than relational databases. This is because document
databases lack the structured data necessary for traditional analytics
tools and reporting systems.

CHAPTER
3
Types of Databases 101

3.7.5.5. Data Duplication


Document databases often duplicate data across multiple documents,
which can lead to redundancy and wasted storage space (Chen
& Lynch, 1992). This can also make it more difficult to maintain
data consistency and integrity.

ACTIVITY 3.1.
You work for a transportation company that manages a large fleet of vehicles and
needs to track the routes, schedules, and maintenance history of each vehicle.
Describe how you would use a graph database to store and manage this data, and
how it would improve performance and efficiency compared to a relational database.

CHAPTER
3
102 Fundamentals of Database Systems

3.8. SUMMARY
The chapter provides an introduction to different types of database systems, including
hierarchical, network, relational, object-oriented, graph, document, and NoSQL databases.
Each database system has its strengths and weaknesses and is suitable for specific
applications based on factors like data structure, scalability, performance, and cost.
Hierarchical databases are often used in mainframe applications, network databases for
modeling complex data relationships, relational databases for structured data in OLTP
and BI, OODBs for complex data types, graph databases for complex relationship data,
document databases for unstructured or semi-structured data in web applications, and
NoSQL databases for handling large volumes of unstructured or semi-structured data in
big data applications or real-time analytics. Selecting the appropriate database system
requires careful consideration of these factors.

REVIEW QUESTIONS
1. What is a database system, and what is its purpose?
2. What are some common types of database systems, and how do they differ from
one another?
3. What are some advantages of using a hierarchical database system, and in what
types of applications is it commonly used?
4. What are some advantages of using a network database system, and in what
types of applications is it commonly used?
5. What is a relational database system, and how does it differ from other types of
database systems?
6. What are some advantages of using a relational database system, and in what
types of applications is it commonly used?
7. What is an object-oriented database system, and in what types of applications
is it commonly used?
8. What is a graph database system, and in what types of applications is it commonly
used?
9. What is a document database system, and in what types of applications is it
commonly used?
10. What is a NoSQL database system, and in what types of applications is it
commonly used?

MULTIPLE CHOICE QUESTIONS


1. Which type of database system is often used in mainframe applications?
a. Hierarchical databases

CHAPTER
3
Types of Databases 103

b. Network databases
c. Relational databases
d. Object-oriented databases
2. Which type of database system is useful for modeling complex relationships
between data?
a. Hierarchical databases
b. Network databases
c. Relational databases
d. Graph databases
3. Which type of database system organizes data into tables with rows and
columns?
a. Hierarchical databases
b. Network databases
c. Relational databases
d. Object-oriented databases
4. Which type of database system is designed to store complex data types
such as objects, classes, and inheritance hierarchies?
a. Hierarchical databases
b. Network databases
c. Relational databases
d. Object-oriented databases
5. Which type of database system is designed to store and query data that has
complex relationships, such as social networks or supply chain systems?
a. Graph databases
b. Document databases
c. NoSQL databases
d. Relational databases
6. Which type of database system stores unstructured or semi-structured data
in documents?
a. Graph databases
b. Document databases
c. NoSQL databases
d. Hierarchical databases
7. Which type of database system is often used in web applications and content
management systems?
a. Graph databases
CHAPTER
3
104 Fundamentals of Database Systems

b. Document databases
c. NoSQL databases
d. Object-oriented databases
8. Which type of database system is designed to handle large volumes of
unstructured or semi-structured data?
a. Hierarchical databases
b. Network databases
c. Relational databases
d. NoSQL databases
9. Which type of database system is widely used for storing structured data
and is popular in applications such as online transaction processing (OLTP)
and business intelligence (BI)?
a. Hierarchical databases
b. Network databases
c. Relational databases
d. Object-oriented databases
10. Which database system uses nodes and edges to represent data and can
perform complex queries quickly?
a. Graph databases
b. Document databases
c. NoSQL databases
d. Hierarchical databases
11. Which database system organizes data in a more complex network-like
structure, with each record having multiple parents and children?
a. Hierarchical databases
b. Network databases
c. Relational databases
d. Object-oriented databases
12. Which type of database system is often used in object-oriented programming
languages such as Java or C++?
a. Hierarchical databases
b. Network databases
c. Relational databases
d. Object-oriented databases

CHAPTER
3
Types of Databases 105

Answers to Multiple Choice Questions


1. (a); 2. (b); 3. (c); 4. (d); 5. (a); 6. (b); 7. (b); 8. (d); 9 (c); 10. (a); 11. (b); 12. (d)

REFERENCES
1. Abiteboul, S., & Hull, R., (1986). Restructuring hierarchical database objects.
Theoretical Computer Science, 62(1, 2), 3–38.
2. Abramova, V., & Bernardino, J., (2013). NoSQL databases: MongoDB vs Cassandra.
In: Proceedings of the International C* Conference on Computer Science and Software
Engineering (Vol. 1, pp. 14–22).
3. AbuSafiya, M., & Mazumdar, S., (2004). Accommodating paper in document databases.
In: Proceedings of the 2004 ACM Symposium on Document Engineering (Vol. 1,
pp. 155–162).
4. Albano, A., Ghelli, G., & Orsini, R., (1989). Types for databases: The Galileo
experience. In: Proceedings of the Second International Workshop on Database
Programming Languages (Vol. 1, pp. 196–206).
5. Alonso, R., & Korth, H. F., (1993). Database system issues in nomadic computing.
In: Proceedings of the 1993 ACM SIGMOD International Conference on Management
of Data (Vol. 1, pp. 388–392).
6. Angles, R., Arenas, M., Barceló, P., Hogan, A., Reutter, J., & Vrgoč, D., (2017).
Foundations of modern query languages for graph databases. ACM Computing
Surveys (CSUR), 50(5), 1–40.
7. Atkinson, M., Dewitt, D., Maier, D., Bancilhon, F., Dittrich, K., & Zdonik, S., (1990).
The object-oriented database system manifesto. In: Deductive and Object-Oriented
Databases (Vol. 1, pp. 223–240). North-Holland.
8. Atzeni, P., & De Antonellis, V., (1993). Relational Database Theory (Vol. 1, pp. 2–5).
Benjamin-Cummings Publishing Co., Inc.
9. Bader, G. D., Betel, D., & Hogue, C. W., (2003). BIND: The biomolecular interaction
network database. Nucleic Acids Research, 31(1), 248–250.
10. Banerjee, J., Hsiao, D. K., & Ng, F. K., (1980). Database transformation, query
translation, and performance analysis of a new database computer in supporting
hierarchical database management. IEEE Transactions on Software Engineering,
(1), 91–109.
11. Batra, S., & Tyagi, C., (2012). Comparative analysis of relational and graph databases.
International Journal of Soft Computing and Engineering (IJSCE), 2(2), 509–512.
12. Beeri, C., (1990). Formal models for object-oriented databases. In: Deductive and
Object-Oriented Databases (Vol. 1, pp. 405–430). North-Holland.
13. Bernstein, P. A., (1998). Repositories and object-oriented databases. ACM SIGMOD
Record, 27(1), 88–96.
CHAPTER
3
106 Fundamentals of Database Systems

14. Bertino, E., & Martino, L., (1991). Object-oriented database management systems:
Concepts and issues. Computer, 24(4), 33–47.
15. Bhat, U., & Jadhav, S., (2010). Moving towards non-relational databases. International
Journal of Computer Applications, 1(13), 40–47.
16. Black, J., Ellis, T., & Makris, D., (2004). A hierarchical database for visual surveillance
applications. In: 2004 IEEE International Conference on Multimedia and Expo (ICME)
(IEEE Cat. No. 04TH8763) (Vol. 3, pp. 1571–1574). IEEE.
17. Blasgen, M. W., & Eswaran, K. P., (1977). Storage and access in relational data
bases. IBM Systems Journal, 16(4), 363–377.
18. Bordoloi, S., & Kalita, B., (2013). Designing graph database models from existing
relational databases. International Journal of Computer Applications, 74(1), 2–7.
19. Bouaziz, S., Nabli, A., & Gargouri, F., (2019). Design a data warehouse schema
from document-oriented database. Procedia Computer Science, 159, 221–230.
20. Bouganim, L., Florescu, D., & Valduriez, P., (1996). Dynamic Load Balancing in
Hierarchical Parallel Database Systems (Doctoral Dissertation, INRIA), 3(1), 5–10.
21. Brodie, M. L., (1980). The application of data types to database semantic integrity.
Information Systems, 5(4), 287–296.
22. Castellana, V. G., Morari, A., Weaver, J., Tumeo, A., Haglin, D., Villa, O., & Feo,
J., (2015). In-memory graph databases for web-scale data. Computer, 48(3), 24–35.
23. Cellary, W., & Jomier, G., (1990). Consistency of versions in object-oriented databases.
In: VLDB (Vol. 90, pp. 432–41).
24. Chandra, D. G., (2015). BASE analysis of NoSQL database. Future Generation
Computer Systems, 52(1), 13–21.
25. Chen, H., & Lynch, K. J., (1992). Automatic construction of networks of concepts
characterizing document databases. IEEE Transactions on Systems, Man, and
Cybernetics, 22(5), 885–902.
26. Chou, T. S., Yen, K. K., & Luo, J., (2008). Network intrusion detection design using
feature selection of soft computing paradigms. International Journal of Computer
and Information Engineering, 2(11), 3722–3734.
27. Chung, W. Y., Yu, P. S., & Huang, C. J., (2013). Cloud computing system based
on wireless sensor network. In: 2013 Federated Conference on Computer Science
and Information Systems (Vol. 1, pp. 877–880). IEEE.
28. Clifton, C., & Garcie-Molina, H., (2000). The design of a document database. In:
Proceedings of the ACM Conference on Document Processing Systems (Vol. 1, pp.
125–134).
29. Codd, E. F., (2007). Relational database: A practical foundation for productivity. In:
ACM Turing Award Lectures (Vol. 1, p. 1981).
30. De Almeida, V. T., & Güting, R. H., (2005). Supporting uncertainty in moving objects
in network databases. In: Proceedings of the 13th Annual ACM International Workshop
on Geographic Information Systems (Vol. 1, pp. 31–40).
CHAPTER
3
Types of Databases 107

31. De Caluwe, R., (1997). Fuzzy and Uncertain Object-Oriented Databases: Concepts
and Models (Vol. 13. Pp. 2–9). World Scientific.
32. Debrouvier, A., Parodi, E., Perazzo, M., Soliani, V., & Vaisman, A., (2021). A model
and query language for temporal graph databases. The VLDB Journal, 30(5), 825–858.
33. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L., (2009). ImageNet: A
large-scale hierarchical image database. In: 2009 IEEE Conference on Computer
Vision and Pattern Recognition (Vol. 1, pp. 248–255). IEEE.
34. Deniša, M., & Ude, A., (2015). Synthesis of new dynamic movement primitives through
search in a hierarchical database of example movements. International Journal of
Advanced Robotic Systems, 12(10), 137.
35. DeWitt, D. J., Futtersack, P., Maier, D., & Velez, F., (1990). A Study of Three
Alternative Workstation-Server Architectures for Object Oriented Database Systems
(Vol. 1, pp. 2–9). University of Wisconsin-Madison Department of Computer Sciences.
36. Domdouzis, K., Lake, P., & Crowther, P., (2021). Hierarchical databases. In: Concise
Guide to Databases: A Practical Introduction (Vol. 1, pp. 205–212). Cham: Springer
International Publishing.
37. Dunham, M. H., & Helal, A., (1995). Mobile computing and databases: Anything
new? ACM SIGMOD Record, 24(4), 5–9.
38. Dy-Liacco, T. E., (1994). Modern control centers and computer networking. IEEE
Computer Applications in Power, 7(4), 17–22.
39. Gavish, B., & Pirkul, H., (1986). Computer and database location in distributed
computer systems. IEEE Transactions on Computers, 35(07), 583–590.
40. Gotthard, W., Lockemann, P. C., & Neufeld, A., (1992). System-guided view integration
for object-oriented databases. IEEE Transactions on Knowledge and Data Engineering,
4(1), 1–22.
41. Gray, P. M., Kulkarni, K. G., & Paton, N. W., (1992). Object-Oriented Databases: A
Semantic Data Model Approach (Vol. 1, pp. 2–8). Prentice-Hall, Inc.
42. Güting, R. H., & Schneider, M., (1993). Realms: A foundation for spatial data types in
database systems. In: Advances in Spatial Databases: Third International Symposium,
SSD’93 Singapore, June 23–25, 1993 Proceedings 3 (Vol. 1, pp. 14–35). Springer
Berlin Heidelberg.
43. Ha, M., & Shichkina, Y., (2022). Translating a distributed relational database to a
document database. Data Science and Engineering, 7(2), 136–155.
44. Han, J., Haihong, E., Le, G., & Du, J., (2011). Survey on NoSQL database. In: 2011
6th International Conference on Pervasive Computing and Applications (Vol. 1, pp.
363–366). IEEE.
45. Han, J., Song, M., & Song, J., (2011). A novel solution of distributed memory NoSQL
database for cloud computing. In: 2011 10th IEEE/ACIS International Conference on
Computer and Information Science (Vol. 1, pp. 351–355). IEEE.

CHAPTER
3
108 Fundamentals of Database Systems

46. Haseeb, A., & Pattun, G., (2017). A review on NoSQL: Applications and challenges.
International Journal of Advanced Research in Computer Science, 8(1), 2–6.
47. Hesse, B. W., Sproull, L. S., Kiesler, S. B., & Walsh, J. P., (1993). Returns to science:
Computer networks in oceanography. Communications of the ACM, 36(8), 90–101.
48. Ho, J. S., & Akyildiz, I. F., (1997). Dynamic hierarchical database architecture for
location management in PCS networks. IEEE/ACM Transactions on Networking,
5(5), 646–660.
49. Hsu, M., & Madnick, S. E., (1983). Hierarchical database decomposition: A technique
for database concurrency control. In: Proceedings of the 2nd ACM SIGACT-SIGMOD
Symposium on Principles of Database Systems (Vol. 1, pp. 182–191).
50. Hurson, A. R., Pakzad, S. H., & Cheng, J. B., (1993). Object-oriented database
management systems: Evolution and performance issues. Computer, 26(2), 48–58.
51. Imieliński, T., & Lipski, Jr. W., (1984). Incomplete information in relational databases.
Journal of the ACM (JACM), 31(4), 761–791.
52. Indermühle, E., Liwicki, M., & Bunke, H., (2010). IAMonDo-database: An online
handwritten document database with non-uniform contents. In: Proceedings of the 9th
IAPR International Workshop on Document Analysis Systems (Vol. 1, pp. 97–104).
53. Ipeirotis, P. G., & Gravano, L., (2002). Distributed search over the hidden web:
Hierarchical database sampling and selection. In: VLDB’02: Proceedings of the 28th
International Conference on Very Large Databases (Vol. 1, pp. 394–405). Morgan
Kaufmann.
54. Jaiswal, G., & Agrawal, A. P., (2013). Comparative analysis of relational and graph
databases. IOSR Journal of Engineering (IOSRJEN), 3(8), 25–27.
55. Jatana, N., Puri, S., Ahuja, M., Kathuria, I., & Gosain, D., (2012). A survey and
comparison of relational and non-relational database. International Journal of
Engineering Research & Technology, 1(6), 1–5.
56. Jin, R., Hong, H., Wang, H., Ruan, N., & Xiang, Y., (2010). Computing label-
constraint reachability in graph databases. In: Proceedings of the 2010 ACM SIGMOD
International Conference on Management of Data (Vol. 1, pp. 123–134).
57. Jindal, G., & Bali, S., (2012). Hierarchical model leads to the evolution of relational
model. International Journal of Engineering and Management Research (IJEMR),
2(4), 11–14.
58. Joseph, J. V., Thatte, S. M., Thompson, C. W., & Wells, D. L., (1991). Object-oriented
databases: Design and implementation. Proceedings of the IEEE, 79(1), 42–64.
59. Joshi, J. B., Li, Z. K., Fahmi, H., Shafiq, B., & Ghafoor, A., (2002). A model for
secure multimedia document database system in a distributed environment. IEEE
Transactions on Multimedia, 4(2), 215–234.
60. Jouili, S., & Vansteenberghe, V., (2013). An empirical comparison of graph databases.
In: 2013 International Conference on Social Computing (Vol. 1, pp. 708–715). IEEE.
61. Kawell, Jr. L., Beckhardt, S., Halvorsen, T., Ozzie, R., & Greif, I., (1988). Replicated
CHAPTER
3
Types of Databases 109

document management in a group communication system. In: Proceedings of the


1988 ACM Conference on Computer-Supported Cooperative Work (Vol. 1, p. 395).
62. Kearns, J. P., & DeFazio, S., (1983). Locality of reference in hierarchical database
systems. IEEE Transactions on Software Engineering, (2), 128–134.
63. Kim, W., (1990). Object-oriented databases: Definition and research directions. IEEE
Transactions on Knowledge and Data Engineering, 2(3), 327–341.
64. Kim, W., Banerjee, J., Chou, H. T., & Garza, J. F., (1990). Object-oriented database
support for CAD. Computer-Aided Design, 22(8), 469–479.
65. Kim, W., Banerjee, J., Chou, H. T., Garza, J. F., & Woelk, D., (1987). Composite
object support in an object-oriented database system. In: Conference Proceedings
on Object-Oriented Programming Systems, Languages and Applications (Vol. 1, pp.
118–125).
66. Kolahdouzan, M., & Shahabi, C., (2004). Voronoi-based k nearest neighbor search for
spatial network databases. In: Proceedings of the Thirtieth International Conference
on Very Large Data Bases (Vol. 1, 30, pp. 840–851).
67. Kumar, K. R., (2015). Graph databases: A survey. In: International Conference on
Computing, Communication & Automation (Vol. 1, pp. 785–790). IEEE.
68. Lai, Y. P., & Hsia, P. L., (2007). Using the vulnerability information of computer systems
to improve the network security. Computer Communications, 30(9), 2032–2047.
69. Larson, J. A., (1983). Bridging the gap between network and relational database
management systems. Computer, 16(09), 82–92.
70. Leavitt, N., (2000). Whatever happened to object-oriented databases? Computer,
33(08), 16–19.
71. Leavitt, N., (2010). Will NoSQL databases live up to their promise? Computer, 43(2),
12–14.
72. Lee, J., Han, W. S., Kasperovics, R., & Lee, J. H., (2012). An in-depth comparison
of subgraph isomorphism algorithms in graph databases. Proceedings of the VLDB
Endowment, 6(2), 133–144.
73. Levene, M., & Loizou, G., (2012). A Guided Tour of Relational Databases and
Beyond. Springer Science & Business Media.
74. Liansheng, M., (2000). Document database construction in China in the 1990s: A
review of developments. The Electronic Library, 18(3), 210–215.
75. Liao, Y. T., Zhou, J., Lu, C. H., Chen, S. C., Hsu, C. H., Chen, W., & Chung, Y.
C., (2016). Data adapter for querying and transformation between SQL and NoSQL
database. Future Generation Computer Systems, 65, 111–121.
76. Maier, D., (1983). The Theory of Relational Databases (Vol. 1, 11, pp. 2–4). Rockville:
Computer science press.
77. Meier, A., & Kaufmann, M., (2019). SQL & NoSQL Databases (Vol. 1, pp. 6–9).
Berlin/Heidelberg, Germany: Springer Fachmedien Wiesbaden.

CHAPTER
3
110 Fundamentals of Database Systems

78. Meier, A., Dippold, R., Mercerat, J., Muriset, A., Untersinger, J. C., Eckerlin, R., &
Ferrara, F., (1994). Hierarchical to relational database migration. IEEE Software,
11(3), 21–27.
79. Meier, A., Kaufmann, M., Meier, A., & Kaufmann, M., (2019). NoSQL databases. SQL
& NoSQL Databases: Models, Languages, Consistency Options and Architectures
for Big Data Management, 1, 201–218.
80. Membrey, P., Plugge, E., Hawkins, T., & Hawkins, D., (2010). The Definitive Guide
to MongoDB: The NoSQL Database for Cloud and Desktop Computing (Vol. 1, pp.
3–8). Springer.
81. Mendelzon, A. O., & Wood, P. T., (1995). Finding regular simple paths in graph
databases. SIAM Journal on Computing, 24(6), 1235–1258.
82. Mishra, P., & Eich, M. H., (1992). Join processing in relational databases. ACM
Computing Surveys (CSUR), 24(1), 63–113.
83. Moffat, A., Zobel, J., & Sharman, N., (1997). Text compression for dynamic document
databases. IEEE Transactions on Knowledge and Data Engineering, 9(2), 302–313.
84. Moniruzzaman, A. B. M., & Hossain, S. A., (2013). NoSQL Database: New Era of
Databases for Big Data Analytics-Classification, Characteristics and Comparison
(Vol. 1, pp. 4–9).
85. Moszer, I., Glaser, P., & Danchin, A., (1995). SubtiList: A relational database for the
Bacillus subtilis genome. Microbiology, 141(2), 261–268.
86. Murty, R., Chandra, R., Moscibroda, T., & Bahl, P., (2011). Senseless: A database-
driven white spaces network. IEEE Transactions on Mobile Computing, 11(2), 189–203.
87. Nakada, H., Sato, M., & Sekiguchi, S., (1999). Design and implementations of Ninf:
Towards a global computing infrastructure. Future Generation Computer Systems,
15(5, 6), 649–658.
88. Navarro, G., & Baeza-Yates, R., (1997). Proximal nodes: A model to query document
databases by content and structure. ACM Transactions on Information Systems
(TOIS), 15(4), 400–435.
89. Nayak, A., Poriya, A., & Poojary, D., (2013). Type of NOSQL databases and its
comparison with relational databases. International Journal of Applied Information
Systems, 5(4), 16–19.
90. Nicklin, P. J., Powell, G. H., & Hollings, J. P., (1985). Hierarchical data management
for structural analysis. Engineering with Computers, 1, 45–54.
91. O’Gorman, L., & Kasturi, R., (1995). Document Image Analysis (Vol. 39, pp. 3–9).
Los Alamitos: IEEE Computer Society Press.
92. Ogle, V. E., & Stonebraker, M., (1995). Chabot: Retrieval from a relational database
of images. Computer, 28(9), 40–48.
93. Ohori, A., (1990). Semantics of types for database objects. Theoretical Computer
Science, 76(1), 53–91.
94. Okman, L., Gal-Oz, N., Gonen, Y., Gudes, E., & Abramov, J., (2011). Security issues
CHAPTER
3
Types of Databases 111

in NoSQL databases. In: 2011IEEE 10th International Conference on Trust, Security


and Privacy in Computing and Communications (Vol. 1, pp. 541–547). IEEE.
95. Orenstein, J. A., (1986). Spatial query processing in an object-oriented database
system. In: Proceedings of the 1986 ACM SIGMOD International Conference on
Management of Data (Vol. 1, pp. 326–336).
96. Outrata, J., & Vychodil, V., (2012). Fast algorithm for computing fixpoints of Galois
connections induced by object-attribute relational data. Information Sciences, 185(1),
114–127.
97. Padmanabhan, P., Gruenwald, L., Vallur, A., & Atiquzzaman, M., (2008). A survey
of data replication techniques for mobile ad hoc network databases. The VLDB
Journal, 17(1), 1143–1164.
98. Padmanabhan, S., Malkemus, T., Jhingran, A., & Agarwal, R., (2001). Block oriented
processing of relational database operations in modern computer architectures. In:
Proceedings 17th International Conference on Data Engineering (Vol. 1, pp. 567–574).
IEEE.
99. Papadias, D., Zhang, J., Mamoulis, N., & Tao, Y., (2003). Query processing in spatial
network databases. In: Proceedings 2003 VLDB Conference (Vol. 1, pp. 802–813).
Morgan Kaufmann.
100. Pham, M. C., & Klamma, R., (2010). The structure of the computer science knowledge
network. In: 2010 International Conference on Advances in Social Networks Analysis
and Mining (Vol. 1, pp. 17–24). IEEE.
101. Pokorny, J., (2011). NoSQL databases: A step to database scalability in web
environment. In: Proceedings of the 13th International Conference on Information
Integration and Web-based Applications and Services (Vol. 1, pp. 278–283).
102. Pokorný, J., (2015). Graph databases: Their power and limitations. In: Computer
Information Systems and Industrial Management: 14th IFIP TC 8 International
Conference, CISIM 2015, Warsaw, Poland, September 24–26, 2015, Proceedings
14 (Vol. 1, pp. 58–69). Springer International Publishing.
103. Pokorný, J., Valenta, M., & Kovačič, J., (2017). Integrity constraints in graph databases.
Procedia Computer Science, 109, 975–981.
104. Popescul, A., Flake, G. W., Lawrence, S., Ungar, L. H., & Giles, C. L., (2000).
Clustering and identifying temporal trends in document databases. In: Proceedings
IEEE Advances in Digital Libraries 2000 (Vol. 1, pp. 173–182). IEEE.
105. Ranu, S., & Singh, A. K., (2009). GraphSig: A scalable approach to mining significant
subgraphs in large graph databases. In: 2009 IEEE 25th International Conference
on Data Engineering (Vol. 1, pp. 844–855). IEEE.
106. Robinson, J. M., (2013). Computer-assisted peer review. In: Computer-Assisted
Assessment in Higher Education (Vol. 1, pp. 95–102). Routledge.
107. Roussopoulos, N., (1982). View indexing in relational databases. ACM Transactions
on Database Systems (TODS), 7(2), 258–290.

CHAPTER
3
112 Fundamentals of Database Systems

108. Sato, M., Nakada, H., Sekiguchi, S., Matsuoka, S., Nagashima, U., & Takagi, H.,
(1997). Ninf: A network-based information library for global world-wide computing
infrastructure. In: HPCN Europe (Vol. 1225, pp. 491–502).
109. Schindler, T., (2018). Anomaly Detection in Log Data Using Graph Databases and
Machine Learning to Defend Advanced Persistent Threats (Vol. 1, pp. 4–9).
110. Schneider, M., (1997). Spatial Data Types for Database Systems: Finite Resolution
Geometry for Geographic Information Systems (Vol. 1, pp. 5–9). Berlin, Heidelberg:
Springer Berlin Heidelberg.
111. ShefaliPatil, G., & Bhatia, A., (2014). Graph Databases: An Overview (Vol. 2, pp.
657–660). 1Student, ME Computers, Terna College of Engg., Navi Mumbai.
112. Shimpi, D., & Chaudhari, S., (2012). An overview of graph databases. In: IJCA
Proceedings on International Conference on Recent Trends in Information Technology
and Computer Science (Vol. 1, pp. 16–22).
113. Shokoufandeh, A., Macrini, D., Dickinson, S., Siddiqi, K., & Zucker, S. W., (2005).
Indexing hierarchical structures using graph spectra. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 27(7), 1125–1140.
114. Sicari, S., Rizzardi, A., & Coen-Porisini, A., (2022). Security&privacy issues and
challenges in NoSQL databases. Computer Networks, 1, 108–828.
115. Silberschatz, A., & Kedem, Z., (1980). Consistency in hierarchical database systems.
Journal of the ACM (JACM), 27(1), 72–80.
116. Singh, M., & Kaur, K., (2015). SQL2Neo: Moving health-care data from relational
to graph databases. In: 2015 IEEE International Advance Computing Conference
(IACC) (Vol. 1, pp. 721–725). IEEE.
117. Stefanidis, K., Drosou, M., & Pitoura, E., (2009). You may also like results in relational
databases. In: Proceedings International Workshop on Personalized Access, Profile
Management and Context Awareness: Databases (Vol. 1, pp. 5–7). Lyon, France.
118. Stolte, C., Tang, D., & Hanrahan, P., (2002). Polaris: A system for query, analysis,
and visualization of multidimensional relational databases. IEEE Transactions on
Visualization and Computer Graphics, 8(1), 52–65.
119. Stonebraker, M., (1986). The INGRES Papers: Anatomy of a Relational Database
System (Vol. 1, pp. 8–10). Addison-Wesley Longman Publishing Co., Inc.
120. Tangorra, F., & Chiarolla, D., (1995). A methodology for reverse engineering hierarchical
databases. Information and Software Technology, 37(4), 225–231.
121. Tauro, C. J., Aravindh, S., & Shreeharsha, A. B., (2012). Comparative study of the
new generation, agile, scalable, high performance NOSQL databases. International
Journal of Computer Applications, 48(20), 1–4.
122. Teorey, T. J., Yang, D., & Fry, J. P., (1986). A logical design methodology for relational
databases using the extended entity-relationship model. ACM Computing Surveys
(CSUR), 18(2), 197–222.

CHAPTER
3
Types of Databases 113

123. Tsichritzis, D. C., & Lochovsky, F. H., (1976). Hierarchical data-base management:
A survey. ACM Computing Surveys (CSUR), 8(1), 105–123.
124. Vargo, C. G., Brown, C. E., & Swierenga, S. J., (1992). An evaluation of computer-
supported backtracking in a hierarchical database. In: Proceedings of the Human
Factors Society Annual Meeting (Vol. 36, No. 4, pp. 356–360). Sage CA: Los Angeles,
CA: SAGE Publications.
125. Walker, J. H., (1988). Supporting document development with Concordia. Computer,
21(1), 48–59.
126. Wells, D. L., Blakeley, J. A., & Thompson, C. W., (1992). Architecture of an open
object-oriented database management system. Computer, 25(10), 74–82.
127. Wood, P. T., (2012). Query languages for graph databases. ACM SIGMOD Record,
41(1), 50–60.
128. Wu, C. H., & Chang, T. C., (1991). Protein classification using a neural network
database system. In: Proceedings of the Conference on Analysis of Neural Network
Applications (Vol. 1, pp. 29–41).
129. Wuu, G. T., & Dayal, U., (1992). A uniform model for temporal object-oriented
databases. In: 1992 Eighth International Conference on Data Engineering (Vol. 1,
pp. 584–585). IEEE Computer Society.
130. Zaki, A. K., (2014). NoSQL databases: New millennium database for big data, big
users, cloud computing and its security challenges. International Journal of Research
in Engineering and Technology (IJRET), 3(15), 403–409.
131. Zand, M., Collins, V., & Caviness, D., (1995). A survey of current object-oriented
databases. ACM SIGMIS Database: The Database for Advances in Information
Systems, 26(1), 14–29.
132. Zantout, H., & Marir, F., (1999). Document management systems from current
capabilities towards intelligent information retrieval: An overview. International Journal
of Information Management, 19(6), 471–484.
133. Zdonik, S. B., & Maier, D., (1990). Readings in Object-Oriented Database Systems
(Vol. 1, pp. 2–7). Morgan Kaufmann.
134. Zhang, Z. J., (2017). Graph databases for knowledge management. IT Professional,
19(6), 26–32.

CHAPTER
3
CHAPTER 4

DATABASE MODELING

UNIT INTRODUCTION
Keeping detailed records is essential for every company. Due to the growing importance
of data management in today’s economy, a sizable portion of the global computing
infrastructure is devoted to this task (Hull & King, 1987).
Almost every industry nowadays uses some form of database. There are databases for
storing everything from emails and contacts to sales figures and bank details. The search
continues for effective ways to archive less-structured data, such as domain expertise.
Relational databases and the language used in SQL will be covered in depth in a
subsequent essay. In this introduction, we will cover some of the fundamentals of the
technology, with a highlight on normalizing databases.
Edgar Frank Codd (nearly always referred to as E. F. Codd in practical literature)
initially introduced the idea of relational databases in the IBM research report RJ599,
on August 19, 1969. However, “A Relational Model of Data for Massive Shared Data
Banks,” issued in Communications of the ACM, is the work that is typically regarded as
the foundation of this technology (Angles & Gutierrez, 2008). Online access to the entire
article is limited to its initial section.
For relational database implementations, additional writings by E. F. Codd from the
1970s and 1980s are still regarded as canon. On October 14, 1985, and October 21, 1985,
respectively, two Computer world articles titled “Is Your DBMS Actually Relational?” and
“Does Your DBMS Operate by the Rules?” presented his well-known “Twelve Rules for
Relational Databases.” The 12 rules, which he originally issued in his book “The Relational
116 Fundamentals of Database Systems

Model for Database Management, Version 2,” have subsequently been expanded, and
there are now 333 of them (Ma, 2007).
In order to describe, control, and question the data in the database, which is conveyed
as a string of characters, it is necessary to have a language that complies with all 12 of
Codd’s rules. All of the top relational database providers have embraced the language,
known as SQL, which was first created at the research division of IBM (originally in
Yorktown Heights, New York, and then in San Jose, California) (Teorey et al., 2011).
Structured Query Language was what the acronym SQL originally stood for. SEQUEL is
short for Sequential English Query Language and was the name of the first commercially
available language implementation, which was a component of IBM’s SEQUEL/DS product.
For legal concerns, the name was altered after sometime. The pronunciation “see-quell”
is therefore used by many veteran database developers (Peckham & Maryanski, 1988).
As an ANSI/ISO standard, SQL has been accepted. Despite being updated in 1999
(sometimes referred to as SQL99 or SQL3), the majority of suppliers still do not adhere
to the 1992 version of the customary in full. Because the 1992 standard is shorter and
easier for users to reference, and because only few of the 1999 explicit necessities are
usually applied at this time, it might be a preferable place to start while learning the
language (Figure 4.1).

Figure 4.1. Illustration of working of database modelling.

Source: Spice works, Creative Commons License.

Learning Objectives
At the end of this chapter, readers will be able to:
• Define data modeling and its importance in database design.
• Explain the entity relationship model and how it represents the relationships
between data objects.
CHAPTER
4
Database Modeling 117

• Describe the part of data modeling in the larger context of database design.
• Identify data objects and their associations using the entity association model.
• Develop a basic schema for a database using the entity relationship model.
• Refine the entity relationship diagram by adding detail and clarity.
• Understand the purpose and function of primary and foreign keys in a database
schema.
• Add attributes to the entity relationship model to more accurately represent the
data objects.
• Define generalization hierarchies and how they can be used to simplify complex
data relationships.
• Implement data integrity rules to ensure data consistency and accuracy.
• Explain the relational model and how it relates to the entity relationship model.

Key Terms
• Adding attributes
• Data modeling
• Data objects
• Database design
• Diagram
• Entity relationship model
• Foreign keys
• Generalization hierarchies
• Primary keys
• Refining
• Schema

CHAPTER
4
118 Fundamentals of Database Systems

4.1. OVERVIEW OF DATA MODELING


In a data model, important data structures for a database. The
data objects, linkages between the items, and rules governing
how the objects can be operated on are all included in the data
structures (Petkovic & Jonker, 2000). As suggested by the name,
the data model is more concerned with the types of data that
are needed and how they should be organized than it is with the
actual operations that will be carried out on the data. The model
of data is comparable to the blueprints for a structure, to use a
standard analogy.
No hardware or software limitations apply to a data model.
The data model emphases on displaying the statistics as the user
perceives it in the “actual world,” as a disparate to trying to denote
the data as it would be seen by a database (Aracic et al., 2006).
The physical illustration of those thoughts in a database and the
ideas that make up actual processes and events are connected
by this bridge.

4.1.1. Methodology
Entity-relationship (ER) method and object model are the two main
approaches used to design a data model. ER theory is applied in
this text (Figure 4.2).

Figure 4.2. Flow


diagram of entity-
relationship (ER).

Source: Smartdraw, Creative Commons License.


CHAPTER
4
Database Modeling 119

4.1.2. Data Modeling in the Setting of Database


Design
Manipulating the rational and physical organization of one or
more than one database to meet the info wants of operators
in an institute for a certain set of bids is what is meant by the
definition of database design. The five steps of the design process
are following:
• Preparation and evaluation;
• Conceptual planning;
• Logical layout;
• Physical layout; and
• Implementation (Kerre & Chen, 1995).
One step in the theoretical designing process is the model
of data. The functional model is usually the other. The functional
model addresses how to handle the data, while the data model
concentrates on which data had it better be saved in the database.
Relational tables are planned by means of the data model, to
position this in the perspective of the relational databases. The
enquiries that drive access and function on those tables are shaped
with the functional model (Figure 4.3).

Figure 4.3. Schematic


of data modelling.

Source: J. H. Bekke, Creative Commons License.


CHAPTER
4
120 Fundamentals of Database Systems

4.1.3. Constituents of a Data Model


Planning and investigation stage outputs are fed into the data
model (Ma & Yan, 2010). Here, the modeler and analysts examine
the current documentation and speak with end users to gather
knowledge about the database’s requirements.
The data model produces two results. The 1st is a picture-based
ER figure that shows the data structures (Modolo et al., 2018).
Since the graphic is simple to understand, it is an effective tool
for explaining the model to the last user. A data document is the
second part. This chapter provides a detailed description of the
relationships, rules, and data items needed by the database. The
vocabulary offers the information needed by the database designer
to build the actual database.
KEYWORD
Primary key is 4.1.4. Significance of Data Modeling
a specific choice
of a minimal set The most backbreaking and labor-intensive stage in the progress
of attributes that process is likely data modelling. Why bother, especially if time is
uniquely specify a of the essence? Practitioners who write about the topic frequently
tuple in a relation. respond that you should know more to construct a database without
an exemplary than you have to know to build a home without plans.
The data model’s aim is to confirm that entire data objects that
the database needs is totally and precisely represented (Deville
et al., 2003). The data model may be examined and validated as
accurate by the end-users since it use plain language and notations
that are simple to understand.
Database designers can use the data model as a detailed
“blueprint” to create the actual database from scratch. All interactive
tables, primary keys, foreign keys, stored events, and triggers will
be defined based on the data model. Inefficient database layouts
will add more work in the long run. The database you build may
lack the information you need to generate vital reports, return
inaccurate or inconsistent findings, and be unable to adapt to
your needs as they evolve if you don’t give it enough thought
beforehand.

4.2. THE ENTITY-RELATIONSHIP (ER) MODEL


In command to combine the network and relational database
perspectives, Peter first suggested the ER model in 1976 (Chen,
CHAPTER
4
Database Modeling 121

1976). The ER model, to put it simply, is a conceptual data model


that perceives the real life as entailing of entities and relations. ER
diagrams, which are used to represent data items visually, are an
important part of models. Since Chen’s article, the paradigm has
expanded and is now extensively used for database design. The Remember
ER model is useful for the database developer because:
he entity-
• The relational model is well mapped to it. Relational tables relationship model is
a conceptual model
can be formed quickly from the ER model’s concepts used to design
(Moody et al., 2005). and represent
• It requires little training and is straightforward to grasp. In relationships
between data
direction to explain the enterprise to the user, the database entities in a
designer can use the model. database.
• The model can also be utilized as a design strategy by
the database designer to incorporate a data model in
a particular database managing system (Gregersen &
Jensen, 1999).

4.2.1. Basic Concepts of E-R Modeling


According to the ER model, the actual world is made up of
associations between different items.

4.2.2. Entities
The main data object that has to have information collected on is
an entity. Entities often consist of recognizable notions that might
be either abstract or concrete, such as individuals, locations,
objects, or events that are relevant to the database (Chen, 1977).
Employees, projects, and invoices are a few instances of entities in
more detail. The relational paradigm compares an entity to a table.
There are two categories of entities: independent and rely on.
A self-identifying object is one that do not depend on another. A
dependent unit is one whose identification depends on another.
An entity’s individual occurrences are referred to as entity
occurrences or instances. A row in a relational table and an
occurrence are equivalent.

4.2.3. Special Entity Types


Association entities, often referred to as intersectional entities, are
utilized to connect dual or more entities to make sense of a many-
CHAPTER
4
122 Fundamentals of Database Systems

to-many relationship (Batini & Lenzerini, 1984). In generalization


hierarchies, sub-types are used to characterize a subset of cases
of their parental item, known as the super type, but only for which
certain characteristics or relations apply. Further information about
generalization hierarchies and associative entities is provided in
Figure 4.4.

Figure 4.4. Flow chart


of entity and set of
entity.

Source: After academy, Creative Commons License.

4.3. DATABASE DESIGN IS A PART OF DATA


MODELLING
The abstract design process includes the data model as one
component. The function model is another. In contrast to the
function model, which focuses on how to handle the data and the
data model emphasis on which data must be kept in the database.
The relational tables in a relational database are intended with
the data model, to put this into perspective. The queries that will
contact those tables and carry out those activities are created
using the functional model.
Planning and analysis come before data modelling (Câmara et
al., 1996). The extent of the database determines how much effort
is put into this stage. In comparison to a database designed to
service the needs of a small workgroup, a database planned to aid
the desires of an enterprise will need more planning and analysis.
During the requirements analysis, the data required to construct
a data model is obtained. The requirements study and the ER
diagramming stage of the data model are actually completed at
the same time, even though they aren’t formally included in the
data modelling stage by some techniques.

CHAPTER
4
Database Modeling 123

4.3.1. Requirements Analysis


The objectives of the requirements analysis are:
• To identify the database’s data necessities in the form of
primeval objects (Bellatreche et al., 2006);
• To categorize and explain the data regarding these things;
• To define and organize the relations between the objects;
• To establish the kinds of transactions that will be carried KEYWORD
out on the database and how the data and transactions
will interact (Zhao & Roberts, 1988); Entity relationship
(ER) Diagram is
• To determine the laws controlling the accuracy of the data.
a type of flowchart
To ascertain the data requirements for the database, the that illustrates how
modelers or modelers, collaborates with the organization’s end “entities” such as
users. Many methods can be used to acquire the data needed people, objects or
for the requirements analysis: concepts relate to
each other within a
• Review of already-existing documentation, such as system.
memoranda, job descriptions, written policies and
procedures, forms and reports, and personal narratives.
A smart technique to get accustomed to the organization or
activity you want to mimic is through paper documentation.
• Interviews with end users, which may combine one or both
types of encounters. Strive to limit group meetings to no
more than five or six persons. Try to hold a meeting where
everyone who performs the same function is present. Take
notes from the interviews using a blackboard, flip charts,
or overhead transparencies.
• Examination of current automated systems – if the company
currently has mechanized system, analysis the system
enterprise documentation and requirements (Yu et al.,
2000).
The requirements investigation is typically carried out along with
data modelling. Once data is gathered, data objects are named,
characterized using terminologies that are recognizable to end users,
and categorized as entities, characteristics, or relationships. After
that, an ER diagram is used to model and analyze the objects.
The accuracy and completeness of the figure can be checked by
both the modeler and the end operators. Incorrect models are
changed, which occasionally necessitates the gathering of more
data. Up till the model is declared to be accurate, evaluate, and
edit process is repeated.

CHAPTER
4
124 Fundamentals of Database Systems

During the requirements analysis, there are three things to


keep in mind:
• Discuss data with end users in “real-world” terms. Users
consider the authentic people, things, and activities they
interact with on a regular basis rather than abstract
concepts like entities, qualities, and relationships (Sibley
& Kerschberg, 1977).
• Spend some time getting to know the organization and
KEYWORD the operations you wish to imitate. It will be simpler to
Integrity rules are construct the model if you are aware of the procedures.
needed to inform • Depending on their role within an organization, end users
the DBMS about often approach and perceive data differently (Peckham &
certain constraints Maryanski, 1988). As a result, it’s crucial to interview as
in the real world. many people as time will allow.

4.3.2. Phases in Building the Data Model


The steps for creating a data model are not standardized, despite
the fact that the ER model lists and defines the necessary structures.
Certain techniques, like IDEFIX, call for a bottom-up expansion
approach somewhere the model is constructed incrementally. It
is typical to model the entities and relations first, then the key
characteristics, and finally the non-key attributes to complete the
model (Worboys et al., 1990). Using a phased strategy, according
to some experts, is unrealistic since it necessitates too many
meetings with end users:
• Credentials of data items and relations;
• Initial conscripting of the ER diagram containing entities
and relations;
• Improving the ER diagram (Kusiak et al., 1997);
• Including important attributes in the diagram;
• Adding unimportant attributes;
• Diagramming generality hierarchies;
• Normalization serves to validate the model;
• The model’s addition of business and integrity rules (Batra
& Davis, 1992).
In reality, creating a model is not a strictly linear process. As
was already said, the initial ER diagram draught and requirements
analysis frequently happen at the same time. The diagram may
need to be improved and validated in order to identify any issues
CHAPTER
4
Database Modeling 125

or information gaps that call for additional data gathering and


examination (Figure 4.5).

Figure 4.5. Example of


linear process.

Source: Amcharts, Creative Commons License.

4.4. CLASSIFYING DATA OBJECTS AND


RELATIONSHIPS
The modeler must evaluate the data obtained from the requirements
analysis before starting to build the basic model in order to:
• Classifying data objects as entities or characteristics, for
example;
• Identification and clarification of links between entities;
• Defining and identifying the recognized entities, traits, and
relationships;
• Recording this data in the data file (Fang et al., 2020).
The modeler must examine user narratives, meeting notes,
plans, and process documents, and if they’re lucky, approach
documents from the existing information system in order to achieve
these aims.
The ER model’s fundamental concepts are simple to define,
but it might be challenging to determine how each plays a part
in the construction of the data model (Bruns et al., 2008). What
qualifies an item as an entity or an attribute? Take the statement
CHAPTER
4
126 Fundamentals of Database Systems

that “workers effort on projects” as an example. Employees should


they be considered an attribute or an entity? The needs of the
database are frequently a determining factor in the right answer.
Employee may be an attribute in some circumstances or an entity
in others.
The ER Model has straightforward definitions for its structures,
but it doesn’t deal with the crucial problem of how to identify them.
Here are a few suggestions that are frequently given:
Entities are made up of descriptive information, and
characteristics either recognize or characterize them. Relations
are relations between entities.
KEYWORD Below, we go into greater depth about these rules:
Derived attribute • Entities;
is an attribute • Derived attributes;
or property in a
table that has • Attributes;
been calculated or • Code values;
derived using other • Object definition (Hashem et al., 2015);
attributes in the
database
• Naming data objects;
• Relationships;
• Recording data in project document.

4.4.1. Entities
An entity can be described in many different ways, such as “any
distinct place, place, thing, concept, or event about which info is
conserved.”
A distinguishing identifier for anything every distinct article that
is to be exemplified in a database is referred to as “any unique
object that is to be characterized in a database” (e.g., supplier,
employee, machine tool, airline seat, utility pole, etc.) (Sezer et
al., 2017). Certain properties are kept for each entity type.
Common ideas concerning entities appear in these definitions:
• A thing, notion, or item is an entity. The connections
between two or more items can, however, occasionally be
represented as entities. Associative entities are the name
for this kind of thing (Bhavsar & Ganatra, 2012).
• Entities are things that have a descriptive description.

CHAPTER
4
Database Modeling 127

Your chosen data object is an entity if it can be termed


by other objects. The item is not an entity if here is no
accompanying description. Depending on the business or
activity that is being modelled, a data object may be an
entity or not.
• Several things with similar properties are represented by
an entity. They are not isolated entities. For instance, the
plays King Lear and Hamlet have similar traits in common,
such as their names, authors, and casts of characters.
King Lear and Hamlet are examples of the entity PLAY,
which is the thing describing these things (Figure 4.6).

Figure 4.6. Picture of


King Lear.

Source: Shakespeare, Creative Commons License.

• Entities with shared characteristics are potential candidates


for generalization hierarchies (see below).
• It is improper to employ entities to discriminate between
different time periods. One entity called Profits, for instance,
should replace the entities 2nd Quarter Profits, 1st Quarter
Profits, etc. (Nguyen & Rosen, 2017). To categorize by
time, a time period attribute would be utilized.
• Not all of the things the consumers want to gather data
about will be entities. Several entities may be needed to
represent a complex topic. Some “things” users consider
significant may not actually be objects.

4.4.2. Attributes
Data objects known as attributes either describe or identify entities.
Key qualities are those that allow for entity identification. The term
“non-key attributes” refers to qualities that characterize an entity
(Clarkson et al., 1998). In a later section, important characteristics
CHAPTER
4
128 Fundamentals of Database Systems

will be covered in more detail. Similar steps can be taken to


identify attributes, with the exception that this time you want to
search for and excerpt names that seem to be noun expressions
that describe something.

KEYWORD 4.4.3. Validating Attributes


Data modeling Values for attributes should only include one fact or be atomic.
is the process of Data that has been deaggregated makes programming simpler,
creating a visual increases data reuse, and makes change implementation simpler.
representation of The “single fact” requirement must be fulfilled for normalization to
either a whole occur. Typical transgressions include:
information system
or parts of it to i. Straightforward concatenation, such as Person Tag, which
communicate combines the first name, central initial, and final name.
connections The street address, city, and zip code are combined in
between data another option called Address (Zhou et al., 2004). You
points and must determine whether there are valid justifications for
structures. decomposing such attributes when dealing with them. For
instance, do end users wish to utilize the subject’s first
name in a template letter? Would you like to kind by zip
code?
ii. Compound codes are characteristics whose tenets are
codes made up of concatenated bits of data. The code
found on cars and trucks is one instance (Chen et al., 2013).
Over 10 different facts regarding the car are represented by
the code. These cyphers have no significance to the end
operator unless they are a part of an industry standard.
They are extremely challenging to practice and update.
iii. Text blocks, which are open-ended text fields. Even if they
have a place, relying too much on them could mean that
the model isn’t able to handle all the data.
iv. Mixed domains, where a worth of an trait may have a
distinct meaning in other contexts (Ma et al., 2018).

4.5. DERIVED ATTRIBUTES AND CODE VALUES


Experts in data modelling dispute on whether derived attributes
and characteristics with code values should be allowed in the data
model, to name two topics.
The characteristics that result from a summary or a formula
process on other qualities are known as derived attributes. The
CHAPTER
4
Database Modeling 129

foundation of arguments against derived data inclusion is that it


shouldn’t be kept in a database and, hence, shouldn’t be a part of
the data model (Sever et al., 2020). The supporting evidence is:
• Resulting data is frequently crucial to users and managers,
so it should be involved in the data model (Kononenko,
1995).
• Documenting derived characteristics is equally as crucial
as documenting other qualities, if not more so.
• The data model’s inclusion of derived attributes does not
suggest how they will be used (Chang et al., 2011). KEYWORD
A fact is represented by one or more eruditions or integers Cardinality is
in a coded value. For instance, the value Gender can utilize “M” the numerical
and “F” instead of “Male” and “Female” as values. Opponents of relationship
this technique point out that codes complicate data processing between rows of
and lack any clear significance for end users. Several businesses one table and rows
have long used coded attributes, according to proponents, who in another.
also claim that codes save space and improve flexibility by making
it simple to add or change values using look-up tables.

4.5.1. Relationships
Relationships are linkages that connect different things. A verb
connecting two or more elements typically denotes a relationship.
Employees are given tasks to work on, for instance.
Once connections are found, they should be categorized
according to cardinality, direction, dependence, optionality. The
definition of the relations may lead to the removal of some
relationships and the addition of others. The number of occurrences
of one entity that are related to one instance of another is known
as the cardinality, which quantifies the relationships between things
(Van Winden et al., 2016). Think about the existence of a single
instance of each entity to calculate the cardinality. Then, ascertain
how many distinct examples of the second item may be connected
to the first. Reverse the entities and run this analysis once more
(Figure 4.7).\
Consider this, each project has at least two employees allocated
to it, and no employee may be dispensed to more than three
projects at one time.
The cardinality of the association between the projects and the
employees in this instance is two, but it is three for the relationship
CHAPTER
4
130 Fundamentals of Database Systems

between the two. As a result, this relationship fits the definition of


a many-to-many association.

Figure 4.7. Illustration


of cardinality.

Source: Leigh, Creative Commons License.

Any affiliation that has a potential cardinality of nonentity is


an elective one. The relationship is required if it essentially have
a cardinality of at one. The conditional tense usually indicates
relationships that are optional. An employee might be given a
project, for instance. On the other side, words like must have
suggest relationships that are mandatory.
For instance, a student needs to sign up for at least three
classes per semester. A parental object and a child entity exist in
every instance of the particular relationship form (1:1 and 1:M).
The parent in one-to-many connections is always the entity with
cardinality one. The business being modelled must be taken into
account when choosing the parent organization in one-to-one
connections (Demšar, 2010). The selection is arbitrary if a result
cannot be made.

4.5.2. Naming Data Objects


These names ought to have the enlisted possessions:
• Unique;
• Have significance to the end-user;
• Cover the least digits of words required to exceptionally
and precisely define the object (Klenosky & Perkins, 1992).
For attributes an entity, terms are single nouns though
association names are normally the verbs. Few of the authors
guide against the use acronyms or abbreviations because they
may lead to misunderstanding about whatever they actually mean.
Other rely on using acronyms or abbreviations are suitable providing
that they are globally used and agreed within the institute. You
would also take caution to recognize and resolve substitutes for
attributes and entities. This can take place in bulky projects where
dissimilar departments use dissimilar terms for the similar thing.

CHAPTER
4
Database Modeling 131

4.5.3. Object Definition


Accurate and complete definitions are significant to make certain
that all participants involved in the exhibiting of the data identify
exactly what ideas the objects are demonstrating (Friedman &
Goldszmidt, 1996).
Definitions must use terms acquainted to the user and must
precisely describe what the object signifies and the character it is
playing in the enterprise (Micci-Barreca, 2001). Few of the authors
suggest having the end-users offer the definitions. If abbreviations,
or terms not commonly understandable, are used in the description,
then these have to be defined.
Although crucial objects, the modeler would be cautious to
resolute any occurrences where a solitary entity is truly representing
two distinct concepts (called homonyms) or where two distinct
entities are essentially on behalf of the similar “thing” (called
synonyms). This state of affairs usually arises because persons
or administrations may consider about a process or an event in
terms of their particular function.

Figure 4.8. Illustration


of types of marketing
entities.

Source: Shridaddy, Creative Commons License.

An instance of a homonym will be a situation where the


Marketing Section outlines the entity MARKETING in terms of
topographical regions whereas the Sales Departments may think this
CHAPTER
4
132 Fundamentals of Database Systems

entity in relations of demography. Unless determined, the outcome


would be and unit with two unlike meanings and characteristics
(Figure 4.8).
On the contrary, an illustration of a synonym will be the
Service Department might have acknowledged an entity entitled
as CUSTOMER even though the Help Desk has acknowledged the
entity CONTACT. In actual, they could mean the similar thing, an
individual who calls or contacts the institute for aid with a badly
behaved. The tenacity of synonyms is essential in order to evade
idleness and to elude possible constancy or integrity hitches (Feng
et al., 1988).

4.5.4. Recording Information’s in Designing


Document
The policy file records thorough information roughly of each object
utilized in the model (Kawakami & Kaneda, 2009). As per you
KEYWORD define, describe, and name the objects, this evidence should be
Word processor sited in this document. You are not utilizing an automatic design
is a device or tool if, the document could be completed on paper or with a word
software program processor system. Here is no standard for the association of this
capable of file, but the document must comprise material about definitions,
creating, storing, terms, and for domains and attributes.
and printing text
Two documents utilized in the IDEF1X technique of modeling
documents.
are valuable for keeping trails of objects. They are the ENTITY-
ATTRIBUTE matrix, and the ENTITY-ENTITY matrix. The ENTITY-
ENTITY matrix is a 2D arrangement for representing relations
between objects.
The terms of all recognized entities are recorded alongside
both axes (Menzies et al., 2006). As relations are first recognized,
an “X” is positioned in the crossing points where each of the two
axes encounter to specify a possible association among the objects
involved. As the association is additionally categorized, the “X” is
swapped with the notation representing cardinality.
The ENTITY-ATTRIBUTE matrix is utilized to point out the
obligation of entities or characteristics (Sellitto et al., 2007). It is
related in practice to the ENTITY-ENTITY matrix excluding attribute
names are recorded on the rows (Figure 4.9).
Table 4.1 displays models of an ENTITY-ENTITY matrix, and
an ENTITY-ATTRIBUTE matrix.
CHAPTER
4
Database Modeling 133

Figure 4.9. Example of


entity attribute matrix.

Source: Aftab Hussian, Creative Commons License.

Table 4.1. Tabular Representation of Models of an ENTITY-ENTITY


Matrix, and an ENTITY-ATTRIBUTE Matrix

Employee An individual who work for as well as is salaried by


the institute.
Est_Time The sum of hours a project director estimates that
task will need to be completed. Predictable time is
precarious for arranging a project and for following
project time modifications.
Assigned Workers in the institute may be allocated to work on
no extra than three schemes at a period. Each project
will have as a minimum two workers allotted to it at
any assumed time.

Source: Wilson & Martinez, Creative Commons License.

4.5.5. Recording Information in Designing


Document
The policy document records thorough information roughly of each
object utilized in the model. As you define, describe, and name
the objects, this evidence should be sited in this document. If you
are not utilizing an automatic design tool, the document could be
completed on broadsheet or with a word processor system. Here
is no standard for the association of this document. The document
must comprise material about definitions, names, and for domains
and attributes.
Two documents utilized in the IDEF1X technique of modeling
are valuable for keeping trails of objects (Moudrý et al., 2019).
These are the ENTITY-ATTRIBUTE matrix and the ENTITY-ENTITY
matrix.
CHAPTER
4
134 Fundamentals of Database Systems

The ENTITY-ENTITY matrix is a 2D arrangement for representing


relations between entities. The tags of all recognized entities are
recorded alongside both axes (Tan et al., 2010). As relations are
first recognized, an “X” is positioned in the crossing points where
each of the two axes encounter to specify a possible association
among the objects involved.
As the association is additionally classified, the “X” is swapped
with the symbolization representing cardinality. The ENTITY-
ATTRIBUTE matrix is utilized to point out the task of entities or
attributes. It is related in practice to the ENTITY-ENTITY matrix
excluding attribute names are recorded on the rows. Figure 4.10
displays models of an ENTITY-ENTITY matrix and an ENTITY-
ATTRIBUTE matrix (Moser et al., 2008).

Figure 4.10. ENTITY-


ENTITY matrix and an
E N T I T Y- AT T R I B U T E
matrix.

Source: Science Direct, Creative Commons License.

4.6. DEVELOPING THE BASIC SCHEMA


Once entities and associations have been recognized and distinct,
the primary draft of the object association diagram will be shaped.
This unit presents the ER figure by validating how to figure binary
associations (Nicolaos & Katerina, 2015). Recursive associations
are also revealed.

4.6.1. Binary Relationships


Figure 4.11 displays samples of by what means to diagram one-
to-one, many-to-many, and one-to-many relationships (Sheth &
Larson, 1990).
CHAPTER
4
Database Modeling 135

Figure 4.11.
Schematic of binary
relationship.

Source: Data base design, Creative Commons License.

4.6.2. One-To-One
KEYWORD
Example of binary relationships in Figure 4.11(A) demonstrates a
case of a one-to-one figure (Baghdadi, 2006). Understanding the Business rules
figure from left to rightward signify the association every worker describe the
is allotted a workstation. operations,
definitions and
For the reason that every single employee must have a constraints that
workstation, the sign for compulsory existence, in this situation apply to an
the crossbar is positioned following to the WORKSTATION entity. organization.
Interpretating from right to left, the figure displays that not entire
workstation are allotted to personnel.
This state may reveal that more or less workstations are
reserved for spares or for lends. Consequently, we custom the sign
for noncompulsory existence, the circle, following to EMPLOYEE.
The existence and cardinality of an association must be resultant
from the “business rules” of the institute. For instance, if entire
workstations possessed by an institute were allotted to employees,
at that time the circle would be substituted by a crossbar to
signpost compulsory presence. One-to-one relations are infrequently

CHAPTER
4
136 Fundamentals of Database Systems

perceived in “practical” data models (Han et al., 2011). Some PR


actioners suggest that greatest one-to-one relations ought to be
misshapen into a solitary entity or rehabilitated to a generality
hierarchy.

4.6.3. One-To-Many
Figure 4.11(B) expresses an illustration of a one-to-many association
between PROJECT and DEPARTMENT (Burgin & Mikkilineni,
2021). In this diagram, a DEPARTMENT is deliberated the parent
article though PROJECT is the child. Interpretation from leftward
to right, the figure characterizes sections may be accountable for
numerous projects.
The optionality of the associations reveals the “business rule”
that not every department in the institute will be answerable for
handling projects. Reading from right to left, the diagram tells
us that every project must be the responsibility of exactly one
department.
Manage many-to-
many data relations
by implementing 4.6.4. Many-To-Many
junction tables
that efficiently link Figure 4.11(C) expresses a many-to-many association between
multiple data entities. PROJECT and EMPLOY (Brunette et al., 2013). A worker might be
allotted to numerous projects; each project needs many employee.
Remind that the link among EMPLOYEE and PROJECT is
voluntary as, at a certain time, and worker may not be allotted to
a project. Conversely, the association between the PROJECT and
EMPLOYEE is compulsory because a project requisite at least two
employees allotted.
Many-To-Many associations can be utilized in the preliminary
recruiting of the model but ultimately must be converted into two
one-to-numerous relationships (Soylu & De Causmaecker, 2009).
The conversion is obligatory because many-to-many relations
can’t be signified by the interpersonal model. The procedure
for determining many-to-many relationships is discoursed in the
subsequent section.

4.6.5. Recursive Relations


A recursive link is an object is connected with this one. Figure 4.12
displays an sample of the recursive association (Wang & Ariguzo,
CHAPTER
4
Database Modeling 137

2004). A worker may be able to manage numerous workers and


each worker is managed by one worker.

Figure 4.12. Diagram


of instance of recursive
association.

Source: Direct Science, Creative Commons License.

4.7. REFINING – THE ENTITY-RELATIONSHIPS


(ERS) DIAGRAMS
This piece deliberates four rudimentary procedures for modeling
interactions are discussed in subsections.

4.7.1. Entities Participation in Relationships


Entities can’t be exhibited unconnected to one other entity (Jen-Yen
& Yu-Shiang, 1994). Otherwise, when the model was transformed
to the relational model, there would be no way to navigate to that
table. The exception to this rule is a database with a single table.

4.7.2. Resolve Many-To-Many Relationships


Many-to-many interactions can’t be utilized in the data. Model as
they can’t be exemplified by the interactive model. Consequently,
many-to-many relations essentially be determined initially in the
model development. The approach for determining many-to-many
connection is to swap the relationship by an association object
and then narrate the two innovative objects to the link entity. This
policy is established in Figure 4.13 illustrate the many-to-many
association (Campbell & Islam, 1992):
Employees might be assigned to numerous projects.
Each project needs to allot it to more than an individual
employee.
CHAPTER
4
138 Fundamentals of Database Systems

Figure 4.13. Illustration


of many-to-many
relationship.

Source: Filemaker, Creative Commons License.

In adding to the putting into practice problematic, this relationship


offers other hitches. Presume we required to record info about
member projects such as who allotted them, the starting date of
the task, and the finishing date for the task. Given the current
association, these features could not be epitomized in either
PROJECT or EMPLOYEE deprived of repeating info. The first
stage is to change the correlation given to a novel entity we will
call an ASSIGNMENT. Formerly the unique entities, PROJECT
and EMPLOYEE are connected to this new entity maintaining
the optionality and cardinality of the original relations (Colomb &
Dampney, 2005).
Notice that the plan deviates the semantics of the original
relative to workers may be prearranged projects to tasks and
assignments must be completed by more or one worker assignment.
A many to many recursive associations are determined in alike
fashion.

4.7.3. Transform Complex Relations into Binary


Relationships
Complex relations are categorized as ternary; a relationship between
three entities, or n-ray, and relationship between more than 3,
where n is the figure of involved entities (Dragos et al., 2016). For
instance, Figure 4.14(A) demonstrates the relationship.
Employees can use diverse abilities on any one or more
projects.
Each project uses several employees with a number of skills.
Complex associations can’t be straightly applied in the
interpersonal model so they must be determined initially in the
exhibiting process. The policy for determining complex associations
CHAPTER
4
Database Modeling 139

is comparable to determining many-to-many relations. The difficult


relationship swapped by a relationship entity and the novel entities
are linked to this new object. Entity correlated over binary relations
to apiece of the unique entities (Sommestad et al., 2009).

4.7.4. Eliminate, Redundant, and Relationships


A terminated connection is an association among two entities that
is equal in sense to additional association among those similar
two articles that may permit over an middle entity (Huang et al.,
2017). For instance, Figure 4.14(B) displays the explanation which
is to eliminate the terminated association DEPARTMENT allotted
WORKSTATIONS.

Figure 4.14.
Schematic of
removing redundant
relationship.

Source: DW, Creative Commons License.

4.8. PRIMARY AND FOREIGN KEYS


Relational theory’s two cornerstone components are primary
and foreign keys. By permanently and unambiguously identifying
individual instances of an entity, primary keys are a powerful tool
for enforcing entity consistency. Relying on a foreign key to fully
connect two entities is a surefire way to maintain their referential
integrity. Step two in constructing the data model entails:
• Determining which attributes of each entity will serve as
“primary keys” and then defining those attributes;
CHAPTER
4
140 Fundamentals of Database Systems

• Check for correctness of primary keys and associations;


• Foreign key establishment via migration of primary keys
(Memari et al., 2015).

Did you Know? 4.8.1. Primary Key Attributes


Primary keys uniquely Entities can be described by their attributes, which are pieces of
identify a record information. Simply put, an attribute example is a solitary value
in a table, while of a trait for a given case of an entity. An employee’s name and
foreign keys establish start date are both examples of properties that can be applied to
relationships between the entity EMPLOYEE. Illustrations of the qualities name and hiring
records in different date include “Jane Hathaway” and “3 March 1989,” respectively.
tables.
An entity’s main key is the attribute (or group of attributes) that
can be used to identify a single instance of that entity (Qian &
Lunt, 1996). It is required that each object in the data ideal have
a primary key, the values of which can be used to exclusively
recognize each case of that entity.
Attributes must meet the following criteria in order to function
as primary keys for entities:
• First, it must always be non-null in all occurrences of the
entity;
• Second, each instances of an entity need to have a
different value;
• During the lifetime of an entity case, the values must
remain constant and cannot be set to null (Markowitz &
Makowsky, 1990).
Sometimes, an entity will have more than one characteristic
that have action as the main key. A candidate key is a single key
or a small usual of keys that could serve as the primary key. Once
possible keys have been determined, pick a single main key that
will be used consistently across all instances of that entity. Choose
the identification that is most frequently used by the user provided
it has the above-mentioned qualities. Unselected candidate keys
are referred to as alternate keys.
Employee is a type of entity that exemplifies the possibility of
many main keys (Zhuge & Xing, 2005). Consider three potential
keys for each worker at an organization: id, employee, name, and
social security number.
As qualifications go, “name” ranks towards the bottom. In
a small team where everyone knows each other’s names, this
CHAPTER
4
Database Modeling 141

strategy would work, but it wouldn’t be practical for a company


with hundreds or thousands of workers. In addition, an employee’s
name may change after getting married. Employee ID could work
if it were used to identify each employee in a way that was both
consistent and distinctive. Since all employers mandate a Social
Security number, that’s the most practical option.

4.8.2. Composite Keys


Several identifying characteristics may be necessary at times.
A composite key is a main key that contains of no more than
one attribute. A sample composite key is shown in Table 4.2. A
combination of the employee ID and the project ID is the sole way KEYWORD
to identify a single case of the entity Work.
Artificial key is
Table 4.2. Tabular Representation of Example of Composite Keys an extra attribute
added to the table
that is seen by the
Project ID Employee ID Hours Worked
user.
01 01 200
02 01 120
01 02 50
03 02 120
03 03 100
04 03 200

Source: Haas, Creative Commons License.

4.8.3. Artificial Keys


Fake keys are those that have no significance for the company
or organization. When either:
• No trait possesses all of the main key attributes;
• The prime key is big and complex, artificial keys are
allowed (Getoor & Grant, 2006).

4.8.4. Primary Key Migration


Dependent entities inherit the whole main key from the parent
unit, meaning they depend on the presence of a new entity for
CHAPTER
4
142 Fundamentals of Database Systems

their documentation (Premerlani & Blaha, 1993). The root generic


entity’s primary key is sent down to each entity in a generalization
hierarchy.

4.8.5. Define Key Attributes


It is time to identify and describe the qualities that have been
utilized as keys after the model’s keys have been determined.
The representation of main keys in ER diagrams is not
KEYWORD standardized. Inside the entity box for this file, the other name of
the principal key is typed, trailed by the notation (PK) (Hunt, 2010).
Generalization
hierarchy connects
a superclass 4.8.6. Validate Keys and Relationships
and one or more
subclasses, The following fundamental guidelines regulate the documents and
representing a relocation of main keys:
specialization of
• Every object in the data model essentially have a main
the superclass.
key, the ideals of which exclusively recognize instances
of the entity.
• The primary key characteristic can’t have an optional value
(i.e., have null values).
• Repeating values are incompatible with the primary key
(Levene & Loizou, 2012). That is, it is not permitted for
an entity instance to have more than one value for an
attribute at any given time. The No Repetition Rule is
used to describe this.
• It is impossible to divide entities with multiple primary keys
into several entities with more straightforward primary keys.
The Minimum Key Rule applies here.
• Apart from entities contained within generalization
hierarchies, two entities cannot have the same primary
keys.
• The whole main key must be transferred between parent and
child entities as well as between supertype subtypes, generic
entities and category entities (Hamouda & Zainol, 2017).

4.8.7. Foreign Keys


By identifying the parent object, a foreign key is a characteristic
that finalizes a relationship. Foreign keys offer a way to navigate

CHAPTER
4
Database Modeling 143

between multiple examples of an entity and retain the referential


integrity of the data (also known as foreign keys). There must be
a foreign key supporting each relationship in the model.

4.8.8. Categorizing Foreign Keys


KEYWORD
A foreign key is required for each connection in which a model’s
reliant on and class (subtype) entities take part (Waguespack & Referential
Waguespack, 2010). The whole chief key from the parental or integrity is the
generic object is migrated to dependent and subtype entities to logical dependency
create foreign keys. The primary key may not be divided if it is of a foreign key on
composite. a primary key.

4.8.9. Foreign Key Ownership


Due to the fact that they are reflections of parent entity
characteristics, foreign key attributes are not regarded as belonging
to the entities to which they migrate. As a result, each feature in
an object either belongs to that object or to a distant key in that
object. When all the properties in a foreign key are existing in the
main key of a child entity, the child entity is said to be “identifier
reliant” on the parent entity, and the relationship is referred to
as a “identifying relationship.” When any of the properties in a
far-off key are not a part of the kid’s main key, the connection is
referred to as “non-identifying” and the kid is not identifier reliant
on the parent.

4.8.10. Diagramming Foreign Keys


The symbol (FK) next to the characteristics of foreign keys
designates them (Mäenpää & Wanne, 2015).

4.9. ADDING QUALITIES TO THE MODEL


Non-key characteristics give information about the entities they feel
right. In this segment, we go over the guidelines for tagging entities
with non-key properties and how to deal with multivalued attributes.

4.9.1. Relation of Attributes to Entities


There can only be one entity with non-key attributes. Non-key

CHAPTER
4
144 Fundamentals of Database Systems

attributes, in contrast to key attributes, never pass from parent


to child entities and only reside in one entity (Solomon & Brisini,
2019). The first step in connecting attributes to entities is for the
modeler to group characteristics with the entities that they seem to
describe, with the help of the end users. Your choices should be
noted in the entity characteristic matrix covered in the preceding
section. The formal approach of normalization is then used to
validate the assignments.
The law is to include non-key characteristics in entities wherever
KEYWORD
the primary key’s value influences the attributes’ values before
Normalization formal normalization is started. Generally speaking, entities that
is the process of share a primary key ought to be consolidated into a single entity.
organizing data in The following list includes further recommendations for linking
a database. attributes to entities.

4.9.2. Parent-Child Relationships


• When there is a parent-child relationship, characteristics
should be added to the parent object whenever it makes
sense to do so.
• Associate the parent and child entities if a parent entity
lacks non-key attributes (Xu et al., 2014).

4.9.3. Multivalued Attributes


Categorize the characteristic as a new child entity if it depends on
the chief key but is multivalued (more or one value for a specific
value of the key) (Ravald & Grönroos, 1996). The multivalued
attribute becomes the primary key if it is exclusive to the new
object. If not, shift the original, parent entity’s primary key (Table
4.3).
For example, assume an entity called PROJECT with the
attributes Proj_ID (the key), Proj_Name, Task_ID, Task_Name as
following:
Table 4.3. Tabular Representation of PROJECT Entity

Proj_ID Task_ID Project Task_Name


Name
01 01 A Analysis
01 02 A Designing

CHAPTER
4
Database Modeling 145

01 03 A Programming
01 04 A Tuning
02 01 B Analysis

Source: Data Base Design, Creative Commons License.


Several values are available for the key property for Task ID
and Task Name. The answer is to establish a novel entity—name
lease’s its TASK—and sort it a subordinate of PROJECT. Transfer
the fields Task ID., and Task Title from PROJECT to the TASK.
Migration of Proj ID to TASK would be the last step because neither
characteristic can be used to uniquely identify a task.

4.9.4. Relations Described by Attributes


There are times when it seems like an attribute designates a
relationship slightly than an actual object (Mattingly et al., 2014).
A MEMBER might borrow books, for instance.
The time the books were borrowed, and their due date are
potential qualities. A many-to-many relationship will typically result
in such a problem, and the remedy is the similar. Re-classify
the connection as a new entity that is a child of the two primary
entities. The freshly produced thing is known as an associated
entity in some techniques (Hung, 2007).

4.9.5. Code Values and Derived Attributes


The inclusion of derived characteristics and attributes with code
values in the data model are two topics on which data modelling
professionals can’t agree.
A summary or a formula operation on additional attributes
creates derived attributes, which are formed by such a process.
Under the evidence that derived data shouldn’t be kept in a data-
base and, consequently, shouldn’t be comprised in data model,
influences against its inclusion are made. The justifications are
as follows:
• The data model needs to incorporate resulting data because
it is frequently crucial for managers and users alike.
• Documenting derived attributes should be treated in the
same manner as other attributes, if not more so.
CHAPTER
4
146 Fundamentals of Database Systems

• The fact that derived attributes are included in the data


model does not necessarily indicate how they will be used
(Davis, 1985).
In an implicit value, a fact is represented by one or more letters
or numbers. In place of “Male” and “Female,” the values “M” and
“F” can be used for the value “Gender,” for instance. Codes don’t
have a natural meaning to end users, according to many who object
KEYWORD to this practice, and they complicate the data processing process.
Data processing In support, proponents point out that numerous administrations
is the collection have long used implicit properties, that coding save space, and
and manipulation that they increase flexibility by making it simple to add or change
of digital data to values using look-up tables.
produce meaningful
information.
4.9.6. Attributes in the ER Diagram
On whether characteristics belong in the ER diagram, there is
dispute. Attributes must be added, according to the IDEF1X standard
(Gambrel et al., 2016). However, many knowledgeable practitioners
point out that including attributes, particularly if there are a lot of
them, clogs the diagram and reduces its effectiveness in giving
the end-user an outline of in what way the data is organized.

4.10. GENERALIZATION HIERARCHIES


Up until now, we have talked about labeling an object, or entity,
through its communal traits, or attributes. An employee’s name,
employee id, job title, and skills setting are a few examples of
how we might describe them.
Using both comparisons and modifications to describe entities
is another strategy. Assume, for instance, that a company divides
the projects it works on into internal and external ones. On behalf
of a specific organizational unit, internal tasks are completed.
For organizations outside the organization, external tasks are
completed. We can see that both sorts of projects share a common
characteristic in that they both require work to be completed by
organization employees according to a set schedule. But, we also
acknowledge that there are distinctions between them. A client
identity and the amount the customer is charged are two distinctive
characteristics of external projects (Reiter, 1989). Generalization
is the process of classifying entities according to their shared
characteristics and distinctive characteristics.

CHAPTER
4
Database Modeling 147

4.10.1. Description
A generality hierarchy is a logical arrangement of items with related
characteristics. It is a potent and popular technique for conveying
shared traits among things while maintaining their distinctions. It
is the association of an entity with one or more abridged forms.
Each refined form is referred to as a subtype, while the entity KEYWORD
being advanced is referred to as the supertype. When:
Generalization
• A lot of things seem to be of the same type and hierarchy connects
generalization hierarchies should be utilized; a superclass
• When attributes are replicated across several entities; and one or more
subclasses,
• When the model is dynamic (McCarthy, 1982).
representing a
Detailed hierarchies reduce the amount of entities in the specialization of
model, which simplifies it and increases model stability by allowing the superclass.
modifications to only be made to objects relevant to the change.

4.10.2. Making a Generalization Hierarchy


The supertype is given all common attributes in order to build a
generalization hierarchy. A discriminator attribute, whose values
specify the subtype categories, is also given to the supertype.
A subtype is given the attributes that are specific to that category.
The super type’s primary key is also passed down to each subtype.
Subtypes with a single primary key ought to be dropped. In a
one-to-one relationship, subtypes, and super types are related.

4.10.3. Types of Hierarchies


Both overlapping and disjoint generalization hierarchies are possible.
An entity instance may belong to more than one subtype in a
hierarchy that overlaps. The super type entity. PERSON, which
has the three subtypes STAFF, STUDENT, and FACULTY, is one
example of how you may symbolize people at a university. An
individual could very well fall under more than one category,
such as a staff employee who is simultaneously enrolled as a
student. A single subtype can contain all instances of an entity in
a disjoint hierarchy. Examples include the subtypes CLASSIFIED
and WAGES for the entity EMPLOYEE. An employee may fall
under either category, but not both (Ehrlich, 2001).

CHAPTER
4
148 Fundamentals of Database Systems

4.10.4. Rules
Generalization hierarchies’ main tenet is that any instance of an
entity belonging to the supertype must also occur in at least one
instance of the subtype, and vice versa.
There can only be one generalization hierarchy that includes
subtypes. In other words, a subtype can only be associated to one
supertype. The supertype for one hierarchy might be the subtype
for another, allowing generalization hierarchies to be layered.
Subtypes are not always the children in relationships; they can
be the parent entity (Michalski, 1983). The subtype would receive
two primary keys if this were permitted.
KEYWORD
Entity integrity 4.11. ADDING DATA INTEGRITY RULES
ensures that there
are no duplicate Any of the pillars of the interactive pattern is data reliability. Simply
records within the said, data integrity denotes to the constancy and correctness of
table and that the the data standards in the data-base. Entity and the referential
field that identifies integrity rules in the interpersonal model are used to enforce
each record within data integrity (Zhang et al., 2019). Although not a component of
the table is unique the relational architecture, the majority of database software uses
and never null. domain information to guarantee attribute integrity.

4.11.1. Entity Integrity


The chief key value for each example of an object must be existing,
dissimilar, and couldn’t be null, according to the entity integrity rule
(Bertossi & Milani, 2018). The primary key purpose of exclusively
identifying every instance of an entity would be compromised in
the absence of entity integrity.

4.11.2. Referential Integrity


All distant key value need correspond to a main key value in a
related table, according to the referential veracity rule. We can transfer
between linked things with correctness thanks to referential integrity.

4.11.3. Inserting and Deleting Rules


Between two related entities, a foreign key establishes a hierarchical
relationship (Becker et al., 2008). The crucial key table from which
CHAPTER
4
Database Modeling 149

the distant key values are derived is the parental, and the entity
comprising the distant key is the child, or reliant.
Some insert and delete rules must be taken into account when
adding or removing data from the database in directive to ensure
referential integrity among the parent and child.

4.11.4. Insert Rules


The following insert rules are frequently used:
1. Dependent: Only if the parent entity that equals the child
entity instance can be inserted according to the dependent
insert rule (Wang et al., 2006).
2. Automatic: A child entity instance may always be inserted
under the automatic insert rule. If no instance of the
matching parent object already exists, one is generated.
3. Nullify: The addition of a child entity case is always
allowable by the nullify insert rule. The far-off key in the
teen-ager entity case is set to null if a corresponding
parent entity case is not present.
4. Default: Child entity instances may always be inserted
according to the default insert rule. If there isn’t a parent
entity instance that matches, the foreign key in the child
is agreed to the value that was originally specified.
5. Customized: Only under certain customized validity
conditions is it possible to insert a child entity instance
according to the custom-made insert rule (Thion &
Coulondre, 2012).
6. No Impact: The introduction of a child entity case is always
allowed, according to this rule. There is no requirement
that a parent entity instance match, hence there is no
validity check.

4.11.5. Delete Rules


The following insert rules are frequently used:
1. Restrict: Only if there are no identical child entity cases
is it possible to delete the parent entity instance under
the restrict delete rule.
2. Cascade: A parent entity case may always be deleted,

CHAPTER
4
150 Fundamentals of Database Systems

and the cascade delete rule also removes all cases in


the child entity that match.
3. Nullify: The deletion of a parent entity case is always
permitted under the nullify delete rules (Berti-Equille, 2007).
The standards of the foreign keys in any instances of
matching child entities that exist are set to null.
4. Default: A parent entity case may never be deleted under
the default rule. The worth of the distant keys is fixe to
a predetermined default value if there are any instances
of matching child entities.
5. Customized: A parent entity case may only be deleted using
the customized delete rule if certain validity requirements
are satisfied (Hernandez, 2013).
6. No Impact: A parent entity case may never be deleted
while using the no impact delete rule. There is no validity
checking.

4.11.6. Insert and Delete Guidelines


The decision of which regulation to apply is made by these are
some fundamental recommendations for delete and insert rules:
• Steer clear of rules that negate inserts or deletions. In
a parent-child connection, the parent entity often has an
obligatory existence. This rule would be broken if the null
insert or delete rule was used.
• For generalization hierarchies, use either the automated or
dependent insert rule. The requirement that all instances
in subtypes must also be in super types will only be
preserved by these rules.
• When working with generalization hierarchies, use the
cascade delete rule. The restriction that only instances
of the supertype may exist in subtypes will be enforced
by this rule (Caroprese et al., 2008).

4.11.7. Domains
A legitimate collection of values for a characteristic is referred to
as a domain, and it ensures that values from inserts or updates
make sense. The following domain information should be assigned
to each characteristic in the model:
1. Data Type: Integer, decimal, and character data types are
CHAPTER
4
Database Modeling 151

the most basic data types. The majority of data bases


allow variations of these in addition to unique date and
time data types (Reiter, 1989).
2. Length: The length of the value is its digit or character
count; a five digit or 40-character value, for instance.
3. Date Format: The set-up for dates using the notation dd/
mm/yy or yy/mm/dd (Christiansen & Martinenghi, 2006).
4. Range: The array describes the lowest and upper limits
of the values that an attribute may have legally.
5. Constrictions: Thes are particular limitations on acceptable KEYWORD
values. For instance, a new employee’s Beginning Pay
Date must always be the first workday of the month in Data type is a
particular kind
which they are hired.
of data item, as
6. Null Support: States if the attribute is compatible with defined by the
null values. values it can take,
7. Default Value (If Any): If no value is given, this value the programming
will be applied to an attribute instance (Kufoniyi, 1995). language used, or
the operations that
can be performed
4.11.8. Primary Key Domains on it.
Most important key values cannot be null and essentially be distinct.

4.11.9. Foreign Key Domains


Primary keys must have the same data type, length, and format
as the matching primary key (Bertossi & Rizzolo, 2016). The
relationship type and the individuality property must match. An
exclusive foreign key is implied by a one-to-one link, whereas an
unexclusive foreign key is implied by a one-to-many link.

4.12. OUTLINE OF THE RELATIONAL MODEL


Dr. E. F. Codd officially proposed the relational model in 1970,
and it has subsequently developed through a number of articles.
The relational model offers a straightforward yet precisely defined
understanding of how consumers view data. Two-dimensional tables
are used in the relational paradigm to represent data. Each table
characterizes a practical individual, location, object, or event that
is the subject of data collection. Two-dimensional tables make up a
relational database. The logical perspective of the database refers
to how the data is arranged into relational tables (Van Zomeren,
CHAPTER
4
152 Fundamentals of Database Systems

2015). Specifically, this refers to the way a relational data-base


displays data to the operator and the computer operator. The
KEYWORD internal view refers to how the database program physically saves
the information on a computer disc drive. The internal view varies
Relational model from creation to creation and is not relevant to this discussion.
represents how
data is stored To use relational database software that is based on the
in relational relational model, such as Oracle, and Microsoft. SQL Server, or
databases. A even your own personal database systems like Access or Fox,
relational database successfully, one must have a fundamental understanding of the
consists of a relational model.
collection of tables,
each of which is This document serves as a loose introduction to relational
assigned a unique principles, particularly as they apply to problems with relational
name. database architecture. Relational theory is not entirely described
in this way.
The fundamental ideas—data assemblies, relations, and data
integrity—that form the foundation of the relational model are
covered in this section:
• Terminology and data structure;
• Relational table properties;
• Notation;
• Partnerships and keys;
• Data integrity;
• Relational data manipulation;
• Stabilization;
• Advancement of stabilization (Sibley, 2007).

ACTIVITY 4.1.
You are tasked with designing a database system for a large hospital network that
will store patient information, medical records, and billing data. Describe how you
would use database modeling techniques, such as entity-relationship diagrams, to
design an efficient and scalable database structure.

CHAPTER
4
Database Modeling 153

4.13. SUMMARY
Database modelling, or the act of developing a conceptual model of a database, is
explored in length in the chapter. It starts with a brief introduction to data modelling and
its significance in developing a database structure. The entity relationship model, which
depicts connections between data objects, is next presented in this chapter. Data object
and relationship discovery, schema construction, and ERD refinement are also discussed
in this chapter. Later, the chapter dives into the value of primary and foreign keys in a
database. The definition of generalization hierarchies and the addition of characteristics
to the entity relationship model are also covered. Finally, a brief introduction to relational
models and data integrity rules rounds off this chapter.

REVIEW QUESTIONS
1. What is the entity relationship model and how is it used in database modeling?
2. How do you identify data objects and relationships in the entity relationship model?
3. What is the role of primary and foreign keys in a database schema?
4. How do you refine the entity relationship diagram?
5. What are generalization hierarchies and how are they used in database modeling?
6. What are data integrity rules and why are they important in a database schema?
7. What is the relational model and how does it relate to the entity relationship
model?

MULTIPLE CHOICE QUESTION


1. Which of the following best defines data modeling?
a. The process of creating a visual representation of data objects and their relationships
b. The method of creating a physical representation of a database
c. The process of creating a logical representation of a database schema
d. The procedure of creating a report based on the database
2. What is the purpose of the entity-relationship model?
a. To provide a physical representation of a database
b. To provide a logical representation of a database schema
c. To define data integrity rules
d. To provide a report based on database data
3. What is the role of data modeling in database design?
a. To create a physical representation of a database
b. To create a logical representation of a database schema

CHAPTER
4
154 Fundamentals of Database Systems

c. To define data integrity rules


d. To create a report based on database data
4. What is the primary key in a database schema?
a. A field that uniquely recognizes each record in a table
b. A field that links records from one table to records in an additional table
c. A field that defines data integrity rules
d. A field that is used to create a report based on database data
5. What is the purpose of foreign keys in a database schema?
a. To uniquely identify each record in a table
b. To link records from one table to records in another table
c. To define data integrity rules
d. To create a report based on database data
6. What is the process of refining the entity relationship diagram?
a. Adding detail and clarity
b. Removing unnecessary attributes
c. Creating a physical representation of the database
d. Defining data integrity rules
7. What are generalization hierarchies?
a. A way to simplify complex data relationships
b. A way to define data integrity rules
c. A way to create a physical representation of the database
d. A way to create a report based on database data
8. What is the relational model?
a. A model that describes the relationships between data objects in a database
b. A physical representation of a database
c. A report based on database data
d. A set of data integrity rules

Answer to Multiple Choice Questions


1. (a); 2. (b); 3. (b); 4. (a); 5. (b); 6. (a); 7. (a); 8. (a)

CHAPTER
4
Database Modeling 155

REFERENCES
1. Angles, R., & Gutierrez, C., (2008). Survey of graph database models. ACM Computing
Surveys (CSUR), 40(1), 1–39.
2. Aracic, I., Gasiunas, V., Mezini, M., & Ostermann, K., (2006). An overview of caesarJ.
Transactions on Aspect-Oriented Software Development I, 1, 135–173.
3. Baghdadi, Y., (2006). Reverse engineering relational databases to identify and specify
basic web services with respect to service-oriented computing. Information Systems
Frontiers, 8(1), 395–410.
4. Batini, C., & Lenzerini, M., (1984). A methodology for data schema integration in the
entity relationship model. IEEE Transactions on Software Engineering, 4(6), 650–664.
5. Batra, D., & Davis, J. G., (1992). Conceptual data modeling in database design:
Similarities and differences between expert and novice designers. International
Journal of Man-Machine Studies, 37(1), 83–101.
6. Becker, J., Matzner, M., Müller, O., & Winkelmann, A., (2008). Towards a semantic
data quality management-using ontologies to assess master data quality in retailing.
AMCIS 2008 Proceedings, 129, 1, 4–9.
7. Bellatreche, L., Dung, N. X., Pierra, G., & Hondjack, D., (2006). Contribution of
ontology-based data modeling to automatic integration of electronic catalogues within
engineering databases. Computers in Industry, 57(8, 9), 711–724.
8. Berti-Equille, L., (2007). Measuring and modeling data quality for quality-awareness
in data mining. Quality Measures in Data Mining, 2(1), 101–126.
9. Bertossi, L., & Milani, M., (2018). Ontological multidimensional data models and
contextual data quality. Journal of Data and Information Quality (JDIQ), 9(3), 1–36.
10. Bertossi, L., & Rizzolo, F., (2016). Contexts and Data Quality Assessment, 1, 3–6.
11. Bhavsar, H., & Ganatra, A., (2012). A comparative study of training algorithms for
supervised machine learning. International Journal of Soft Computing and Engineering
(IJSCE), 2(4), 2231–2307.
12. Brunette, W., Sundt, M., Dell, N., Chaudhri, R., Breit, N., & Borriello, G., (2013). Open
data kit 2.0: Expanding and refining information services for developing regions. In:
Proceedings of the 14th Workshop on Mobile Computing Systems and Applications
(Vol. 1, pp. 1–6).
13. Bruns, E., Brombach, B., & Bimber, O., (2008). Mobile phone-enabled museum
guidance with adaptive classification. IEEE Computer Graphics and Applications,
28(4), 98–102.
14. Burgin, M., & Mikkilineni, R., (2021). From data processing to knowledge processing:
Working with operational schemas by autopoietic machines. Big Data and Cognitive
Computing, 5(1), 13.
15. Câmara, G., Souza, R. C. M., Freitas, U. M., & Garrido, J., (1996). SPRING:
Integrating remote sensing and GIS by object-oriented data modeling. Computers
& Graphics, 20(3), 395–403.
CHAPTER
4
156 Fundamentals of Database Systems

16. Campbell, R. H., & Islam, N., (1992). A technique for documenting the framework
of an object-oriented system. In: [1992] Proceedings of the Second International
Workshop on Object Orientation in Operating Systems (Vol. 1, pp. 288–300). IEEE.
17. Caroprese, L., Greco, S., & Zumpano, E., (2008). Active integrity constraints for
database consistency maintenance. IEEE Transactions on Knowledge and Data
Engineering, 21(7), 1042–1058.
18. Chang, J. R., Jheng, Y. H., Lo, C. H., & Chang, B., (2011). Attribute coding for the
rough set theory based rule simplications by using the particle swarm optimization
algorithm. In: Intelligent Decision Technologies: Proceedings of the 3rd International
Conference on Intelligent Decision Technologies (IDT’2011) (Vol. 1, pp. 399–407).
Springer Berlin Heidelberg.
19. Chen, P. P. S., (1976). The entity-relationship model—Toward a unified view of data.
ACM Transactions on Database Systems (TODS), 1(1), 9–36.
20. Chen, P. P. S., (1977). The entity-relationship model: A basis for the enterprise view
of data. In: Proceedings of the June 13–16, 1977, National Computer Conference
(Vol. 1, pp. 77–84).
21. Chen, X., Shrivastava, A., & Gupta, A., (2013). Neil: Extracting visual knowledge
from web data. In: Proceedings of the IEEE International Conference on Computer
Vision (Vol. 1, pp. 1409–1416).
22. Christiansen, H., & Martinenghi, D., (2006). On simplification of database integrity
constraints. Fundamenta Informaticae, 71(4), 371–417.
23. Clarkson, B., Sawhney, N., & Pentland, A., (1998). Auditory context awareness via
wearable computing. Energy, 400(600), 20.
24. Colomb, R. M., & Dampney, C. N., (2005). An approach to ontology for institutional
facts in the semantic web. Information and Software Technology, 47(12), 775–783.
25. Davis, F. D., (1985). A Technology Acceptance Model for Empirically Testing New
End-User Information Systems: Theory and Results (Vol. 1, pp. 2–4). Doctoral
dissertation, Massachusetts Institute of Technology.
26. Demšar, J., (2010). Algorithms for subsetting attribute values with relief. Machine
Learning, 78(1), 421–428.
27. Deville, Y., Gilbert, D., Van, H. J., & Wodak, S. J., (2003). An overview of data models
for the analysis of biochemical pathways. Briefings in Bioinformatics, 4(3), 246–259.
28. Dragos, V., Gatepaille, S., & Lerouvreur, X., (2016). Refining relation identification by
combining soft and sensor data. In: 2016 19th International Conference on Information
Fusion (FUSION) (Vol. 1, pp. 2139–2146). IEEE.
29. Ehrlich, P., (2001). Number systems with simplicity hierarchies: A generalization of
Conway’s theory of surreal numbers. The Journal of Symbolic Logic, 66(3), 1231–1258.
30. Fang, W., Ma, L., Love, P. E., Luo, H., Ding, L., & Zhou, A. O., (2020). Knowledge
graph for identifying hazards on construction sites: Integrating computer vision with
ontology. Automation in Construction, 119(1), 103310.
CHAPTER
4
Database Modeling 157

31. Feng, A., Sugiyama, Y., Fujii, M., & Torii, K., (1988). An attribute grammar with
common attributes and its evaluator in prolog. Systems and Computers in Japan,
19(6), 97–107.
32. Friedman, N., & Goldszmidt, M., (1996). Discretizing continuous attributes while
learning Bayesian networks. In: ICML (Vol. 1, pp. 157–165).
33. Gambrel, L. E., Faas, C., Kaestle, C. E., & Savla, J., (2016). Interpersonal neurobiology
and couple relationship quality: A longitudinal model. Contemporary Family Therapy,
38(1), 272–283.
34. Getoor, L., & Grant, J., (2006). PRL: A probabilistic relational language. Machine
Learning, 62(1), 7–31.
35. Gregersen, H., & Jensen, C. S., (1999). Temporal entity-relationship models-a survey.
IEEE Transactions on Knowledge and Data Engineering, 11(3), 464–497.
36. Haas, L., (2006). Beauty and the beast: The theory and practice of information
integration. In: Database Theory–ICDT 2007: 11th International Conference, Barcelona,
Spain, January 10–12, 2007, Proceedings 11 (Vol. 1, pp. 28–43). Springer Berlin
Heidelberg.
37. Hamouda, S., & Zainol, Z., (2017). Document-oriented data schema for relational
database migration to NoSQL. In: 2017 International Conference on Big Data
Innovations and Applications (Innovate-Data) (Vol. 1, pp. 43–50). IEEE.
38. Han, J., Haihong, E., Le, G., & Du, J., (2011). Survey on NoSQL database. In: 2011
6th International Conference on Pervasive Computing and Applications (Vol. 1, pp.
363–366). IEEE.
39. Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U.,
(2015). The rise of “big data” on cloud computing: Review and open research issues.
Information Systems, 47, 98–115.
40. Hernandez, M. J., (2013). Database Design for Mere Mortals: A Hands-On Guide
to Relational Database Design (Vol. 1, pp. 2–5). Pearson Education.
41. Huang, J., Zhang, W., Zhao, S., Ding, S., & Wang, H., (2017). Learning to explain
entity relationships by pairwise ranking with convolutional neural networks. In: IJCAI
(Vol. 1, pp. 4018–4025).
42. Hull, R., & King, R., (1987). Semantic database modeling: Survey, applications, and
research issues. ACM Computing Surveys (CSUR), 19(3), 201–260.
43. Hung, C. J. F., (2007). Toward the theory of relationship management in public
relations: How to cultivate quality relationships. The Future of Excellence in Public
Relations and Communication Management: Challenges for the Next Generation,
1, 443–476.
44. Hunt, T. D., (2010). Natural or artificial primary key? Using the Mifrenz childrens
email application as a case study. New Zealand Journal of Applied Computing &
Information Technology, 14(1), 4–9.
45. Jen-Yen, C., & Yu-Shiang, H., (1994). An integrated object-oriented analysis and
CHAPTER
4
158 Fundamentals of Database Systems

design method emphasizing entity/class relationship and operation finding. Journal


of Systems and Software, 24(1), 31–47.
46. Kawakami, T., & Kaneda, S., (2009). A Real-Time Computing Approach for Derived
Attribute Values in Object-Oriented Application Design (Vol. 109, No. 196, pp. 25–28).
IEICE Technical Report; IEICE Tech. Rep.
47. Kerre, E. E., & Chen, G., (1995). An overview of fuzzy data models. Fuzziness in
Database Management Systems, 1, 23–41.
48. Klenosky, D. B., & Perkins, W. S., (1992). Deriving Attribute Utilities from Consideration
Sets: An Alternative to Self-Explicated Utilities (Vol. 1, pp. 3–7). ACR North American
Advances.
49. Kononenko, I., (1995). On biases in estimating multi-valued attributes. In: IJCAI (Vol.
95, pp. 1034–1040).
50. Kufoniyi, O., (1995). Spatial Coincidence Modeling, Automated Database Updating
and Data Consistency in Vector GIS (Vol. 1, pp. 1–5). Wageningen University and
Research.
51. Kusiak, A., Letsche, T., & Zakarian, A., (1997). Data modeling with IDEF1x. International
Journal of Computer Integrated Manufacturing, 10(6), 470–486.
52. Levene, M., & Loizou, G., (2012). A Guided Tour of Relational Databases and Beyond
(Vol. 2, No. 1, pp. 6–10). Springer Science & Business Media.
53. Ma, L., Sacks, R., Kattel, U., & Bloch, T., (2018). 3D object classification using
geometric features and pairwise relationships. Computer-Aided Civil and Infrastructure
Engineering, 33(2), 152–164.
54. Ma, Z. M., & Yan, L., (2010). A literature overview of fuzzy conceptual data modeling.
J. Inf. Sci. Eng., 26(2), 427–441.
55. Ma, Z. M., (2007). A literature overview of fuzzy database modeling. Intelligent
Databases: Technologies and Applications, 1, 167–196.
56. Mäenpää, T., & Wanne, M., (2015). Review of similarities between adjacency model
and relational model. In: Modeling, Computation and Optimization in Information
Systems and Management Sciences: Proceedings of the 3rd International Conference
on Modeling, Computation and Optimization in Information Systems and Management
Sciences-MCO 2015-Part II (Vol. 1, pp. 69–79). Springer International Publishing.
57. Markowitz, V. M., & Makowsky, J. A., (1990). Identifying extended entity-relationship
object structures in relational schemas. IEEE Transactions on Software Engineering,
16(8), 777–790.
58. Mattingly, B. A., Lewandowski, Jr. G. W., & McIntyre, K. P., (2014). “You make me a
better/worse person”: A two-dimensional model of relationship self-change. Personal
Relationships, 21(1), 176–190.
59. McCarthy, W. E., (1982). The REA accounting model: A generalized framework for
accounting systems in a shared data environment. Accounting Review, 1, 554–578.
60. Memari, M., Link, S., & Dobbie, G., (2015). SQL data profiling of foreign keys. In:
CHAPTER
4
Database Modeling 159

Conceptual Modeling: 34th International Conference, ER 2015, Stockholm, Sweden,


October 19–22, 2015, Proceedings 34 (Vol. 1, pp. 229–243). Springer International
Publishing.
61. Menzies, T., Greenwald, J., & Frank, A., (2006). Data mining static code attributes
to learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13.
62. Micci-Barreca, D., (2001). A preprocessing scheme for high-cardinality categorical
attributes in classification and prediction problems. ACM SIGKDD Explorations
Newsletter, 3(1), 27–32.
63. Michalski, R. S., (1983). A theory and methodology of inductive learning. In: Machine
Learning (Vol. 1, pp. 83–134). Morgan Kaufmann.
64. Modolo, R., Hess, S., Génot, V., Leclercq, L., Leblanc, F., Chaufray, J. Y., &
Holmström, M., (2018). The LatHyS database for planetary plasma environment
investigations: Overview and a case study of data/model comparisons. Planetary
and Space Science, 150(1), 13–21.
65. Moody, D. L., & Shanks, G. G., (2005). What makes a good data model? Evaluating the
quality of entity relationship models. In: Entity-Relationship Approach—ER’94 Business
Modeling and Re-Engineering: 13th International Conference on the Entity-Relationship
Approach Manchester, United Kingdom, December 13–16, 1994 Proceedings (Vol.
1, pp. 94–111). Berlin, Heidelberg: Springer Berlin Heidelberg.
66. Moser, R., Pedrycz, W., & Succi, G., (2008). A comparative analysis of the efficiency
of change metrics and static code attributes for defect prediction. In: Proceedings
of the 30th International Conference on Software Engineering (Vol. 1, pp. 181–190).
67. Moudrý, V., Lecours, V., Malavasi, M., Misiuk, B., Gábor, L., Gdulová, K., & Wild,
J., (2019). Potential pitfalls in rescaling digital terrain model-derived attributes for
ecological studies. Ecological Informatics, 54(1), 100987.
68. Nguyen, H., & Rosen, P., (2017). DSPCP: A data scalable approach for identifying
relationships in parallel coordinates. IEEE Transactions on Visualization and Computer
Graphics, 24(3), 1301–1315.
69. Nicolaos, P., & Katerina, T., (2015). Simple-talking database development: Let the
end-user design a relational schema by using simple words. Computers in Human
Behavior, 48(1), 273–289.
70. Peckham, J., & Maryanski, F., (1988). Semantic data models. ACM Computing
Surveys (CSUR), 20(3), 153–189.
71. Petkovic, M., & Jonker, W., (2000). An overview of data models and query languages
for content-based video retrieval. In: International Conference on Advances in
Infrastructure for E-Business, Science, and Education on the Internet, (Vol. 1, pp.
2–4).
72. Premerlani, W. J., & Blaha, M. R., (1993). An approach for reverse engineering
of relational databases. In: [1993] Proceedings Working Conference on Reverse
Engineering (Vol. 1, pp. 151–160). IEEE.

CHAPTER
4
160 Fundamentals of Database Systems

73. Qian, X., & Lunt, T. F., (1996). A MAC policy framework for multilevel relational
databases. IEEE Transactions on Knowledge and Data Engineering, 8(1), 3–15.
74. Ravald, A., & Grönroos, C., (1996). The value concept and relationship marketing.
European Journal of Marketing, 30(2), 19–30.
75. Reiter, R., (1989). Towards a logical reconstruction of relational database theory.
In: Readings in Artificial Intelligence and Databases (Vol. 1, pp. 301–327). Morgan
Kaufmann.
76. Sellitto, C., Burgess, S., & Hawking, P., (2007). Information quality attributes associated
with RFID-derived benefits in the retail supply chain. International Journal of Retail
& Distribution Management, 35(1), 69–87.
77. Sever, I., Verbič, M., & Klaric, S. E., (2020). Estimating attribute-specific willingness-
to-pay values from a health care contingent valuation study: A best–worst choice
approach. Applied Health Economics and Health Policy, 18(1), 97–107.
78. Sezer, O. B., Dogdu, E., & Ozbayoglu, A. M., (2017). Context-aware computing,
learning, and big data in internet of things: A survey. IEEE Internet of Things Journal,
5(1), 1–27.
79. Sheth, A. P., & Larson, J. A., (1990). Federated database systems for managing
distributed, heterogeneous, and autonomous databases. ACM Computing Surveys
(CSUR), 22(3), 183–236.
80. Sibley, C. G., (2007). The association between working models of attachment and
personality: Toward an integrative framework operationalizing global relational models.
Journal of Research in Personality, 41(1), 90–109.
81. Sibley, E. H., & Kerschberg, L., (1977). Data architecture and data model considerations.
In: Proceedings of the June 13–16, 1977, National Computer Conference (Vol. 1,
pp. 85–96).
82. Solomon, D. H., & Brisini, K. S. C., (2019). Relational uncertainty and interdependence
processes in marriage: A test of relational turbulence theory. Journal of Social and
Personal Relationships, 36(8), 2416–2436.
83. Sommestad, T., Ekstedt, M., & Johnson, P., (2009). Cyber security risks assessment
with Bayesian defense graphs and architectural models. In: 2009 42nd Hawaii
International Conference on System Sciences (Vol. 1, pp. 1–10). IEEE.
84. Soylu, A., & De Causmaecker, P., (2009). Merging model driven and ontology
driven system development approaches pervasive computing perspective. In: 2009
24th International Symposium on Computer and Information Sciences (Vol. 1, pp.
730–735). IEEE.
85. Tan, X., Hammad, A., & Fazio, P., (2010). Automated code compliance checking for
building envelope design. Journal of Computing in Civil Engineering, 24(2), 203–211.
86. Teorey, T. J., Lightstone, S. S., Nadeau, T., & Jagadish, H. V., (2011). Database
Modeling and Design: Logical Design (Vol. 3, No. 1, pp. 4–9). Elsevier.
87. Thion, R., & Coulondre, S., (2012). A relational database integrity framework for
CHAPTER
4
Database Modeling 161

access control policies. Journal of Intelligent Information Systems, 38, 131–159.


88. Van, W. K., Biljecki, F., & Van, D. S. S., (2016). Automatic update of road attributes
by mining GPS tracks. Transactions in GIS, 20(5), 664–683.
89. Van, Z. M., (2015). Collective action as relational interaction: A new relational
hypothesis on how non-activists become activists. New Ideas in Psychology, 39(1),
1–11.
90. Waguespack, L. J., & Waguespack, L. J., (2010). Promoting life using the relational
paradigm. Thriving Systems Theory and Metaphor-Driven Modeling, 1, 139–151.
91. Wang, R. Y., Ziad, M., & Lee, Y. W., (2006). Data Quality (Vol. 23, pp. 6–19).
Springer Science & Business Media.
92. Wang, S., & Ariguzo, G., (2004). Knowledge management through the development
of information schema. Information & Management, 41(4), 445–456.
93. Wilson, D. R., & Martinez, T. R., (1996). Instance-based learning with genetically
derived attribute weights. In: Proceedings of the International Conference on Artificial
Intelligence, Expert Systems, and Neural Networks (Vol. 1, pp. 11–14).
94. Worboys, M. F., Hearnshaw, H. M., & Maguire, D. J., (1990). Object-oriented data
modeling for spatial databases. International Journal of Geographical Information
System, 4(4), 369–383.
95. Xu, L., Fu, P., Xi, Y., Zhang, L., Zhao, X., Cao, C., & Ge, J., (2014). Adding
dynamics to a static theory: How leader traits evolve and how they are expressed.
The Leadership Quarterly, 25(6), 1095–1119.
96. Yu, K., Froese, T., & Grobler, F., (2000). A development framework for data models
for computer-integrated facilities management. Automation in Construction, 9(2),
145–167.
97. Zhang, R., Indulska, M., & Sadiq, S., (2019). Discovering data quality problems: The
case of repurposed data. Business & Information Systems Engineering, 61, 575–593.
98. Zhao, L., & Roberts, S. A., (1988). An object-oriented data model for database
modeling, implementation and access. The Computer Journal, 31(2), 116–124.
99. Zhou, L., Burgoon, J. K., Twitchell, D. P., Qin, T., & Nunamaker, Jr. J. F., (2004). A
comparison of classification methods for predicting deception in computer-mediated
communication. Journal of Management Information Systems, 20(4), 139–166.
100. Zhuge, H., & Xing, Y., (2005). Integrity theory for resource space model and its
application. In: Advances in Web-Age Information Management: 6th International
Conference, WAIM 2005, Hangzhou, China, October 11–13, 2005, Proceedings 6
(Vol. 1, pp. 8–24). Springer Berlin Heidelberg.

CHAPTER
4
CHAPTER 5

RELATIONAL DATABASE
AND SQL

UNIT INTRODUCTION
In the world we live in today, which is increasingly driven by data, the effective and
efficient management of data is essential to the success of any organization. Here is where
relational databases come into play; they serve as the foundation for many enterprise
systems, storing, and processing massive amounts of data. Relational databases are
foundation of many enterprise applications (Deari et al., 2018). The principles of relational
databases, including their structure, design, and implementation, and how they are utilized
to manage and manipulate data, will be covered in this chapter.
In addition, we will go over SQL, stands for Structured Query Language, standard
language for interaction with relational databases. Since SQL is powerful tool which enable
users to create, change, edit, and query databases, knowing how to use it is an absolute
must for everyone who works with data. This chapter will provide an in-depth introduction
to SQL, focusing on the language’s syntax, functions, and capabilities.
At the end of this chapter, you will have a solid understanding of the operation of
relational databases, as well as how to construct and deploy such databases, and how to
use SQL to manage and query data. This material will be essential for everyone who works
with data in any capacity, be they software engineers, data scientists, or data analysts.
164 Fundamentals of Database Systems

Learning Objectives
At the end of this chapter, readers will be able to learn:
• Concepts of relational database;
• Basics of hierarchical and network databases;
• Various components of the relational database;
• Overview of the SQL and the practices of the SQL commands.

Key Terms
• Command
• Datatype
• Field
• Microsoft access
• Oracle
• Query
• Record
• Row
• Server
• Table

CHAPTER
5
Relational Database and SQL 165

5.1. RELATIONAL DATABASE CONCEPTS


Relational databases have existed for past 30 years, although they
are neither the oldest nor the most current type of database (Kroenke
et al., 2010). The past few years have seen the development of XML
as well as object-oriented data structures. Despite this, relational
databases continue to be the most used type of database, and it
is expected that this trend will continue for a while (Figure 5.1).

Figure 5.1. An
example of the
relational database.

Source: Pragimtech, Creative Commons License.

Before we move to specifics of SQL (structured query language),


It is absolutely necessary to have a solid grasp of the nomenclature
and history of relational databases. An early computerized database-
management systems were unlike relational database systems
we observe in different items including Oracle, MySQL, and SQL Did you Know?
server. In 1970s, only two data models were famous: network and
hierarchical. Relational databases
use tables, rows, and
columns to store and
5.2. HIERARCHICAL DATABASES organize data, and
enforce data integrity
The hierarchical model is one of the oldest database models, dating and consistency
back to early 1960s. IBM’s IMS, that are still in use, was most through relationships
popular hierarchical database management system (DBMS). Data and constraints.
in hierarchical model is grouped in parent-child links (Silberschatz
& Kedem, 1980). Think of this relationship as an upside-down tree.
Here’s the example in Figure 5.2.
A child element has one parent is the major point that one can
observe in this model, whereas a parent element has numerous
children. This type of relationship is referred to as a one-to-many
CHAPTER
5
166 Fundamentals of Database Systems

relationship in the lingo used in database design (Tsichritzis &


Lochovsky, 1976). One department, for example, may have a large
number of staff. Use Windows Explorer to examine your PC’s file
structure in case if you’re having issue in seeing this structure. You
observe a file structure that is organized hierarchically. To access
any area of this structure, you need to go through tree structure
until you reach your desired area. The hierarchical paradigm has
flaws; one important disadvantage is data redundancy. This difficulty
arises because hierarchical model is excellent at dealing with
one-to-many relationships but fails miserably with many-to-many
relationships, resulting in data duplication (Domdouzis et al., 2021).
For instance, if a hierarchical chart were to portray an employee
who reported to more than one supervisor, then the information
pertaining to that person would need to be duplicated.

Figure 5.2. An
example of the
hierarchical database.

Source: Codelack, Creative Commons License.

5.3. NETWORK DATABASES


Because the hierarchical paradigm presented several challenges,
the network mode was developed as a solution. Using set theory
instead of a hierarchy to represent data solves the major redundancy
issue. The network model shares major similarities with hierarchical
model, except that kid element in this model can have multiple
parents or more than one parent (Papadias et al., 2003). Hence
essential notion behind network model is, kid can have many
parents, and parents can have many children. This makes it
possible to depict many-to-many relationships. Like, one student
may register for multiple classes at the same time, and a large
number of students may enroll in a single class. One further thing
that sets these two models apart is that network model lets you
navigate structure without having to begin at the element that is
considered to be the root (Figure 5.3).

CHAPTER
5
Relational Database and SQL 167

Figure 5.3. An
example of the
network data model.

Source: John Mariani, Creative Commons License.

However, hierarchical and network models both come with


limitations. The fact that their indexing mechanism is linked to
hard disk’s sector scheme on which they are stored is major
drawback that might cause chaos if the hard drive is ever formatted.
Additionally, querying data from numerous tables is cumbersome.
Querying a database to retrieve data entails writing software that
navigates database structure to access desired data (de Almeida
& Güting, 2005). These programs use proprietary/procedural
Remember
Network
languages which necessitate high level of skill and expertise.
databases are a
Any further description of these two models is outside the scope type of database
of this work. management
system that allow
for complex
relationships to be
5.4. COMPONENTS OF RELATIONAL DATABASE modeled between
data entities,
Dr. E. F. Codd, an IBM researcher, proposed a better method facilitating the
in 1970: relational data model. The DBMS maintains track of management of
each table relationships in this model, regardless of hardware or large and complex
data sets.
external programming languages. The user has to comprehend the
conceptual structure of data in the relational model, not how it’s
stored physically. Data is represented in this model as basic two-
dimensional tables (relations) with columns and rows. Collection
of tables is called a relational database (Zanzig & Tsay, 2004).
We will utilize part of Lyric Music database to demonstrate these
relational concepts.

5.4.1. Table
The table is the fundamental unit of a relational database. A table
can be considered columns and rows of information, similar to a
CHAPTER
5
168 Fundamentals of Database Systems

spreadsheet. A relational database comprises at least one (and


usually more) tables. Each table contains information about a
KEYWORD particular topic, such as staff, products for sale, orders, etc. We
Spreadsheet offer tables for titles and artists, among other things, in the Lyric
is a computer Music database.
application for
The “relational” aspect of relational databases refers to the
computation,
relationships between the various tables, which enable users to
organization,
access data from various tables (Kanellakis, 1990). Let’s say
analysis and
storage of data in there is a link between artists or artist tables and titles in the
tabular form. Lyric Music database (the CD Titles table). One could search
for name of artist plus other details for every CD title using the
ArtistID column in Titles.

5.4.2. Record/Row
All the details regarding one thing or subject are contained in a
record. In theory, if your business cards from 50 people, each card
would be representing table, and data on each card would stand in
for single record. Every row in database table represents a record.
The seven rows in the Titles table also correspond to the seven
various CD titles Lyric Music offers (Atzeni & De Antonellis, 1993).
Few database specialists prefer to use row as records sometimes
have additional meanings outside relational databases.

5.4.3. Field/Column
Any information about subject or thing stored in field. Few experts
of database like to use the word “column” alone because a field
in a database table is a column. There are City, WebAddress,
LeadSource, etc., fields in the Artists table. Matching fields in
two tables help maintain relationships in relational databases
(Colombera et al., 2012). For instance, the ArtistID field is shared
by the Artists table and the Titles table, enabling data to be fetched
from both tables simultaneously.

5.4.4. Datatype
A datatype is given to every field in database table, describing
the datatype that can be placed there. Datatypes are important
to understand since they greatly influence how you handle data
in SQL (Chang & Fu, 2005) (Table 5.1).

CHAPTER
5
Relational Database and SQL 169

Table 5.1. Tabular Data about Various Datatypes

Generic Access SQL Server Oracle MySQL Description


Datatype
Text or Text memo Char Char Char Contains
char alphanumeric
VarChar NChar VarChar data (letters and
numbers). Used
NChar VarChar2 TinyText for each fields
containing one
NVarChar Long Text
letter. Likewise,
CLob MediumText numeric values,
such as phone
LongText number and
postal code,
Set do not require
calculations. Some
data types can
carry 8-bit or 16-bit
values, be padded
with spaces or not,
and have other
properties.
Numeric Byte Integer TinyInt Byte TinyInt Keeps numerical
data. Few of
Long Integer SmallInt SmallInt SmallInt these data types
Integer can handle
Single Integer MediumInt larger or smaller
Number integers and
Double Decimal Int
whole numbers or
Float Real
Decimal BigInt fractions.

Float

Real
Currency Currency SmallMoney Money Decimal Used for money
fields.
Money
Date DateTime SmallDateTime Date Date Holds date and
time data.
DateTime Time

DateTime

CHAPTER
5
170 Fundamentals of Database Systems

Boolean or Yes/No Bit Bit TinyInt Contains just two


true/false values:
Enum
Yes/No, On/Off,
True/False, and
Similar Phrases
In many cases,
this datatype
comes before
the checkboxes.
When it comes
to saving data in
Yes/No fields, the
industry practice
is to use zero for
No/False and one
for Yes/True (–1 in
case of Microsoft
Access). For some
samples, take a
look at the MP3
and RealAudid
properties in the
Tracks table.
Special OLE Object Image Blob TinyBlob Used to store
unique data types
Hyperlink Binary Raw Blob such as pictures,
music, hyperlinks,
Long Raw MediumBlob etc.
LongBlob

Source: Carson, Creative Commons License.

There are a variety of data types available, depending on


DBMS that you’re using. Below is a list containing both the specific
datatypes that are equivalent in each of the four well-known
database systems as well as generic datatypes that are featured
in the majority of database systems. This list isn’t comprehensive
because datatypes differ from one database system version to
another.

5.4.5. Query/View
To display data from a database, perform a query. Both query
by example (QBE), a GUI interface for query development, and
Structured Query Language (SQL), a query language, can be

CHAPTER
5
Relational Database and SQL 171

used to create queries. SQL is the more versatile and potent tool
of the two. KEYWORD
Any set of rows from table, any combination of columns from Programming
able, and even data from many tables using table relationships code is the
can be reported by queries. Information can be calculated and instructions given
summarized through queries. Queries can insert, edit, and delete to a machine to
data and choose information to display. create a computer
program.
You can even save queries for later use in some databases.
Saved query known as view in advanced databases. In Internet
programming, views, and queries are incredibly potent tools. A
view can reduce the complexity of a complex multi-table query
to something extremely basic, simplifying the programming code.

5.4.6. Stored Procedure


A high-end database technique, a stored procedure, embeds
programming capabilities directly into database. Database
administrators (DBAs) frequently build stored procedures to handle
record inserts, revisions, and update (Date, 2006). So, to perform
these tasks, the front-end programmer merely needs to invoke
stored process. It simplifies the programming code and aids in
safeguarding the database from issues brought on by programming
errors.

5.5. OVERVIEW OF SQL


Relational databases are built using the language SQL, also known
as Structured Query Language, which allows for data manipulation.
Depending on who you speak to, it is pronounced as a word
(Seequel) or a string of letters (Ess-Que-Ell).
ANSI supports SQL, an open-source database language
(American National Standards Institute). Relational databases are
now designed, queried, and updated using SQL as the preferred
language. Numerous databases, including IBM DB2, Oracle, MySQ,
Microsoft Access, Microsoft SQL Server, Sybase, Lotus Approach,
and more, support SQL. SQL is supported by over 100 database
administration tools operating on everything from mainframes to
PC’s (Kießling et al., 2011). Most systems contain a QBE visual
user interface. However, whereas QBE changes considerably
amongst database systems, SQL is an ANSI standard and differs
slightly. Moreover, on many database systems, it is not possible
CHAPTER
5
172 Fundamentals of Database Systems

to execute some queries using QBE. SQL expertise is necessary


and transferable for programmers and DBAs alike (Vrhovnik et al.,
2008). Web programming, Client-server programming, and many
other settings frequently use SQL.

5.5.1. Working of SQL


First and foremost, SQL is most powerful tool for displaying data from
relational database. It does not simply provide a data dump. SQL
Learn and apply SQL provides powerful capabilities for summarizing, consolidating, and
efficiently to manage calculating data (Tasevski & Jakimoski, 2020). Data from different
relational databases
tables can be merged in various ways using table relationships. SQL
and perform various
operations such as
can solve almost any data-related query with a well-constructed
querying, updating, database.
and deleting data.
Secondly, in order to manipulate data stored in a relational
database, SQL commands are employed. Records in a table can
be added, changed or deleted. SQL shines as database language
in this case. When using procedural programming languages like
BASIC, it may take more than one line of code in order to update
database table’s record. Additionally, to carry out this process on
each record, procedural programming languages would need to
make use of a looping structure. SQL can perform operations on
a huge number of records all at once. SQL is analogous to haiku
for programmers in that removing or updating thousands of records
may frequently be accomplished with just a dozen or fewer words
(Mattos et al., 1990).
Lastly, SQL is an all-inclusive data definition language (DDL).
The database itself, as well as all tables, primary keys, fields, and
relationships, can be constructed. When you include the commands
for inserting records, you’ll have complete database with each of
its data expressed in programming language. This considerably
improves database programmer’s capacity to work remotely plus
to transfer data upgrades between installation.

5.5.2. History of SQL


Well, history isn’t everyone’s cup of tea, we will keep this brief.
To comprehend the current effect of SQL, however, it is helpful
to understand how it came into being.
The aim behind relational was that a DBMS would monitor
every table relationship without specific programming languages or
CHAPTER
5
Relational Database and SQL 173

hardware (Emerson et al., 1989). Thus, a language for querying


data from several tables may be considerably higher level and
easier to use. Dr. E. F. Codd, an IBM researcher, named such
structured English query language (or SEQUEL). Subsequently,
term was abbreviated to Structured Query Language (SQL).
KEYWORD
Codd’s ideas immediately received attention. Numerous
businesses began developing DBMSs to apply the relational notion. Database
In the late 1970s, the first commercially available computer systems architecture
were Oracle, IBM’s System/R and Relational Technology’s Ingres. involves the
To manipulate the data and database architecture, they all used a application of
variation of SQL. SQL swiftly gained traction due to IBM’s market programming
dominance, and IBM’s version of SQL became de facto standard languages to
(Fang et al., 2008). design software.
It mainly involves
The computer industry felt compelled to unify all SQL versions the design,
into a universal standard. This project was led by ANSI, one of the implementation,
major standards groups in United States, representing thousands development, and
of organizations, corporations, and government agencies. SQL89, maintenance of the
the original SQL standard, released in 1989. Since then, SQL computer programs
standard has been changed twice, with SQL92 (in 1992) and that store and
SQL99 (in 1999). manage data for
businesses.
Because SQL is an open standard, SQL programmers can
transfer their skills to any DBMS. It enables individual databases
to be converted to other DBMSs with modest SQL code changes
(Larson et al., 2013).

5.5.3. Standard of SQL


Most DBMSs adhere to SQL89, and several adhere to SQL92
requirements. Hence, SQL written for one DBMS will typically
run in another with minimal or zero modifications. Each DBMS
provider desires to distinguish their products by incorporating
more robust capabilities (Vats & Saha, 2019). However, the SQL
standard leaves a functional gap that each vendor must address
by defining specific tasks that programmers typically want to do
with data. In conclusion, DBMS manufacturers routinely diverge
from the SQL standard in a variety of ways, both minor and major,
and for a variety of reasons.
Every SQL implementation is little bit different. In this book,
we’ll stick close to ANSI-SQL while noting variances across DBMS
solutions. Many of these are quite handy. Furthermore, because
they are tuned for a specific DBMS, many SQL extensions perform
CHAPTER
5
174 Fundamentals of Database Systems

quicker than ANSI-SQL. Nonetheless, programmers should proceed


with caution when dealing with DBMS-specific SQL extensions.
Assume you’re developing a database-driven web application. You
do some research and find a web hosting company that provides
SQL Server as the back-end database. The web hosting company
KEYWORD that you were using goes out of business after you have used
their services for a year, and you switch to new company which
Back-end
database is a
doesn’t offer SQL Server but offer access to MySQL databases.
database that More SQL Server-specific functions in your code imply more work
is accessed by to get your application up and operating on the new host.
users indirectly Assume, for example, that your organization purchases Oracle
through an external to serve as the back-end database for a company intranet. They’ll
application rather
almost certainly continue to use Oracle as their database, so it
than by application
may be worthwhile if you can squeeze more performance out of
programming
your code by employing Oracle-specific functions.
stored within the
database itself This book will adhere to ANSI standard SQL as closely as
or by low level feasible, utilizing database-specific code only when necessary to
manipulation of the complete typical tasks. When you read the book, it will become
data. clear that it is probably impossible to develop SQL that is completely
portable across various DBMS products. However, remember that
you lose portability the more you deviate from ANSI-SQL.

5.5.4. Importance of SQL Today and Tomorrow


Despite the introduction of hierarchical and network database
systems and novel, non-relational data concepts, relational database
remains king, and SQL is a chosen language for relational data
(including extensible markup language (XML) and object-oriented
programming (OOP)) (Foster et al., 2016). In actuality, XML is being
incorporated into SQL. To construct XML views of relational data,
Microsoft’s SQL Server 2000 supports extensions to SQL called
SQLXML. The W3C (World Wide Web Consortium) XML data model
has been fully integrated into Oracle9i Version 2’s DBMS, and it
offers access mechanisms for traversing and querying XML (Melton
& Mattos, 1995). Hence, all signs point to relational databases,
and SQL will continue to be a crucial tool in the future.
SQL is stand-alone language administrators, and back-end
developers use to build and maintain databases (Mitrovic, 1998).
Front-end application programmers also utilize SQL as embedded
instructions for connecting to databases. This is common practice
in client-server as well as many web programming languages
and frameworks, such as ASP, PHP, JSP, and Java. SQL is used
CHAPTER
5
Relational Database and SQL 175

frequently. Even for people who are not programmers, SQL is


a valuable skill (Dumler, 2005). Many managers must query a
company database to obtain crucial information. Although many KEYWORD
different user interfaces have been created for that purpose, almost User interface
all use SQL syntax or principles. is the point of
human-computer
interaction and
5.6. THE PRACTICE OF SQL COMMANDS communication in a
device.
SQL practice is a must for learning SQL. Which software applications
are available for practicing SQL commands in this book? You have
multiple choices.

5.6.1. Microsoft Access


Microsoft Office includes Microsoft Access. It’s capable database
system for individuals and small teams of up to 25 users (Kemalis
& Tzouramanis, 2008). Moreover, Access performs admirably as a
database backend for database-driven websites with modest traffic.
Direct SQL command entry in Microsoft Access is possible
if you have version 97 or higher of the program (Antunes et al.,
2009). From the CD-ROM, copy lyric2k.mdb (Access 2000 or
higher) or lyric97.mdb (Access 97). Read-only property of the file
might need to be unchecked. Open it in Access and modify it as
needed. Then take these actions (Ali et al., 2011) (Table 5.2).
Table 5.2. Comparison of Access 97 and Access 2000+

Access 97 Access 2000+


Select Queries Select Queries
Click on New Execute a double-click
on option that says,
“Create query in Design
view.”
Click on Design View Click the Close button
and OK in the Show Tables
dialogue box.
Hit the Close button Click on the SQL button
located in the Show in the toolbar
Tables dialogue before
clicking the SQL button
located in the toolbar.

Source: Oracle, Creative Commons License.


CHAPTER
5
176 Fundamentals of Database Systems

Data description commands:


SQL commands are not used in Access to view tables and table
data. To access tables, click Tables tab in database window.
Choose table where you want to see the information, then click
KEYWORD the Design button.
SQL server is
a proprietary
5.6.2. SQL Server
relational database
management Microsoft’s premium client-server database is called SQL Server.
system developed
Do following to add Lyric Music database to SQL server:
by Microsoft.
1. Launch SQL server query analyzer then log in using master
SA (system administrator) or login supplied by your DBMS
(Su & Wassermann, 2006).
2. Choose File | Open from the menu. Go to your CD-ROM
drive and pick LyricSQLServer.sql while book’s CD-ROM
is in your CD-ROM drive (Morgan, 2006).
3. If you don’t have the necessary permissions to build
database, remove first four lines of loaded script file.
4. To execute the script and build the Lyric Music database,
choose Query | Execute from the menu (Rankins et al.,
2014).
To practice your SQL, do following:
• Open the SQL server query analyzer and log in with master
SA login or login your database administrator provides
(Atchariyachanvanich et al., 2019).
• On the top pane, type the SQL statement.
• Launch the SQL by clicking green triangle symbol in toolbar
or choosing Query | Execute from the menu. The bottom
pane will display the results (Julavanich et al., 2019).
Data description commands:
• Enter the following to display a complete list of every user
table in your SQL Server database. Choose a name from
sysobjects with the type = ‘U’ (Boyd & Keromytis, 2004).
• To list each of the data types, fields, and associated
information for a specific table type: exec sp_help table
name (Beaulieu, 2009);

CHAPTER
5
Relational Database and SQL 177

5.6.3. MySQL KEYWORD


MySQL is a free client-server database available at http://www. Linux server
mysql.com. MySQL is utilized on multiple websites, particularly those consists of Linux, a
hosted on Linux servers (Codd, 2007). But it is also compatible family of free, open
with Windows. Do the following to load Lyric Music database: source software
operating systems
• Launch a command prompt (Start | All Programs | built around the
Accessories Command Prompt, depending on your Linux kernel.
Windows version) (Henderson, 2000).
• Once you have access to a command line, navigate to
the directory in which the MySQL programs you need to
use have been installed (Steinberg, 2009). In Windows,
this is typically done with: cd\mysql\bin.
• Insert CD-ROM from the book into your CD-ROM drive,
then type: mysql mysql < d:LyricMySQL.sql. Change letter
d: to letter of your CD-ROM drive (Litwin et al., 2006).
To practice your SQL, do the following:
• Start MySQLManager. (You will find this in the /mysql/bin
folder. Creating a desktop or Start menu shortcut will be
smart move. This is the software where you can enter
SQL.)
• Choose Tools | SQL Query from the corresponding drop-
down menu (Rankins et al., 2014).
• To display a list of your databases, click the yellow cylinder
database icon. Choose a Lyric O Click on the Query tab
and enter Use, followed by your Lyric Music database
name (i.e., Use Lyric).
• Choose green triangle run icon from menu. Display will
proceed to Results tab, however there will be no results
presented at this time (Melton, 1996).
• Re-click the Query tab, then enter your SQL statement.
To examine your results, click the green triangle run icon
(Halfond & Orso, 2007).
Data description commands:
• Simply type show tables into your Oracle database to see a
complete listing of all user tables (Halder & Cortesi, 2010).
• To list all of the data and fields types contained within
specific type of table: Explain the title of the table.

CHAPTER
5
178 Fundamentals of Database Systems

5.6.4. Oracle
KEYWORD
Oracle is widely used client-server database system in current
Oracle is an market. The steps below can load the Lyric Music database if you
object-oriented can access an Oracle database.
relational database
distributed by the • Open Oracle SQL Plus and login with master system user
Oracle Corporation. name or login with the database administrator provided
to you (Yeole & Meshram, 2011).
• Insert the CD-ROM for the book into your CD-ROM drive.
• Type @ and your CD-ROM drive’s drive letter at SQL
prompt, followed by colon (:) & LyricOracle.sql (i.e., d:
LyricOracle.sql).
To practice your SQL, do the following:
• Open Oracle SQL Plus and log in with user name of
master system or login with details provided to you by
database administrator (Seyed-Abbassi, 1993).
• Type your SQL command at the SQL prompt. If you’d like,
you can split the command across numerous lines. Add a
semicolon (;) to the last line to carry out the command.
Data description commands:
• Type “select table_name from user_tables” into your Oracle
database to see a list of all the user tables there;
• To list all of the data and field type contained in specific
sort of table: description tablename (Shi, 2010).

ACTIVITY 5.1.
You work for a financial institution that has a relational database containing customer
account data. Describe how you would use SQL to query the database and retrieve
specific information, such as account balances, transaction histories, and customer
demographics.

CHAPTER
5
Relational Database and SQL 179

5.7. SUMMARY
Even though there are other kinds of data formats (such object-oriented databases (OODBs)
and XML), relational databases remain foundation for vast majority of applications that are
used today. Relational databases organize their data storage in separate tables, are linked
to one another through primary key-foreign key connections. These tables hold the data.
A primary key is set of fields that is used to uniquely identify each row (or record) in a
database. This primary key can be group of fields or single field. Every field of column
in database table equipped with assigned datatype which determines types of data that
can be stored in that column or field. These datatypes are classified as numeric, text,
currency, Boolean, date, and special datatypes. Although they vary from database to
database, they may generally be broken down into these categories.
Structured Query Language is the language that should be used whenever possible
for obtaining information from a relational database (SQL). SQL is a sophisticated query
language that includes commands for adding new data, updating existing ones, and
removing old ones. It is even possible to utilize it to construct a relational database
from the ground up. Almost all relational database products have some form of SQL
implementation in their backend operations. In addition, SQL is remarkably consistent
across all of the many database systems that use it because it is an ANSI standard.
Nonetheless, the implementation of SQL that is utilized by each vendor is distinct. Several
times, the database-specific extensions for SQL deliver both essential functionality and
much-appreciated efficiency improvements. Programmers that want their code to be as
portable as feasible should, despite this, adhere as closely as they can to the ANSI
standard SQL.

REVIEW QUESTIONS
1. What is a relational database? What are its key components?
2. Compare and contrast hierarchical and network databases.
3. What is normalization? Why is it important in database design?
4. What is SQL? What are its key features?
5. Explain in detail the working of SQL.

MULTIPLE CHOICE QUESTIONS


1. Which of the following databases stores data in a tree-like structure?
a. Relational database
b. Hierarchical database
c. Network database
d. Object-oriented database

CHAPTER
5
180 Fundamentals of Database Systems

2. Which of the following is a component of a relational database?


a. Fields
b. Nodes
c. Records
d. All of the above
3. What is the purpose of normalization in database design?
a. To reduce data redundancy
b. To improve data integrity
c. To facilitate data querying
d. All of the above
4. Which SQL command is used to create a new table?
a. SELECT
b. INSERT
c. UPDATE
d. CREATE
5. Which SQL command is used to add data to an existing table?
a. CREATE
b. INSERT
c. UPDATE
d. SELECT

Answers to Multiple Choice Questions


1. (b); 2. (a); 3. (b); 4. (c); 5. (c)

REFERENCES
1. Ali, A. B. M., Abdullah, M. S., & Alostad, J., (2011). SQL-injection vulnerability
scanning tool for automatic creation of SQL-injection attacks. Procedia Computer
Science, 3(1), 453–458.
2. Antunes, N., Laranjeiro, N., Vieira, M., & Madeira, H., (2009). Effective detection
of SQL/XPath injection vulnerabilities in web services. In: 2009 IEEE International
Conference on Services Computing (Vol. 1, pp. 260–267). IEEE.
3. Atchariyachanvanich, K., Nalintippayawong, S., & Julavanich, T., (2019). Reverse
SQL question generation algorithm in the DBLearn adaptive E-learning system. IEEE
Access, 7(1), 54993–55004.

CHAPTER
5
Relational Database and SQL 181

4. Atzeni, P., & De Antonellis, V., (1993). Relational Database Theory (Vol. 1, pp. 5–10).
Benjamin-Cummings Publishing Co., Inc.
5. Beaulieu, A., (2009). Learning SQL: Master SQL Fundamentals (Vol. 1, pp. 2–5).
“O’Reilly Media, Inc.”
6. Boyd, S. W., & Keromytis, A. D., (2004). SQLrand: Preventing SQL injection attacks.
In: Applied Cryptography and Network Security: Second International Conference,
ACNS 2004, Yellow Mountain, China, June 8–11, 2004, Proceedings 2 (Vol. 1, pp.
292–302). Springer Berlin Heidelberg.
7. Chang, N. S., & Fu, K. S., (2005). A relational database system for images. Pictorial
Information Systems, 1, 288–321.
8. Codd, E. F., (2007). Relational database: A practical foundation for productivity. In:
ACM Turing Award Lectures (Vol. 1, p. 1981).
9. Colombera, L., Mountney, N. P., & McCaffrey, W. D., (2012). A Relational Database
for the Digitization of Fluvial Architecture: Concepts and Example Applications, 1,
5–10.
10. Date, C. J., (2006). The Relational Database Dictionary: A Comprehensive Glossary
of Relational Terms and Concepts, with Illustrative Examples (Vol. 1, pp. 7–9).
“O’Reilly Media, Inc.”
11. De Almeida, V. T., & Güting, R. H., (2005). Supporting uncertainty in moving objects
in network databases. In: Proceedings of the 13th Annual ACM International Workshop
on Geographic Information Systems (Vol. 1, pp. 31–40).
12. Deari, R., Zenuni, X., Ajdari, J., Ismaili, F., & Raufi, B., (2018). Analysis and comparison
of document-based databases with SQL relational databases: MongoDB vs MySQL.
In: Proceedings of the International Conference on Information Technologies (Vol.
1, pp. 1–10).
13. Domdouzis, K., Lake, P., & Crowther, P., (2021). Hierarchical databases. In: Concise
Guide to Databases: A Practical Introduction (Vol. 1, pp. 205–212). Cham: Springer
International Publishing.
14. Dumler, M., (2005). Microsoft SQL server 2008 product overview. Microsoft Corporation,
3(2), 5–10.
15. Emerson, S. L., Darnovsky, M., & Bowman, J., (1989). The Practical SQL Handbook:
Using Structured Query Language (Vol. 1, pp. 2–8). Addison-Wesley Longman
Publishing Co., Inc.
16. Fang, Y., Friedman, M., Nair, G., Rys, M., & Schmid, A. E., (2008). Spatial indexing in
Microsoft SQL server 2008. In: Proceedings of the 2008 ACM SIGMOD International
Conference on Management of Data (Vol. 1, pp. 1207–1216).
17. Foster, E. C., Godbole, S., Foster, E. C., & Godbole, S., (2016). Overview of Microsoft
SQL server. Database Systems: A Pragmatic Approach, 1, 461–467.
18. Halder, R., & Cortesi, A., (2010). Obfuscation-based analysis of SQL injection
attacks. In: The IEEE Symposium on Computers and Communications (Vol. 1, pp.
931–938). IEEE.
CHAPTER
5
182 Fundamentals of Database Systems

19. Halfond, W. G., & Orso, A., (2007). Detection and prevention of SQL injection attacks.
In: Malware Detection (Vol. 1, pp. 85–109). Springer US.
20. Henderson, K., (2000). The Guru’s Guide to Transact-SQL (Vol. 1, pp. 5–8). Addison-
Wesley Professional.
21. Julavanich, T., Nalintippayawong, S., & Atchariyachanvanich, K., (2019). RSQLG: The
reverse SQL question generation algorithm. In: 2019 IEEE 6th International Conference
on Industrial Engineering and Applications (ICIEA) (Vol. 1, pp. 908–912). IEEE.
22. Kanellakis, P. C., (1990). Elements of relational database theory. In: Formal Models
and Semantics (Vol. 1, pp. 1073–1156). Elsevier.
23. Kemalis, K., & Tzouramanis, T., (2008). SQL-IDS: A specification-based approach for
SQL-injection detection. In: Proceedings of the 2008 ACM Symposium on Applied
Computing (Vol. 1, pp. 2153–2158).
24. Kießling, W., Endres, M., & Wenzel, F., (2011). The preference SQL system-an
overview. IEEE Data Engineering Bulletin, 34(3), 12–19.
25. Kroenke, D. M., Auer, D. J., Vandenberg, S. L., & Yoder, R. C., (2010). Database
Concepts (Vol. 1, pp. 2–8, 1480–1486). Upper Saddle River, NJ: Prentice Hall.
26. Larson, P. A., Clinciu, C., Fraser, C., Hanson, E. N., Mokhtar, M., Nowakiewicz, M., &
Saubhasik, M., (2013). Enhancements to SQL server column stores. In: Proceedings
of the 2013 ACM SIGMOD International Conference on Management of Data (Vol.
1, pp. 1159–1168).
27. Litwin, W., Sahri, S., & Schwarz, T., (2006). An overview of a scalable distributed
database system SD-SQL server. In: Flexible and Efficient Information Handling: 23rd
British National Conference on Databases, BNCOD 23, Belfast, Northern Ireland, UK,
July 18–20, 2006, Proceedings 23 (Vol. 1, pp. 16–35). Springer Berlin Heidelberg.
28. Mattos, N. M., Darwen, H., Cotton, P., Pistor, P., Kulkarni, K., Dessloch, S., &
Zeidenstein, K., (1999). SQL: 1999, SQL/MM and SQLJ: An overview of the SQL
standards. Tutorial, IBM Database Common Technology, 1, pp. 2–9.
29. Melton, J., & Mattos, N. M., (1995). An overview of the emerging third-generation
SQL standard. ACM SIGMOD Record, 24(2), 468.
30. Melton, J., (1996). SQL language summary. ACM Computing Surveys (CSUR),
28(1), 141–143.
31. Mitrovic, A., (1998). A knowledge-based teaching system for SQL. In: Proceedings
of ED-MEDIA (Vol. 98, pp. 1027–1032).
32. Morgan, D., (2006). Web application security–SQL injection attacks. Network Security,
2006(4), 4–5.
33. Papadias, D., Zhang, J., Mamoulis, N., & Tao, Y., (2003). Query processing in spatial
network databases. In: Proceedings 2003 VLDB Conference (Vol. 1, pp. 802–813).
Morgan Kaufmann.
34. Rankins, R., Bertucci, P. T., Gallelli, C., Silverstein, A. T., & Cotter, H., (2014).
Microsoft SQL Server 2012 Unleashed (Vol. 1, pp. 2–8). Pearson Education.
CHAPTER
5
Relational Database and SQL 183

35. Seyed-Abbassi, B., (1993). A SQL project as a learning method in a database


course. In: Proceedings of the 1993 Conference on Computer Personnel Research
(Vol. 1, pp. 291–297).
36. Shi, J., (2010). Research and practice of SQL optimization in ORACLE. In: 2010
Third International Symposium on Information Processing (Vol. 1, pp. 490–494). IEEE.
37. Silberschatz, A., & Kedem, Z., (1980). Consistency in hierarchical database systems.
Journal of the ACM (JACM), 27(1), 72–80.
38. Steinberg, G., (2009). Teaching relational database concepts to computer literacy
students: The spreadsheet metaphor. Information Systems Education Journal, 7(53),
1–13.
39. Su, Z., & Wassermann, G., (2006). The essence of command injection attacks in
web applications. ACM SIGPLAN Notices, 41(1), 372–382.
40. Tasevski, I., & Jakimoski, K., (2020). Overview of SQL injection defense mechanisms.
In: 2020 28th Telecommunications Forum (TELFOR) (Vol. 1, pp. 1–4). IEEE.
41. Tsichritzis, D. C., & Lochovsky, F. H., (1976). Hierarchical data-base management:
A survey. ACM Computing Surveys (CSUR), 8(1), 105–123.
42. Vats, P., & Saha, A., (2019). An Overview of SQL Injection Attacks, 1, 2–8.
43. Vrhovnik, M., Schwarz, H., Radeschutz, S., & Mitschang, B., (2008). An overview
of SQL support in workflow products. In: 2008 IEEE 24th International Conference
on Data Engineering (Vol. 1, pp. 1287–1296). IEEE.
44. Yeole, A. S., & Meshram, B. B., (2011). Analysis of different technique for detection
of SQL injection. In: Proceedings of the International Conference & Workshop on
Emerging Trends in Technology (Vol. 1, pp. 963–966).
45. Zanzig, J. S., & Tsay, B. Y., (2004). Hands-on training in relational database concepts.
Journal of Accounting Education, 22(2), 131–152.

CHAPTER
5
CHAPTER 6

ROLE OF BIG DATA IN


DATABASE SYSTEMS

UNIT INTRODUCTION
In 2006, LinkedIn, the social networking behemoth, began analyzing its members’ profiles
and recommending individuals they might recognize. This feature was intended to motivate
users to extend their social media network according to their hobbies and provide
them with appropriate recommendations. LinkedIn determined that most of its invitation
recommendations were practical via this function. Similarly, in 2012, President Obama’s
candidacy received a substantial boost in the United States presidential primaries. It
was highly successful through predictive modeling on a large dataset containing voters’
identities, preferences, and trends (Villars et al., 2011).
The two instances provided illustrate the great potential of analyzing, connecting, and
retrieving actionable data from big data sets. The subject of prediction necessitates vast
data and the methodical linking of multiple qualities on enormous data sets.
While big data has enormous potential for extracting information, it requires an in-depth
understanding of the underpinning systems (hardware and software) utilized to maintain,
manipulate, connect, and interpret information (Jeble et al., 2017).
186 Fundamentals of Database Systems

Learning Objectives
After this chapter, readers will understand the following:
• The basics, including the significance and properties of big data;
• Technology utilized for big data;
• Varieties of data utilized in big data include both analytical and transactional data;
• Requirements and difficulties of big data.

Key Terms
• Analytical data
• Big data
• Hadoop
• Privacy
• Scalability
• Spark
• Technologies
• Transactional data

CHAPTER
6
Role of Big Data in Database Systems 187

6.1. UNDERSTANDING BIG DATA


Big data has increased in prominence. It relates to the enormous
amount of information that requires managing, interpreting, and
comprehending data at quantities and speeds that push technological
boundaries.

6.1.1. Characteristics of Big Data


Experts have identified some crucial aspects of big data. They are
commonly known as the five V’s of big data (Jeble et al., 2017)
(Figure 6.1).

Figure 6.1. The five V’s of big data.

Source: Surya Gutta, Creative Commons License.

6.1.1.1. Volume
Big data pertains to the enormous quantity of data that poses
a barrier to storing and analyzing demands. Although there is
no explicit differentiation regarding the amount of information,
the volume can often range from Terabytes (10^12) to Exabytes
(10^18) and beyond.

CHAPTER
6
188 Fundamentals of Database Systems

6.1.1.2. Velocity
Data is being produced at a rapid rate. The rapid rate of data
creation indicates the significance of data. The fast velocity of
data may be measured by the reality that a significant fraction of
utilized data is from the recent past.

6.1.1.3. Variety
KEYWORD Such data may be gathered from various sources, including online
Comma-separated archives, Internet of Things (IoT) equipment, URL, users tweeting,
values (CSV) file and search trends. Likewise, data may be presented in several
is a delimited text ways, including comma separated values (CSV), text documents,
file that uses a tables, and infographics. Moreover, it may be organized, semi-
comma to separate structured, or unorganized.
values. Each line
of the file is a data
6.1.1.4. Veracity
record.
The authenticity of data may fluctuate; i.e., data under evaluation
could be irregular or extremely regular throughout all copies; it
may be worthless or of tremendous value. Veracity relates to the
dependability, precision, or legitimacy of data.

6.1.1.5. Value
Data must be of excellent quality; outdated data is useless.

6.1.2. Importance of Big Data


One of the critical concerns in big data is how much information
is sufficient for a particular big data challenge. In other words,
how much data must be examined to calculate the outcome? This
question’s explanation is not simple.
In data processing, data samples are frequently used to analyze
results. For example, public surveys utilize data samples. Similarly,
gender-based evaluation and demographics are determined by data
sampling. Sampling introduces the possibility of a mistake — a
situation in which the sampled data may not represent the actual
results (Alsghaier et al., 2017).
Google flu trends (GFT) can serve as an illustration of data
sampling inaccuracy. In 2009, Google monitored the transmission
of Flu inside the U. S. The projection was predicated on Google’s

CHAPTER
6
Role of Big Data in Database Systems 189

accessible GFT search patterns. The projection was so accurate


that it exceeded the Centers for Disease Control’s estimate (CDC).
Nevertheless, in February 2013, a similar forecast proved to be
incorrect. It was noticed that the GFT forecast was exaggerated KEYWORD
by more than a two-fold amount. The issue was that Google’s Search engine is
algorithm considered only search phrases on the search engine a software system
operated by Google. that finds web
It was considered that all related Google searches were related pages that match a
web search.
to spreading influenza. The Google team failed to identify a link
between search phrases and influenza (Ansari et al., 2015).
During an electoral campaign, data sampling mistakes might
also be introduced. For example, election Twitter data may favor
a particular politician. Nonetheless, the candidate’s supporters
may be more engaged on social networks than other contenders.
Likewise, sampling size in any situation involving massive amounts
of data might have its own bias.
Finding the optimal quantity of information for a particular
big data challenge is not straightforward. In addition, complete
data collection or collection is a problem. Several experts agree
that in the context of massive data, N=ALL is a reasonable point
of reference in data analysis (Najafabadi et al., 2015). That is,
all of the data has to be examined. Collecting or identifying the
components of N=ALL is not simple. Hence, in many circumstances,
a significant data problem is examined on found data – a word
linked to contributing to the data acquired for analysis.
While gathering more data is frequently more valuable for
analysis, it is not always the case that more data will result in
better outcomes. The relevance of the data acquired is also crucial
in this regard (Jiang et al., 2016).

6.2. BIG DATA TECHNOLOGIES


Big Data relates to the enormous volume of data created daily by
corporations, institutions, and people. This data’s amount, pace,
and diversity make it difficult to maintain, store, and interpret using
conventional database technology. Many emerging innovations have
evolved in recent years to handle the unique demands of Big Data.
This section discusses three prominent Big Data technologies:
Hadoop, NoSQL, and Spark databases.

CHAPTER
6
190 Fundamentals of Database Systems

6.2.1. Hadoop
Hadoop is a free platform for storing and computing massive
datasets in a distributed way. It is founded on the MapReduce
paradigm and the Hadoop distributed file system (HDFS). Hadoop
enables businesses to manage, store, and examine enormous
amounts of information across hundreds of thousands of commodity
computers. Hadoop is intended to be scalable, fault-tolerant, and
economical (Borthakur, 2007) (Figure 6.2).

Figure 6.2. Illustration of


Hadoop and its benefits.

Source: Prwatech, Creative Commons License.

Hadoop is composed of two core parts:


1. Hadoop Distributed File System (HDFS): It is an
interconnected file system that offers high-performance
data accessibility across several Hadoop stations. HDFS
is intended to hold huge files over several drives and
nodes for fault-tolerant and maximum availability.
2. MapReduce: It is a computing framework that enables
the concurrently processing of big datasets throughout
a Hadoop cluster. The MapReduce algorithm functions
by breaking a massive dataset into manageable bits and
disseminating them across numerous nodes in a cluster.
Each node analyzes the data concurrently, and the output
is produced by combining the findings from all nodes.

CHAPTER
6
Role of Big Data in Database Systems 191

6.2.2. Spark
Apache Spark offers an accessible platform for cluster computing
KEYWORD
that enables rapid in-memory data analysis. Spark is supposed Hadoop
to be speedier than Hadoop by giving features for real-time data distributed file
handling. It is interoperable with Hadoop data sources and is system (HDFS) is
developed on the foundation of a HDFS. Spark offers a variety of the primary data
data processing APIs, such as SQL, graph processing, machine storage system
learning, and broadcasting. It also provides a uniform API that used by Hadoop
enables developers to write apps in many programming languages, applications.
such as Java, Python, Scala, and R (Zaharia et al., 2016).
Spark is composed of three primary elements:
i. Spark’s fundamental functionality, including shared
scheduling scheme, memory allocation, and error resolution,
is provided by the Spark Core;
ii. Spark SQL offers SQL-based interfaces for dealing with
structured information. It enables programmers to do SQL
searches against Hadoop and other data sources; and
iii. Spark Streaming: It provides capabilities for real-time data
processing. It enables developers to handle information in
close to real-time, which makes it ideal for detecting fraud,
log processing, and IoT computational use applications.

6.2.3. NoSQL Databases


NoSQL databases are a kind of non-relational databases intended
for storing unorganized and semi-structured information. Unlike
conventional relational databases, NoSQL databases do not have
a set schema and thus are horizontally scalable (Figure 6.3).

Figure 6.3. Illustration


of the NoSQL database.

Source: Geeks for Geeks, Creative Commons License.


CHAPTER
6
192 Fundamentals of Database Systems

Numerous NoSQL databases exist, such as document-oriented,


graph, key-value, and column-family database systems. MongoDB,
Redis, Cassandra, and Neo4j are well-known NoSQL databases
(Strauch et al., 2011).
NoSQL databases are intended to manage Big Data by offering
horizontal scalability with customizable data formats. They are
also intended to be highly accessible and fault-tolerant, rendering
them excellent for use cases involving data distribution across
numerous locations.

KEYWORD 6.2.4. Comparison of the Technologies


Data distribution Spark, Hadoop, and NoSQL large databases are all meant to manage
is a function Big Data, but their methods vary. Hadoop is an excellent batch-
that specifies all oriented solution for handling enormous quantities of information
possible values which may be broken down into smaller parts. Spark is meant
for a variable and to deliver rapid in-memory data processing and is ideally suited
also quantifies the for real-time data handling. NoSQL databases are meant to be
relative frequency. highly scalable and adaptable, making them excellent for managing
unstructured or partially-structured data.
Spark is typically more efficient than Hadoop since it handles
data in memories instead of transferring it to disk. NoSQL databases
offer excellent efficiency by storing and accessing data across a
cluster of numerous machines (Ahmed et al., 2020).
Hadoop needs HDFS for data storage, but Spark and NoSQL
databases handle a broad range of information sources, including
HDFS. NoSQL databases even provide flexible schemas, enabling
data storage in various formats, like JSON, binary, and XML.
NoSQL and Hadoop databases often need programmers to
create programs in Java or other compatible computer languages.
Spark provides a uniform API that enables several computer
languages, notably Java, Python, Scala, and R.
Ultimately, selecting Big Data technologies will hinge on the
project’s needs. Spark is best adapted for real-time processing,
whereas Hadoop is suitable for processing massive volumes of data
in bulk. NoSQL databases offer adaptable data formats and excellent
scalability, which makes them perfect for managing unstructured
and partially structured data (Phwan et al., 2018).

CHAPTER
6
Role of Big Data in Database Systems 193

6.3. TYPE OF DATA: TRANSACTIONAL OR


ANALYTICAL
The topic of what types of data constitutes “big data” is crucial.
Two sorts of systems have already been identified in the literature
(Figure 6.4).

Figure 6.4. Schematic


of the transactional or
analytical datatypes.

Source: Michael Kaminsky, Creative Commons License.

6.3.1. Transactional Systems


They are the different types of systems that facilitate transaction
processing. Hence, these systems exhibit characteristics of
ACID (atomicity, consistency, isolation, durability). They have
an appropriate schema, and each transaction’s data is uniquely
recognized. Remember
Transactional
data systems are
6.3.2. Analytical Systems databases designed
to process and
These systems may or may not possess ACID features. As a store high volumes
of data generated
result, data frequently fails to conform to a correct schema. It could by business
contain duplicates, incomplete data, etc. These systems are much transactions in real-
more suitable for data analysis (Verma et al., 2018). time.

Historically, “big data” has been connected with analytical


systems, notably since these systems do not necessitate strong
robustness and contain schema-less data containing duplicates,
many formats, and missing values. However, as seen in Chapter
8, extensive data systems increasingly evolved to incorporate
transactional systems with ACID features.

CHAPTER
6
194 Fundamentals of Database Systems

6.3.3. CAP Theorem


Previously, the database experts stated that the absence of
consistency and standardization in big data platforms constituted
a significant restriction. Subsequently, it was determined that big
dataset platforms are not obligated to maintain strict consistency
Did you Know? and can utilize the CAP theorem to take advantage of reduced
In a distributed system, consistency with enhanced performance (Siddiqa et al., 2017).
it is impossible to
Eric Brewer presented the CAP theorem. It describes essential
simultaneously achieve
aspects of distributed systems. Availability, Consistency, and Partition
consistency, availability,
and partition tolerance, Tolerance are the three essential features of a distributed system,
and trade-offs must be according to the CAP theorem’s central tenet. In the event of a
made between these partitioning (or networking breakdown), consistency plus availability
three. cannot be provided simultaneously, according to the CAP theorem.
So, when connectivity issues and network partitioning occur, a
distributed system can provide either consistency or availability, not
both. It should be remembered that a distributed database system
would provide both availability and consistency in the absence of
network disruption (Shim, 2012) (Figure 6.5).

Figure 6.5. Diagram


representing CAP
theorem.

Source: Hamzeh Khazaei, Creative Commons License.

Numerous large-scale data systems utilize the CAP theorem


to achieve availability at the expense of consistency. However,
availability does not inevitably undermine consistency. It is also
vulnerable to latency attacks (Casado & Younas, 2015). Systems
with less stringent consistency criteria are typically as suitable for
data analytics because they do not permanently preserve ACID
features. Due to the enormous amount and dispersed aspect of big
CHAPTER
6
Role of Big Data in Database Systems 195

data, it is challenging to preserve ACID principles. Hence, big data


platforms have typically been connected with analytical techniques.

6.3.4. ACID vs. BASE


For distributed systems, it is challenging to achieve ACID
guarantees. Hence, numerous extensive data systems utilize BASE
characteristics. Basically Available Soft state Eventual consistency
is the acronym for the BASE. BASE suggests that significant
data systems trade consistency to preserve availability during a
networking breakdown. Such systems prioritize availability while
the ultimate consistency model is implemented (Soma et al., 2022).

6.4. REQUIREMENTS AND CHALLENGES OF BIG


DATA
With big data platforms, there exist several distinct research
obstacles. These needs must be satisfied. Focus on horizontal
scaling by utilizing
distributed systems
6.4.1. Scalability and cloud technologies
to handle the growing
The most crucial criterion for big data is immense storage and volume, velocity, and
processing capacity for enormous quantities of information. variety of big data.
Scalability must be done without noticeable speed reduction.

6.4.2. Availability and Fault Tolerance


An effective big data system must be tolerant of errors. Transient
errors include congestion issues, CPU unavailability, and packet
drop, whereas chronic faults include disk failures, energy problems,
and networking disruptions.

6.4.3. Efficient Network Setup


A timely and efficient configuration is essential as a big data
system comprises a large number of computers and monitors. The
network must provide access to massive amounts of data with
little delay. Local area network and wide area network (LAN &
WAN) configuration should assist in developing big data systems
(Al-Qarni et al., 2019).

CHAPTER
6
196 Fundamentals of Database Systems

6.4.4. Flexibility
Big data systems might include text information, photos, films, and
charts, among other data types. Likewise, data can be evaluated and
examined using various methods, such as infographics, unprocessed
data, queries, and aggregated data. Such systems must provide
KEYWORD flexible accessibility to and storage of big data systems.
Infographics are
graphic visual 6.4.5. Privacy and Access Control
representations of
information, data, As big data systems collect information from various places,
or knowledge confidentiality, and access control seem among the main priorities.
intended to present Big issues like what data can be publicly disclosed, what material
information quickly
should be evaluated, and who owns the data must be addressed.
and clearly.

6.4.6. Elasticity
The volume of participants in big data systems fluctuates over
time. An effective system must have the capacity to satisfy
user requirements. Elasticity relates to the system’s capacity to
accommodate these demands (Barnawi et al., 2020).

6.4.7. Batch Processing and Interactive


Processing
Big data systems have also transitioned from batch systems to
participatory processing. Competent big data systems must be
capable of analyzing and handling huge data in batch and streaming
modes.

6.4.8. Efficient Storage


Since data is copied in big data systems, effective duplication and
retention technologies are crucial for decreasing costs.

6.4.9. Multi-Tenancy
Many users can access big data systems at once. Multi-tenancy
pertains to the system’s capacity to provide impartial, consistent,
and separate services to big data users.

CHAPTER
6
Role of Big Data in Database Systems 197

6.4.10. Efficient Processing


Big data processing requires fast algorithms, methodologies, and
equipment due to the volume of data. In this situation, good
parallel processing techniques are equally crucial. Likewise, iterative
processing is vital for data analysis and machine learning (Al-
Dhuraibi et al., 2017).

6.4.11. Efficient Scheduling


Many simultaneous jobs and users necessitate strategies and
approaches for proper scheduling. The above list of requirements
is essential for large data systems. Many options have been
created to meet these requirements. Throughout the remainder of
this work, various responses to these difficulties will be analyzed
(Abourezq & Idrissi, 2016).

ACTIVITY 6.1.
You are part of a team tasked with developing a database system that can handle
the large volumes of data generated by a social media platform. Describe how you
would design the database system to handle the volume, velocity, and variety of
data, while ensuring scalability, availability, and security.

CHAPTER
6
198 Fundamentals of Database Systems

6.5. SUMMARY
Big data systems analyze huge amounts of data in depth to forecast and analyze it.
The system design, fault endurance, computing, and processing, replication, consistency,
scalability, and memory space for these systems are unique. This book is intended to
demonstrate available solutions to these problems.

REVIEW QUESTIONS
1. Explain the five V’s of big data.
2. Explain CAP Theorem. How is it useful for big data?
3. Explain the difference between found data and all data.
4. What are the major challenges for big data systems?
5. What are ACID guarantees? Are they needed for big data systems?
6. Highlight major differences between transactional systems and analytical system.

MULTIPLE CHOICE QUESTIONS


1. What is big data?
a. A small amount of data
b. A medium amount of data
c. A large amount of data
d. A variable amount of data
2. What are the characteristics of big data?
a. Volume, velocity, variety, and veracity
b. Volume, velocity, versatility, and variance
c. Volume, velocity, variety, and validity
d. Volume, value, variety, and validity
3. Which of the following technologies are commonly used in big data?
a. Hadoop, Spark, and NoSQL databases
b. MySQL, Oracle, and SQL Server
c. Excel, Access, and Word
d. C++, Java, and Python
4. What is the difference between transactional and analytical data in big data?
a. Transactional data is structured, while analytical data is unstructured
b. Transactional data is used for reporting, while analytical data is used for real-time
processing
CHAPTER
6
Role of Big Data in Database Systems 199

c. Transactional data is used for real-time processing, while analytical data is used
for reporting
d. Transactional data is unstructured, while analytical data is structured
5. What are some of the requirements and challenges of big data?
a. Scalability, security, and data integration
b. Scalability, simplicity, and data consistency
c. Scalability, compatibility, and data completeness
d. Scalability, accessibility, and data accuracy

Answers to Multiple Choice Questions


1. (c); 2. (a); 3. (a); 4. (c); 5. (a)

REFERENCES
1. Abourezq, M., & Idrissi, A., (2016). Database-as-a-service for big data: An overview.
International Journal of Advanced Computer Science and Applications, 7(1), 2–19.
2. Agrawal, D., Das, S., & El Abbadi, A., (2011). Big data and cloud computing: Current
state and future opportunities. In: Proceedings of the 14th International Conference
on Extending Database Technology, 1(2), 530–533.
3. Ahmed, O. M., Haji, L. M., Shukur, H. M., Zebari, R. R., Abas, S. M., & Sadeeq, M.
A., (2020). Comparison among cloud technologies and cloud performance. Journal
of Applied Science and Technology Trends, 1, 40–47.
4. Al-Dhuraibi, Y., Paraiso, F., Djarallah, N., & Merle, P., (2017). Elasticity in cloud
computing: State of the art and research challenges. IEEE Transactions on Services
Computing, 11(2), 430–447.
5. Al-Qarni, B. H., Almogren, A., & Hassan, M. M., (2019). An efficient networking
protocol for internet of things to handle multimedia big data. Multimedia Tools and
Applications, 78, 30039–30056.
6. Alsghaier, H., Akour, M., Shehabat, I., & Aldiabat, S., (2017). The importance of big
data analytics in business: A case study. American Journal of Software Engineering
and Applications, 6(4), 111–115.
7. Ansari, S., Mohanlal, R., Poncela, J., Ansari, A., & Mohanlal, K., (2015). Importance
of big data. In: Handbook of Research on Trends and Future Directions in Big Data
and Web Intelligence, 1(1), 1–19.
8. Bagui, S., & Nguyen, L. T., (2015). Database sharding: To provide fault tolerance
and scalability of big data on the cloud. International Journal of Cloud Applications
and Computing (IJCAC), 5(2), 36–52.
9. Barnawi, A., Sakr, S., Xiao, W., & Al-Barakati, A., (2020). The views, measurements

CHAPTER
6
200 Fundamentals of Database Systems

and challenges of elasticity in the cloud: A review. Computer Communications, 154,


111–117.
10. Bologa, A. R., Bologa, R., & Florea, A., (2013). Big data and specific analysis
methods for insurance fraud detection. Database Systems Journal, 4(4), 30–39.
11. Borthakur, D., (2007). The Hadoop distributed file system: Architecture and design.
Hadoop Project Website, 11(2007), 21.
12. Casado, R., & Younas, M., (2015). Emerging trends and technologies in big data
processing. Concurrency and Computation: Practice and Experience, 27(8), 2078–
2091.
13. Chaudhuri, S., (2012). What next? A half-dozen data management research goals
for big data and the cloud. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI
Symposium on Principles of Database Systems, 1(1), 1–4.
14. Jeble, S., Kumari, S., & Patil, Y., (2017). Role of big data in decision making.
Operations and Supply Chain Management: An International Journal, 11(1), 36–44.
15. Jiang, X., Abdel-Aty, M., Hu, J., & Lee, J., (2016). Investigating macro-level hotzone
identification and variable importance using big data: A random forest models
approach. Neurocomputing, 181, 53–63.
16. Najafabadi, M. M., Villanustre, F., Khoshgoftaar, T. M., Seliya, N., Wald, R., &
Muharemagic, E., (2015). Deep learning applications and challenges in big data
analytics. Journal of Big Data, 2(1), 1–21.
17. Nedelcu, B., (2013). About big data and its challenges and benefits in manufacturing.
Database Systems Journal, 4(3), 10–19.
18. Petković, D., (2017). JSON integration in relational database systems. Int. J. Comput.
Appl., 168(5), 14–19s.
19. Phwan, C. K., Ong, H. C., Chen, W. H., Ling, T. C., Ng, E. P., & Show, P. L., (2018).
Overview: Comparison of pretreatment technologies and fermentation processes
of bioethanol from microalgae. Energy Conversion and Management, 173, 81–94.
20. Shim, S. S., (2012). Guest editor’s introduction: The cap theorem’s growing impact.
Computer, 45(02), 21, 22.
21. Siddiqa, A., Karim, A., & Gani, A., (2017). Big data storage technologies: A survey.
Frontiers of Information Technology & Electronic Engineering, 18, 1040–1070.
22. Soma, L. R., Stefonovski, D., Robinson, M. A., Tsang, D. S., Haughan, J., & Boston,
R. C., (2022). Prerace venous blood gases and acid-base values in standardbred
horses: Effects of geography, season, prerace furosemide, gender, age, and trainer
using big data analytics. American Journal of Veterinary Research, 83(11), 1–12.
23. Strauch, C., Sites, U. L. S., & Kriha, W., (2011). NoSQL Databases (Vol. 20, No.
24, p. 79). Lecture Notes, Stuttgart Media University.
24. Verma, S., Bhattacharyya, S. S., & Kumar, S., (2018). An extension of the technology
acceptance model in the big data analytics system implementation environment.
Information Processing & Management, 54(5), 791–806.
CHAPTER
6
Role of Big Data in Database Systems 201

25. Villars, R. L., Olofson, C. W., & Eastwood, M., (2011). Big Data: What it is and Why
You Should Care? (Vol. 14, pp. 1–14). White Paper, IDC.
26. Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., & Stoica, I.,
(2016). Apache spark: A unified engine for big data processing. Communications of
the ACM, 59(11), 56–65.
27. Zhang, H., Chen, G., Ooi, B. C., Tan, K. L., & Zhang, M., (2015). In-memory big
data management and processing: A survey. IEEE Transactions on Knowledge and
Data Engineering, 27(7), 1920–1948.

CHAPTER
6
CHAPTER 7

DATA WAREHOUSING AND


BUSINESS INTELLIGENCE

UNIT INTRODUCTION
According to the definition provided by Ballard (1998), a data warehouse (DW) is type of
computer system which regularly consolidates and collects data from its various source
systems into dimensional data storage format. Typically, DW stores history that spans
years and can be searched for purposes including corporate intelligence and analytical
endeavors. It’s often updated in batches, as opposed to each time a transaction takes
place in the system that it is derived from Burton (2010).
The DW includes the Data Mart as one of its subsets, and it’s defined as a historical
data body stored in an electronic repository. The Data Mart isn’t involved in day-to-day
activities of the firm. In its place, this data is put to use in the process of developing
business intelligence (BI). Data stored in DataMart often relate to particular division or
department of the business (Demarest, 2008). The dimensional model’s “Fact Table” contain
company’s numerical performance measurements. We try to store each measurement data
produced as byproduct of business activity in single data mart.
A fact table is never complete without its indispensable sidekick, the Dimension Table
(Eckerson, 2003). The business’s textual descriptions can be found in the dimension tables.
In dimensional model that has been thoughtfully designed, the dimension tables contain
a large number of columns or attributes. These characteristics are used to categorize
each row in dimension table. Dimension tables are typically not very deep in several rows
(typically have fewer than 1 million rows), but they are very wide and contain multiple large
columns. Fact tables can be accessed through their corresponding dimension tables. The
204 Fundamentals of Database Systems

dimensions are responsible for implementing user connection to DW. An online analytical
database processing, also known as an OLAP database, is a type of database technology
that stores, manages, and queries data intending to support applications of BI.
The (ETL) system is set of steps which clean, change, combine, remove duplicates,
archive, standardize, and structure data to be used in DW. ETL stands for extract,
transformation, and load system.

Learning Objectives
At the end of this chapter, readers will be able to learn:
• Concepts related to the data warehouse;
• Data statements in business intelligence;
• Architecture of the business intelligence;
• Data models of the data warehouse;
• Different concepts in business intelligence;
• Data warehousing online transactional processing (OLTP);
• Data warehouse and business intelligence high level architecture.

Key Terms
• Business
• Business intelligence
• Channel analysis
• Customer
• Data
• Data model
• Data warehouse
• Market
• Online

CHAPTER
7
Data Warehousing and Business Intelligence 205

7.1. DATA WAREHOUSE (DW) CONCEPTS


Data warehousing collects data for storage in managed database
where data are subject-oriented, integrated, change over time,
and are not easily changed. This is done to help people make
decisions (Inmon, 1993). Data from each of a company’s divisions
are compared and organized before being saved in a centralized
location (known as a data warehouse (DW)). From this location,
analysts may then gather information which will help them in
making more informed decisions (Cho & Ngai, 2003). After then,
data easily be gathered, processed, sliced, and diced according
to the requirements to present relevant details (Eldabi et al.,
2002). There are two primary authors who’re well-known in DW
architecture; nevertheless, authors strategies for dealing with some
aspects of data warehousing are distinct from one another; William
Inmon and Ralph Kimball. Inmon uses a top-down design strategy,
whereas Kimball uses a bottom-up one. Most people who work in
a DW use one of the two strategies (Figure 7.1).

Figure 7.1.
Illustration of the data
warehousing.

Source: CFI Team, Creative Commons License.

According to Inmon (1993), DW is defined as collection of


integrated, time-variant, subject-oriented, and non-volatile data that
can be utilized to support decision-making processes (1993). The
term “integrated” means the data can be stored in uniform formats,
and naming conventions in the measurement of data’s physical
feature, variables, encoding structures, or domain constraints.
Integrated data can also be restricted to specific domains. A
DW, referred to as “Subject Oriented,” emphasizes the high-level
entities of a company. For the ethnicity category, for instance, a
DW will only have one coding system, whereas an organization
may have four to five distinct coding methods. Time-variant data
are those that are linked to certain moment or period in time,
CHAPTER
7
206 Fundamentals of Database Systems

like a month, quarter, or year (Giovinazzo, 2003), and accessible


through warehouses over some time.
The fact that data placed into the database is rarely or never
altered after it has been entered into the warehouse makes
warehouse data non-volatile. Data in the warehouse can only be
updated or refreshed periodically, incrementally, or completely.
Last but not least, “nonvolatile” denotes that the data are static
(Benbasat et al., 1987).
Hackathorn (1999) asserts that the enterprise’s data marts are
consolidated into the DW. The dimensional model always contains
information. In Kimball’s opinion, data warehousing is component of
KEYWORD data marts. The business goals for the organization’s departments
Data warehouse are the main emphasis of the data marts. Also, A dimension that
is a type of data complies with data marts is the DW.
management
system that Data mart is a subset of a DW, according to Kimball. The DW
is designed comprises all the data marts, each of which uses a family of star
to enable and schemas with varying levels of granularity to describe a business
support business process. Kimball uses denormalized conformed dimensions, while
intelligence (BI) Han & Kamber (2001) uses a central database model that is
activities, especially fully normalized. This is the major distinction between the two
analytics. approaches.
According to Basaran (2005), the following are the characteristics
of DW.
• It’s subject-focused;
• It’s stable and doesn’t change easily or non-volatile;
• It enables the fusion of different application systems. It
facilitates information processing by combining old data
(Singh, 1997);
• Data is kept in a structured manner that allows for analysis
and querying;
• Data is summarized. DWs frequently maintain less
information than transaction-oriented systems.

7.2. STATEMENTS BUSINESS INTELLIGENCE


(BI)
Many accurate business intelligence (BI) definitions can be found
in literature (BI). Various parties, including news organizations, IT
providers, and business consultants, have different point of view
CHAPTER
7
Data Warehousing and Business Intelligence 207

on this issue. Many examples are described below. Collectively,


they ought to convey the core idea of BI. According to the Gartner
Group, BI is the process of turning data into information and
then, after a period of discovery, turning that information into
knowledge. Kimball & Ross (2010) who discovered BI, defined it
as process of collecting and processing data to support strategy
of an organization (Figure 7.2). Use business
intelligence tools
to analyze and
gain insights from
large datasets and
convert them into
meaningful reports
and visualizations.

Figure 7.2. Schematic


of the business
intelligence and its
components.

Source: Ostaraward, Creative Commons License.

Poe et al. (1997) defines BI as all programs that assist the


analysis and reporting of corporate data in enhancing decision-
making, resulting in better company direction. The raw data firm
has accumulated in past must be sorted through and filtered
to provide the information that the company’s decision-makers
require. The primary goal here is to gain knowledge that is both
insightful and practical through the processing of these raw facts
(Karmańska, 2019). Typical transactional software automates routine
tasks such as invoice production and records them in the system.
In contrast, BI takes step backward to provide more comprehensive
view of these interactions. A forecast of future actions is made by
aggregating, analyzing, and connecting data from the past rather

CHAPTER
7
208 Fundamentals of Database Systems

than reporting it in a particularly specific manner. According to


Dobbs et al. (2002) study BI systems are divided in few major
types, including reporting, data mining (DM) and OLAP. Simon &
Shaffer (2001) categorize BI tools as reporting, OLAP, plus DM.
Well, I’ll also categorize BI tools in above-mentioned three groups in
this chapter. It appears that most definitions concur that BI should
assist in determining the basic course of a company through data
analysis and reporting.

7.3. BUSINESS INTELLIGENCE (BI)


ARCHITECTURE
7.3.1. Operational Applications vs. Business
Intelligence (BI) Applications
The relationship between the two primary components of BI apps
and operational apps is shown in Figure 7.3 (Shariat & Hightower,
2007). According to Ong et al. (2011), reporting and DM are primary
BI components. I believe that OLAP falls in between reporting
and DM.

Figure 7.3. Connections


between business and
operational intelligence
apps.

Source: Kroenke, Creative Commons License.

On one hand, operational database management system


(DBMS) makes it possible for operational business applications
like entry of order, manufacturing, and purchasing to read data
from and write data to operational database (DBMS).
For instance, placing orders into corporate system is primarily
done at the operational level of a business and doesn’t typically
include making high-level decisions. This concept states that
management should employ BI applications to improve decision-
making at the tactical and strategic levels (Berthold et al., 2010).
CHAPTER
7
Data Warehousing and Business Intelligence 209

Be aware that this distinction touches on the dissertation’s central


question, determining which organizational levels genuinely impact.
In contrast, BI applications may only read data directly from
operational database through operational DBMS if simple reporting
and small databases are employed. Well, BI DBMS reads data
extracted from this operational database and data acquired from
external data suppliers. This data allows BI software to generate
reports and do complex analytics. A more extensive breakdown
of these components will be presented in subsequent paragraphs
KEYWORD
(Nedelcu, 2013). Operational
database
management
7.3.2. Requirement for Data Warehouse (DW) system is software
that is designed
Complications may develop when reading straight from operational
to allow users
database in complex BI systems with enormous datasets. In addition to easily define,
to slowing-down DBMS as well as its applications, missing or modify, retrieve,
improperly formatted information might cause errors. and manage data
As a result, a new database derived from the operational in real-time.
database must be created and prepared for BI use. This data
warehousing process is divided into three major processes:
extraction, transformation, and loading (ETL) (Watson, 2009).
The use of a model enables extraction algorithms to obtain
information from multiple operational databases. In metadata, this
model and the source data specification are described. For example,
metadata consisting of sales information shown in integer format
produced by salespeople in a particular region is used to build
a model that describes regional sales success. It’s important to
note that using indexes accelerates the extraction process. Data
transformation is often needed to ensure uniformity in the DW.
It’s necessary to convert the data into the proper format or to
fill in any missing values. Some components of operational data are
also eliminated, such as low-level transaction information, because
they slow down query speeds. Finally, a DW is created by the DBMS
to house the processed data. The ETL process is critical in BI since
it connects to source data. Users can start providing information
or intelligence once ETL is finished (Chan & Lau, 2018). Those
who create DWs are masters in data management. Creation of
DWs is what they view as their finished product. From a corporate
perspective, though, business analysts’ responsibilities don’t end
with DWs. Individuals in the marketing or finance departments
may want to use data marts. They’re small divisions of DWs that
include information on particular business elements. For instance,
CHAPTER
7
210 Fundamentals of Database Systems

the marketing analyst may analyze data-mart containing sales


information for specific market sectors. Figure 7.4 graphically
demonstrates how DW DBMS connects operational databases
and BI tools (Elena, 2011). Be aware that the DW DBMS also
stores metadata or information about the data’s creator, format,
creation date, etc.

Figure 7.4.
Displaying data
warehouse
components.

Source: Kroenke, Creative Commons License.

7.3.3. Improved Decision-Making via Analysis and


Reporting
The functionality of reporting technology in BI extends much beyond
the simple dissemination of data (McKnight, 2004). In business
operations, reporting is used to generate reports for applications
like financial management and logistics. Desktop report writers,
production reporting tools, and managed query tools are the three
basic categories of reporting tools that BI separates based on
user competencies.
Operational reports are produced using production reporting
tools or large batch operations like printing and counting pay. The
IT department must assist with report generation. Queries are
handled in the batch mode because these reports encompass vast
data. However, desktop report writers let users create queries and
reports on their computers quickly and easily without the help of
IT department. Report writers have access to numerous databases
using graphical interface, can choose from them, and can present
and distribute results using a wide range of report formats. Figure
7.5 depicts instances of several report kinds (Dayal et al., 2009).
Users can create simple reports using the Desktop report writers
based on the small data set. When accessing complex source
CHAPTER
7
Data Warehousing and Business Intelligence 211

data, controlled query tools should be used. Managed query tools


give users a relatively simple way to retrieve complex source data.
It is necessary to create an interface between the data sources
and user to describe relationship between physical data in user
language and databases. This interface features graphical SQL KEYWORD
environment that responds to graphical commands by producing
Business
SQL code. For data access and manipulation, relational DBMS
activities refer
use the standard database language known as standard query
to the activities
language (SQL) (Nadipalli, 2017).
performed by
Moreover, users can use SQL to conduct basic calculations businesses to
on data, including producing summaries of past (trends), present, make a profit and
and anticipated future business activities (forecasting) (Lopes et ensure business
al., 2020). continuity.

Due to user-friendly design, user may concentrate entirely on


crafting questions without thinking about issues like data placement,
consistency, etc. For example, a summary of the total number of units
sold annually, the type of customer, and the region may be shown.
The fact that this interface enables non-technical users to create
customized reports must be emphasized. Because comprehensive
overviews, in my opinion, are particularly beneficial at the highest
levels of management, this reporting tool has the potential to
contribute to a company’s strategic management significantly.
OLAP functionality is sometimes provided through controlled query
tools (Zimmer et al., 2012). OLAP allows users to dive deep into
overviews of controlled query tools.

Figure 7.5.
Representing
reporting system
parts.

Source: Grant Harrington, Creative Commons License.

CHAPTER
7
212 Fundamentals of Database Systems

The distribution and delivery of reports are essential steps in


the decision-making process. Reports must be given to appropriate
and authorized users in the appropriate format and at appropriate
time. Report’s outcome might be presented on paper, through
browser, phone, or by using other media. A reporting system’s
input and output components are depicted in Figure 7.5. Digital
dashboard is a customized electronic representation of report
(Kalelkar et al., 2014). Financial analyst would want to see the
KEYWORD company’s financial stock price on his dashboard in addition to
the values of American and European stocks. Alerts are reports
Financial analysts generated automatically when an event occurs, such as when
are responsible company’s stock price surpasses predetermined limit.
for a variety of
research tasks to An illustration of this kind of report is the RFM analysis. This
inform investment report categorizes each and every consumer based on how recently
strategy and (R) they purchased anything, how frequently (F) they purchased
make investment something, and how much money (M) they spent. For example,
decisions for their this RFM approach enables users to identify clients who intend
company or clients. to compete (Chaudhuri et al., 2011). Well, this isn’t so difficult
technique, it already reveals incredibly valuable intelligence for
corporations, in my opinion.

7.4. DATA WAREHOUSE (DW) DATA MODEL


According to Franconi & Kamblet (2004), 3 layers make up data
modeling process: the high-level entity-relationship level (ERD),
that contains attributes, entities, and relationships; the mid-level
data item set, that is department-specific data set; and low-level
physical model, which optimizes for performance.
The mid-level model is developed after high-level data model
has been created. A midlevel model is created for every major
subject area or entity established in high-level data model. After
that, each subsection is developed into its separate midlevel
model. The midlevel data model is expanded to include model’s
keys and physical properties to create the physical data model.
The physical data model has appearance like tables collection,
also called relational tables.
In blog “Data Warehouse Data Model Design,” Gosain (2015)
outlines how to distinguish the DW from a conventional archival
database, that quickly become dumping site. Data is conformed
(Data elements standardized “client” and “revenue” imply the same
regardless of origin), Data is historical (a snapshot of the company

CHAPTER
7
Data Warehousing and Business Intelligence 213

as it exists at particular point), Data is shared (has little value


because it can be queried or accessed in other ways), Data is in
detail (Capable of being taken and consolidated from a number
of different systems).
Remember
Data warehouse
7.4.1. DW Modeling Techniques modeling techniques
involve the design
Damiani & Spaccapietra (2008) explored evolution of notion of and development
of data warehouse
data warehousing related to DW modeling. Database warehouse structures that
modeling is establishing a data model to store it in DW. Two data facilitate efficient
modeling techniques which work well in DW scenario are entity- data querying,
relationship (ER) modeling and dimensional modeling. analysis, and
reporting.
For purpose of developing data model of specific topic of
interest, ER modelling relies on two fundamental ideas: entities and
the relationships between them. Attributes, which might represent
attributes of entities or relationships, are contained in detailed ER
models (Vaisman & Zimányi, 2014).
The three main notions used in dimensional modeling are
measures, facts, and dimensions. In context of database tables,
dimensional modeling helps articulate business user demands.
Measurements are numbers that can be added and calculated
(Franconi & Sattler, 1999).

7.4.2. DW Database Design Modeling


Data modeling is divided into three distinct groups. They come
in three types: logical, physical, and intellectual. We will only
discuss the first two for this thesis. Conceptual design is concerned
with how users perceive data; logical design is concerned with
concepts relating to particular type of DBMS; and physical design
is based on DBMS and specifies how data is saved. Conceptual
design modeling aims to generate a formal, complete, abstract
design based on the user’s requirements. A component of DW
logical design is definition of structures which permit quick access
to information. The designer builds multidimensional structures
while considering source databases, non-functional (primarily
performance) requirements, and conceptual schema reflecting the
information requirements (Rifaie et al., 2008). The needs for data
extraction tools, data loading procedures, and warehouse access
mechanisms are also covered in this phase. At the conclusion of
the logical design phase, a functional prototype for the end user
should be created (Patel & Patel, 2012).
CHAPTER
7
214 Fundamentals of Database Systems

7.4.3. Developing Data Warehouse (DW)


According to Moody & Kortink (2000), A conventional DW should
be planned, developed, and deployed like an IT project; as a
result, what led to IT project failure also applies to developing
DWs, necessitating the use of project planning and adhering to
the system development life cycle. Requirements specification,
careful planning, design, prototype, and implementation are all
required. Cycle model has five stages, discussed in Figure 7.6
(Rizzi et al., 2006).

Figure 7.6. Showing


DW development
lifecycle (DWLC)
model.

Source: Marc Demarest, Creative Commons License.

In contrast, Design step converts information from accessible


data inventories, analyst demands, and analytical requirements into
data marts and intelligent information. A set of decision-makers
and certain end-user clients are presented with a working DW
prototype or data mart design prototype at the prototype deployment
stage. The aim of prototyping changes as design team switches
between design and prototype. The user-approved prototype is
codified for actual production use at deployment step. The operation
includes administration of ongoing transformation, extraction, and
loading procedures that update warehouse concerning traditional
transactional source systems, data delivery services, and client tools
that provide access of warehouse to analysts (Glancy & Yadav,
2011). Enhancement stage occurs when businesses experience
CHAPTER
7
Data Warehousing and Business Intelligence 215

dramatic changes or when external business conditions alter


suddenly. Improvement quickly returns to the core design if the
initial design and implementation do not satisfy the needs.

7.5. BUSINESS INTELLIGENCE (BI) CONCEPTS


BI was first used as general term for tools used in data analysis. At KEYWORD
the same time, the concept of BI has been widened to encompass
all elements of a comprehensive decision-support architecture. Competitive
information
To “present complex and competitive information to the decision- means any
makers and planners in BI systems,” OLTP data is combined with Information that
analytical front ends. A crucial component of BI systems is the would give Lender
DW, which combines OLTP data for analytical purposes. BI is a a competitive
management process that integrates internal and external data to advantage over
enable quick and effective decision-making (Ranjan, 2009). other providers
to Borrower
Building an informational environment and process for analyzing of insurance
operational data gathered from transactional systems as well or reinsurance
as other sources and disclosing “strategic” company features is products.
the goal of BI in this context. From this perspective, concepts
like the “intelligent corporation” (a business that employs BI to
make quicker and more informed decisions than its rivals) start
to take shape (Shmueli et al., 2011). Turning a lot of data into
knowledge by screening, analyzing, and reporting information is
called “intelligence.”
According to the technological viewpoint, BI is a group of
technologies that help with data storage and analysis. Rather than
concentrating on the process itself, the emphasis is placed on the
technology that enables data recording, retrieving, alteration, and
analysis. Rouhani et al. (2012) classifies DM as BI technique;
Pirttimaki (2007) added all resources (DW, DM, hypertext analysis,
and web information) in BI system creation; and finally, Foley &
Guillemette (2010) proposes the integration of DW and customer
relationship management (CRM) applications in linking internet
and BI.
Whether managerial or technological, these studies have one
thing in common. Information gathering, analysis, and utilization to
assist decision-making and the company’s plan are at the core of
BI (Aws et al., 2021). Given the scarcity of literature, we searched
for supplementary topics that could provide us with a more in-depth
understanding of BI.

CHAPTER
7
216 Fundamentals of Database Systems

Information planning, balanced scorecards, and competitive


intelligence are three areas where contributions can be found. Listed
below are benefits of BI and how they can aid entertainment industry
in developing and distributing original content while maintaining
competitiveness: Profitability of product: How much of the profit
does one item make? What variations exist in an item’s profit
across media, business sectors, and distribution methods? What
are precise expenditures and expenses related to item’s production?
How much of the sales or profit do they represent?

7.5.1. Customer and Market Analysis


KEYWORD What are most important demographic traits of clients based on
Sales product? What things do they frequently purchase? Is there evidence
performance that an underserved market group has higher revenue potential?
refers to how
effectively your
sales team 7.5.2. Channel Analysis
performs within a
specific period of Which channels reach which consumers? What is the profitability
time. of each channel? How will changing technology and the creation
of new channels affect channels?

7.5.3. Forecasting and Planning


What is a new product’s market potential, and how much money
should be invested? How will a newly launched product perform
and how much profit will be generated? How much supply will be
enough to meet demand? Hence, employees may now access
global sales data and run sophisticated self-service reports, both
of which provide granularity and a near real-time perspective of
sales performance. This assists employees in making educated
decisions that deliver outcomes for the organization. In addition
to sales data, media companies can assess the effectiveness of
marketing and promotion, corporate performance, and results.
Business users can access relevant information at right time and
utilize it to make wise decisions thanks to BI, which also turns
raw data into intelligent information.
By incorporating intelligent information into their company
operations, media companies can hinder their rivals’ efforts, develop
a long-lasting competitive advantage, attract new clientele, keep
their current clientele, increase operational effectiveness, and be
better prepared for the future.
CHAPTER
7
Data Warehousing and Business Intelligence 217

7.6. DATA WAREHOUSING ONLINE


TRANSACTIONAL PROCESSING (OLTP)
DWs are also called online analytical processing (OLAP) systems
since they support managers and knowledge workers in data
analysis (Warnars, 2014) (Figure 7.7).

Figure 7.7. Schematic of the online transaction processing (OLTP).


Did you Know?
Source: Abhishek Ghosh, Creative Commons License.
Online transaction
Information systems that support an organization’s daily processing (OLTP)
operations are called online transaction processing (OLTP) systems are designed
systems. An OLTP system’s main objective is gathering data on to handle high
an organization’s economic activity. A DW aims to extract data transaction volumes
or information from computers, contrary to the argument that an in real-time, while
OLTP system aims to input data into computers. Kimball (1996) data warehousing
state that a DW is market-oriented, whereas an OLTP system is systems are optimized
customer-oriented. Integrating OLTP and OLAP capabilities into a for large-scale data
single solution can be difficult. analysis.

7.7. DATA WAREHOUSE (DW) AND BUSINESS


INTELLIGENCE (BI) HIGH-LEVEL
ARCHITECTURE
According to Bara et al. (2009), DW Institute conducted studies on
factors that contribute to adopting BI, using systems in businesses,
and DW role in these endeavors. According to Kalelkar (2014), the
entire BI process can be considered a “data refinery.” Information
is produced by combining data from numerous OLTP systems. The
DW staging procedure manages the transformation. Programs like
specialized reporting tools, OLAP tools, and DM tools enable users
CHAPTER
7
218 Fundamentals of Database Systems

to turn data into knowledge. This is incorporated into the DW by


Kimball (1996). According to Kimball, DW’s objective is to give
end users—primarily managers—open access to the information
within company. In order to accomplish this goal, it is necessary
to collect daily operational data from the operational systems of
the company. An OLTP system is this. Before being sent to the
presentation servers, the source systems’ data is staged (Devlin,
2010). Four steps are taken with the data during the staging
phase: extraction, transformation, loading, and presentation. The
data marts—which stand in for corporate business units—are
constructed on the presentation stage.
Star schemas, made up of DIMENSION and FACT tables, are
used to store the data in the data mart or DW. The entity relational
diagram (ERD) used in earlier systems is not the same as this.

ACTIVITY 7.1.
You work for a manufacturing company that has data stored in multiple systems,
such as production data in one system, sales data in another, and inventory data
in yet another. Describe how you would integrate these data sources into a data
warehouse to facilitate business intelligence analysis and improve decision-making.

CHAPTER
7
Data Warehousing and Business Intelligence 219

7.8. SUMMARY
This chapter discusses the various ideas associated with data warehousing and BI. This
chapter offers an introduction to the DW, discussing its capabilities for online transactional
processing (OLTP), as well as its architecture and data models. Moreover, data statements
in BI, as well as a variety of BI ideas and the high-level architecture of both DWs and
BI, are discussed in this chapter.

REVIEW QUESTIONS
1. What is a data warehouse, and why is it important in business intelligence?
2. Explain the architecture of a data warehouse and its components.
3. What are the different data models used in a data warehouse, and how do they
differ from each other?
4. Describe OLTP, and how is it used in a data warehouse?
5. What are the different concepts in business intelligence, and how are they used
in a data warehouse?
6. Explain the high-level architecture of a data warehouse and business intelligence
system.

MULTIPLE CHOICE QUESTIONS


1. What is the primary function of a data warehouse?
a. To store and manage data for OLTP systems
b. To store and manage data for business intelligence systems
c. To store and manage data for social media platforms
d. To store and manage data for e-commerce websites
2. What is the architecture of a data warehouse?
a. Three-tier architecture
b. Two-tier architecture
c. Single-tier architecture
d. Four-tier architecture
3. What are the data models used in a data warehouse?
a. Relational, object-oriented, and hierarchical
b. Relational, dimensional, and object-oriented
c. Hierarchical, dimensional, and network
d. Dimensional, network, and object-oriented

CHAPTER
7
220 Fundamentals of Database Systems

4. What are the different concepts in business intelligence?


a. Data mining, data warehousing, and online transactional processing
b. Data modeling, data analysis, and data visualization
c. Data warehousing, online transactional processing, and data visualization
d. Data mining, data modeling, and data analysis
5. What is OLTP?
a. Online Long-Term Processing
b. Offline Transactional Processing
c. Online Transactional Processing
d. Offline Long-Term Processing
6. What is the high-level architecture of a data warehouse and business
intelligence system?
a. Three-tier architecture
b. Two-tier architecture
c. Single-tier architecture
d. Four-tier architecture
7. What is the primary purpose of data statements in business intelligence?
a. To retrieve data from a data warehouse
b. To store data in a data warehouse
c. To analyze data in a data warehouse
d. To delete data from a data warehouse

Answers to Multiple Choice Questions


1. (b); 2. (a); 3. (b); 4. (c); 5. (c); 6. (a); 7. (a)

REFERENCES
1. Aws, A. L., Ping, T. A., & Al-Okaily, M., (2021). Towards business intelligence
success measurement in an organization: A conceptual study. J. Syst. Manag. Sci.,
11, 155–170.
2. Ballard, C., Herreman, D., Schau, D., Bell, R., Kim, E., & Valencic, A., (1998). Data
Modeling Techniques for Data Warehousing (Vol. 1, pp. 3–6, 25). San Jose: IBM
Corporation International Technical Support Organization.
3. Bara, A., Botha, I., Diaconita, V., Lungu, I., Velicanu, A., & Velicanu, M., (2009).
A model for business intelligence systems’ development. Informatica Economica,
13(4), 99.

CHAPTER
7
Data Warehousing and Business Intelligence 221

4. Başaran, B. P., (2005). A Comparison of Data Warehouse Design Models (Vol. 1,


pp. 7–9). Atılım Üniversitesi.
5. Benbasat, I., Goldstein, D. K., & Mead, M., (1987). The case research strategy in
studies of information systems. MIS Quarterly, 2(1), 369–386.
6. Berthold, H., Rösch, P., Zöller, S., Wortmann, F., Carenini, A., Campbell, S., &
Strohmaier, F., (2010). An architecture for ad-hoc and collaborative business
intelligence. In: Proceedings of the 2010 EDBT/ICDT Workshops (Vol. 1, pp. 1–6).
7. Burton, P., (2010). Meta Data: The Key to Data Warehouse Design (A Systems
Engineering Approach) (Vol. 1, pp. 5–10). ENSE623 Project Institute of System Research.
8. Chan, L. K., & Lau, P. Y., (2018). Investigating the Impact of System Quality on
Service-Oriented Business Intelligence Architecture (Vol. 8, No. 4, pp. 3–9). Sage Open.
9. Chaudhuri, S., Dayal, U., & Narasayya, V., (2011). An overview of business intelligence
technology. Communications of the ACM, 54(8), 88–98.
10. Cho, V., & Ngai, E. W., (2003). Data mining for selection of insurance sales agents.
Expert Systems, 20(3), 123–132.
11. Damiani, M. L., & Spaccapietra, S., (2008). Spatial data warehouse modeling. In:
Data Warehousing and Mining: Concepts, Methodologies, Tools, and Applications
(Vol. 1, pp. 659–678). IGI Global.
12. Theodoratos, D., & Sellis, T. (1997, August). Data warehouse configuration.
In VLDB (Vol. 97, pp. 126-135).
13. Dayal, U., Castellanos, M., Simitsis, A., & Wilkinson, K., (2009). Data integration
flows for business intelligence. In: Proceedings of the 12th International Conference on
Extending Database Technology: Advances in Database Technology (Vol. 1, pp. 1–11).
14. Demarest, M., (2008). Data Warehouse Prototyping: Reducing Risk Securing
Commitment and Improving Project Governance (Vol. 1, pp. 5–10).
15. Devlin, B., (2010). Beyond business intelligence. Business Intelligence Journal,
15(2), 7–16.
16. Dobbs, T., Stone, M., & Abbott, J., (2002). UK data warehousing and business
intelligence implementation. Qualitative Market Research: An International Journal,
1, 5–10.
17. Eckerson, W., (2003). Smart Companies in the 21st Century: The Secrets of Creating
Successful Business Intelligence Solutions (Vol. 7, No. 1, pp. 1–38). TDWI Report
Series.
18. Eldabi, T., Irani, Z., Paul, R. J., & Love, P. E., (2002). Quantitative and qualitative
decision-making methods in simulation modeling. Management Decision, 40(1), 64–73.
19. Elena, C., (2011). Business intelligence. Journal of Knowledge Management, Economics
and Information Technology, 1(2), 1–12.
20. Foley, É., & Guillemette, M. G., (2010). What is business intelligence? International
Journal of Business Intelligence Research (IJBIR), 1(4), 1–28.

CHAPTER
7
222 Fundamentals of Database Systems

21. Franconi, E., & Kamblet, A., (2004). A data warehouse conceptual data model. In:
Proceedings. 16th International Conference on Scientific and Statistical Database
Management, 2004 (Vol. 1, pp. 435, 436). IEEE.
22. Franconi, E., & Sattler, U., (1999). A data warehouse conceptual data model for
multidimensional aggregation. In: DMDW (Vol. 19, p. 13).
23. Giovinazzo, W. A., (2003). Internet-Enabled Business Intelligence (Vol. 2, No. 1, pp.
6–10). Prentice Hall Professional.
24. Glancy, F. H., & Yadav, S. B., (2011). Business intelligence conceptual model.
International Journal of Business Intelligence Research (IJBIR), 2(2), 48–66.
25. Gosain, A., (2015). Literature review of data model quality metrics of data warehouse.
Procedia Computer Science, 48, 236–243.
26. Hackathorn, R., (1999). Farming the web for systematic business intelligence (Invited
talk. abstract only). In: Proceedings of the Fifth ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (Vol. 1, p. 3).
27. Han, J., & Kamber, M., (2001). Data Mining Concepts and Techniques (Vol. 1, pp.
5–10). San Francisco Moraga Kaufman.
28. Inmon, W. H., (1993). Building the Data Warehouse (Vol. 1, pp. 2–7). A Wiley QED
publication.
29. Kalelkar, M., Churi, P., & Kalelkar, D., (2014). Implementation of model-view-controller
architecture pattern for business intelligence architecture. International Journal of
Computer Applications, 102(12), 4–7.
30. Karmańska, A., (2019). Business intelligence in consolidation of financial statements.
Business informatics. Economic Informatics, 4(54)), 19–28.
31. Kimball, R., & Ross, M., (2010). The Kimball Group Reader: Relentlessly Practical
Tools for Data Warehousing and Business Intelligence (Vol. 1, pp. 5–10). John
Wiley & Sons.
32. Kimball, R., (1996). The Data Warehouse Toolkit: Practical Techniques for Building
Dimensional Data Warehouses (Vol. 1, pp. 3–7). John Wiley & Sons, Inc.
33. Lopes, J., Guimarães, T., & Santos, M. F., (2020). Adaptive business intelligence: A
new architectural approach. Procedia Computer Science, 177, 540–545.
34. McKnight, W., (2004). The new business intelligence architecture discussion.
Information Management, 14(9), 96.
35. Moody, D. L., & Kortink, M. A., (2000). From enterprise models to dimensional models:
A methodology for data warehouse and data mart design. In: DMDW (Vol. 1, p. 5).
36. Nadipalli, R., (2017). Effective Business Intelligence with QuickSight (Vol. 1, pp.
4–18). Packt Publishing Ltd.
37. Nedelcu, B., (2013). Business intelligence systems. Database Systems Journal,
4(4), 12–20.
38. Negash, S., (2004). Business intelligence. Communications of the Association for
Information Systems, 13(1), 15.
CHAPTER
7
Data Warehousing and Business Intelligence 223

39. Ong, I. L., Siew, P. H., & Wong, S. F., (2011). A five-layered business intelligence
architecture. Communications of the IBIMA, 2011, 1–11.
40. Patel, A., & Patel, J., (2012). Data modeling techniques for data warehouse.
International Journal of Multidisciplinary Research, 2(2), 240–246.
41. Pirttimaki, V. H., (2007). Conceptual analysis of business intelligence. South African
Journal of Information Management, 9(2), 3–7.
42. Poe, V., Brobst, S., & Klauer, P., (1997). Building a Data Warehouse for Decision
Support (Vol. 1, pp. 4–5). Prentice-Hall, Inc.
43. Rainardi, V., (2008). Building a Data Warehouse: With Examples in SQL Server
(Vol. 2, No. 1, pp. 6–10). John Wiley & Sons.
44. Ranjan, J., (2009). Business intelligence: Concepts, components, techniques and
benefits. Journal of Theoretical and Applied Information Technology, 9(1), 60–70.
45. Rifaie, M., Kianmehr, K., Alhajj, R., & Ridley, M. J., (2008). Data warehouse architecture
and design. In: 2008 IEEE International Conference on Information Reuse and
Integration (Vol. 1, pp. 58–63). IEEE.
46. Rizzi, S., Abelló, A., Lechtenbörger, J., & Trujillo, J., (2006). Research in data
warehouse modeling and design: Dead or alive? In: Proceedings of the 9th ACM
International Workshop on Data Warehousing and OLAP (Vol. 1, pp. 3–10).
47. Rouhani, S., Asgari, S., & Mirhosseini, S. V., (2012). Review study: Business
intelligence concepts and approaches. American Journal of Scientific Research,
50(1), 62–75.
48. Shariat, M., & Hightower, R., (2007). Conceptualizing business intelligence architecture.
Marketing Management Journal, 17(2), 40–46.
49. Shmueli, G., Patel, N. R., & Bruce, P. C., (2011). Data Mining for Business Intelligence:
Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner (Vol.
2, No. 1, pp. 5–9). John Wiley and Sons.
50. Simon, A. R., & Shaffer, S. L., (2001). Data Warehousing and Business Intelligence
for e-Commerce (Vol. 2, No. 3, pp. 1–11). Elsevier.
51. Singh, H. S., (1997). Data Warehousing: Concepts, Technologies, Implementations,
and Management (Vol. 1, pp. 4–10). Prentice-Hall, Inc.
52. Vaisman, A., & Zimányi, E., (2014). Data warehouse systems. Data-Centric Systems
and Applications, 3(1), 5–8.
53. Warnars, H. L. H. S., (2014). Perbandingan penggunaan database OLTP (online
transactional processing) dan data warehouse. Creative Communication and Innovative
Technology (CCIT) Journal, 8(1), 83–100.
54. Watson, H. J., (2009). Tutorial: Business intelligence–past, present, and future.
Communications of the Association for Information Systems, 25(1), 39.
55. Zimmer, M., Baars, H., & Kemper, H. G., (2012). The impact of agility requirements
on business intelligence architectures. In: 2012 45th Hawaii International Conference
on System Sciences (Vol. 1, pp. 4189–4198). IEEE.
CHAPTER
7
CHAPTER 8

APPLICATIONS OF
DATABASE SYSTEMS

UNIT INTRODUCTION
A database system is a collection of computer programs that enables users to store, access,
manipulate, and analyze data (Singh, 2009). It is a software system designed to manage
a large amount of structured data efficiently, and it provides tools for data organization,
storage, retrieval, and analysis. Database systems are essential for the efficient storage
and retrieval of data, and they are used in a wide variety of applications.
In modern computing, database systems are crucial for managing data in various
industries and domains. They provide the backbone for data-driven decision making, business
intelligence (BI), and analytics. Database systems help organizations to store and manage
large volumes of data, and they enable efficient access to this data when needed (Özsu
& Valduriez, 1996). They also provide a means of integrating data from different sources,
allowing organizations to gain insights and make informed decisions. With the growth of
big data and the rise of new technologies like cloud computing, database systems have
become even more important. Modern database systems are designed to handle large
volumes of data, and they provide features such as scalability, fault tolerance, and high
availability (Kifer, 2007). They are also capable of processing data in real-time, enabling
organizations to respond quickly to changing conditions and make timely decisions.
Overall, database systems are essential tools in modern computing, providing the
foundation for efficient data management and analysis. They play a critical role in various
applications, including e-commerce, healthcare, finance, social media, education, logistics,
and supply chain management. As data continues to grow in volume and complexity, the
importance of database systems is only set to increase.
226 Fundamentals of Database Systems

Learning Objectives
At the end of this chapter, readers will be able to:
• Define database systems and their role in managing and storing large amounts
of data.
• Explain the importance of database systems in various industries such as
e-commerce, healthcare, finance, social media, education, and logistics.
• Identify the critical role of databases in managing product catalogs, customer
orders, patient records, financial transactions, social media data, student records,
inventory, and delivery routes.
• Discuss the future applications of database systems in managing and analyzing
data from the Internet of Things (IoT).
• Describe the role of databases in storing, processing, and analyzing IoT data to
extract valuable insights.
• List examples of IoT databases and their features.

Key Terms
• Database systems
• Data analysis
• Data management
• E-commerce
• Healthcare
• Finance
• Social media
• Education
• Logistics
• Internet of things (IoT)

CHAPTER
8
Applications of Database Systems 227

8.1. E-COMMERCE
E-commerce has become increasingly popular in recent years, and
database systems play a critical role in enabling efficient online
shopping and e-commerce applications.

8.1.1. Importance of Database in E-Commerce


Some ways in which databases are important in e-commerce are
discussed in subsections. KEYWORD
E-commerce
8.1.1.1. Data Storage is the activity
of electronically
E-commerce websites need to store a large amount of product buying or selling of
information, customer information, and transaction data. Databases products on online
are used to store this data efficiently and securely, ensuring that services or over
it is easily accessible and can be queried for analysis (Rayport the Internet.
& Jaworski, 2004).

8.1.1.2. Personalization
E-commerce websites use customer data to personalize the shopping
experience. Databases are used to store customer preferences,
purchase history, and other data that can be used to tailor product
recommendations and marketing messages (Efendioglu & Yip,
2004).

8.1.1.3. Inventory Management


Databases are used to manage product inventory, ensuring that the
website only displays products that are in stock and available for
purchase (Taher, 2021). This helps to avoid customer dissatisfaction
due to out-of-stock products.

8.1.1.4. Order Management


Databases are used to manage orders and transactions, ensuring
that orders are processed efficiently and that customers receive
timely updates on the status of their orders (Figure 8.1).

CHAPTER
8
228 Fundamentals of Database Systems

Figure 8.1. Illustration


of the order
management system
and its components.

Source: Vera Agiang, Creative Commons License.

8.1.2. Examples of E-Commerce Databases


Oracle Commerce: Oracle Commerce is a database system designed
specifically for e-commerce applications. It provides features such as
product catalog management, order management, personalization,
and analytics (Barnes & Vidgen, 2002).

8.1.2.1. Magento
Magento is an open-source e-commerce platform that uses a MySQL
database for data storage. It provides features such as inventory
management, order management, and customer management
(Eastin, 2002).

8.1.2.2. Shopify
Shopify is a cloud-based e-commerce platform that uses a MySQL
database for data storage. It provides features such as product
management, order management, and customer management (Jain
et al., 2021).

CHAPTER
8
Applications of Database Systems 229

8.1.2.3. WooCommerce
WooCommerce is a plugin for WordPress that enables e-commerce
functionality. It uses a MySQL database for data storage and
provides features such as product management, order management,
and payment processing (Gupta, 2014). Overall, databases play
a critical role in enabling efficient and personalized e-commerce
applications. They provide a means of managing data efficiently,
ensuring that customers receive a seamless shopping experience
and businesses can make informed decisions based on data
analysis.

8.2. HEALTHCARE Remember


Databases are
The healthcare industry is one that generates vast amounts of essential in
data, including patient records, medical research, clinical trials, healthcare as they
allow for the storage
and billing information. Database systems play a critical role in and retrieval of
managing this data and ensuring that it is stored securely, accurately, patient data, which
and efficiently. can inform clinical
decision-making
and improve patient
8.2.1. Role of Databases in Healthcare outcomes.

Some ways in which databases are important in healthcare data


management are discussed in subsections.

8.2.1.1. Patient Record Management


Databases are used to store patient records, including medical
histories, treatment plans, and test results. This information is
critical for providing effective patient care and can be used to
track patient outcomes and monitor the effectiveness of treatments
(Park & Lee, 2021).

8.2.1.2. Medical Research


Databases are used to store data from medical research studies,
including clinical trials and epidemiological studies. This data is
used to inform medical practice, develop new treatments, and
improve patient outcomes.

8.2.1.3. Billing and Reimbursement


Databases are used to manage billing and reimbursement data,
CHAPTER
8
230 Fundamentals of Database Systems

including insurance claims and payment records. This data is critical


for ensuring that healthcare providers are reimbursed accurately
and in a timely manner.

8.2.1.4. Health Analytics


Databases are used to store and analyze healthcare data, enabling
insights into population health trends, disease outbreaks, and
healthcare utilization patterns. This information can be used to inform
public health policies and improve healthcare delivery (Figure 8.2).

Figure 8.2. Health data


analytics.

Source: Teniola Fatunmbi, Creative Commons License.

8.2.2. Examples of Healthcare Databases


8.2.2.1. Epic
Epic is a healthcare database system used by hospitals and
healthcare providers. It provides features such as patient record
management, billing, and reimbursement management, and health
analytics.

8.2.2.2. Cerner
Cerner is a healthcare database system used for patient record
management, clinical decision support, and population health
management (Tomar et al., 2019). It provides features such as
electronic health records, clinical documentation, and revenue
cycle management.
CHAPTER
8
Applications of Database Systems 231

8.2.2.3. Meditech
Meditech is a healthcare database system used for patient
record management, clinical decision support, and billing and
reimbursement management. It provides features such as electronic
health records, pharmacy management, and laboratory information
management.
Overall, databases play a critical role in managing healthcare
data and enabling effective healthcare delivery. They provide a KEYWORD
means of storing, analyzing, and accessing healthcare information,
Financial
enabling healthcare providers to make informed decisions and
institution is a
improve patient outcomes. company engaged
in the business
of dealing
8.3. BANKING AND FINANCE with financial
The banking and finance industry generates and manages a vast and monetary
transactions such
amount of data, including transaction data, customer information,
as deposits, loans,
and financial market data (Ergungor, 2004). Database systems
investments, and
play a critical role in managing this data and ensuring that it is
currency exchange.
stored securely, accurately, and efficiently.

8.3.1. Importance of Databases in Financial


Management
Some ways in which databases are important in financial
management are discussed in subsections.

8.3.1.1. Transaction Management


Databases are used to store and manage transaction data, including
deposits, withdrawals, and transfers. This data is critical for financial
institutions to manage customer accounts and prevent fraudulent
transactions.

8.3.1.2. Customer Relationship Management (CRM)


Databases are used to store customer information, including
contact information, transaction history, and credit scores (Uhde
& Heimeshoff, 2009). This information is used to develop customer
profiles and target marketing efforts (Figure 8.3).

CHAPTER
8
232 Fundamentals of Database Systems

Figure 8.3.
Customer relationship
management and its
components.

Source: Perfect View CRM, Creative Commons License.

8.3.1.3. Risk Management


Databases are used to store and analyze financial market data,
enabling financial institutions to manage risks and make informed
investment decisions.

8.3.1.4. Compliance
Databases are used to store and manage regulatory data, including
reporting requirements and compliance with anti-money laundering
regulations (Biancone et al., 2020).

8.3.2. Examples of Financial Databases


Bloomberg Terminal: The Bloomberg Terminal is a financial database
used by financial institutions to access market data, news, and
analytics (Delis, 2012). It provides features such as real-time market
data, portfolio management, and financial news.

8.3.2.1. Oracle Financials


Oracle Financials is a financial database used by large corporations
to manage financial data (Kroszner et al., 2007). It provides
CHAPTER
8
Applications of Database Systems 233

features such as general ledger management, accounts receivable


management, and financial reporting.

8.3.2.2. QuickBooks
QuickBooks is a financial database used by small businesses to
manage financial data (Allen & Rai, 1996). It provides features
such as invoicing, expense tracking, and financial reporting.

8.3.2.3. FinFolio
FinFolio is a portfolio management database used by financial
advisors to manage client portfolios (Barrell et al., 2010). It provides
features such as performance reporting, rebalancing, and billing.
Overall, databases play a critical role in financial management,
enabling financial institutions to manage data efficiently and make
informed decisions based on data analysis. They provide a means
of storing, analyzing, and accessing financial information, ensuring
that financial institutions can operate effectively and provide high-
quality services to their customers. Databases play
a crucial role in
managing and
8.4. SOCIAL MEDIA analyzing large
amounts of data
generated by social
Social media platforms generate enormous amounts of data on
media platforms,
user behavior, preferences, and interactions. These platforms rely allowing for effective
heavily on database systems to store, process, and manage this user targeting
data. and personalized
advertising.

8.4.1. Role of Databases in Social Media


Platforms
Some ways in which databases are important in social media
platforms are discussed in subsections.

8.4.1.1. User Profile Management


Databases are used to store user profile data, including personal
information, social connections, and preferences (Benite et al.,
2020). This data is used to personalize the user experience and
target advertising.

CHAPTER
8
234 Fundamentals of Database Systems

8.4.1.2. Content Management System


Databases are used to store user-generated content, including
posts, photos, and videos. This content is used to engage users
and drive user engagement (Figure 8.4).

Figure 8.4. Content


management system
and its components.

Source: Pieter Arntz, Creative Commons License.

8.4.1.3. Analytics
Databases are used to store and analyze user data, enabling
insights into user behavior and preferences (Fair & Wesslen,
2019). This information is used to improve the user experience
and target advertising.

8.4.1.4. Advertising
Databases are used to store and manage advertising data,
including targeting criteria and ad performance. This data is used
to target advertising to the most relevant audience and optimize
ad performance.

8.4.2. Examples of Social Media Databases


8.4.2.1. Facebook Graph API
The Facebook Graph API is a database system used by Facebook
to manage user data (Ghezzi et al., 2016). It provides features such
as user profile management, content management, and analytics.

CHAPTER
8
Applications of Database Systems 235

8.4.2.2. Twitter API


KEYWORD
The Twitter API is a database system used by Twitter to manage
user data. It provides features such as user profile management, Profile
content management, and analytics. management is
referred to as
the process of
8.4.2.3. Instagram API managing user
personalization
The Instagram API is a database system used by Instagram to
settings on a
manage user data (Carah, 2014). It provides features such as user device.
profile management, content management, and analytics.

8.4.2.4. LinkedIn API


The LinkedIn API is a database system used by LinkedIn to manage
user data (Elena, 2016). It provides features such as user profile
management, content management, and analytics.
Overall, databases play a critical role in social media platforms,
enabling these platforms to manage user data efficiently and provide
a personalized user experience. They provide a means of storing,
analyzing, and accessing social media information, ensuring that
these platforms can operate effectively and provide high-quality
services to their users.

8.5. EDUCATION
Database systems play an important role in the management of
educational data, including student records, course schedules, and
academic performance.

8.5.1. Importance of Databases in Education


Some ways in which databases are important in educational data
management are discussed in subsections.

8.5.1.1. Student Record Management


Databases are used to store and manage student records, including
personal information, course history, and academic performance
(Wang et al., 2016). This data is used to track student progress
and ensure compliance with regulations.

CHAPTER
8
236 Fundamentals of Database Systems

8.5.1.2. Course Scheduling


Databases are used to manage course schedules, including course
offerings, class sizes, and instructor assignments. This data is
used to optimize course schedules and ensure that courses are
offered efficiently.

8.5.1.3. Academic Performance Tracking


Databases are used to track student academic performance,
KEYWORD including grades, attendance, and participation. This data is used
to identify struggling students and provide targeted interventions.
Academic
performance is
the measurement 8.5.1.4. Research and Analytics
of student
Databases are used to store and analyze educational data, enabling
achievement
insights into student performance, teaching effectiveness, and
across various
academic subjects educational outcomes (Rupp-Serrano & Robbins, 2013).

8.5.2. Examples of Educational Databases


8.5.2.1. PowerSchool
PowerSchool is an educational database used by K-12 schools to
manage student records and academic performance (Torani et al.,
2019). It provides features such as student record management,
course scheduling, and academic performance tracking.

8.5.2.2. Blackboard
Blackboard is an educational database used by colleges and
universities to manage course content, student engagement,
and academic performance. It provides features such as course
management, student engagement tracking, and academic
performance tracking.

8.5.2.3. Moodle
Moodle is an open-source educational database used by schools,
colleges, and universities to manage course content, student
engagement, and academic performance (Morreale et al., 2017). It
provides features such as course management, student engagement
tracking, and academic performance tracking.

CHAPTER
8
Applications of Database Systems 237

8.5.2.4. Edmodo
Edmodo is an educational database used by K-12 schools to
manage student engagement, course content, and academic
performance (Cook et al., 2010). It provides features such as
student engagement tracking, course management, and academic
performance tracking.
Overall, databases play a critical role in educational data
management, enabling educational institutions to manage data
efficiently and make informed decisions based on data analysis. Did you Know?
They provide a means of storing, analyzing, and accessing
According to a study by
educational information, ensuring that educational institutions can
Deloitte, effective supply
operate effectively and provide high-quality educational services
chain management can
to their students. lead to a 20% reduction
in supply chain costs
and a 15% increase in
8.6. LOGISTICS AND SUPPLY CHAIN customer satisfaction.
MANAGEMENT
Database systems play an important role in the management of
logistics and supply chain data, including inventory, shipping, and
transportation information (Figure 8.5).

Figure 8.5. Various


functions of logistics.

Source: Yulia Miashkova, Creative Commons License.

8.6.1. Role of Databases in Logistics and Supply


Chain Management
Some ways in which databases are important in logistics and
supply chain management are discussed in subsections.
CHAPTER
8
238 Fundamentals of Database Systems

8.6.1.1. Inventory Management


KEYWORD Databases are used to track inventory levels, monitor stock levels,
Inventory and forecast demand (Lummus et al., 2001). This data is used
management to optimize inventory management and ensure that products are
refers to the available when needed.
process of
ordering, storing,
8.6.1.2. Shipping and Tracking
using, and selling
a company's Databases are used to manage shipping and tracking information,
inventory. including carrier information, shipping status, and delivery dates.
This data is used to optimize shipping processes and ensure that
products are delivered on time.

8.6.1.3. Transportation Management


Databases are used to manage transportation information, including
vehicle assignments, driver schedules, and delivery routes (Mangan
& Lalwani, 2016). This data is used to optimize transportation
processes and ensure that products are delivered efficiently.
Analytics: Databases are used to store and analyze logistics
and supply chain data, enabling insights into performance, efficiency,
and cost-effectiveness.

8.6.2. Examples of Logistics and Supply Chain


Databases
8.6.2.1. SAP Supply Chain Management
SAP Supply Chain Management is a database system used by
businesses to manage inventory, logistics, and transportation
information (Christopher, 2016). It provides features such as
inventory management, shipping, and tracking, transportation
management, and analytics.

8.6.2.2. Oracle Supply Chain Management


Oracle Supply Chain Management is a database system used by
businesses to manage logistics and supply chain information. It
provides features such as inventory management, shipping, and
tracking, transportation management, and analytics.

CHAPTER
8
Applications of Database Systems 239

8.6.2.3. JDA Supply Chain Management


JDA Supply Chain Management is a database system used by
businesses to manage logistics and supply chain information
(Frazelle, 2002). It provides features such as inventory management,
shipping, and tracking, transportation management, and analytics.

8.6.2.4. Manhattan Associates Supply Chain Management KEYWORD


Manhattan Associates Supply Chain Management is a database Supply chain
system used by businesses to manage logistics and supply chain management is
information (Larson & Halldorsson, 2004). It provides features such the management
as inventory management, shipping, and tracking, transportation of the flow of
management, and analytics. goods and services
and includes all
Overall, databases play a critical role in logistics and supply processes that
chain management, enabling businesses to manage data efficiently transform raw
and make informed decisions based on data analysis. They provide materials into final
a means of storing, analyzing, and accessing logistics and supply products.
chain information, ensuring that businesses can operate effectively
and provide high-quality products and services to their customers.

8.7. INTERNET OF THINGS (IOT)


The Internet of Things (IoT) is a rapidly growing network of
interconnected devices that generate and transmit vast amounts
of data (Madakam et al., 2015). Database systems play a crucial
role in managing and analyzing this data, as it is often structured
and requires complex queries and analysis.

8.7.1. Role of Databases in IoT Data


Some ways in which databases are used in managing and analyzing
IoT data are discussed in subsections.

8.7.1.1. Data Storage


IoT generates massive amounts of data that must be stored and
managed. Database systems provide a reliable and efficient way
to store this data, ensuring that it can be easily accessed and
queried when needed.

CHAPTER
8
240 Fundamentals of Database Systems

8.7.1.2. Data Analysis


IoT data is often complex and requires sophisticated analysis
techniques (Farooq et al., 2015). Database systems provide powerful
tools for analyzing this data, including complex queries, data mining
(DM), and machine learning algorithms.

8.7.1.3. Data Integration


IoT data is often generated by a wide range of devices and sensors,
which may use different data formats and protocols. Database
systems provide a way to integrate this data into a common format,
allowing for easy analysis and interpretation.

8.7.1.4. Real-Time Data Processing


IoT data is often generated in real-time, requiring database systems
to be able to process and analyze data as it is generated (Gubbi
et al., 2013).
Remember
Imagine you
are tasked 8.7.2. Examples of IoT Databases
with improving
the inventory 8.7.2.1. Apache Cassandra
management
system for a large Apache Cassandra is a distributed NoSQL database that is designed
retail company. for scalability and high availability. It is well-suited for managing
Describe how
you would use a
large volumes of structured and unstructured data, including IoT
database system data.
to track inventory
levels, optimize
product placement, 8.7.2.2. MongoDB
and manage supply
chain logistics. MongoDB is a document-oriented NoSQL database that is
designed for flexibility and scalability. It provides powerful tools
for managing and analyzing complex data, making it well-suited
for IoT applications.

8.7.2.3. Amazon DynamoDB


Amazon DynamoDB is a fully managed NoSQL database that is
designed for high performance and scalability (Hossein Motlagh
et al., 2020). It provides a flexible data model and powerful tools
for managing and analyzing IoT data (Figure 8.6).

CHAPTER
8
Applications of Database Systems 241

Figure 8.6. Using


Amazon DynamoDB
document API with
the AWS mobile SDK
for android.

Source: Adrian Hall, Creative Commons License.

8.7.2.4. Microsoft Azure Cosmos DB


Microsoft Azure Cosmos DB is a globally distributed NoSQL database
that is designed for high availability and low latency (Lee & Lee,
2015). It provides powerful tools for managing and analyzing IoT
data, including support for multiple data models and APIs.
Overall, database systems are essential for managing and
analyzing the vast amounts of data generated by IoT devices.
As the number of IoT devices continues to grow, the demand for
powerful and flexible database systems will only increase.

CHAPTER
8
242 Fundamentals of Database Systems

8.8. SUMMARY
This chapter provides a brief overview of database systems, defining them as software
systems that manage and store large amounts of data. The importance of database
systems in various industries, such as e-commerce, healthcare, finance, social media,
education, and logistics, is highlighted. Database systems play a critical role in managing
product catalogs, customer orders, patient records, financial transactions, social media
data, student records, inventory, and delivery routes. The chapter also discusses the future
applications of database systems, particularly in managing and analyzing data from the
Internet of Things (IoT). The role of databases in storing, processing, and analyzing IoT
data to extract valuable insights is emphasized, along with examples of IoT databases
and their features.

REVIEW QUESTIONS
1. What is a database system and why is it important?
2. How do database systems play a critical role in managing data in various
industries?
3. What are some examples of data that are managed by database systems?
4. What is the future application of database systems and how do they relate to
IoT?
5. How do databases store, process, and analyze data from IoT devices?

MULTIPLE CHOICE QUESTIONS


1. What is a database system?
a. A device used to connect to the internet.
b. A software system that manages and stores large amounts of data.
c. A type of computer processor.
d. A tool used to create web applications.
2. What is the role of databases in healthcare?
a. To manage financial transactions.
b. To manage social media data.
c. To manage patient records.
d. To manage inventory.
3. Which industries use database systems?
a. E-commerce, healthcare, finance, social media, education, and logistics.
b. Construction, automotive, and retail.
c. Music, fashion, and art.
CHAPTER
8
Applications of Database Systems 243

d. Food, travel, and entertainment.


4. What is the future application of database systems?
a. Managing and analyzing data from the Internet of Things (IoT).
b. Creating virtual reality environments.
c. Building chatbots for customer service.
d. Designing websites.
5. What is the role of databases in IoT?
a. Sending and receiving messages from social media.
b. Managing employee schedules.
c. Managing financial transactions.
d. Storing, processing, and analyzing IoT data.
6. What are some examples of IoT databases?
a. Fitbit, Nest, and Philips Hue.
b. MongoDB, MySQL, and Oracle.
c. Facebook, Twitter, and Instagram.
d. Amazon, Google, and Microsoft.

Answers to Multiple Choice Questions


1. (b); 2. (c); 3. (a); 4. (a); 5. (d); 6. (b)

REFERENCES
1. Allen, L., & Rai, A., (1996). Operational efficiency in banking: An international
comparison. Journal of Banking & Finance, 20(4), 655–672.
2. Barnes, S. J., & Vidgen, R. T., (2002). An integrative approach to the assessment
of e-commerce quality. J. Electron. Commer. Res., 3(3), 114–127.
3. Barrell, R., Davis, E. P., Karim, D., & Liadze, I., (2010). Bank regulation, property
prices and early warning systems for banking crises in OECD countries. Journal of
Banking & Finance, 34(9), 2255–2264.
4. Benitez, J., Ruiz, L., Castillo, A., & Llorens, J., (2020). How corporate social
responsibility activities influence employer reputation: The role of social media
capability. Decision Support Systems, 129(1), 113223.
5. Biancone, P. P., Saiti, B., Petricean, D., & Chmet, F., (2020). The bibliometric
analysis of Islamic banking and finance. Journal of Islamic Accounting and Business
Research, 11(9), 2069–2086.
6. Carah, N., (2014). Curators of databases: Circulating images, managing attention
and making value on social media. Media International Australia, 150(1), 137–142.
CHAPTER
8
244 Fundamentals of Database Systems

7. Christopher, M., (2016). Logistics & Supply Chain Management (Vol. 2, No. 1, pp.
4–7). Pearson UK.
8. Cook, D. A., Andriole, D. A., Durning, S. J., Roberts, N. K., & Triola, M. M., (2010).
Longitudinal research databases in medical education: Facilitating the study of
educational outcomes over time and across institutions. Academic Medicine, 85(8),
1340–1346.
9. Delis, M. D., (2012). Bank competition, financial reform, and institutions: The importance
of being developed. Journal of Development Economics, 97(2), 450–465.
10. Eastin, M. S., (2002). Diffusion of e-commerce: An analysis of the adoption of four
e-commerce activities. Telematics and Informatics, 19(3), 251–267.
11. Efendioglu, A. M., & Yip, V. F., (2004). Chinese culture and e-commerce: An exploratory
study. Interacting with Computers, 16(1), 45–62.
12. Elena, C. A., (2016). Social media–a strategy in developing customer relationship
management. Procedia Economics and Finance, 39, 785–790.
13. Ergungor, O. E., (2004). Market-vs. bank-based financial systems: Do rights and
regulations really matter? Journal of Banking & Finance, 28(12), 2869–2887.
14. Fair, G., & Wesslen, R., (2019). Shouting into the void: A database of the alternative
social media platform gab. In: Proceedings of the International AAAI Conference on
Web and Social Media (Vol. 13, pp. 608–610).
15. Farooq, M. U., Waseem, M., Mazhar, S., Khairi, A., & Kamal, T., (2015). A review on
internet of things (IoT). International Journal of Computer Applications, 113(1), 1–7.
16. Frazelle, E., (2002). Supply Chain Strategy: The Logistics of Supply Chain Management
(Vol. 3, No. 2, pp. 7–9). MCGraw-Hill Education.
17. Ghezzi, A., Gastaldi, L., Lettieri, E., Martini, A., & Corso, M., (2016). A role for
startups in unleashing the disruptive power of social media. International Journal of
Information Management, 36(6), 1152–1159.
18. Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M., (2013). Internet of things (IoT):
A vision, architectural elements, and future directions. Future Generation Computer
Systems, 29(7), 1645–1660.
19. Gupta, A., (2014). E-Commerce: Role of e-commerce in today’s business. International
Journal of Computing and Corporate Research, 4(1), 1–8.
20. Hossein, M. N., Mohammadrezaei, M., Hunt, J., & Zakeri, B., (2020). Internet of
things (IoT) and the energy sector. Energies, 13(2), 494.
21. Jain, V. I., Malviya, B.I., & Arya, S. A., (2021). An overview of electronic commerce
(e-Commerce). Journal of Contemporary Issues in Business and Government, 27(3),
666.
22. Kifer, M., (2007). Database Systems: An Application-Oriented Approach, Introductory
Version, 2/E (Vol. 4, No. 1, pp. 2–6). Pearson Education India.
23. Kroszner, R. S., Laeven, L., & Klingebiel, D., (2007). Banking crises, financial
dependence, and growth. Journal of Financial Economics, 84(1), 187–228.
CHAPTER
8
Applications of Database Systems 245

24. Larson, P. D., & Halldorsson, A., (2004). Logistics versus supply chain management:
An international survey. International Journal of Logistics: Research and Applications,
7(1), 17–31.
25. Lee, I., & Lee, K., (2015). The internet of things (IoT): Applications, investments,
and challenges for enterprises. Business Horizons, 58(4), 431–440.
26. Lummus, R. R., Krumwiede, D. W., & Vokurka, R. J., (2001). The relationship of
logistics to supply chain management: Developing a common industry definition.
Industrial Management & Data Systems, 101(8), 426–432.
27. Madakam, S., Lake, V., Lake, V., & Lake, V., (2015). Internet of things (IoT): A
literature review. Journal of Computer and Communications, 3(5), 164.
28. Mangan, J., & Lalwani, C., (2016). Global Logistics and Supply Chain Management
(Vol. 4, No. 1, pp. 7–10). John Wiley & Sons.
29. Morreale, S. P., Valenzano, J. M., & Bauer, J. A., (2017). Why communication
education is important: A third study on the centrality of the discipline’s content and
pedagogy. Communication Education, 66(4), 402–422.
30. Özsu, M. T., & Valduriez, P., (1996). Distributed and parallel database systems. ACM
Computing Surveys (CSUR), 28(1), 125–128.
31. Park, J. S., & Lee, C. H., (2021). Clinical study using healthcare claims database.
Journal of Rheumatic Diseases, 28(3), 119–125.
32. Rayport, J. F., & Jaworski, B. J., (2004). Introduction to e-commerce. McGraw-Hill
Irwin Marketspace, 5(1). 1–5.
33. Rupp-Serrano, K., & Robbins, S., (2013). Information-seeking habits of education
faculty. College & Research Libraries, 74(2), 131–142.
34. Singh, S. K., (2009). Database Systems: Concepts, Design and Applications (Vol.
2, No. 1, pp. 6–9). Pearson Education India.
35. Taher, G., (2021). E-commerce: Advantages and limitations. International Journal
of Academic Research in Accounting Finance and Management Sciences, 11(1),
153–165.
36. Tomar, D., Bhati, J. P., Tomar, P., & Kaur, G., (2019). Migration of healthcare relational
database to NoSQL cloud database for healthcare analytics and management. In:
Healthcare Data Analytics and Management (Vol. 1, pp. 59–87). Academic Press.
37. Torani, S., Majd, P. M., Maroufi, S. S., Dowlati, M., & Sheikhi, R. A., (2019). The
importance of education on disasters and emergencies: A review article. Journal of
Education and Health Promotion, 8(1), 5–8.
38. Uhde, A., & Heimeshoff, U., (2009). Consolidation in banking and financial stability
in Europe: Empirical evidence. Journal of Banking & Finance, 33(7), 1299–1311.
39. Wang, G., Li, X., & Wang, Z., (2016). APD3: The antimicrobial peptide database as
a tool for research and education. Nucleic Acids Research, 44(D1), D1087–D1093.

CHAPTER
8
INDEX

A audio 4
academic performance 235, 236, 237 auditing 22
academic performance tracking 236, 237 Availability 194, 195
accessibility 4, 16 B
account information 54
accounts receivable management 233 bank details 115
ACID (atomicity, consistency, isolation, durabil- banking 52, 53, 54, 93
ity) 193 banking systems 6
Adding attributes 117 Big data 5
administrative data 55 big data analytics 7
aggregated data 196 big data applications 50, 91, 102
agreement 33, 36 big data challenge 188, 189
airline reservation systems 60, 61 big data sets 185
Analytical data 186 Big Data technologies 189, 192
analytical endeavors 203 billing 229, 230, 231, 233
analytics 225, 228, 230, 232, 234, 235, 238, billing information 55, 63
239, 245 binary files 4
analyze customer behavior 4 Blackboard 236
Apache Spark 191 Bloomberg Terminal 232
application 34, 35, 39 blueprint 120
application serving 2 Boolean 170, 179
architecture 2, 3, 19, 24, 25, 28 broadcasting 191
artificial intelligence 5, 23 bulk filling 42
Artists table 168 bulk load 42
ASP 174 Business 203, 204, 208, 216, 221, 222, 223
Association entities 121 business data administration 40
Attributes 126, 127, 128, 140, 142, 143, 144, business intelligence (BI) 203, 206
145, 146 business logic layer 19
248 Fundamentals of Database Systems

C content management systems (CMS) 50, 77


corporate intelligence 203
C++. 49
cost 50, 90, 93, 102
call records 55
course management 236, 237
careful planning 214
course schedules 235, 236
carrier information 238
credit scores 231
Cassandra 6
currency 179
catalog 41, 42
current automated systems 123
CD-ROM drive 176, 177, 178
customer data 9, 52, 54, 55, 77
Cerner 230
customer dissatisfaction 227
Channel analysis 204
customer experience 9
child entity 130, 143, 144, 149, 150
customer information 227, 231
citizen data 55, 70
customer relationship management (CRM) 215
classes 49, 103
customers 62, 63, 66, 67, 70, 84
client-server 174, 176, 177, 178
Cycle model 214
client-server architecture 2, 19, 24
Client-server programming 172 D
clinical documentation 230
Data 1, 2, 3, 4, 7, 8, 10, 11, 12, 13, 15, 17, 18,
cloud-based e-commerce platform 228
22, 24, 25, 26, 27, 28, 29
cloud computing 7
data analysis 36
Code values 126
database 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
Command 164, 177
14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
comma separated values (CSV) 188
26, 27, 28, 29
communication 36
Database administrators (DBAs) 171
completeness 37
database constraints 5
Complex relations 138
Database design 117
complex software systems 9
Database development 34
compliance 9, 22, 29
database management system (DBMS) 1, 28
computer 4, 5, 14, 15
database serving 2
computer-aided design (CAD) systems 82
Database systems 1, 2, 7, 8, 9, 17, 18, 19, 21,
computerized database-management systems
22, 23, 27, 28
165
database technology 53, 54, 73, 74
computer program 4
Database warehouse modeling 213
computer system development 33
data basics 36
computing environment 52
data consistency 7, 8, 12
conceptual data model 36, 37, 39
data control language (DCL) 1
Conceptual planning 119
Data definition language (DDL) 3, 10
concluding system 34
data-driven decision making 225
Concurrent access 2
data entry errors 8
Consistency 194
data extraction tools 213
constraints 1, 18
Data independence 2, 3
construction 34
data integrity 2, 6, 7, 14, 15, 18, 24, 29
contact information 231
data loading procedures 213
content management 234, 235
Index 249

data management 6, 7, 8 e-commerce applications 227, 228, 229


Data manipulation language (DML) 11, 28 e-commerce websites 6
Data Mart 203 Edmodo 237
data mining (DM) 208 education 1, 4, 225, 226, 242, 244, 245
Data modeling 117, 158 educational data 235, 236, 237
Data objects 117, 127 educational outcomes 236, 244
data processing 188, 191, 192, 197, 200, 201 efficient online shopping 227
Data query language (DQL) 12 Elasticity 196, 199
data redundancy 66, 73, 166, 180 electoral campaign 189
Data requirements 36 electronic device 4, 5
data requisite document 36 electronic health records 230, 231
data samples 188 Employment 40
data sampling 188, 189 encryption 18, 22
data security 8 enterprise systems 163
data storage 1, 2, 8, 15, 17, 19, 24, 26, 28, 29 Entities 121, 126, 127, 137, 140, 143
data storage layer 19 Entity-Attribute matrix 132, 133, 134
data structure 50, 54, 55, 57, 59, 102 Entity-Entity matrix 132, 133, 134
Datatype 164, 168, 169 Entity-relationship (ER) method 118
data type restrictions 6 entity-relationship level (ERD) 212
Data warehouse (DW) 203, 205 Entity relationship model 117
date 169, 179 environments 53
decision-making 4 Epic 230
delivery dates 238 ER diagram 31, 42, 43, 44, 45
demographics 188 error resolution 191
derived attributes 128, 129, 145, 146, 159 Establishing requirements 36
descriptive information 126 execution 2, 3, 12, 16, 17, 20, 21, 24
design 205, 211, 213, 214, 222, 223 extensible markup language (XML) 174
Diagram 117, 137, 146 external programming languages 167
Digital dashboard 212 extraction, transformation, and loading (ETL)
dimensional modeling 213 209
Dimension tables 203
F
distributed architecture 2, 19, 20, 24
distributed database system 194 Facebook Graph API 234
distributed system 194 Fact table 203
diversity 189 Fault tolerance 225
Document databases 50, 94, 95, 96, 97, 98, Finance 225, 226, 231, 242, 243
99, 100, 101, 103, 104 Financial institutions 54, 56, 70
documents 6 Financial news 232
domain expertise 115 Financial reporting 233
Financial transactions 1
E
Financial transaction systems 60, 61
e-commerce 72, 73, 74, 75, 76, 77, 90, 91, 92, FinFolio 233
93, 95, 96, 97 Flexibility 37, 40
250 Fundamentals of Database Systems

flight schedules 63 indexing 2, 8, 17, 24


forecast demand 238 indexing mechanism 167
Foreign keys 117, 142 influenza 189
Fraud detection 84 infographics 188, 196
functional model 119, 122 information collection phase 42
information management system (IMS) 52
G
informed decisions 225, 229, 231, 233, 237,
gender-based evaluation 188 239
generality hierarchy 136, 147 inheritance hierarchies 49, 103
Generalization hierarchies 117, 148 initial phase 35, 38
general ledger management 233 innovation 23
geographic information systems (GIS) 81 Instagram API 235
global computing infrastructure 115 insurance 52, 53
Google 188 insurance claims 230
Google flu trends (GFT) 188 insurance industries 52
Google’s algorithm 189 integrated data store (IDS) 60
government 4 integrity 37
government systems 55 intelligent corporation 215
graph database 80, 81, 82, 83, 84, 87, 101, intelligent information 214, 216
102, 106 Internet of Things (IoT) 77, 91, 97, 226, 239,
graphic 120 242, 243
graph processing 191 Internet of Things (IoT) equipment 188
graph theory 80, 81, 86 Internet programming 171
GUI interface 170 interpersonal model 136, 138, 148
intersectional entities 121
H
Interviews 123
Hadoop 186, 189, 190, 191, 192, 198, 200 inventory items 63
Hadoop distributed file system (HDFS) 190 inventory management systems 6, 54
hard disk drives 16 invoice production 207
hard disks 2
J
healthcare 225, 226, 229, 230, 231, 242, 245
healthcare systems 6 Java 49, 104
hierarchical chart 166 JDA Supply Chain Management 239
hierarchical database model 52, 53, 58, 59, 60, job descriptions 123
61 JSON 50, 87, 93, 94, 96
Hierarchical databases 49, 52, 53, 54, 55, 56, JSP 174
57, 58, 102, 103, 104, 107
K
high-level data model 212
High-quality data 4 key-value 192
hypertext analysis 215 key-value pairs 6
I L
images 4, 50, 74, 77, 87, 91, 95, 97, 110 laboratory information management 231
implementation 214, 215, 221
Index 251

language 115, 116, 120, 157 Microsoft SQL Server 7, 171


LeadSource 168 midlevel data model 212
learning 116, 155, 157, 159, 160, 161 midlevel model 212
legacy systems 60 modern computing 225
Life cycle 32, 46 Modern database systems 225
life cycle whether 31 MongoDB 6
LinkedIn 185 monitor stock levels 238
LinkedIn API 235 Moodle 236
load system 204 multi-layered architecture 19
Local area network 195 multimedia 4
logging 22 multimedia files 4
logical design 213
N
Logical layout 119
logistics 225, 226, 237, 238, 239, 240, 242, 245 network database 58, 59, 60, 61, 62, 63, 64,
Lotus Approach 171 65, 66, 67, 102, 105, 113
low-level physical model 212 network models 1
Low-quality data 4 network partitioning 194
Lyric Music database 167, 168, 176, 177, 178 news organizations 206
nonentity 130
M
non-key attributes 124, 127, 143, 144
machine learning 5, 7, 23 Non-relational databases 6
Magento 228 NoSQL 189, 191, 192, 198, 200
magnetic tapes 6, 16 NoSQL databases 50, 75, 87, 88, 89, 90, 91,
mainframe systems 53 92, 93, 94, 102, 103, 104, 105, 109, 110, 111,
Maintenance 32, 34 112, 113
maintenance costs 57 NoSQL database systems 50
Manhattan Associates Supply Chain notations 120
Management 239 numbers 4
manufacturing 53, 54, 208, 218
O
Manufacturing systems 63
many-to-many association 130, 136, 137 object model 118
many-to-many relationships 58, 59, 61, 64, 65 Object-oriented databases (OODBs) 73
MapReduce algorithm 190 object-oriented data structures 165
Market 204, 216, 221 object-oriented programming (OOP) 49, 73
marketing messages 227 object-oriented programming (OOP) languages
medical research 229 49
medicine 4 object-relational databases (ORDBs) 75
Meditech 231 objects 49, 73, 74, 75, 76, 81, 103, 105, 106,
memoranda 123 110
memory 2, 16 Online 204, 217, 220
memory allocation 191 online analytical processing (OLAP) systems
Michigan terminal system database manage- 217
ment system (MTSDBMS) 60 online archives 188
Microsoft access 164 Online retailers 69
252 Fundamentals of Database Systems

online transaction processing (OLTP) 49, 104 product catalog management 228
open-source database language 171 product data 54, 97
operational business applications 208 product information 227
Operational reports 210 product inventory 227
optimization 2, 3, 20, 21, 24, 26, 28 programming code 171
Oracle 7, 164, 165, 169, 171, 173, 174, 175, programming errors 171
177, 178 project planning 214
Oracle Commerce 228 prototype 213, 214
Oracle Financials 232 purchasing 208
Oracle Supply Chain Management 238 Python 191, 192, 198
order information 54
Q
order management 228, 229
organizational hierarchy 52 Query 163, 164, 170, 171, 173, 176, 177, 179,
181, 182
P
query by example (QBE) 170
parental object 130 query data 163
parent-child relationships 52 Query execution 16
parent organization 130 Query execution performance 17
parsing 2 Query optimization 16, 20
Partition Tolerance 194 Query processing 2, 20, 26, 28
passenger data 63 QuickBooks 233
patient records 226, 229, 242
R
payment processing 229
payment records 230 R 191, 192, 199, 200, 201
performance 50, 53, 56, 57, 65, 68, 69, 70, 71, rational schema 37, 40, 41
72, 73, 78, 79, 80, 81, 82, 83, 85, 86, 87, 90, real-time analytics 50, 92, 95, 102
91, 92, 93, 96, 99, 100, 101, 102, 105, 108, 112 real-time market data 232
performance reporting 233 rebalancing 233
peripheral schema 40 Recommendation engines 84
personal information 233, 235 Record 164, 168, 182
personalization 228 Recursive associations 134
pharmacy management 231 recursive link 136
phone numbers 63 Redis 6
PHP 174 Referential integrity 6
physical data model 212 Refining 117, 156
Physical layout 119 reimbursement 229, 230, 231
portfolio management 232, 233 relational algebra 66
preliminary scheme 34 relational database management system (RD-
Primary keys 117, 140, 151 BMS) 6
Primary storage 1, 16, 25 relational demonstration 37
Privacy 186, 196 relational model 1, 6
problem-solving 4 relational paradigm 121, 151, 161
procedural programming languages 172 Relational tables 119, 121
relationships 1, 5, 6
Index 253

reporting 207, 208, 209, 210, 211, 212, 213, social networks 49, 81, 82, 83, 87, 88, 90, 103,
215, 217 189
requirements analysis 122, 123, 124, 125 software development life cycle 31, 47
requirements investigation 123 software engineering 31, 33, 40
Requirements specification 214 software invention 31
research 4, 26 software system 1, 15, 34
reservations 63 solid-state drives 2, 16
revenue cycle management 230 solitary entity 131, 136
Row 164, 168 source systems 203, 214, 218
Spark 186, 189, 191, 192, 198
S
Spark databases 189
safety implementation 40 Spark Streaming 191
sales figures 115 specialized applications 60, 61, 62, 63, 65
Saved query 171 spreadsheet 4
Scala 191, 192 SQL commands 164, 172, 175, 176
scalability 2, 18, 19, 24, 25, 27, 50, 53, 58, 60, SQL Microsoft Access 41
68, 72, 73, 86, 88, 89, 91, 92, 93, 96, 98, 100, Stabilization 152
102, 111 stakeholders 33
Scalability 186, 195, 199 statistics 118
scalable database models 53, 54, 55, 57 stock 227, 238
Schema 117 stock levels 54, 70
science 4 stored events 120
scientific research 73, 74, 75, 76, 84, 98 stowage schema 40
scratch 120 strategic decisions 4
searching 8 Structured data 4
search phrases 189 structured English query language (or SE-
search trends 188 QUEL) 173
Secondary storage 2, 16, 25 Structured Query Language (SQL) 6, 12, 13
security 2, 13, 14, 15, 18, 24, 26, 28 student engagement tracking 236, 237
semi-structured data 50, 87, 90, 93, 102, 103, student records 226, 235, 236, 242
104 Subject Oriented 205
Server 164, 169, 174, 176, 182 succeeding phase 33
Server Management Studio 41 suppliers 63, 76, 84
sets 58, 81, 95 supply chain data 237, 238
shared scheduling scheme 191 supply chain information 238, 239
shipping status 238 supply chain management 225, 237, 239, 245
Shopify 228 supply chain systems 50, 103
social connections 233 Sybase 7, 171
social media 225, 226, 233, 235, 242, 243, 244 system development life cycle 214
social media interactions 1
T
social media network 185
Social media platforms 233 Table 164, 167, 168, 169, 175
social media posts 50, 95 Target advertising 233, 234
254 Fundamentals of Database Systems

tax records 55 user interface 19


teaching effectiveness 236 user profile management 234, 235
telecommunications systems 55, 63 users tweeting 188
terminated connection 139
V
text 4, 5, 50, 87, 91, 95, 97, 179
text documents 188 video 4
text files 4 views 171, 174
theoretical data model 36 vocabulary 120
tiered architecture 2, 19, 24 volume 187, 189, 195, 196, 197
Titles table 168
W
track inventory 4
track inventory data 54 warehouse access mechanisms 213
track inventory levels 238, 240 waterfall 31, 33, 34, 46, 47
Transactional data 186, 193, 198, 199 waterfall cycle 31, 34
transaction history 231 waterfall model 33, 34, 46
transaction logs 16 waterfall process 33
transactions 54, 62, 67, 70, 84, 91, 100 WebAddress 168
transformation 204, 209, 214, 217 web applications 50, 75, 88, 89, 90, 96, 102,
translation 2, 20 103
transportation information 237, 238 web-based data 7
transportation management 238, 239 web information 215
tree-like structure 49, 52, 53, 56 Web programming 172
triggers 120 web serving 2
Twitter 189 wide area network 195
Twitter API 235 WooCommerce 229
WordPress 229
U
X
unique values 6
Unstructured data 4 XML 50, 87
updating data 8
URL 188
usability 37, 46
Salter
Fundamentals of Database Systems
The field of database systems has evolved significantly over the years, and as a result, it
has become more complex. Understanding the fundamentals of database systems is
About the Author the key to unlocking the full potential of databases. Database systems provide a
centralized and structured approach to storing and organizing data, enabling efficient
data retrieval, manipulation, and analysis. This is essential for businesses that deal with
large volumes of data, as it provides a systematic way of managing data that leads to
more accurate and informed decision-making. Furthermore, a solid foundation in
database systems is essential for anyone involved in the development of software
applications that utilize databases, as it helps to ensure data consistency and integrity.
Ultimately, a good understanding of the fundamentals of database systems is essential
for anyone seeking to work with databases in any capacity.
The first chapter provides an introduction to database systems, discussing the basic

Fundamentals of Database Systems


components, features, and advantages of database management. This chapter also
covers the evolution of database systems and their growing role in modern organiza-
Kaitlyn Salter is an accomplished marketing
tions. The second chapter explores the database development process, including
professional with over a decade of experience in
requirements gathering, database design, implementation, testing, and maintenance.
the industry. She currently serves as the Director
This chapter provides a detailed guide to designing and implementing an efficient
of Marketing at a leading digital agency, where
database system that meets the needs of the organization. The third chapter discusses
she oversees the development and execution of
the different types of databases, including relational, NoSQL, and graph databases, and
marketing strategies for a diverse range of clients.
their respective strengths and weaknesses. This chapter also covers the differences
Kaitlyn's expertise spans a wide range of market-
between these types of databases and their use cases. The fourth chapter focuses on
ing disciplines, including digital marketing,
database modeling, covering entity-relationship (ER) diagrams, normalization, and
branding, social media, content marketing, and
other techniques used to design a database schema. This chapter also covers database
advertising. Her extensive knowledge of industry

Fundamentals of
design principles and best practices. The fifth chapter delves into the fundamentals of
trends and consumer behavior allows her to
relational databases, including SQL querying, constraints, and transactions. This
develop effective campaigns that drive engage-
chapter provides practical tips and techniques for managing relational data and
ment, conversions, and ROI. Prior to her current
optimizing SQL queries. The sixth chapter discusses the role of big data in database
role, Kaitlyn worked as a Marketing Manager at
systems, including the challenges and opportunities presented by the increasing
several leading companies, where she honed her
volume, velocity, and variety of data. This chapter also covers the use of distributed
skills in brand development, campaign manage-
systems and cloud technologies for managing big data. The seventh chapter covers
ment, and customer acquisition. She has also

Database Systems
data warehousing and business intelligence (BI), including the design and develop-
worked as a freelance consultant, helping
ment of data warehouses (DWs), OLAP cubes, and the use of business intelligence tools
businesses of all sizes to create and execute
for data analysis and reporting. This chapter also covers data mining (DM) and predic-
effective marketing strategies. Kaitlyn holds a
tive analytics techniques. The eighth chapter explores the various applications of
Bachelor of Science in Marketing from the Univer-
database systems, including e-commerce, healthcare, and social media platforms. This
sity of California, Los Angeles (UCLA), where she
chapter highlights the diverse and growing role of databases in modern organizations
graduated with honors. She is a member of
and covers the unique challenges and opportunities presented by each industry.
several professional organizations, including the
With real-world examples, case studies, and practical exercises, Fundamentals of
American Marketing Association (AMA) and the
Database Systems provides readers with a comprehensive understanding of database

Kaitlyn Salter
Digital Marketing Association (DMA).
management, preparing them for success in today’s data-driven world.

ISBN 978-1-77956-170-1
00000

TAP
9 781779 561701
Toronto Academic Press
TAP TAP

You might also like