[go: up one dir, main page]

0% found this document useful (0 votes)
186 views27 pages

DBMS Notes Unit 5

The document outlines advanced topics in database management systems, focusing on distributed databases, NoSQL databases, and database security. It discusses the architecture of distributed databases, types, query processing, and optimization, as well as the CAP theorem and document-based systems. Additionally, it highlights the advantages and disadvantages of NoSQL databases and the challenges related to database security.

Uploaded by

Vignes Waran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
186 views27 pages

DBMS Notes Unit 5

The document outlines advanced topics in database management systems, focusing on distributed databases, NoSQL databases, and database security. It discusses the architecture of distributed databases, types, query processing, and optimization, as well as the CAP theorem and document-based systems. Additionally, it highlights the advantages and disadvantages of NoSQL databases and the challenges related to database security.

Uploaded by

Vignes Waran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CS3492 – DATABASE MANAGEMENT SYSTEMS

ACADEMIC YEAR 2024-2025 (EVEN)


Department of Computer Science and Engineering

UNIT-IV - ADVANCED TOPICS

Distributed Databases: Architecture, Data Storage, Transaction Processing, Query processing and
optimization – NOSQL Databases: Introduction – CAP Theorem – Document Based systems – Key
value Stores – Column Based Systems – Graph Databases. Database Security: Security issues – Access
control based on privileges – Role Based access control – SQL Injection – Statistical Database security
– Flow control – Encryption and Public Key infrastructures – Challenges

5.1 Distributed Databases: Architecture, Data Storage, Transaction Processing, Query processing
and optimization
Distributed database system
A distributed database is basically a database that is not limited to one system, it is spread over
different sites, i.e, on multiple computers or over a network of computers. A distributed database system
is located on various sites that don’t share physical components. This may be required when a particular
database needs to be accessed by various users globally. It needs to be managed such that for the users it
looks like one single database.

Types of Distributed database


 Homogeneous Database: In a homogeneous database, all different sites store database
identically. The operating system, database management system, and the data structures used – all are
the same at all sites. Hence, they’re easy to manage.

 Heterogeneous Database: In a heterogeneous distributed database, different sites can use


different schema and software that can lead to problems in query processing and transactions. Also, a
particular site might be completely unaware of the other sites. Different computers may use a different
operating system, different database application. They may even use different data models for the
database. Hence, translations are required for different sites to communicate.

SRMMCET / CSE / DBMS 1


CS3492 – DATABASE MANAGEMENT SYSTEMS

Architectures for distributed database


(i) Client-server architecture: In this architecture, clients connect to a central server, which
manages the distributed database system. The server is responsible for coordinating transactions,
managing data storage, and providing access control.

Fig. 5.1 Client-server architecture


(ii) Peer-to-peer architecture: In this architecture, each site in the distributed database system
is connected to all other sites. Each site is responsible for managing its own data and coordinating
transactions with other sites.

Fig. 5.2 Peer-to-peer architecture

(iii) Federated architecture: In this architecture, each site in the distributed database system
maintains its own independent database, but the databases are integrated through a middleware layer
that provides a common interface for accessing and querying the data.

Fig. 5.3 Federated architecture

(iv) Shared−Nothing Architecture: Data is divided up and spread among several nodes in a
shared−nothing architecture, with each node in charge of a particular portion of the data. Resources are
not shared across nodes, and each node runs independently. Due to the system's capacity to add
additional nodes as needed without affecting the current nodes, this design offers great scalability and
fault tolerance.

SRMMCET / CSE / DBMS 2


CS3492 – DATABASE MANAGEMENT SYSTEMS

Fig. 5.4 Shared−Nothing Architecture

Query Processing:
Query processing refers to activities including translation of high level languages (HLL) queries
into operations at physical file level, query optimization transformations, and actual evaluation of
queries.

Steps in Query Processing:


The steps involved in query processing are:
(i) Parsing and Translation (Query Compilation)
(ii) Query Optimization
(iii)Evaluation (code generation)

Fig. 5.5 Steps of Query Processing

Parsing and Translation (Query Compilation)


As query processing includes certain activities for data retrieval. Initially, the given user queries
get translated in high-level database languages such a sSQL. It gets translated into expressinons that can
be further used at the physical level of the file system. After this, the actual evaluation of the queries and
a variety of query – optimizing transformations and takes place. Thus before processing a query, a
computer system needs to translate the query into a human-readble and understandable language.
Consequently, SQL is the best suitable choice for humans. But, it is not perfectly suitable for the
internal representation of the query to the system. Relational algebra is well suited for the internal
representation of a query.

SRMMCET / CSE / DBMS 3


CS3492 – DATABASE MANAGEMENT SYSTEMS

The translation process in query processing is similar to the parser of a query. When a user
executes any query, for generating the internal form of the query, the parser in the system checks the
synatax of the query, verifies the name of the relation in the database, the tuple, and finally the rquired
attribute value. The parser creates a tree of the query, known as ‘parse-tree’. Further, translate it into the
form of relational algebra. With this, it evently replaces all the use of the views when used in the query.
Suppose a user executes a query. There are various methods of extracting the data from the databases.
In SQL, a user wants to fetch the records of the employees whose salary is greater than or equal to
10000. For doing this, the query is
Select emp_name from Employee where salary > 10000;
It is translated to the relational algebra as:
salary>10000 (πemp_name (Employee))
πemp_name (salary>10000 (Employee))
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the translated
relational algebra expression with the instructions used for specifying and evalauting each operation.
Thus, after translating the user query, the system executes a query evaluation plan. A query execution
engine is responsible for generating the output of the given query. It takes the query execution
plan(query evaluation plan), executes it, and finally makes the output for the user query.

Optimization
Query optimization is the process of selecting an efficient execution plan for evaluating the
query. After parsing of the query, parsed query is passed to query optimizer, which generates different
execution plans to evaluate parsed query and select the plan with least estimated cost. Catalog manager
helps optimizer to choose best plan to execute query generating cost of each plan.

Fig. 5.6 Optimization Query processing

SRMMCET / CSE / DBMS 4


CS3492 – DATABASE MANAGEMENT SYSTEMS

Query optimization is a process of transforming a query into an equivalent form which can be
evaluated more efficiently. The essence of query optimization is to find an execution plan that
minimizes time needed to evaluate a query. To achieve this optimization goal, we need to accomplish
two main tasks. First one is to find out the best plan and the second one is to reduce the time involved in
executing the query plan. The next step is an optimization step that transforms the initial algebraic query
using relational algebra transformation into other algebraic queries until the best one is found.

5.2 NOSQL Databases: Introduction


NoSQL
NoSQL originally referring to non SQL or non-relational is a type of database management
system (DBMS) that is designed to handle and store large volumes of unstructured and semi-
structured data. Unlike traditional relational databases that use tables with pre-defined schemas to
store data, NoSQL databases use flexible data models that can adapt to changes in data structures and
are capable of scaling horizontally to handle growing amounts of data.

Types of NoSQL database:

Types of NoSQL databases and the name of the database system that falls in that category are:

1) Graph Databases: Examples – Amazon Neptune, Neo4j


2) Key value store: Examples – Memcached, Redis, Coherence
3) Column: Examples – Hbase, Big Table, Accumulo
4) Document-based: Examples – MongoDB, CouchDB, Cloudant

Features of NoSQL databases


a) Multiple data model compatibility
Relational databases usually require data to be placed in table form before they can access and
analyze it. They are typically rigid in data handling, requiring information to be well-
structured before it can be ingested.
b) Enhanced scalability and availability
 Relational databases are undoubtedly scalable. However, their traditional server-client
architecture makes both horizontal and vertical scaling a resource-intensive process.

 In contrast, NoSQL solutions can offer a serverless, peer-to-peer architecture with


consistent properties among all nodes. This allows for simplified scalability, making
NOSQL highly adaptable for cloud applications. Performance is also improved, enabling
more excellent read and write speeds and continuous availability.

SRMMCET / CSE / DBMS 5


CS3492 – DATABASE MANAGEMENT SYSTEMS

 NoSQL databases also use sharing to scale horizontally. Data is partitioned and placed on
numerous machines in this process to preserve its ordering. The high flexibility of NoSQL
allows it to handle massive data volumes efficiently.

 Finally, the auto replication capability of NoSQL also helps ensure high availability, as the
solution responds with replication to the last-known consistent state in case of any failure.

c) Global data distribution


Cutting-edge NoSQL databases can ensure data distribution at a global scale. This is achieved
through multiple cloud regions and data centers for read-and-write operations across several
locations.

d) Minimal downtime
NoSQL databases are robust and feature minimal downtime. Business continuity is bolstered
through serverless architecture and the creation of multiple copies of data to be retained across
nodes. In case of a node malfunction, a different node will grant access to its copy of the data.

5.3 CAP Theorem


The CAP theorem is a fundamental concept in distributed systems theory that was first proposed
by Eric Brewer in 2000 and subsequently shown by Seth Gilbert and Nancy Lynch in 2002. It asserts
that all three of the following qualities cannot be concurrently guaranteed in any distributed data system:

1. Consistency
Consistency means that all the nodes (databases) inside a network will have the same copies of a
replicated data item visible for various transactions. It guarantees that every node in a distributed cluster
returns the same, most recent, and successful write. It refers to every client having the same view of the
data. There are various types of consistency models. Consistency in CAP refers to sequential
consistency, a very strong form of consistency.

For example, a user checks his account balance and knows that he has 500 rupees. He spends
200 rupees on some products. Hence the amount of 200 must be deducted changing his account balance
to 300 rupees. This change must be committed and communicated with all other databases that hold this
user’s details. Otherwise, there will be inconsistency, and the other database might show his account
balance as 500 rupees which is not true.

SRMMCET / CSE / DBMS 6


CS3492 – DATABASE MANAGEMENT SYSTEMS

Fig. 5.7 Consistency problem


2. Availability
Availability means that each read or write request for a data item will either be processed
successfully or will receive a message that the operation cannot be completed. Every non-failing node
returns a response for all the read and write requests in a reasonable amount of time. The key word here
is “every”. In simple terms, every node (on either side of a network partition) must be able to respond in
a reasonable amount of time.

For example, user A is a content creator having 1000 other users subscribed to his channel.
Another user B who is far away from user A tries to subscribe to user A’s channel. Since the distance
between both users are huge, they are connected to different database node of the social media network.
If the distributed system follows the principle of availability, user B must be able to subscribe to user
A’s channel.

Fig. 5.8 Availability problem


3. Partition Tolerance
Partition tolerance means that the system can continue operating even if the network connecting
the nodes has a fault that results in two or more partitions, where the nodes in each partition can only
communicate among each other. That means, the system continues to function and upholds its
consistency guarantees in spite of network partitions. Network partitions are a fact of life. Distributed
systems guaranteeing partition tolerance can gracefully recover from partitions once the partition heals.

SRMMCET / CSE / DBMS 7


CS3492 – DATABASE MANAGEMENT SYSTEMS

For example, take the example of the same social media network where two users are trying to
find the subscriber count of a particular channel. Due to some technical fault, there occurs a network
outage, the second database connected by user B losses its connection with first database. Hence the
subscriber count is shown to the user B with the help of replica of data which was previously stored in
database 1 backed up prior to network outage. Hence the distributed system is partition tolerant.

Fig. 5.9 Partition Tolerance problem


The CAP theorem states that distributed databases can have at most two of the three properties:
consistency, availability, and partition tolerance. As a result, database systems prioritize only two
properties at a time.

Fig. 5.10 Characteristics of CAP Theorem

The CAP theorem implies that a distributed system can only provide two out of three properties:
a) CA (Consistency and Availability) - These types of system always accept the request to view
or modify the data sent by the user and they are always responded with data which is
consistent among all the database nodes of a big, distributed network.
b) AP (Availability and Partition Tolerance) - These types of system are distributed in nature,
ensuring that the request sent by the user to view or modify the data present in the database
nodes are not dropped and are processed in presence of a network partition.
c) CP (Consistency and Partition Tolerance) - These types of system are distributed in nature,
ensuring that the request sent by the user to view or modify the data present in the database
nodes are dropped instead of responding with inconsistent data in presence of a network
partition.

SRMMCET / CSE / DBMS 8


CS3492 – DATABASE MANAGEMENT SYSTEMS

5.4 Document Based systems


A Document Data Model is a lot different than other data models because it stores data in JSON,
BSON, or XML documents. in this data model, we can move documents under one document and apart
from this, any particular elements can be indexed to run queries faster. Often documents are stored and
retrieved in such a way that it becomes close to the data objects which are used in many applications
which means very less translations are required to use data in applications. JSON is a native language
that is often used to store and query data too.

So in the document data model, each document has a key-value pair below is an example for the
same.
{
"Name" : "Yashodhra",
"Address" : "Near Patel Nagar",
"Email" : "yahoo123@yahoo.com",
"Contact" : "12345"
}
Features:
 Document Type Model: As we all know data is stored in documents rather than tables or
graphs, so it becomes easy to map things in many programming languages.
 Flexible Schema: Overall schema is very much flexible to support this statement one must
know that not all documents in a collection need to have the same fields.
 Distributed and Resilient: Document data models are very much dispersed which is the reason
behind horizontal scaling and distribution of data.
 Manageable Query Language: These data models are the ones in which query language allows
the developers to perform CRUD (Create Read Update Destroy) operations on the data model.
Examples of Document Data Models:
- Amazon DocumentDB
- MongoDB
- Cosmos DB
- ArangoDB
- Couchbase Server
- CouchDB
Advantages:
 Schema-less: These are very good in retaining existing data at massive volumes because there
are absolutely no restrictions in the format and the structure of data storage.

SRMMCET / CSE / DBMS 9


CS3492 – DATABASE MANAGEMENT SYSTEMS

 Faster creation of document and maintenance: It is very simple to create a document and apart
from this maintenance requires is almost nothing.
 Open formats: It has a very simple build process that uses XML, JSON, and its other forms.
 Built-in versioning: It has built-in versioning which means as the documents grow in size there
might be a chance they can grow in complexity. Versioning decreases conflicts.

Disadvantages:
 Weak Atomicity: It lacks in supporting multi-document ACID transactions. A change in the
document data model involving two collections will require us to run two separate queries
i.e. one for each collection. This is where it breaks atomicity requirements.
 Consistency Check Limitations: One can search the collections and documents that are not
connected to an author collection but doing this might create a problem in the performance of
database performance.
 Security: Nowadays many web applications lack security which in turn results in the leakage
of sensitive data. So it becomes a point of concern, one must pay attention to web app
vulnerabilities.

Applications of Document Data Model:


 Content Management: These data models are very much used in creating various video
streaming platforms, blogs, and similar services Because each is stored as a single document
and the database here is much easier to maintain as the service evolves over time.
 Book Database: These are very much useful in making book databases because as we know
this data model lets us nest.
 Catalog: When it comes to storing and reading catalog files these data models are very much
used because it has a fast reading ability if incase Catalogs have thousands of attributes stored.
 Analytics Platform: These data models are very much used in the Analytics Platform.

5.5 Key value Stores


Key value stores
A key-value data model or database is also referred to as a key-value store. It is a non-relational
type of database. In this, an associative array is used as a basic database in which an individual key is
linked with just one value in a collection. For the values, keys are special identifiers. Any kind of entity
can be valued. The collection of key-value pairs stored on separate records is called key-value databases
and they do not have an already defined structure.

SRMMCET / CSE / DBMS 10


CS3492 – DATABASE MANAGEMENT SYSTEMS

Fig. 5.11. Key-value


Advantages:
 It is very easy to use. Due to the simplicity of the database, data can accept any kind, or even
different kinds when required.
 Its response time is fast due to its simplicity, given that the remaining environment near it is
very much constructed and improved.
 Key-value store databases are scalable vertically as well as horizontally.
 Built-in redundancy makes this database more reliable.
Disadvantages:
 As querying language is not present in key-value databases, transportation of queries from one
database to a different database cannot be done.
 The key-value store database is not refined. You cannot query the database without a key.

5.6 Column Based Systems


In a column-oriented data store, data is organized and stored by columns rather than by rows.
This approach is optimized for retrieving specific columns of data and is typically used in data
warehousing and analytics systems. In a column-oriented data store, data is organized and stored like
this:
ID Name Age Department
1 John 35 IT
2 Balu 28 HR
3 Rahuman 42 Finance

When a query is executed in a column-oriented data store, it retrieves only the specified columns
of data, which can result in faster query performance. Additionally, column-oriented data stores can use
compression techniques to reduce storage space and improve performance.

SRMMCET / CSE / DBMS 11


CS3492 – DATABASE MANAGEMENT SYSTEMS

One potential downside of column-oriented data stores is that they may require more complex
queries to retrieve entire rows of data, as the data is spread across multiple columns. However, this can
often be mitigated by using specialized query languages and optimization techniques that are designed
for columnar data stores.

Advantages of Column-Oriented Databases


 Designed with Online Analytical Processing (OLAP) in mind: Perfect for analytical queries
requiring operations and aggregates on certain columns.
 Improved query performance and greater compression are made possible by storage
efficiency.
 Faster Query Performance: Because just the relevant data is read, queries that only call for a
subset of columns run more quickly.

Disadvantages of Column-Oriented Databases


 Complicated Row Retrieval: When data is dispersed across many columns, retrieving full rows
may be more difficult.
 Less Appropriate for OLTP: Transactional workloads involving frequent insert, update, or
delete operations are less efficient using this approach.
 Query Complexity: To optimize performance, certain query languages and optimization
strategies are needed.

5.7 Graph Databases


A graph database (GDB) is a database that uses graph structures for storing data. It uses
nodes, edges, and properties instead of tables or documents to represent and store data. The edges
represent relationships between the nodes. This helps in retrieving data more easily and, in many cases,
with one operation. Graph databases are commonly referred to as NoSQL. Ex: Neo4j, Amazon Neptune,
ArangoDB etc.

Representation:
The graph database is based on graph theory. The data is stored in the nodes of the graph and
the relationship between the data are represented by the edges between the nodes.

Fig. 5.12 Graph based Representation

SRMMCET / CSE / DBMS 12


CS3492 – DATABASE MANAGEMENT SYSTEMS

When do we need Graph Database?


1. It solves Many-To-Many relationship problems
2. When relationships between data elements are more important
3. Low latency with large scale data

Example
We have a social network in which five friends are all connected. These friends are Anay,
Bhagya, Chaitanya, Dilip, and Erica. A graph database that will store their personal information may
look something like this:

Fig. 5.13 Example for Graph database

Advantages: Frequent schema changes, managing volume of data, real-time query response time, and
more intelligent data activation requirements are done by graph model.

Disadvantages: Note that graph databases aren’t always the best solution for an application. We will
need to assess the needs of application before deciding the architecture.

Limitations of Graph Databases:


 Graph Databases may not be offering better choice over the NoSQL variations.
 If application needs to scale horizontally this may introduces poor performance.
 Not very efficient when it needs to update all nodes with a given parameter.

5.8 Database Security: Security issues


1. Data quality
The database community basically needs techniques and some organizational solutions to assess
and attest the quality of data. These techniques may include the simple mechanism such as quality
stamps that are posted on different websites. We also need techniques that will provide us more

SRMMCET / CSE / DBMS 13


CS3492 – DATABASE MANAGEMENT SYSTEMS

effective integrity semantics verification tools for assessment of data quality, based on many techniques
such as record linkage.

2. Intellectual property rights


As the use of Internet and intranet is increasing day by day, legal and informational aspects of
data are becoming major concerns for many organizations. To address this concerns watermark
technique are used which will help to protect content from unauthorized duplication and distribution by
giving the provable power to the ownership of the content. Traditionally they are dependent upon the
availability of a large domain within which the objects can be altered while retaining its essential or
important properties. However, research is needed to access the robustness of many such techniques and
the study and investigate many different approaches or methods that aimed to prevent intellectual
property rights violation.

3. Database survivability
Database systems need to operate and continued their functions even with the reduced
capabilities, despite disruptive events such as information warfare attacks A DBMS in addition to
making every effort to prevent an attack and detecting one in the event of the occurrence should be able
to do the following:
 Confident: We should take immediate action to eliminate the attacker’s access to the system
and to isolate or contain the problem to prevent further spread.
 Damage assessment: Determine the extent of the problem, including failed function and
corrupted data.
 Recover: Recover corrupted or lost data and repair or reinstall failed function to reestablish a
normal level of operation.
 Reconfiguration: Reconfigure to allow the operation to continue in a degraded mode while
recovery proceeds.
 Fault treatment: To the extent possible, identify the weakness exploited in the attack and takes
steps to prevent a recurrence.

Challenges of database security


The challenges of database security in DBMS include:
 Authentication and Authorization: One of the biggest challenges of database security is
ensuring that only authorized users can access the database. The DBMS must authenticate
users and grant them appropriate access rights based on their roles and responsibilities.

SRMMCET / CSE / DBMS 14


CS3492 – DATABASE MANAGEMENT SYSTEMS

 Encryption: Data encryption is an effective way to protect sensitive data in transit and at rest.
However, it can also be a challenge to implement and manage encryption keys and ensure that
encrypted data is not compromised.

 Access Control: Access control involves regulating the access to data within the database. It
can be challenging to implement access control mechanisms that allow authorized users to
access the data they need while preventing unauthorized users from accessing it.

 Auditing and Logging: DBMS must maintain an audit trail of all activities in the database.
This includes monitoring who accesses the database, what data is accessed, and when it is
accessed. This can be a challenge to implement and manage, especially in large databases.

 Database Design: The design of the database can also impact security. A poorly designed
database can lead to security vulnerabilities, such as SQL injection attacks, which can
compromise the confidentiality, integrity, and availability of data.

 Malicious attacks: Cyberattacks such as hacking, malware, and phishing pose a significant
threat to the security of databases. DBMS must have robust security measures in place to
prevent and detect such attacks.

 Physical Security: Physical security of the database is also important, as unauthorized physical
access to the server can lead to data breaches.

5.9 Access control based on privileges


Confidentiality, integrity, and availability are the stamps of database security. Authorization is
the allowance to the user or process to access the set of objects. The type of access granted can be any
like, read-only, read, and write. Privilege means different Data Manipulation Language(DML)
operations which can be performed by the user on data like INSERT, UPDATE, SELECT and
DELETE, etc.
There are two methods by which access control is performed is done by using the following.
i. Privileges
ii. Roles
Privileges :
The authority or permission to access a named object as advised manner, for example,
permission to access a table. Privileges can allow permitting a particular user to connect to the database.
In, other words privileges are the allowance to the database by the database object.

SRMMCET / CSE / DBMS 15


CS3492 – DATABASE MANAGEMENT SYSTEMS

 Database privileges — A privilege is permission to execute one particular type of SQL


statement or access a second persons’ object. Database privilege controls the use of computing
resources. Database privilege does not apply to the Database administrator of the database.

 System privileges — A system privilege is the right to perform an activity on a specific type of
object. For example, the privilege to delete rows of any table in a database is system privilege.
There are a total of 60 different system privileges. System privileges allow users to CREATE,
ALTER, or DROP the database objects.

 Object privilege — An object privilege is a privilege to perform a specific action on a


particular table, function, or package. For example, the right to delete rows from a table is an
object privilege. For example, let us consider a row of table GEEKSFORGEEKS that contains
the name of the employee who is no longer a part of the organization, then deleting that row is
considered as an object privilege. Object privilege allows the user to INSERT, DELETE,
UPDATE, or SELECT the data in the database object

5.10 Role Based access control


Role based access control
Role-based access control (RBAC) refers to the idea of assigning permissions to users based on
their role within an organization. In simple terms, RBAC assigns roles based on factors such as a
person’s job responsibilities, authorization, and skill level.

For example, a security analyst can configure a firewall but can’t view customer data, while a
sales rep can see customer accounts but can’t touch firewall settings.

Benefits of RBAC
 Create systematic, repeatable assignment of permissions
 Easily audit user privileges and correct identified issues
 Quickly add and change roles, as well as implement them across APIs
 More effectively comply with regulatory and statutory requirements for confidentiality and
privacy

Primary rules of RBAC


There are three basic rules used in all RBAC systems. They are:
(i) Role assignment: A user must be assigned one or more active roles to exercise
permissions or privileges.

SRMMCET / CSE / DBMS 16


CS3492 – DATABASE MANAGEMENT SYSTEMS

(ii) Role authorization: The user must be authorized to take on the role or roles they have
been assigned.
(iii) Permission authorization: Permissions or privileges are granted only to users who have
been authorized through their role assignments.

Models of RBAC
There are four separate models for implementing RBAC, but each model begins with the same
core structure. Each successive model builds new functionality and features upon the previous model.
1) Core RBAC - Sometimes called Flat RBAC, this model is the required foundation for any
RBAC system. It follows the three basic rules of RBAC.
2) Hierarchical RBAC - This model adds role hierarchies that replicate the reporting structure of
an organization. In a role hierarchy, each role inherits the permissions of the role beneath it
and gains new permissions.
3) Constrained RBAC - This model adds capabilities for enforcing separation of duties (SOD).
Separation of duties helps prevent conflicts of interest by requiring two people to complete
certain tasks.
4) Symmetric RBAC - This model is the most advanced, flexible and comprehensive version of
RBAC. In addition to the capabilities of the previous models, it adds deeper visibility into
permissions across an enterprise.

Advantages of Role-based Access Control


There are different advantages to utilizing RBAC, including:
i. Working on functional proficiency:- With RBAC, organizations can diminish the
requirement for administrative work and secret word changes when they recruit new representatives or
switch the jobs of existing representatives. RBAC allows associations rapidly to add and change jobs, as
well as carry out them across stages, working frameworks (OSes) and applications. It likewise
eliminates the potential for blunder while allocating client consents. Also, with RBAC, organizations
can all the more effectively incorporate outsider clients into their organizations by giving them
predefined jobs.

ii. Upgrading consistence:- Each association should conform to nearby, state and government
guidelines. Organizations for the most part really like to carry out RBAC frameworks to meet the
administrative and legal necessities for secrecy and protection since leaders and IT offices can all the
more actually oversee how the information is gotten to and utilized. This is especially significant for
monetary establishments and medical organizations that oversee delicate information.

SRMMCET / CSE / DBMS 17


CS3492 – DATABASE MANAGEMENT SYSTEMS

iii. Giving chairmen expanded perceivability:- RBAC gives network overseers and
administrators greater perceivability and oversight into the business, while likewise ensuring approved
clients and visitors on the framework are just given admittance to what they need to take care of their
responsibilities.

iv. Diminishing gamble of breaks and information spillage:- Executing RBAC implies
confining admittance to delicate data, consequently lessening the potential for information breaks or
information spillage.

Disadvantages of Role-based Access Control:


i. Lack of Flexibility: RBAC can be inflexible, as roles and permissions are typically
assigned based on a predetermined set of rules. This can make it difficult to accommodate exceptions or
unique situations where an individual needs access to resources that are outside of their assigned role.

ii. Complexity: Implementing RBAC can be complex, requiring significant planning and
resources. Setting up roles and permissions for each user can be time-consuming and may require
ongoing maintenance to ensure that access levels remain appropriate.

iii. Lack of Granularity: RBAC can lack granularity in terms of the level of access granted to
users. This can result in situations where users have access to more resources than they need to perform
their job, which can increase the risk of data breaches and other security incidents.

iv. Overreliance on Roles: RBAC can result in overreliance on roles, which can limit the
ability to make granular access decisions based on specific user attributes, such as their level of trust or
the sensitivity of the data they are accessing.

v. Difficulty in Delegation: Delegating roles and permissions can be difficult, particularly in


larger organizations where there are many different roles and levels of access. This can result in delays
in granting access to resources, which can impact productivity and business operations.

5.11 SQL Injection


SQL Injection is a security flaw in web applications where attackers insert harmful SQL code
through user inputs. This can allow them to access sensitive data, change database contents or even take
control of the system. It’s important to know about SQL Injection to keep web applications secure.

SQL Injection (SQLi) is a security vulnerability that occurs when an attacker is able to
manipulate a web application’s database queries by inserting malicious SQL code into user input fields.
These injected queries can manipulate the underlying database to retrieve, modify, or delete sensitive

SRMMCET / CSE / DBMS 18


CS3492 – DATABASE MANAGEMENT SYSTEMS

data. In some cases, attackers can even escalate privileges, gaining full control over the database or
server.

Fig. 5.14 Process of SQL Injection

SQL Injection typically works when a web application improperly validates user input, allowing
an attacker to inject malicious SQL code. For example, if a web application takes user input (e.g., a
username or password) and directly inserts it into an SQL query without proper sanitization, an attacker
can manipulate the query to perform unintended actions.
Example
Suppose we have an application based on student records. Any student can view only his or her
records by entering a unique and private student ID.
SELECT * FROM STUDENT WHERE
STUDENT-ID == 12222345 or 1 = 1
SQL Injection based on 1=1 is always true. As we can see in the above example, 1=1 will return
all records for which this holds true. So basically, all the student data is compromised. Now the
malicious user can also similarly use other SQL queries.

Query 1:
SELECT * FROM USER WHERE
USERNAME = “” AND PASSWORD=””
Now the malicious attacker can use the ‘=’ operator to retrieve private and secure user
information. So following query when executed retrieves protected data, not intended to be shown to
users.

Query 2:
SELECT* FROM User WHERE
(Username = “” OR 1=1) AND
(Password=”” OR 1=1).
Since '1'='1' is always true, the attacker could gain unauthorized access to the application.

SRMMCET / CSE / DBMS 19


CS3492 – DATABASE MANAGEMENT SYSTEMS

Types of SQL Injection


There are several types of SQL Injection attacks, each with different methods of exploiting the
vulnerability. These include:
1. In-band SQL Injection
In-band SQL Injection is the most common type, where the attacker sends malicious SQL
queries directly through the application interface. This method allows attackers to extract sensitive
information or manipulate the database.
Example:
SELECT * FROM users WHERE id = 1; -- OR 1=1 --
This query would retrieve all users in the database because 1=1 is always true.

2. Error-based SQL Injection


This type of SQL injection exploits error messages generated by the database. Attackers can use
the information provided in error messages to learn about the database structure and craft more
sophisticated attacks.
Example:
SELECT * FROM users WHERE id = 1' -- ;
An error message could reveal details about the database schema, allowing the attacker to refine
their attack.

3. Blind SQL Injection


In blind SQL injection, the attacker does not receive error messages but can infer information
about the database by observing the behavior of the application. The attacker uses boolean conditions to
test various aspects of the database.
Example:
SELECT * FROM users WHERE id = 1 AND 1=1;
If the response is different when 1=1 is changed to 1=0, the attacker can infer information about
the database.

4. Out-of-band SQL Injection


Out-of-band SQL injection relies on the attacker using a different communication channel to
exfiltrate data from the database. This type of attack is less common but can be very effective.
Example:
SELECT * FROM users WHERE id = 1; -- ;
The attacker might direct the database to send a DNS request or HTTP request with the extracted
data.

SRMMCET / CSE / DBMS 20


CS3492 – DATABASE MANAGEMENT SYSTEMS

5. Time-based Blind SQL Injection


In this form of blind SQL injection, the attacker sends a query that causes a time delay (e.g.,
using SLEEP), allowing them to infer whether the query was true or false based on the response time.
Example:
SELECT * FROM users WHERE id = 1 AND 1=1 SLEEP(5);
If the query takes 5 seconds to execute, the attacker knows that the query is true.

Impact of SQL Injection Attacks


 Unauthorized access to sensitive data: Attackers can retrieve personal, financial, or
confidential information stored in the database.
 Data integrity issues: Attackers can modify, delete, or corrupt critical data, impacting the
application’s functionality.
 Privilege escalation: Attackers can bypass authentication mechanisms and gain administrative
privileges.
 Service downtime: SQL injection can overload the server, causing performance degradation or
system crashes.
 Reputation damage: A successful attack can severely harm the reputation of an organization,
leading to a loss of customer trust.

Detecting SQL Injection Vulnerabilities


To detect SQL injection vulnerabilities, consider the following:
 Input validation testing: Test inputs by inserting special characters like --, ;, ', or " to see if
they cause errors or unintended behavior.
 Automated tools: Use tools like SQLMap, Burp Suite, or OWASP ZAP to scan for
vulnerabilities.
 Review source code: Inspect source code for insecure coding practices such as concatenating
user inputs directly into SQL queries.
 Monitor error messages: Unexpected or detailed error messages can indicate that the
application is vulnerable.
 Penetration testing: Regularly perform penetration testing to identify security gaps.

5.12 Statistical Database security


Statistical databases are used mainly to produce statistics about various populations. The
database may contain confidential data about individuals, which should be protected from user access.
However, users are permitted to retrieve statistical information about the populations, such as averages,
sums, counts, maximums, minimums, and standard deviations. The techniques that have been developed

SRMMCET / CSE / DBMS 21


CS3492 – DATABASE MANAGEMENT SYSTEMS

to protect the privacy of individual information are beyond the scope of this book. We will illustrate the
problem with a very simple example, which refers to the relation shown in Figure 24.3. This is a
PERSON relation with the attributes Name, Ssn, Income, Address, City, State, Zip, Sex, and
Last_degree.

A population is a set of tuples of a relation (table) that satisfy some selection condition. Hence,
each selection condition on the PERSON relation will specify a particular population of PERSON
tuples. For example, the condition Sex = ‘M’ specifies the male population; the condition ((Sex = ‘F’)
AND (Last_degree = ‘M.S.’ OR Last_degree = ‘Ph.D.’)) specifies the female population that has an
M.S. or Ph.D. degree as their highest degree; and the condition City = ‘Houston’ specifies the
population that lives in Houston.

Statistical queries involve applying statistical functions to a population of tuples. For example,
we may want to retrieve the number of individuals in a population or the average income in the
population. However, statistical users are not allowed to retrieve individual data, such as the income of a
specific person. Statistical database security techniques must prohibit the retrieval of individual data.
This can be achieved by prohibiting queries that retrieve attribute values and by allowing only queries
that involve statistical aggregate functions such as COUNT, SUM, MIN, MAX, AVERAGE, and
STANDARD DEVIATION. Such queries are sometimes called statistical queries.

It is the responsibility of a database management system to ensure the confidentiality of


information about individuals, while still providing useful statistical summaries of data about those
individuals to users. Provision of privacy protection of users in a statistical database is paramount; its
violation is illustrated in the following example.

Fig. 5.15 Person relation schema

In some cases it is possible to infer the values of individual tuples from a sequence of statistical
queries. This is particularly true when the conditions result in a population consisting of a small number
of tuples. As an illustration, consider the following statistical queries:
Q1: SELECT COUNT (*) FROM PERSON WHERE <condition>;
Q2: SELECT AVG (Income) FROM PERSON WHERE <condition>;

Now suppose that we are interested in finding the Salary of Jane Smith, and we know that she
has a Ph.D. degree and that she lives in the city of Bellaire, Texas. We issue the statistical query Q1
with the following condition:
(Last_degree=‘Ph.D.’ AND Sex=‘F’ AND City=‘Bellaire’ AND State=‘Texas’)

SRMMCET / CSE / DBMS 22


CS3492 – DATABASE MANAGEMENT SYSTEMS

If we get a result of 1 for this query, we can issue Q2 with the same condition and find the Salary
of Jane Smith. Even if the result of Q1 on the preceding condition is not 1 but is a small number—say 2
or 3—we can issue statistical queries using the functions MAX, MIN, and AVERAGE to identify the
possible range of values for the Salary of Jane Smith.

The possibility of inferring individual information from statistical queries is reduced if no


statistical queries are permitted whenever the number of tuples in the population specified by the
selection condition falls below some threshold.

Another technique for prohibiting retrieval of individual information is to prohibit sequences of


queries that refer repeatedly to the same population of tuples. It is also possible to introduce slight
inaccuracies or noise into the results of statistical queries deliberately, to make it difficult to deduce
individual information from the results. Another technique is partitioning of the database. Partitioning
implies that records are stored in groups of some minimum size; queries can refer to any complete group
or set of groups, but never to subsets of records within a group.

The possibility of accessing individual information from statistical queries is reduced by using
the following measures –
 Partitioning of Database – This means the records of database must be not be stored as bulk in
single record. It must be divided into groups of some minimum size according to
confidentiality of records. The advantage of Partitioning of database is queries can refer to any
complete group or set of groups, but queries cannot access the subsets of records within a
group. So, attacker can access at most one or two groups which are less private.
 If no statistical queries are permitted whenever number of tuples in population specified by
selection condition falls below some threshold.
 Prohibit sequences of queries that refer repeatedly to same population of tuples.

5.13 Flow control


• Flow control is a mechanism that regulates the flow of information among accessible objects.
• A flow between two objects obj1 and obj2 occurs when program reads values from obj1 and
writes values to the object obj2.
• The flow control checks that the information contained in one object should not get transferred
to the less protected object.
• The flow policy specifies the channels along which the information is allowed to move.

SRMMCET / CSE / DBMS 23


CS3492 – DATABASE MANAGEMENT SYSTEMS

• The simple flow policy specifies two classes of information - Confidential(C) and non
confidential (N). According to flow policy only the information flow from confidential to non
confidential class is not allowed.

Convert Channel
• A covert channel is a type of attack that creates a capability to transfer information objects
between processes that are not supposed to be allowed to communicate.
• This convert channel violates the security or the policy.
• The convert channel allows information to pass from higher classification level to lower
classification level through improper means.
• The security experts believe that one way to avoid convert channels is for as programmers to
not gain the access to sensitive data.

5.14 Encryption and Public Key infrastructures


Cryptology is a technique of encoding and decoding messages, so that they cannot be understood
by anybody except the sender and the intended recipient. There are various encoding and decoding
schemes which are called as encryption schemes. The sender and recipient of the message decide on an
encoding and decoding scheme and use it for communication.

The process of encoding messages is known as encryption. The sender sends the original text.
The original text called plaintext, The encrypted form of plaintext it is called as ciphertext. This
encrypted text travel through the network. When it reaches at the receiving computer, the recipient
understands the meaning and decodes the message to extract the correct meaning out of it. This process
is called as decryption.

Fig. 5.16 Encryption and Decryption process

The sender applies the encryption algorithm and recipient applies the decryption algorithm. Both
the sender and the receiver must agree on this algorithm for any meaningful communication. The
algorithm basically takes one text as input and produces another as the output. Therefore, the algorithm
contains the intelligence for transforming message.

For example: If we want to send some message through an e-mail and we wish that nobody
except the friend should be able to understand it. Then the message can be encoded using some
intelligence. For example if the alphabets A to Z are encoded as follows-

SRMMCET / CSE / DBMS 24


CS3492 – DATABASE MANAGEMENT SYSTEMS

That means last three letters are placed in reverse order and then first three letters are in straight
manner. Continuing this logic the A to Z letters are encoded. Now if I write the message
"SEND SOME MONEY"
it will be
QBSA QRTB TRSBN
This coded message is called cipher text.

Types of Cryptography
There are two types encryption schemes based in key used for encryption and decryption.

1. Symmetric key encryption: It is also known as secret key encryption. In this method, only one key is
used. The same key is shared by sender and receiver for encryption and decryption of messages. Hence
both parties must agree upon the key before any transmission begins and nobody else should know
about it. At the sender's end, the key is used to change the original message into an encoded form. At the
receiver's end using the same key the encoded message is decrypted and original message is obtained.
Data Encryption Standard (DES) uses this approach. The problem with this approach is that of key
agreement and distribution.

2. Asymmetric key encryption: It is also known as public key encryption. In this method, different keys
are used. One key is used for encryption and other key must be used for decryption. No other key can
decrypt the message-not even the original key used for encryption. One of the two keys is known as
public key and the other is the private key. Suppose there are two users X and Y.
• X wants to send a message to Y. Then X will convey its public key to Y but the private key of
X will be known to X only.
• Y should know the private key of Y and X should know the Y's public key.

When X and Y wants to communicate: m


(i) If X wants to send a message to Y, then first of all X encrypts the message using Y's public
key. For that purpose it is necessary that X knows the Y's public key.
(ii) X then sends this encrypted to Y.
(iii)Now using Y's private key, Y decrypts X's message. Note that only Y knows his private key. It
is not possible for Y to decrypt the message using X's public key.
(iv) When Y wants to send a message to X then using X's public key Y will encrypt the message
and will send the encrypted message to X. On the other hand, X will use its own private key to
decrypt this message. Here again Y will not know the private key of X.

SRMMCET / CSE / DBMS 25


CS3492 – DATABASE MANAGEMENT SYSTEMS

Digital Signature
A digital signature is a mathematical scheme for demonstrating the authenticity of a digital
message or document. If the recipient gets a message with digital signature then he believes that the
message was created by a known sender. Digital signatures are commonly used for software
distribution, financial transactions, and in other cases where it is important to detect forgery or
tampering.

When X and Y wants to communicate with each other


(i) X encrypts the original plaintext message into ciphertext by using Y's public key.

(ii) Then X executes an algorithm on the original plaintext to calculate a Message Digest, also
known as hash. This algorithm takes the original plaintext in the binary format, apply the
hashing algorithm. As an output a small string of binary digits gets created. This hashing
algorithm is public and anyone can use it. The most popular message digest algorithms are
MD5 and SHA-1. X encrypts the message digest. For this, it uses its own private key.

(iii)X now combines the ciphertext and its digital signature (i.e encrypted message digest) and it is
sent over the network to Y.

(iv) Y receives the ciphertext and X's digital signature. Y has to decrypt both of these. Y first
decrypts ciphertext back to plaintext. For this, it uses its own private key. Thus, Y gets the
message itself in a secure manner.

(v) Now to ensure that the message has come from the intended sender Y takes X's digital
signature and decrypts it. This gives Y the message digest as was generated by X. The X had
encrypted the message digest to form a digital signature using its own private key. Therefore,
Y uses X's public key for decrypting the digital signature.

(vi) Hash algorithm to generate the message digest is public. Therefore, Y can also use it.

(vii) Now there are two message digests one created by X and other by Y. The Y now Anon
simply compares the two message digests. If the two match, Y can be sure that the message
came indeed from X and not from someone else.

Thus with digital signature confidentiality, authenticity as well as message integrity is assured.
The other important feature supported by digital signature is non-repudiation. That is, a sender cannot
refuse having sent a message. Since the digital signature requires the private key of the sender, once a
message is digitally signed, it can be legally proven that the sender had indeed sent the message.

SRMMCET / CSE / DBMS 26


CS3492 – DATABASE MANAGEMENT SYSTEMS

5.15 Challenges
Following are the challenges faced by the database security system -
(1) Data Quality
• The database community needs the solution to assess the quality of data. The quality of data
can be assessed by a simple mechanism such as quality stamps that are posted on web sites:
• The database community may need more effective technique of integrity semantic verification
for accessing the quality of data.
• Application level recovery techniques are also used to repair incorrect data.

(2) Intellectual Property Rights


• Everywhere there is increasing use of internet and intranet. Due to which there are chances of
making un-authorized duplication and distribution of the contents. Hence digital watermarking
technique is used to protect the contents from unauthorized access or ownership.
• However, research is needed to develop the techniques for preventing intellectual property right
violation.

(3) Database Survivability


• It is desired that the database systems must continue to work even after information warfare
attacks.
• The goal of information warfare attacker is to damage the organization's operation.
• Following are the corrective actions for handling this situation -
- Confinement: Take immediate action to eliminate attacker's access to the system. Isolate
the affected components to avoid further spread.
- Damage Assessment: Determine the extent of problem.
- Reconfiguration: Re-configuration allows the system to be in operation in degraded mode
while recovery is going on.
- Repair: Recover the corrupted or lost data by repairing or reinstalling the system.
- Fault treatment: Identify the weakness exploited in the attack and take steps to prevent a
recurrence.

SRMMCET / CSE / DBMS 27

You might also like