Dbms Model QP
Dbms Model QP
PART – A
1. A) List any two differences between a database and a file system. ( 10 X 2 = 20 M)
B) Mention two types of attributes in ER diagrams.
C) Define integrity constraints with examples.
D) What is the purpose of the ALTER TABLE command?
E) Write a basic SQL query using the SELECT and WHERE clauses
F) List two numeric functions available in SQL.
G) Define the Normalization?
H) List three advantages of normal forms.
I) Write the need of serializability in transaction management.
J) Differentiate primary index from secondary index.
PART - B
2. A) Explain the architecture of DBMS with a neat sketch. [10M]
Or
B) Compare and contrast various Data Models. [5M]
C) Explain the Generalization and Specialization with suitable ER Modeling. [5M]
OR
B) Explain the relational algebra opertions with suitable examples. [10M]
4.A) Explain the different Date,Numeric and String function with an example. [10M]
OR
B) Define the Join in SQL? And explain the types of joins with syntax and example. [5M]
C)Define the View in SQL? And explain the types of joins with syntax and example. [5M]
5.A) Define the Normalization and Explain the different normal forms? [10M]
OR
B) Elaborate the importance of computing closure of functional dependencies.
Explain the procedure with an example. [10M]
Answers
1. A) List any two differences between a database and a file system.
Ans)Data Management and Querying:
File System: Stores data in files and folders; retrieval requires manual
parsing and handling by custom programs.
Database: Manages data using structured query languages (like SQL),
supporting efficient searching, filtering, and relationships between data.
Data Integrity and Security:
File System: Limited mechanisms for ensuring data integrity and access
control.
Database: Enforces constraints (like primary keys, foreign keys) and
supports fine-grained access control and transaction management to ensure
data consistency and security.
3
I) Write the need of serializability in transaction management.
Ans) Serializability is essential in transaction management to ensure the
correctness and consistency of the database when multiple
transactions are executed concurrently. It guarantees that the outcome
of executing multiple transactions concurrently is the same as if the
transactions were executed one after another in some serial order.
Here's why it is needed:
1. Maintains Database Consistency: Ensures that concurrent transactions do
not violate the consistency constraints of the database.
2. Avoids Anomalies: Prevents concurrency-related issues such as lost
updates, temporary inconsistency, and uncommitted data being read (dirty
reads).
3. Preserves Isolation: Serializability upholds the isolation property of
ACID (Atomicity, Consistency, Isolation, Durability), ensuring each
transaction appears isolated from others.
4. Predictable Results: Enables predictable and reproducible outcomes,
critical for applications where data integrity is crucial.
The interactive query processor helps the database system to simplify and facilitate
access to data. It consists of DDL(Data Definition Language) interpreter, DML(Data
Manipulation Language) compiler and query evaluation engine.
The following are various functionalities and components of query processor
DDL interpreter: This is basically a translator which interprets the DDL
statements in data dictionaries.
5
DML compiler: It translates DML statements query language into an
evaluation plan. This plan consists of the instructions which query evaluation
engine understands.
Query evaluation engine: It executes the low-level instructions generated by
the DML compiler.
When a user issues a query, the parsed query is presented to a query optimizer, which
uses information about how the data is stored to produce an efficient execution plan
for evaluating the query. An execution plan is a blueprint for evaluating a query. It is
evaluated by query evaluation engine.
In database architecture, the Query Processor is responsible for interpreting and
executing database queries. It acts as a bridge between the user’s query and the
database engine, ensuring that the query is understood, optimized, and executed
efficiently. The main components of a Query Processor are:
Process:
o Evaluates different query execution plans.
o Considers factors like indexing, join methods, and data distribution.
o Chooses the optimal plan based on cost estimation (e.g., CPU usage,
memory, and I/O operations).
4. Query Plan Generator
Function: Generates a sequence of operations to execute the query.
6
o Process: Performs operations like scanning, joining, sorting, and
filtering.
o Communicates with the storage engine to access or update the data.
6. Runtime Database Manager : Coordinates between the query executor and the
storage manager.
Process: Ensures data integrity and consistency during the query execution
process.
In database architecture, the Storage Manager is responsible for handling the
storage, retrieval, and update of data in the database. It acts as an interface between
the low-level data stored on physical storage devices and the higher-level query
processor components. Its primary role is to efficiently manage data storage, access,
and integrity.
In the above figure, Electronics is the root node which has two children i.e. Televisions
and Portable Electronics. These two has further children for which they act as parent.
For example: Television has children as Tube, LCD and Plasma, for these three
7
Television act as parent. It follows one to many relationship.
Advantages of the Hierarchical Data Model
Because of its tree form, it is easy to grasp.
Retrieving data in a one-to-many connection is efficient.
Disadvantages of the Hierarchical Data Model
Inflexibility in reorganizing data.
accessing complicated data structures may be challenging.
redundant data storage, which might cause anomalies and inconsistencies.
In the above figure, Project is the root node which has two children i.e. Project 1 and
Project 2. Project 1 has 3 children and Project 2 has 2 children. Total there are 5 children
i.e Department A, Department B and Department C, they are network related children as
we said that this model can have more than one parent. So, for the Department B and
Department C have two parents i.e. Project 1 and Project 2.
Advantages of the Network Data Model
Because of its numerous parent ties, it is more adaptable than the hierarchical
approach.
Ideal for managing intricate, many-to-many connections.
Disadvantages of the Network Data Model
Increased complexity in database design and management.
requires complex programming in order to manage and work with data.
What is a Relational Data Model?
The relational data model was developed by E.F. Codd in 1970. There are no physical
links as they are in the hierarchical data model. Following are the properties of the
relational data model :
Data is represented in the form of table only.
It deals only with the data not with the physical structure.
It provides information regarding metadata.
At the intersection of row and column there will be only one value for the tuple.
It provides a way to handle the queries with ease.
8
Advantages of the Relational Data Model
High data independence and flexibility.
offers robust and user-friendly querying features.
removes duplication by use of normalization.
Disadvantages of the Relational Data Model
For certain kinds of straightforward data retrieval tasks, they may not perform as
well as hierarchical models.
demands a deeper comprehension of SQL and normalization principles.
Generalization
Specialization
In specialization, an entity is divided into sub-entities based on its characteristics. It is a top-
down approach where the higher-level entity is specialized into two or more lower-
level entities. For Example, an EMPLOYEE entity in an Employee management system can
be specialized into DEVELOPER, TESTER, etc. as shown in Figure 2. In this case, common
9
attributes like E_NAME, E_SAL, etc. become part of a higher entity (EMPLOYEE), and
specialized attributes like TES_TYPE become part of a specialized entity
(TESTER).Specialization is also called as ” Top-Down approch”.
Specialization
Example of Relation
10
Aggregation
An ER diagram is not capable of representing the relationship between an entity and a
relationship which may be required in some scenarios. In those cases, a relationship
with its corresponding entities is aggregated into a higher-level entity. Aggregation is
an abstraction through which we can represent relationships as higher-level entity sets.
For Example, an Employee working on a project may require some machinery. So,
REQUIRE relationship is needed between the relationship WORKS_FOR and entity
MACHINERY. Using aggregation, WORKS_FOR relationship with its entities
EMPLOYEE and PROJECT is aggregated into a single entity and relationship
REQUIRE is created between the aggregated entity and MACHINERY.
Notation: Relation1⟕Relation2
d) Right Outer Join (⟖) – Keeps All Right Table Rows
Returns all rows from the right table and matching rows from the left table.
Notation: Relation1⟖Relation2
e) Full Outer Join (⟗) – Keeps All Rows from Both Tables
Returns all rows when there is a match, and NULLs where there is no match.
Notation: Relation1⟗Relation2
4.A) Explain the different Date,Numeric and String function with an example.
Dual Table :- Dual table is a dummy table, It having only one row and one
column, Value of the dual table is ‘x’ . By using dual table we can perform
arithmetic Operations and can use date retrieval functions.
A) Date Functions:- SQL supports the following date functions.
1) sysdate :- sysdate returns the today’s sys date value.
Syntax : SQL> select sysdate from dual;
SYSDATE
---------
28-FEB-25
2)sysdate + Number :- it returns the date value after adding some number of
days.
SQL> select sysdate,sysdate+90 from dual;
SYSDATE SYSDATE+9
--------- ---------
28-FEB-25 29-MAY-25
3)last_day(date) :- It returns the last day of the month.
SQL> select sysdate,last_day(sysdate) from dual;
SYSDATE LAST_DAY
--------- ---------
28-FEB-25 28-FEB-25
13
4)next_day(date):- It returns next date value by passing next day value.
SYSDATE NEXT_DAY
--------- ---------
28-FEB-25 07-MAR-25
5)months_between(Date1,Date2):- It returns number of months between the
two dates.
SQL> select months_between(sysdate,'28-Feb-24') from dual;
MONTHS_BETWEEN(SYSDATE,'28-FEB-24')
-------------------------------------------------------------
12
B)Numeric Functions :-
1)abs(value):- It returns the absolute value.
SQL> SQL> select abs(-11), abs(12) from dual;
ABS(-11) ABS(12)
---------- ----------
11 12
14
----------
5
8) SQL> select least(11,22),greatest(11,22) from dual;
LEAST(11,22) GREATEST(11,22)
------------ ---------------
11 22
C)String Functions :-
1)InitCap(‘String’):- The InitCap() function in SQL is used to capitalize the
first letter of each word in a string while converting the rest of the letters to
lowercase.
SQL> select InitCap('my name is naresh') from dual;
INITCAP('MYNAMEIS
-----------------
My Name Is Naresh
2)Length(‘String’):- The LENGTH() function in SQL is used to return the
number of characters in a string.
SQL> select length('Srinivas') from dual;
LENGTH('SRINIVAS')
------------------
8
2)EQUI JOIN:- Returns records that have matching values in both tables
Syntax :
SQL> Select table1.column1,table1.column2, table2.column1,
table2.column2 from table1,table2
where table1.column= table1.column;
3.A)LEFT (OUTER) JOIN: Returns all records from the left table, and the
matched records from the right table
Syntax :
SQL> Select table1.column1,table1.column2, table2.column1,
table2.column2 from table1,table2
where table1.column= table1.column(+);
3.B)RIGHT (OUTER) JOIN: Returns all records from the right table, and the
matched records from the left table.
Syntax :
SQL> Select table1.column1,table1.column2, table2.column1,
table2.column2 from table1,table2
where table1.column(+)= table1.column;
3.C)FULL (OUTER) JOIN:- Returns all records when there is a match in
either left or right table
Syntax :
SQL> Select table1.column1,table1.column2, table2.column1,
table2.column2 from table1,table2
where table1.column= table1.column(+)
Union
Select table1.column1,table1.column2, table2.column1,
table2.column2 from table1,table2
where table1.column(+)= table1.column;
4)Self Join :-A self join is a type of join operation in a relational database
where a table is joined with itself. It allows you to combine rows from the same
table based on a related condition.
Syntax :
SQL>SELECT column_name(s) FROM table1 T1, table1 T2
17
WHERE condition;
5.A) Define the Normalization and Explain the different normal forms?
Ans) Types of Normal Forms
First Normal Form (1NF): This is the most basic level of normalization. In 1NF,
each table cell should contain only a single value, and each column should have a
unique name.
1)Data Must be in a Tabular form
2) Intersection of row and column should have atomic value.
Student Number SName Address Courses
1 Naresh NRT C, C++, JAVA
2 Suresh Guntur Oracle, MySQL
Fig: Unnormalized Relation
Student Number SName Address Courses
1 Naresh NRT C
1 Naresh NRT C++
1 Naresh NRT JAVA
2 Suresh Guntur Oracle
2 Suresh Guntur MySQL
Fig : First Normal Form
Second Normal Form (2NF): 2NF eliminates redundant data by requiring that each
non-key attribute be dependent on the primary key. This means that each column
should be directly related to the primary key, and not to other columns.
1)Data Must be in a Tabular form
2)It Should be in First Normal Form
3)It Removes Partial Function Dependencies
EmpNo EName DOJ Deptno DName Location
1 Naresh 10-Jan-25 10 CSE Bangalore
2 Suresh 20-Jan-25 10 CSE Bangalore
3 Ramesh 25-Jan-25 20 ECE HYD
4 Mahesh 30-Jan-25 20 ECE HYD
Fig : First Normal Form
Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key
attributes are independent of each other. This means that each column should be
directly related to the primary key, and not to any other columns in the same table.
1)Data Must be in a Tabular form
2)It Should be in Second Normal Form
3)It Removes Transitive Function Dependencies
Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures
that each determinant in a table is a candidate key. In other words, BCNF ensures that
each non-key attribute is dependent only on the candidate key.
For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following
two conditions:
1. It should be in the Third Normal Form.
2. for any dependency A → B, A should be a super key.
The second point sounds a bit tricky, right? In simple words, it means, that for a
dependency A → B, A cannot be a non-prime attribute, if B is a prime attribute.
Below we have a college enrolment table with columns student_id,subject and
Professor.
student_id subject professor
101 Java Naresh
101 C++ Suresh
102 Java Mahesh
103 C# Rajesh
104 Java Ramesh
In the table above:
One student can enrol for multiple subjects. For example, student
with student_id 101, has opted for subjects - Java & C++
Each subject, a professor is assigned to the student.
And, there can be multiple professors teaching one subject like we have for
Java.
What do you think should be the Primary Key?
Well, in the table above student_id, subject together form the primary key, because
using student_id and subject, we can find all the columns of the table.
One more important point to note here is, one professor teaches only one subject, but
one subject may have two different professors.
Hence, there is a dependency between subject and professor here,
19
where subject depends on the professor name.
Fourth Normal Form (4NF): 4NF is a further refinement of BCNF that ensures that
a table does not contain any multi-valued dependencies.
A database table is in Fourth Normal Form (4NF) if it's in Boyce-Codd Normal Form
(BCNF) and has no multivalued dependencies (MVDs) that are not trivial or
dependent on a candidate key.
Consider a table StudentCoursesHobbies with columns StudentID, Course,
and Hobby.
S1 Math Painting
S1 Physics Painting
S1 Math Playing
S1 Physics Playing
S2 Chemistry Reading
In this table:
StudentID determines Course and Hobby: A student can take multiple
courses and have multiple hobbies.
There is a multivalued dependency: For a single student (StudentID), there
are multiple courses and hobbies.
This table is not in 4NF: The multivalued dependencies (StudentID ->->
Course, StudentID ->-> Hobby) are not trivial and not dependent on a
candidate key.
To achieve 4NF, decompose the table into two tables:
StudentCourses: StudentID, Course
20
2. Also, a table should have at-least 3 columns for it to have a multi-valued
dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A
and B, then B and C should be independent of each other.
If all these conditions are true for any relation(table), it is said to have multi-valued
dependency.
Fifth Normal Form (5NF):
Fifth Normal Form (5NF), also known as Projection-Join Normal Form (PJ/NF), is the
highest level of database normalization, addressing join dependencies to minimize
redundancy and anomalies, especially when dealing with multi-valued facts and
relationships.
5NF eliminate redundancy and ensure data integrity by breaking down tables into
smaller, more manageable tables based on join dependencies.
Benefits:
Minimizes redundancy.
Enhances data integrity.
Simplifies data retrieval.
Drawbacks:
Can lead to a larger number of tables and relationships, potentially
making the database design more complex.
May require more complex queries.
Q) Types of Functional Dependencies?
1. Partial Dependency: A partial dependency occurs when a non-key attribute
(an attribute that's not part of any candidate key) depends on only a part of a
composite primary key (a primary key made up of multiple attributes).
Example:
StudentID CourseID StudentName
Let X be a set of attributes, and F be a set of functional dependencies. The closure of X with
respect to F, denoted as X⁺, is the set of all attributes that are functionally determined by X
using F.
Steps:
1. Start with X⁺ = X.
2. Repeat until no new attributes can be added:
o For each FD A→BA \rightarrow BA→B in F:
If A⊆X+A \subseteq X^+A⊆X+, then add B to X⁺.
Example : Given:
Relation R(A, B, C, D, E)
Set of FDs, F:
1. A→B
2. B→C
3. A→D
4. D→E
A → B ⇒ Add B ⇒ A⁺ = {A, B}
B → C (since B ∈ A⁺) ⇒ Add C ⇒ A⁺ = {A, B, C}
A → D ⇒ Add D ⇒ A⁺ = {A, B, C, D}
Result: A⁺ = {A, B, C, D, E}
This means A is a candidate key for R, as it determines all attributes.
1. Atomicity:
o Ensures that all operations within a transaction are completed
successfully. If any part of the transaction fails, the entire transaction is
rolled back.
o Think of it as "all or nothing".
2. Consistency:
o Ensures that a transaction transforms the database from one valid state
to another valid state.
However, concurrency can lead to inconsistencies. To avoid this, the DBMS uses
serializability criteria to ensure that the interleaved schedule behaves like a serial one in
terms of database correctness.
View Serializability:
1. Schedule: A sequence of read and write operations from one or more transactions.
2. Serial Schedule: Transactions are executed one after the other without overlapping.
3. View Equivalent Schedule: A non-serial schedule is said to be view equivalent to a
serial schedule if:
o Initial reads are the same.
o Reads in both schedules read the same values.
o Final writes are done by the same transactions.
24
4. View Serializable Schedule: A schedule that is view equivalent to a serial schedule.
Characteristics of a B+ Tree:
1. Balanced Tree:
o All leaves are at the same level, ensuring that the time to search, insert,
or delete is consistent and logarithmic in nature.
o This balance guarantees O(log n) time complexity for operations.
2. Internal Nodes:
o Internal nodes store only keys, not actual data (values). These keys act
as pointers to direct the search to the appropriate leaf node.
3. Leaf Nodes:
o Leaf nodes store actual data or references to the data.
o They are linked in a linked list, which enables efficient range queries
(traversing consecutive leaf nodes).
4. Ordered Keys:
o All keys in a node are stored in a sorted manner, ensuring that search
operations follow an ordered sequence.
5. M-way Tree:
o A B+ Tree is an M-way tree, meaning each node can have M children,
where M is the order of the tree. The order M determines the maximum
number of children each node can have.
6. Search Efficiency:
o The search operation is similar to a binary search, but because all data
is in the leaf nodes, it is efficient for both single lookups and range
queries.
Operations Supported by B+ Tree:
1. Search:
o Search operations start at the root node and traverse down the tree to the
appropriate leaf node by following pointers. The search complexity is
O(log n), where n is the number of keys in the tree.
o Example: To find a key, the algorithm follows the pointers from the
root through the internal nodes to the leaf node, where the key resides.
2. Insertion:
25
o Insertion starts by finding the correct leaf node to insert the key. If the
leaf node is full, it splits into two nodes, and the middle key is
propagated upwards into the parent node.
o Example: If inserting key 15 into a node, it finds the correct leaf and
inserts it. If the leaf node is full, it splits, and the middle value is moved
up.
3. Deletion:
o Deletion is done by removing the key from the appropriate leaf node. If
the deletion causes the leaf node to underflow (i.e., have fewer than the
minimum required keys), it may borrow a key from a sibling or merge
with a sibling.
Example of a B+ Tree:
Let's consider a B+ Tree of order 3 (i.e., each node can have a maximum of 3
children):
Initial Tree: Inserting values 10, 20, 5, 6, 8, 15, 25, 30.
Step 1: Insert 10, 20, and 5. The tree looks like this:
[5, 10, 20]
Step 2: Insert 6, 8. The tree will split, and the middle key (6) will propagate to the
parent node:
[10]
/ \
[5, 6] [8, 20]
Step 3: Insert 15, 25, 30. The tree expands, and keys propagate upwards:
[10, 20]
/ | \
[5, 6] [8] [15] [25, 30]
Final Tree Structure:
The leaf nodes store the actual data values, and the internal nodes store the
keys used for searching.
Advantages of B+ Tree:
1. Efficient Range Queries:
o The linked list of leaf nodes makes it very efficient to perform range queries.
You can easily traverse all leaf nodes without additional searching.
2. Balanced Structure:
o The B+ Tree remains balanced, providing predictable and consistent
performance for insert, delete, and search operations.
3. Disk Efficiency:
26
o Since B+ Trees store keys in internal nodes and actual data in leaves, they are
more disk-efficient, minimizing disk accesses and maximizing the utilization
of disk block space.
Disadvantages of B+ Tree:
1. Complexity:
o B+ Trees require complex algorithms for insertion and deletion, especially
when handling node splits and merges.
2. Memory Overhead:
o Due to the additional linked list in the leaves, the B+ Tree may require more memory
compared to other data structures like B-Trees.
27