BTech Database Detailed Reference Guide
1. Normalization: First, Second, and Third Normal Forms
Normalization helps in organizing data in a database efficiently. Each normal form reduces
redundancy:
- First Normal Form (1NF): Eliminates repeating groups by ensuring each column has atomic values.
Example: A student table with multiple subjects in the same column can be broken into multiple
rows.
- Second Normal Form (2NF): Ensures partial dependencies are removed (non-primary key
attributes should be fully dependent on the primary key).
Example: Decomposing a table with a composite primary key where non-key attributes depend
only on part of the key.
- Third Normal Form (3NF): Eliminates transitive dependencies (attributes should depend only on
the primary key).
Example: A table where "course" depends on "department" and "department" depends on
"university". Decompose this into separate tables.
**Sample Problem**: Normalize the given unnormalized table step by step through 1NF, 2NF, and
3NF.
2. Boyce-Codd Normal Form (BCNF), Multivalued Dependency, and Fourth Normal Form
- BCNF: Deals with certain types of redundancy not handled by 3NF. In BCNF, every determinant
must be a candidate key.
- Fourth Normal Form (4NF): Handles multivalued dependencies. A relation should not have more
than one multivalued dependency.
Example: A table where a student is linked to multiple courses and hobbies needs to be
decomposed to eliminate multivalued dependency.
**Sample Problem**: Convert a table into BCNF, and handle multivalued dependencies to achieve
4NF.
3. Join Dependency and Fifth Normal Form (5NF)
- 5NF: Ensures that no lossless decomposition happens with join dependencies.
Example: If a table can be broken down into smaller tables and rejoined without loss of information,
it's in 5NF.
**Sample Problem**: Decompose a given relation into 5NF while preserving all data.
4. Physical Database Design and Query Processing
This involves designing the storage structure to optimize query performance.
- Indexing, partitioning, and query optimization techniques play a crucial role.
Example: Partitioning a large student database based on course enrollment or department for
faster query results.
**Sample Problem**: Design a physical schema for a large dataset and describe how query
processing works on it.
5. File Organization and Indexing
- Single-level Indexing: Create a single index on one attribute for quick lookups.
- Multi-level Indexing: Use a hierarchical structure to speed up access times in large datasets.
- B+ Tree Indexing: Balanced tree structure used for efficient data retrieval. B+ Trees are widely
used in databases for indexing.
- Hashing: Static and dynamic hashing help distribute data across buckets. Dynamic hashing can
expand as data grows.
**Sample Problem**: Implement a B+ tree and show how it handles insertions and deletions.
6. Relational Algebra and SQL Queries
Relational algebra is a procedural query language used to query relational databases.
- Operations include SELECT, PROJECT, JOIN, UNION, and more. These can be mapped directly
to SQL.
Example: Translate the SQL query `SELECT * FROM students WHERE age > 18` into relational
algebra.
**Sample Problem**: Write complex SQL queries and translate them into relational algebra
expressions.
7. Query Optimization Techniques
- Algebraic Optimization: Rewriting queries using relational algebra to reduce the cost of execution.
- Heuristic Optimization: Apply heuristic rules like pushing down selections early in the query plan.
- Join Optimization: Using indexes or hashing to speed up join operations.
**Sample Problem**: Optimize a given SQL query using algebraic transformations and indexing
techniques.
8. Tuple Relational Calculus
TRC is a declarative language that specifies what data to retrieve without specifying how.
- Example: A query to find all students with more than 30 credits can be written in TRC as "find all
students where credits > 30".
**Sample Problem**: Write a set of complex queries using tuple relational calculus.
9. Transaction Processing and Recovery
- ACID Properties: Atomicity, Consistency, Isolation, and Durability ensure that database
transactions are reliable.
- Serializability: Ensures that the result of executing transactions concurrently is the same as if they
were executed serially.
- Conflict Serializability: Detecting conflicts between transactions and ensuring serializability.
**Sample Problem**: Given a set of transaction schedules, determine if they are serializable and
recoverable.