DBMS Course Project
Dr.Ranjeetsingh Suryawanshi
Assistant Professor
Vishwakarma Institute of Technology, Pune
Data Migration 1
• Data Migration Projects in DBMS involve transferring data between
different storage systems, formats, or computer systems.
• Data migration is the process of transferring data from one system to
another while ensuring data integrity, quality, security, and minimal
downtime.
• Types of Data Migration Projects
• Database Migration
• Moving data from one database to another (e.g., Oracle → PostgreSQL).
• Storage Migration
• Moving data from one physical storage to another (e.g., HDD to SSD or cloud
storage).
• Application Migration
• Moving data between different application platforms (e.g., CRM A to CRM B).
• Cloud Migration
• Moving on-premises databases to cloud services (e.g., AWS RDS, Azure SQL).
Data Migration
• Title: Migrating Library Management Data from MS Access to MySQL
• Steps:
• Analyze the current Access schema.
• Design the target schema in MySQL.
• Export data to CSV using Access.
• Use LOAD DATA INFILE in MySQL to import data.
• Validate foreign key constraints, relationships.
• Run sample queries to verify integrity.
Data Migration
• Cross-Platform Old Database Migration in DBMS refer to migrating legacy data from older
database systems or platforms (e.g., FoxPro, MS Access, dBase, IBM DB2) to modern
RDBMS or cloud-based platforms (e.g., MySQL, PostgreSQL, SQL Server, Oracle, MongoDB,
or cloud DBMS like AWS RDS).
• To migrate and modernize an existing legacy database system by:
• Preserving data integrity, Upgrading to a scalable, secure platform, Ensuring compatibility
with modern applications.
Legacy DB Modern Target DB
FoxPro/dBase MySQL/PostgreSQL
• Typical Use MS Access
Oracle 9i
SQL Server
Oracle 19c / PostgreSQL
Cases IBM DB2 Azure SQL / Snowflake
File-based DB Cloud-native RDBMS
• Steps in a Cross-Platform Migration Data Migration
Project • 3. Tool Selection & Data Extraction
Tools for old DBs: DBF Viewer/Converter,
• 1. Legacy System Audit
Microsoft Access Export, ODBC drivers (for
• Identify old database engine connectivity)
(e.g., .dbf FoxPro files). Export to CSV, SQL dumps, or via ETL tools.
• Understand table structure, • 4. Data Transformation
relationships, constraints. • Cleanse and normalize data.
• Review any stored procedures, • Resolve encoding issues (e.g., ASCII to UTF-
views, or triggers. 8).
• Apply transformations: date formats, NULL
• 2. Schema Mapping handling, etc.
• Map legacy data types to target • 5. Data Loading into New DB
DB data types. • Use:
• Example: Memo in Access → • LOAD DATA INFILE (MySQL),
• COPY (PostgreSQL),
TEXT in MySQL • ETL tools (Talend, Pentaho)
• Identify unsupported features • Python scripts using Pandas + SQLAlchemy
and plan replacements. • 6. Data Validation
• Row count match
• Resolve differences in indexing • Data type check
and constraints. • Referential integrity verification
Data Migration
• Challenges • Challenges
• Lack of documentation on old • Data loss or corruption
systems.
• Downtime or performance
• Missing primary keys or
normalization issues. issues
• Binary/encoded formats or • Schema incompatibility
special character sets. • Data security and
• Procedural logic compliance
(triggers/functions)
incompatible. • Large data volumes or
• Performance tuning required complex relationships
after migration.
• Referential integrity
verification
Query Optimization 2
• A Query Optimization Project in DBMS focuses on analyzing and
improving SQL queries to enhance performance, reduce execution
time, and lower resource usage especially important for large datasets or
high-traffic applications.
• Objectives:
• Improve response time of complex SQL queries.
• Reduce CPU, memory, and disk I/O consumption.
• Compare performance before and after optimization using metrics like
execution time and cost.
• Project Scope:
• Implement unoptimized and optimized queries on a large sample
database (e.g., employees, e-commerce, university).
• Use tools like EXPLAIN PLAN, SQL Profiler, ANALYZE, or Query
Execution Plans.
• Optimize queries using indexing, rewriting, joins, subqueries, and
partitioning.
Query Optimization Query Unoptimized Optimized Index
Improvement
• Performance Analysis (Baseline) ID Time Time Used
Q1 12.5 sec 1.4 sec 88.8% Yes
• Measure: Q2 8.3 sec 2.6 sec 68.6% No
• Execution time
• Cost in EXPLAIN PLAN
• Number of rows scanned
• CPU and I/O stats (optional)
Technique Description
Indexes Create indexes on filter/search columns
Join optimization Use appropriate join types and ordering
Avoid SELECT* Use only required columns
Query restructuring Convert subqueries to joins
Partitioning Use horizontal partitioning for large tables
Materialized Views Pre-compute expensive joins or aggregates
Use of ANALYZE/STATS Ensure query planner has up-to-date data
Relational Synthesis
• What is Relational Synthesis? 3
• Process of generating normalized relational schemas from:
• ER diagrams
• Functional dependencies (FDs)
• Views or queries
• Legacy flat files or data sources
• Higher-level abstractions (e.g., user inputs, form data)
• Types of Relational Synthesis Projects
• 1. ER-to-Relational Schema Conversion
• Convert ER diagrams into relational tables using mapping rules.
• 2. View/Query-Based Synthesis
• Generate base relations from a set of user-defined views or SQL queries (used in data integration
and query rewriting).
• 3. Dependency-Preserving Decomposition
• Synthesize a normalized schema (e.g., 3NF or BCNF) from functional dependencies.
• 4. Data Integration & Source Mapping
• Create a unified relational schema from multiple heterogeneous data sources.
Relational Synthesis
• Example: "Automatic Relational Schema Generator from ER Diagrams"
• Input: Entity-Relationship diagrams (text-based or graphical)
• Output: SQL DDL statements (CREATE TABLE ...)
• Features:
• Mapping rules for entity, relationship, multivalued attributes
• Normalization check (up to 3NF)
• "Functional Dependency-Based Schema Normalization Tool"
• Input: Set of attributes and functional dependencies
• Output: Decomposed schema in 3NF or BCNF
• Features:
• Attribute closure calculator
• Candidate key detection
• Dependency-preserving decomposition
Relational Synthesis
• FD-Based Synthesis
• Input: R(A, B, C, D, E)
• F = { A → B, B → C, A → D, D → E }
• Find a decomposition into 3NF relations that preserves FDs and allows
lossless join.
• Output:
• R1(A, B, D)
• R2(B, C)
• R3(D, E)
• "Reverse Engineering Legacy Flat Files to Relational Schemas"
• Input: CSV or Excel files
• Output: Normalized relational schema
• Feature: Column data profiling, inference of relationships
Data integration
• It involves combining data from multiple sources into a unified and
4
consistent format, typically stored in a relational schema.
• Objectives
• Combine multiple data sources into a centralized RDBMS.
• Maintain data consistency and integrity.
• Enable cross-source querying and analytics.
• Clean and transform data to remove duplication or conflicts.
• 1. “University Centralized Database Integration”
• Goal: Integrate data from various departmental systems (admissions,
academics, library) into a unified student information system.
• Sources:
•MS Access for admissions
•Excel sheets for library records
•MySQL database for academics
• Target: Centralized MySQL/PostgreSQL relational schema
Data integration
• 2. “E-Commerce Multi-Store Data Integration”
4
• Goal: Integrate sales and product data from different platforms (e.g.,
Shopify, Amazon, Local POS) into a relational warehouse.
• Sources:
•API JSON exports (Shopify)
•CSV downloads (Amazon)
•MySQL POS database
• Target: Product, Sales, Customer schema in RDBMS
• Challenges
• Data quality issues (missing/duplicate records)
• Conflicting keys or naming conventions
• Real-time vs batch integration
• Schema mapping inconsistencies
Query Synthesis from Natural Language 5
• It is the process of automatically converting natural language (NL)
input (e.g., “Show all students in Computer Science”) into a formal
SQL query that retrieves data from a relational database.
• Objectives
• Takes natural language input from the user,
• Understands the intent and structure of the question,
• Maps it to the underlying relational database schema,
• Generates and executes an accurate SQL query.
• Use Cases
• Querying databases without knowing SQL
• AI-powered BI dashboards
• Voice-based database querying
• Chatbots with database backend
• Accessibility support for non-technical users
Query Synthesis from Natural Language 5
• 1. NL2SQL Engine for Student Database
• Goal: Convert questions like “List all students in Computer department” into
SQL queries.
• Schema Example:
• students(student_id, name, dept, email)
• courses(course_id, name, credits)
• enrollments(student_id, course_id, grade)
• Input (NL):
• "What are the names of students who have enrolled in a database course?"
• Output (SQL):
•SELECT s.name
•FROM students s
•JOIN enrollments e ON s.student_id = e.student_id
•JOIN courses c ON e.course_id = c.course_id
•WHERE c.name LIKE '%database%';
Query Synthesis from Natural Language 5
• 2. Voice Query Interface for Sales Database
• Use speech-to-text + NL to SQL
• Useful for sales managers who want to “ask the data”
• E.g., “Total sales last month by region”
• 3. Intelligent Chatbot for Employee Records
• Chatbot backend connects to an HR database
• Questions like:
• “How many employees joined after 2020?”
• “Who is the manager of Rahul Sharma?”
Query Synthesis from Natural Language 5
• Challenges
• Ambiguity in user input
• Synonyms and plural handling (“students” vs “student”)
• Complex joins across multiple tables
• Handling GROUP BY, ORDER BY, and aggregation (SUM, AVG)
• Schema generalization (dynamic DB schemas)
Query Simulation 6
• Query simulation refers to creating an environment where users can
simulate the execution of SQL queries without affecting actual
production data. It’s used for:
•Learning and training
•Query performance prediction
•Understanding how SQL logic works
•Visualizing query execution
•Practicing complex operations (joins, subqueries, groupings)
• Objective
• To build a Query Simulation System that:
• Accepts SQL queries from users
• Executes them on simulated/sample data
• Shows step-by-step results or execution plans
Query Simulation 6
• 1. Interactive SQL Query Simulator for Education
• Goal: Help students learn how SQL works by showing query
breakdown.
• Features:
• Enter SELECT, JOIN, GROUP BY queries
• Simulate step-by-step:
• Table scan
• Filter application
• Join results
• Grouping and aggregation
• Highlight intermediate outputs
AI in RDBMS
• Smart Index Recommender
7
• Goal: Analyze a set of queries and recommend indexes
• Input:
• Query logs
• Schema
• Output:
• Suggested indexes with reasoning
• Performance comparison
• Optimizer-Aware Query Rewriter (with AI Integration)
• Goal: Use a simple rule-based or AI approach (like GPT or BERT) to rewrite
slow queries based on schema knowledge
• Example:
• Input Query: SELECT * FROM orders WHERE customer_id IN (SELECT
customer_id FROM customers WHERE region = 'East')
• Optimized: Use JOIN with index instead of subquery
Transaction Management 8
• Atomicity: All operations in a transaction succeed or none do.
• Consistency: The database moves from one valid state to another.
• Isolation: Concurrent transactions do not interfere.
• Durability: Once committed, the transaction is permanent.
• Objectives
• Demonstrates transaction execution
• Implements rollback, commit, concurrent access
• Simulates real-life scenarios (e.g., banking, inventory)
• Ensures data integrity during failures or conflicts
• Examples:
• 1. Banking System with Transaction Control
• 2. Hotel Booking System with Rollback Support-concurrent room
bookings
• 3. Transaction Deadlock Detection and Recovery
Relation Normalization 9
• Normalization is the process of organizing data in a database to
reduce redundancy and improve data integrity by dividing large
tables into smaller, well-structured tables and defining relationships
between them.
• The main normal forms are:
• 1NF: Eliminate repeating groups and ensure atomic values
• 2NF: Remove partial dependencies (only for composite keys)
• 3NF: Remove transitive dependencies
• BCNF, 4NF, 5NF: Advanced refinements
• Examples:
• 1. College Database Normalization
• 2. E-Commerce Order Database Normalization
• 3. Hospital Patient Record Normalization
Mapping of Relational Algebra to SQL project 10
• This project explores how Relational Algebra (RA) operations are
translated into equivalent SQL queries. It helps students and developers:
• Understand the theoretical foundation of SQL
• Map RA expressions (π, σ, ×, ⋈, ∪, −) to real-world SQL
• Build an interactive or demonstrative system for learning/teaching
• Objectives
• Demonstrate how each Relational Algebra operation maps to SQL.
• Build a system that:
• Accepts relational algebra queries
• Converts them to SQL
• Executes them on sample databases
• Relational Algebra to SQL Translator Tool
• Input RA:
• π name (σ age > 18 (Students))
• Output SQL:
• SELECT name FROM Students WHERE age > 18;
Embedded SQL 11
• Embedded SQL allows you to integrate SQL statements directly into a host programming language
like C, C. It enables you to:
• Execute SQL inside programs
• Combine database operations with procedural logic
• Perform advanced query-based data manipulation within applications
• Example : Student Record Management System
• Language: C/C++ with Embedded SQL Database: PostgreSQL / Oracle / MySQL
• Features:
• Add / Update / Delete student records
• Search by student ID or name
• Store attendance and marks
• Show GPA, average scores, etc.
• Learning Outcome:
• Embedded SELECT, INSERT, UPDATE
• Cursor-based loops in embedded SQL
• Dynamic SQL execution