Student Overview Report
Introduction
This project aims to prove that business intelligence can be extracted from student performance data by designing and
implementing a data warehouse. The project entails building a dimensional model, utilizing SQL Server to create the data
warehouse, and using SSIS to load data into it. In addition, Tableau dashboards and SSRS reports are made to show and examine
the data. The chosen topic area, stakeholder identification, business vision, learned lessons, and technical implementation
phases are all documented in this report.
Subject Area Selection
Student performance analysis has been selected as the data warehouse's topic area. This field was chosen because it has the
ability to offer insightful information about what influences academic performance, which can help educational institutions
improve student outcomes. The data includes student demographics, parental information, academic performance, and
extracurricular activities.
Stakeholder Identification
Educational institutions: administrators and teachers who may apply the findings to enhance instructional tactics and
interventions are among the important stakeholders.
Parents and students: Recipients of resources and support specifically designed for education.
Policy Makers: Organizations with the ability to create policies based on the insights gained to assist advancements in
education.
Business Scope
The goal is to build an extensive data warehouse that combines several facets of student data so that in-depth reporting and
analysis are possible. This will make it easier to:
Spot trends and patterns in the academic performance of students.
Evaluating how parental guidance and assistance affect students' academic performance.
Assessing how extracurricular activities affect students' grades.
Tracking the relationship between student absences and performance.
Insights
By analyzing the data, the following insights can be obtained:
Performance Trends: Knowing how various populations fare academically over time.
Parental Influence: Assessing how parental guidance and assistance contribute to students' academic achievement.
Extracurricular Impact: Evaluating how extracurricular activities improve students' academic performance.
Attendance Correlation: Determining the connection between academic achievement and attendance.
Dimensional Model/Star Schema
Student - star schema
The star schema developed for the data warehouse includes the following tables:
Fact Table: StudentPerformance_Fact
o StudentID (Primary Key)
o GPA
o GradeClass
Dimension Tables:
o StudentDetails_Dim
StudentID (Primary Key)
Age
Gender
Ethnicity
o ParentalInfo_Dim
StudentID (Primary Key)
ParentEducation
ParentSupport
o Academics_Dim
StudentID (Primary Key)
StudyTimeWeekly
Absences
Tutoring
o Extracurricular_Dim
StudentID (Primary Key)
Extracurricular
Sports
Music
Volunteering
Implementation
Data Warehouse Creation
SSIS: ETL Process
SQL Server Management Studio (SSMS) and SQL Server Integration Services (SSIS) were used throughout the full ETL (Extract,
Transform, Load) process.
Student Data Warehouse - SSMS
Extraction:
As whole, data was extracted from multiple source systems that including student records, parental information,
academic performance logs, and extracurricular activity records. The source data was primarily stored in Excel files,
which were imported into SSIS for further processing.
Extract - SSIS
Transformation:
Data normalization and cleansing were done at the transition stage. For example, 'Unknown' was used to fill up the
missing values in the Tutoring column of the Academics_Dim database. This guaranteed the consistency and
completeness of the dataset for analysis. Additional transformations included standardizing data formats, handling null
values, and ensuring data consistency across different tables.
Transform - set up - SSIS
Transform – SSIS
Loading:
The SQL Server database, which functions as the data warehouse, was then filled with the converted data. SQL Server
Management Studio was used to create the tables and data warehouse (SSMS). Data loading into the
StudentDataWarehouse was made easier and more smooth by the interface between SSIS and SSMS. The cleaned and
modified data were added to the tables StudentDetails_Dim, ParentalInfo_Dim, Academics_Dim, Extracurricular_Dim,
and StudentPerformance_Fact.
Loading - SSMS
SSRS Reports
Four SSRS reports were developed to demonstrate business intelligence capabilities:
SSRS set up
SSRS Report
- Average Age by Gender: Male students average 16.5 years old, while female students average 16.4 years old.
- Parental Education Levels: A pie chart displays the distribution of educational attainment among parents. Graduates (1662)
make up the majority, followed by uninstructed people (487), and postgraduates (243).
- Participation of Students in Extracurricular Activities: Of all the students, 947 engage in extracurricular activities, while 1445
do not.
- Average Weekly Study Time with Tutoring Support: Students who receive tutoring tend to study for an hour longer each week
(10.0 hours) on average than students who do not (9.7 hours).
- GPA Trends Across Grade Classes: Four grade classes' worth of GPA fluctuations are depicted in a line graph, with
corresponding GPAs of 3.5, 2.1, 1.1, and 3.3.
Tableau Dashboard
A Tableau dashboard was created to provide multi-dimensional visualizations, including:
Tableau Visuals - Dashboard
- GPA Distribution: The distribution of GPAs is displayed as a histogram, with the majority of students having GPAs in the range
of 1.5 to 2.5.
- Top 5 Students by Study Time and GPA: A bar chart displays the average weekly study time and GPA of the top 5 students,
denoted by their student IDs. With 16.209 hours, the student bearing ID 3445 has the most study time.
- The Effects of Extracurricular Activities and Tutoring on Academic Performance: Depending on whether students take part in
extracurricular activities and receive tutoring, a chart shows the average GPA and weekly study time. The students with the
highest average GPA of 2.197 are those who participate in extracurricular activities and tutoring.
- Parental Education vs. GPA: A bar chart compares average GPAs based on parental education levels, with students whose
parents have postgraduate education showing a slightly lower average GPA (1.8195) compared to those whose parents are
graduates (1.9366) or have no education (1.9334).
Conclusion
Using SQL Server, SSIS, SSRS, and Tableau, the project effectively illustrates how to construct a data warehouse for student
performance analysis. SSIS was used to manage the ETL process efficiently and guarantee the accuracy and completeness of the
data. Tableau visualizations offer comprehensive insights into a range of factors influencing student performance, empowering
stakeholders to make informed decisions based on data to improve educational results. As a future scope, the data warehouse
will be enlarged, the analyses will be improved, and new student data dimensions will be investigated.
Bibliography
1. Microsoft Documentation:
SQL Server Integration Services (SSIS): Link to SSIS Documentation
o Official documentation for SSIS, detailing how to create, deploy, and manage ETL processes.
SQL Server Management Studio (SSMS): Link to SSMS Documentation
o Official documentation for SSMS, covering the setup, configuration, and management of SQL Server databases.
SQL Server Reporting Services (SSRS): Link to SSRS Documentation
o Official documentation for SSRS, explaining how to design, deploy, and manage reports.
Appendices
Appendix A: SQL code for creating the dimension and fact tables.
-- Create StudentDetails_Dim table
CREATE TABLE StudentDetails_Dim (
StudentID INT PRIMARY KEY,
Age INT,
Gender VARCHAR(10),
Ethnicity VARCHAR(50)
);
-- Create ParentalInfo_Dim table
CREATE TABLE ParentalInfo_Dim (
StudentID INT PRIMARY KEY,
ParentEducation VARCHAR(50),
ParentSupport VARCHAR(50)
);
-- Create Academic_Dim table
CREATE TABLE AcademicPerformance_Dim (
StudentID INT PRIMARY KEY,
StudyTimeWeekly INT,
Absences INT,
Tutoring VARCHAR(50)
);
-- Create Extracurricular_Dim table
CREATE TABLE Extracurricular_Dim (
StudentID INT PRIMARY KEY,
Extracurricular VARCHAR(50),
Sports VARCHAR(50),
Music VARCHAR(50),
Volunteering VARCHAR(50)
);
-- Create StudentPerformance_Fact table
CREATE TABLE StudentPerformance_Fact (
StudentID INT PRIMARY KEY,
GPA DECIMAL(3, 2),
GradeClass VARCHAR(10)
);
Appendix B: SSIS Package and Transform SQL query.
-- SELECT StudentID, Tutoring
FROM Academics_dim
WHERE Tutoring = 'Unknown';