0% found this document useful (0 votes)

153 views10 pages

Jamboree

The Jamboree Case Study involves analyzing a dataset of 500 admissions records to identify factors influencing the 'Chance of Admit'. Key findings indicate that GRE Score, TOEFL Score, and CGPA are the most significant predictors, while Research experience has a minor positive impact. The study recommends focusing on academic excellence and research participation to enhance admission chances.

Uploaded by

rohitht

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

153 views10 pages

Jamboree

Uploaded by

rohitht

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Jamboree Case Study

#1. Load and Explore the Data

!gdown
https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/001/839
/original/Jamboree_Admission.csv

Downloading...
From:
https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/001/839
/original/Jamboree_Admission.csv
To: /content/Jamboree_Admission.csv
0% 0.00/16.2k [00:00<?, ?B/s] 100% 16.2k/16.2k [00:00<00:00,
44.4MB/s]

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.metrics import mean_absolute_error, mean_squared_error,
r2_score
from statsmodels.api import OLS, add_constant

# Load the dataset

df = pd.read_csv('Jamboree_Admission.csv')

# Check dataset information

print("Dataset Information:")
print(df.info())

# Check for missing values and duplicates

print("\nMissing Values per Column:\n", df.isna().sum())
print("\nDuplicate Rows:", df.duplicated().sum())

Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Serial No. 500 non-null int64
1 GRE Score 500 non-null int64
2 TOEFL Score 500 non-null int64
3 University Rating 500 non-null int64
4 SOP 500 non-null float64
5 LOR 500 non-null float64
6 CGPA 500 non-null float64
7 Research 500 non-null int64
8 Chance of Admit 500 non-null float64
dtypes: float64(4), int64(5)
memory usage: 35.3 KB
None

Missing Values per Column:

Serial No. 0
GRE Score 0
TOEFL Score 0
University Rating 0
SOP 0
LOR 0
CGPA 0
Research 0
Chance of Admit 0
dtype: int64

Duplicate Rows: 0

#2. Data Cleaning and Optimization

# Drop "Serial No." column

df = df.drop(columns=["Serial No."])

# Rename columns for consistency

df.rename(columns={'LOR ': 'LOR', 'Chance of Admit ': 'Chance of
Admit'}, inplace=True)

# Optimize Data Types

df['GRE Score'] = df['GRE Score'].astype('int16')
df['TOEFL Score'] = df['TOEFL Score'].astype('int8')
df['University Rating'] = df['University Rating'].astype('int8')
df['SOP'] = df['SOP'].astype('float32')
df['LOR'] = df['LOR'].astype('float32')
df['CGPA'] = df['CGPA'].astype('float32')
df['Research'] = df['Research'].astype('bool')
df['Chance of Admit'] = df['Chance of Admit'].astype('float32')

print("Optimized Dataset Information:")

print(df.info())

#3. Exploratory Data Analysis (EDA)

Summary Statistics
print("\nSummary Statistics:")
print(df.describe())

Summary Statistics:
Serial No. GRE Score TOEFL Score University Rating
SOP \
count 500.000000 500.000000 500.000000 500.000000
500.000000
mean 250.500000 316.472000 107.192000 3.114000
3.374000
std 144.481833 11.295148 6.081868 1.143512
0.991004
min 1.000000 290.000000 92.000000 1.000000
1.000000
25% 125.750000 308.000000 103.000000 2.000000
2.500000
50% 250.500000 317.000000 107.000000 3.000000
3.500000
75% 375.250000 325.000000 112.000000 4.000000
4.000000
max 500.000000 340.000000 120.000000 5.000000
5.000000

LOR CGPA Research Chance of Admit

count 500.00000 500.000000 500.000000 500.00000
mean 3.48400 8.576440 0.560000 0.72174
std 0.92545 0.604813 0.496884 0.14114
min 1.00000 6.800000 0.000000 0.34000
25% 3.00000 8.127500 0.000000 0.63000
50% 3.50000 8.560000 1.000000 0.72000
75% 4.00000 9.040000 1.000000 0.82000
max 5.00000 9.920000 1.000000 0.97000

Check Distributions of Numerical Variables

# Rename columns to remove any trailing spaces

df.rename(columns=lambda x: x.strip(), inplace=True)

# Visualize numerical distributions

numerical_columns = ['GRE Score', 'TOEFL Score', 'CGPA', 'Chance of
Admit']
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
for col, ax in zip(numerical_columns, axes.flatten()):
sns.histplot(df[col], kde=True, ax=ax)
ax.set_title(f"Distribution of {col}")
plt.tight_layout()
plt.show()
Categorical Variables

# Pie charts and counts

categorical_columns = ['University Rating', 'SOP', 'LOR', 'Research']
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
for col, ax in zip(categorical_columns, axes.flatten()):
if col == 'Research':
data = df[col].value_counts()
ax.pie(data, labels=['No Research', 'Research'], autopct='%.1f
%%', startangle=90)
ax.set_title("Research Experience")
else:
sns.countplot(x=df[col], ax=ax, palette='coolwarm')
ax.set_title(col)
plt.tight_layout()
plt.show()

<ipython-input-8-a09489fe7f28>:10: FutureWarning:

Passing `palette` without assigning `hue` is deprecated and will be

removed in v0.14.0. Assign the `x` variable to `hue` and set
`legend=False` for the same effect.

sns.countplot(x=df[col], ax=ax, palette='coolwarm')

<ipython-input-8-a09489fe7f28>:10: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be
removed in v0.14.0. Assign the `x` variable to `hue` and set
`legend=False` for the same effect.

sns.countplot(x=df[col], ax=ax, palette='coolwarm')

<ipython-input-8-a09489fe7f28>:10: FutureWarning:

Passing `palette` without assigning `hue` is deprecated and will be

removed in v0.14.0. Assign the `x` variable to `hue` and set
`legend=False` for the same effect.

sns.countplot(x=df[col], ax=ax, palette='coolwarm')

#4. Insights from Correlation Analysis

Heatmap
# Correlation Heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt='.2f',
linewidths=0.5)
plt.title("Feature Correlation Heatmap")
plt.show()

Key Insights: GRE Score, TOEFL Score, and CGPA show strong positive correlations with the
Chance of Admit.

Research is positively correlated but weaker compared to numerical scores.

Minimal correlation between independent variables indicates no multicollinearity issues.

#5. Feature Engineering

# Separate dependent and independent variables
X = df.drop(columns=['Chance of Admit'])
y = df['Chance of Admit']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Scale numerical columns

scaler = MinMaxScaler()
X_train_scaled = pd.DataFrame(scaler.fit_transform(X_train),
columns=X_train.columns)
X_test_scaled = pd.DataFrame(scaler.transform(X_test),
columns=X_test.columns)

#6. Modeling

Train a Linear Regression Model

# Linear Regression
linear_model = LinearRegression()
linear_model.fit(X_train_scaled, y_train)
y_pred = linear_model.predict(X_test_scaled)

# Evaluate Model
print("Linear Regression Performance:")
print("MAE:", mean_absolute_error(y_test, y_pred))
print("RMSE:", mean_squared_error(y_test, y_pred, squared=False))
print("R2 Score:", r2_score(y_test, y_pred))

Linear Regression Performance:

MAE: 0.043258852595452944
RMSE: 0.05959178252996559
R2 Score: 0.826348139603975

Visualize Results

plt.scatter(y_test, y_pred, alpha=0.7, color='blue')

plt.plot([0, 1], [0, 1], '--', color='red')
plt.title("Actual vs Predicted - Linear Regression")
plt.xlabel("Actual Chance of Admit")
plt.ylabel("Predicted Chance of Admit")
plt.show()
Compare with Ridge and Lasso Regression

ridge_model = Ridge(alpha=0.1)
ridge_model.fit(X_train_scaled, y_train)
y_pred_ridge = ridge_model.predict(X_test_scaled)

lasso_model = Lasso(alpha=0.01)
lasso_model.fit(X_train_scaled, y_train)
y_pred_lasso = lasso_model.predict(X_test_scaled)

# Compare Performances
def evaluate_model(model_name, y_pred):
print(f"{model_name} Performance:")
print("MAE:", mean_absolute_error(y_test, y_pred))
print("RMSE:", mean_squared_error(y_test, y_pred, squared=False))
print("R2 Score:", r2_score(y_test, y_pred))
print("-" * 30)

evaluate_model("Ridge Regression", y_pred_ridge)

evaluate_model("Lasso Regression", y_pred_lasso)

Ridge Regression Performance:

MAE: 0.04333556816620076
RMSE: 0.0596480092429131
R2 Score: 0.8260202930737093
------------------------------
Lasso Regression Performance:
MAE: 0.06179644627856106
RMSE: 0.0797955620364342
R2 Score: 0.6886390356620822
------------------------------

/usr/local/lib/python3.10/dist-packages/sklearn/metrics/
_regression.py:492: FutureWarning: 'squared' is deprecated in version
1.4 and will be removed in 1.6. To calculate the root mean squared
error, use the function'root_mean_squared_error'.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_regression.py
:492: FutureWarning: 'squared' is deprecated in version 1.4 and will
be removed in 1.6. To calculate the root mean squared error, use the
function'root_mean_squared_error'.
warnings.warn(

#7. Key Insights

CGPA, GRE Score, and TOEFL Score are the most significant predictors of admission chances.

Research experience provides a slight boost but is less impactful compared to scores.

SOP and LOR have minor contributions to the prediction.

Feature Importance (Linear Model Coefficients)

coefficients = pd.Series(linear_model.coef_,
index=X_train_scaled.columns).sort_values(ascending=False)
coefficients.plot(kind='barh', title='Feature Importance')
plt.show()
#8. Recommendations

Emphasize Academic Excellence: Students should focus on improving CGPA, GRE Score, and
TOEFL Score to maximize admission chances.

Encourage Research Participation: Research experience, while less significant, can be a

differentiator in competitive scenarios.

Refine Prediction Model: Consider dropping or de-emphasizing SOP in assessments as its

contribution is minimal.

Jamboree Case Study
No ratings yet
Jamboree Case Study
24 pages
Jamboree
No ratings yet
Jamboree
17 pages
Naïve Bayes Classifier Guide
No ratings yet
Naïve Bayes Classifier Guide
4 pages
Student Performance Analysis
No ratings yet
Student Performance Analysis
16 pages
Experiment 3 FDL - Jupyter Notebook
No ratings yet
Experiment 3 FDL - Jupyter Notebook
1 page
Assignment 3
No ratings yet
Assignment 3
15 pages
Jamboree
No ratings yet
Jamboree
56 pages
EDA Student
No ratings yet
EDA Student
8 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
Analyzing Student Performance in Exams Using Python
No ratings yet
Analyzing Student Performance in Exams Using Python
11 pages
Open Lab 2
No ratings yet
Open Lab 2
15 pages
List of Practical Ip065 Xii Session 2025 CKC Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 CKC Academy
19 pages
Source Code
No ratings yet
Source Code
20 pages
Assignment - 4 - Decision Tree - 014319
No ratings yet
Assignment - 4 - Decision Tree - 014319
3 pages
Class12 IP Practical File
No ratings yet
Class12 IP Practical File
6 pages
Assignment 02
No ratings yet
Assignment 02
4 pages
A09Ass02 - Jupyter Notebook
No ratings yet
A09Ass02 - Jupyter Notebook
11 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
Oil (01) 256 Eee4416
No ratings yet
Oil (01) 256 Eee4416
10 pages
Data Manipulation With Python Pandas 1700003764
No ratings yet
Data Manipulation With Python Pandas 1700003764
10 pages
Dav Practicals
No ratings yet
Dav Practicals
33 pages
DSBDA Prac2
No ratings yet
DSBDA Prac2
2 pages
DAV Prac BHR
No ratings yet
DAV Prac BHR
22 pages
Samarth Raghav
No ratings yet
Samarth Raghav
15 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
35 pages
Data Wrangling, 2
No ratings yet
Data Wrangling, 2
4 pages
Class12 IP Practical File With Outputs
No ratings yet
Class12 IP Practical File With Outputs
8 pages
Jamboree Linear Regression Version 2 Jupyter Notebook
No ratings yet
Jamboree Linear Regression Version 2 Jupyter Notebook
12 pages
First 4
No ratings yet
First 4
11 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Project Work Info
No ratings yet
Project Work Info
20 pages
IP12 Gargi
No ratings yet
IP12 Gargi
32 pages
Python Case Study
No ratings yet
Python Case Study
7 pages
CSC 222 - Data Wrangling and EDA
No ratings yet
CSC 222 - Data Wrangling and EDA
5 pages
ADS Exp3 BE9 29
No ratings yet
ADS Exp3 BE9 29
5 pages
Name: Muhammad Sarfraz Seat: EP1850086 Section: A Course Code: 514 Course Name: Data Warehousing and Data Mining
No ratings yet
Name: Muhammad Sarfraz Seat: EP1850086 Section: A Course Code: 514 Course Name: Data Warehousing and Data Mining
39 pages
CMSC320 Final Project
No ratings yet
CMSC320 Final Project
20 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
Week2 Lab
No ratings yet
Week2 Lab
8 pages
Student Performance Analysis and Prediction 2.3
No ratings yet
Student Performance Analysis and Prediction 2.3
19 pages
Dav All Practicals
No ratings yet
Dav All Practicals
35 pages
PMA Experiment 1
No ratings yet
PMA Experiment 1
9 pages
List of Practical Ip065 Xii Session 2025 CKC Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 CKC Academy
19 pages
IP XII U1 Ch3 DataHandling (DataFrame) Final
No ratings yet
IP XII U1 Ch3 DataHandling (DataFrame) Final
45 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
Final Ip NFNN
No ratings yet
Final Ip NFNN
20 pages
Aiclass
No ratings yet
Aiclass
9 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages
Lab 2 - Basic Statistical Analysis
No ratings yet
Lab 2 - Basic Statistical Analysis
7 pages
DS Manual 1
No ratings yet
DS Manual 1
96 pages
DR T V V Pavan Kumar - Assign - 2
No ratings yet
DR T V V Pavan Kumar - Assign - 2
5 pages
S24 - Bigdata Lab Final 005
No ratings yet
S24 - Bigdata Lab Final 005
9 pages
Ssce-2025 Practical Test Solution
No ratings yet
Ssce-2025 Practical Test Solution
7 pages
Practical File Class Xii
No ratings yet
Practical File Class Xii
25 pages
DataScience Assignment4
No ratings yet
DataScience Assignment4
4 pages
Class12 IP Practical Solutions
No ratings yet
Class12 IP Practical Solutions
39 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Bioceramic Materials
100% (3)
Bioceramic Materials
106 pages
2024회계년도 온실가스감축인지 결산서 분석
No ratings yet
2024회계년도 온실가스감축인지 결산서 분석
234 pages
Laser Instabilities. A Modern Perspective - Guido H.M. Van Tartwijk
No ratings yet
Laser Instabilities. A Modern Perspective - Guido H.M. Van Tartwijk
80 pages
Chlorine vs. Quaternary Ammonium
No ratings yet
Chlorine vs. Quaternary Ammonium
3 pages
Tower Load Calculations
No ratings yet
Tower Load Calculations
6 pages
LNCompact GB 2005
100% (1)
LNCompact GB 2005
90 pages
CS508 Quiz-1 File by Vu Topper RM
100% (1)
CS508 Quiz-1 File by Vu Topper RM
29 pages
2018 Offshore Wind Market Report
No ratings yet
2018 Offshore Wind Market Report
94 pages
2022 Curriculum Sample - Kindergarten - 06 2022
No ratings yet
2022 Curriculum Sample - Kindergarten - 06 2022
25 pages
Sqlite: Dirk Pawlaszczyk
No ratings yet
Sqlite: Dirk Pawlaszczyk
28 pages
Advanced Electric Drive Vehicles 1st Edition Ali Emadi Available Any Format
100% (1)
Advanced Electric Drive Vehicles 1st Edition Ali Emadi Available Any Format
72 pages
English Grade 5
0% (1)
English Grade 5
154 pages
Steinmetz System Design Under Unbalanced Conditions
No ratings yet
Steinmetz System Design Under Unbalanced Conditions
8 pages
Climate Change Lesson Plan With Drawings and Games
No ratings yet
Climate Change Lesson Plan With Drawings and Games
2 pages
Adult Language Program Report
No ratings yet
Adult Language Program Report
1 page
Puremeg: Monoethylene Glycol Reclamation and Regeneration Unit
No ratings yet
Puremeg: Monoethylene Glycol Reclamation and Regeneration Unit
8 pages
Lygio Nustatymas 2
No ratings yet
Lygio Nustatymas 2
4 pages
Gen 003 Introduction and History
No ratings yet
Gen 003 Introduction and History
3 pages
Who I Am:: True Beauty Comes From Within
No ratings yet
Who I Am:: True Beauty Comes From Within
1 page
Horizontal Alignment
No ratings yet
Horizontal Alignment
53 pages
Detailed Lesson Plan in Science Iv
No ratings yet
Detailed Lesson Plan in Science Iv
7 pages
Galactose and Fructose Metabolism 2025
No ratings yet
Galactose and Fructose Metabolism 2025
23 pages
1 Introduction To Manufacturing
No ratings yet
1 Introduction To Manufacturing
20 pages
Hero Maestro Edge FI Scooter
No ratings yet
Hero Maestro Edge FI Scooter
85 pages
Aircraft Inspection and Test Lab Fall Semester 2018 4A
No ratings yet
Aircraft Inspection and Test Lab Fall Semester 2018 4A
7 pages
Royalty Aaccounts
No ratings yet
Royalty Aaccounts
10 pages
Automotive Brake-Caliper
No ratings yet
Automotive Brake-Caliper
4 pages
1st - Math
No ratings yet
1st - Math
30 pages
Muslim Scouts in the UK
No ratings yet
Muslim Scouts in the UK
5 pages
Periodicity STR of Atom - Question-Paper
No ratings yet
Periodicity STR of Atom - Question-Paper
4 pages

Jamboree

Uploaded by

Jamboree

Uploaded by

Jamboree Case Study

#1. Load and Explore the Data

# Load the dataset

# Check dataset information

# Check for missing values and duplicates

Missing Values per Column:

#2. Data Cleaning and Optimization

# Drop "Serial No." column

# Rename columns for consistency

# Optimize Data Types

print("Optimized Dataset Information:")

#3. Exploratory Data Analysis (EDA)

LOR CGPA Research Chance of Admit

Check Distributions of Numerical Variables

# Rename columns to remove any trailing spaces

# Visualize numerical distributions

# Pie charts and counts

Passing `palette` without assigning `hue` is deprecated and will be

sns.countplot(x=df[col], ax=ax, palette='coolwarm')

sns.countplot(x=df[col], ax=ax, palette='coolwarm')

Passing `palette` without assigning `hue` is deprecated and will be

sns.countplot(x=df[col], ax=ax, palette='coolwarm')

#4. Insights from Correlation Analysis

Research is positively correlated but weaker compared to numerical scores.

Minimal correlation between independent variables indicates no multicollinearity issues.

#5. Feature Engineering

# Scale numerical columns

Train a Linear Regression Model

Linear Regression Performance:

plt.scatter(y_test, y_pred, alpha=0.7, color='blue')

evaluate_model("Ridge Regression", y_pred_ridge)

Ridge Regression Performance:

#7. Key Insights

SOP and LOR have minor contributions to the prediction.

Feature Importance (Linear Model Coefficients)

Encourage Research Participation: Research experience, while less significant, can be a

Refine Prediction Model: Consider dropping or de-emphasizing SOP in assessments as its

You might also like