Chapter-wise Research Report - Project 1
Chapter 1: Understanding the Dataset and Problem Statement
Learned to identify the structure and objective of the dataset. Understood the relationship between marketing
spends (TV, Radio, Newspaper) and sales. Recognized the importance of defining dependent and
independent variables.
Chapter-wise Research Report - Project 1
Chapter 2: Data Cleaning and Preprocessing
Verified the dataset for missing values, proper column names, and data types. Learned how early checks
prevent issues during modeling. No missing values were found in the given dataset.
Chapter-wise Research Report - Project 1
Chapter 3: Exploratory Data Analysis (EDA)
Applied scatterplots and pairplots to observe relationships between variables. Found strong correlation of TV
and Radio with Sales. Used visualization to form hypotheses about variable behavior.
Chapter-wise Research Report - Project 1
Chapter 4: Correlation and Statistical Understanding
Calculated correlation coefficients. Learned to interpret strength and direction of relationships. Understood
that correlation does not imply causation.
Chapter-wise Research Report - Project 1
Chapter 5: Linear Regression Modeling
Built simple and multiple linear regression models. Learned about coefficients, intercept, R² value, and
adjusted R². Found that TV and Radio are statistically significant predictors.
Chapter-wise Research Report - Project 1
Chapter 6: Model Evaluation and Interpretation
Evaluated regression models using R² and p-values. Understood implications of high R² and the risk of
including insignificant predictors like Newspaper.
Chapter-wise Research Report - Project 1
Chapter 7: Polynomial Features and Interaction Effects
Learned to include polynomial terms and interaction features to model non-linear effects. Recognized the
tradeoff between accuracy improvement and risk of overfitting.
Additional Content for Chapter 1
In this chapter, we explored the business context of the dataset, focusing on the role of advertising
The dataset comprises data from a marketing campaign, with spending figures across TV, Radio, a
We emphasized the significance of clearly defining the dependent and independent variables to bui
Initial exploration helped us hypothesize how different media channels might impact sales differentl
This understanding guided our expectations and analysis in the subsequent chapters..
Additional Content for Chapter 2
Data cleaning is crucial for reliable results in any data science project.
We checked for missing values using functions like isnull().sum() and ensured that the column nam
Data types were examined and found appropriate.
The absence of missing data simplified the preprocessing.
We also considered renaming columns for clarity but retained the original names for consistency.
This chapter underlines the importance of validating data before proceeding to modeling..
Additional Content for Chapter 3
Using seaborn and matplotlib, we conducted an in-depth exploratory data analysis.
Pairplots revealed that TV and Radio spending showed a strong positive linear relationship with Sa
Boxplots helped identify the distribution and potential outliers in the dataset.
Correlation heatmaps visually supported our hypothesis about the varying impacts of each channel
These visual tools provided intuition and direction for model building..
Additional Content for Chapter 4
We computed Pearson correlation coefficients to quantify the strength of relationships between eac
The strongest correlation was observed between TV and Sales, followed by Radio.
Newspaper had a relatively weak correlation, suggesting it might not be a strong predictor.
We discussed the difference between correlation and causation and how this distinction affects bus
Additional Content for Chapter 5
Linear regression was implemented using sklearn.
We began with simple linear regression for individual predictors, followed by multiple regression inc
Model summaries provided insight into coefficients, intercepts, and R² values.
TV and Radio had statistically significant coefficients, reinforcing their importance as predictors.
Newspaper's coefficient was not significant, which raised considerations about model simplification
Additional Content for Chapter 6
Model evaluation was performed using R² and adjusted R² to assess fit quality.
We also reviewed p-values for each predictor to determine their statistical significance.
High R² values from models including TV and Radio indicated good fit, whereas including Newspap
This step taught the importance of balancing complexity with interpretability..
Additional Content for Chapter 7
Polynomial regression and interaction terms were introduced to capture non-linear and combined e
Polynomial terms like TV² and interaction terms like TV*Radio were added to improve prediction.
While model accuracy improved slightly, it also increased the risk of overfitting.
This experiment highlighted the trade-offs between model complexity and generalization capability