0% found this document useful (0 votes)

125 views13 pages

Regrassion Analysis Lab Question and Answer

Uploaded by

ahpramanik1418862027

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views13 pages

Regrassion Analysis Lab Question and Answer

Uploaded by

ahpramanik1418862027

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Given Data:

• X (Number of Units Repaired): 1, 2, 3, 4, 4, 5, 6, 6, 7, 8, 9, 9, 10, 10

• Y (Length of Service Calls in minutes): 23, 29, 49, 64, 74, 87, 96, 97, 109, 119, 149,
145, 154, 166

Let's go through each part of the problem:

(a) Scatter Plot

We will plot Length of Service Calls (Y) against Number of Units Repaired (X) and assess
whether a linear model might be appropriate.

R Code:

R
Copy code
# Data
X <- c(1, 2, 3, 4, 4, 5, 6, 6, 7, 8, 9, 9, 10, 10)
Y <- c(23, 29, 49, 64, 74, 87, 96, 97, 109, 119, 149, 145, 154, 166)

# Scatter plot
plot(X, Y, main="Scatter Plot of Service Calls vs Units Repaired",
xlab="Number of Units Repaired", ylab="Length of Service Calls
(minutes)",
pch=19, col="blue")

# Is a linear model appropriate?

# The scatter plot should reveal whether the points are roughly
forming a straight line.

Based on the scatter plot, we can visually inspect whether the relationship between X and
Y seems linear. If the points form a straight line, a linear model is likely appropriate.

(b) Line of Best Fit

We will fit a linear regression model and plot the regression line on the scatter plot.
R Code:

R
Copy code
# Linear model
model <- lm(Y ~ X)

# Scatter plot with regression line

plot(X, Y, main="Scatter Plot with Line of Best Fit",
xlab="Number of Units Repaired", ylab="Length of Service Calls
(minutes)",
pch=19, col="blue")
abline(model, col="red")

# Output the model summary to get the regression coefficients

summary(model)

The summary of the model will provide us with the slope (β1\beta_1β1 ) and intercept
(β0\beta_0β0 ).

(c) Interpret the Intercept and Slope

• Intercept (β0\beta_0 β0 ): The predicted length of a service call when no units are
repaired (X = 0). This is the base time for a service call with no repairs.
• Slope (β1\beta_1 β1 ): The increase in the length of the service call for every
additional unit repaired.

You can interpret these values directly from the output of summary(model).

(d) Practical Sense of the Intercept

Since the intercept represents the length of a service call when 0 units are repaired, its
practical sense depends on whether service calls with no repairs occur. If a service call
where no repairs are made doesn't make sense in practice, the intercept may not have
practical value.

(e) Goodness-of-Fit (R²)

We can check the R2R^2R2 value from the model summary to assess how well the model
fits the data.

R Code:

R
Copy code
# R-squared value from the model
summary(model)$r.squared

A higher R2R^2R2 value (closer to 1) indicates a better fit of the model.

(f) ANOVA Table

The ANOVA table breaks down the variance into different components. We can use the
anova() function in R to generate the ANOVA table.

R Code:

R
Copy code
# ANOVA table
anova(model)

This will provide the sum of squares, mean squares, F-statistic, and p-value, allowing us to
comment on the overall fit of the model.

(g) Significance of the Variable

The p-value associated with the slope (β1\beta_1β1 ) in the summary output tells us if the
Number of Units Repaired is a statistically significant predictor of the Length of Service
Calls.
R Code:

R
Copy code
# Check p-value for the slope (X variable)
summary(model)$coefficients

If the p-value is less than 0.05, the variable is statistically significant.

(h) 95% Confidence and Prediction Intervals

To find the 95% confidence and prediction intervals for the length of service calls when 12
units are repaired:

R Code:

R
Copy code
# New data for prediction (when X = 12)
new_data <- data.frame(X = 12)

# 95% Confidence Interval

conf_interval <- predict(model, new_data, interval = "confidence",
level = 0.95)

# 95% Prediction Interval

pred_interval <- predict(model, new_data, interval = "prediction",
level = 0.95)

# Output both intervals

conf_interval
pred_interval

Conclusion:

By running these commands, you'll have:

1. A scatter plot with the line of best fit.

2. Regression coefficients to interpret the model.
3. Goodness-of-fit (R²) and ANOVA table.
4. Significance test for the variable (Number of Units Repaired).
5. 95% confidence and prediction intervals for the length of service calls when 12
units are repaired.

This R-based approach completes the analysis of the given data and provides insights into
the relationship between the number of units repaired and the length of service calls.

#######Problem –02#######

To solve this problem using R, we need to:

6. Fit a multiple linear regression model.

7. Check if lot size (X₂) and age of the house (X₃) individually have significant impacts.
8. Calculate R2R^2R2 and adjusted R2R^2R2.
9. Construct the ANOVA table.
10. Find the 90% confidence and prediction intervals.

Here is how we can approach each step in R:

Given Data:

• Y (Sale Price): 25.9, 29.5, 27.9, 25.9, 29.9, 29.9, 30.9, 28.9, 45.8, 36.9, 38.9, 37.9,
44.5, 37.9, 37.5, 43.9
• X₂ (Lot Size in thousands of square feet): 3.472, 3.531, 2.275, 4.050, 4.455, 4.455,
5.850, 9.520, 7.326, 8.000, 9.150, 6.727, 9.890, 5.000, 5.520, 7.800
• X₃ (Age of House in years): 42, 62, 40, 54, 42, 56, 51, 32, 31, 3, 48, 44, 50, 22, 40, 23

(a) Fit a Multiple Linear Regression Model

We will fit a multiple linear regression model where Y (Sale Price) is predicted by X₂ (Lot
Size) and X₃ (Age of House).

R Code:

R
Copy code
# Data
Y <- c(25.9, 29.5, 27.9, 25.9, 29.9, 29.9, 30.9, 28.9, 45.8, 36.9,
38.9, 37.9, 44.5, 37.9, 37.5, 43.9)
X2 <- c(3.472, 3.531, 2.275, 4.050, 4.455, 4.455, 5.850, 9.520, 7.326,
8.000, 9.150, 6.727, 9.890, 5.000, 5.520, 7.800)
X3 <- c(42, 62, 40, 54, 42, 56, 51, 32, 31, 3, 48, 44, 50, 22, 40, 23)

# Fit the multiple linear regression model

model <- lm(Y ~ X2 + X3)

# Summary of the model

summary(model)

• Intercept: The expected sale price of a house when lot size and age of the house
are both 0.
• X₂ Coefficient: How much the sale price increases for each additional 1000 square
feet in lot size, holding age constant.
• X₃ Coefficient: How much the sale price decreases (or increases) for each
additional year in the age of the house, holding lot size constant.

(b) Calculate R2R^2R2 and Adjusted R2R^2R2

We can get the values of R2R^2R2 and adjusted R2R^2R2 from the model summary. These
values tell us how well the model explains the variability in the sale price.

R Code:

R
Copy code
# R-squared and Adjusted R-squared
summary(model)$r.squared
summary(model)$adj.r.squared

• R2R^2 R2: Proportion of variance in sale price explained by the model.

• Adjusted R2R^2R2: Adjusted for the number of predictors in the model.
(c) ANOVA Table

The ANOVA table provides insights into the overall significance of the model. We can use
the anova() function to generate the ANOVA table.

R Code:

R
Copy code
# ANOVA table
anova(model)

The ANOVA table will show the sum of squares, degrees of freedom, mean squares, F-
statistic, and p-value. This helps in determining if the overall model is statistically
significant.

(d) 90% Confidence and Prediction Intervals

We need to find the 90% confidence and prediction intervals for the sale price of a house
with a lot size of 10,000 square feet and age of 50 years.

R Code:

R
Copy code
# New data for prediction (Lot size = 10, Age = 50)
new_data <- data.frame(X2 = 10, X3 = 50)

# 90% Confidence Interval

conf_interval <- predict(model, new_data, interval = "confidence",
level = 0.90)

# 90% Prediction Interval

pred_interval <- predict(model, new_data, interval = "prediction",
level = 0.90)

# Output both intervals

conf_interval
pred_interval

• Confidence Interval: Gives a range in which the mean sale price is likely to fall.
• Prediction Interval: Gives a range in which the actual sale price of an individual
house is likely to fall.

Conclusion:

By running the above code, you'll be able to:

11. Fit a multiple linear regression model.

12. Interpret the impact of lot size and age on the sale price.
13. Compute and interpret R2R^2R2 and adjusted R2R^2R2.
14. Obtain the ANOVA table and comment on the overall model significance.
15. Calculate the 90% confidence and prediction intervals for a house with a given lot
size and age.

Problem 03

To solve this problem using R, we will need to address each part step by step, utilizing
various diagnostics tools in R for residual analysis. Here's how you can approach the tasks:

Steps in the R Solution:

16. Fit a linear model based on the given water flow data.
17. Generate residual plots for ordinary residuals, studentized residuals, and deleted
studentized residuals to comment on normality.
18. Identify outliers using studentized and deleted studentized residuals.
19. Identify high leverage points using twice-the-mean and thrice-the-mean rules.
20. Use Cook's distance and DFFITS to find influential observations.
21. Evaluate improvement after omission of outliers.

Let’s assume we are fitting a simple linear regression model. For this example, let’s
assume that Libby X is the independent variable, and we need to generate a dependent
variable to proceed.
(a) Normal Probability Plot of Residuals

R Code for Ordinary, Studentized, and Deleted Studentized Residuals:

R
Copy code
# Assuming the data (replace y with the actual dependent variable)
libby_x <- c(27.1, 19.7, 20.9, 18.0, 33.4, 26.1, 77.6, 15.7, 44.9,
37.0, 26.1, 21.6, 19.9, 17.6, 15.7, 35.1, 27.6, 32.6, 24.9, 26.0,
23.4, 27.6, 23.1, 38.7, 31.3, 27.8)
y <- c(23.8, 22.3, 25.6, 20.1, 35.7, 24.4, 81.2, 17.9, 49.0, 40.6,
28.0, 21.2, 20.5, 17.0, 34.0, 29.7, 36.5, 26.8, 24.0, 22.9, 27.0,
26.7, 39.1, 30.4, 31.7, 28.1)

# Fit the linear model

model <- lm(y ~ libby_x)

# Ordinary residuals
ordinary_res <- residuals(model)

# Studentized residuals
student_res <- rstudent(model)

# Deleted Studentized residuals

library(car)
deleted_student_res <- studres(model)

# Normal probability plots

par(mfrow = c(1, 3))
qqnorm(ordinary_res, main = "Ordinary Residuals")
qqline(ordinary_res)

qqnorm(student_res, main = "Studentized Residuals")

qqline(student_res)

qqnorm(deleted_student_res, main = "Deleted Studentized Residuals")

qqline(deleted_student_res)
(b) Identify Outliers Using Studentized and Deleted Studentized Residuals

We can identify potential outliers using studentized residuals. A common rule is that
residuals with absolute values greater than 3 are considered outliers.

R Code:

R
Copy code
# Identify outliers using studentized residuals
outliers <- which(abs(student_res) > 3)
outliers

# Identify outliers using deleted studentized residuals

outliers_deleted <- which(abs(deleted_student_res) > 3)
outliers_deleted

(c) Identify High Leverage Points

Leverage measures how far the independent variable values of a data point are from the
mean of the independent variables. Points with high leverage can be identified using the
twice-the-mean and thrice-the-mean rules.

R Code:

R
Copy code
# Leverage points (hat values)
hat_values <- hatvalues(model)

# Twice the mean rule

high_leverage_2mean <- which(hat_values > 2 * mean(hat_values))
high_leverage_2mean

# Thrice the mean rule

high_leverage_3mean <- which(hat_values > 3 * mean(hat_values))
high_leverage_3mean
(d) Identify Influential Observations Using Cook's Distance and DFFITS

Cook's distance and DFFITS can be used to identify influential points. Observations with
Cook's distance greater than 1 or DFFITS values greater than 2p/n2\sqrt{p/n}2p/n (where
ppp is the number of parameters, and nnn is the sample size) are considered influential.

R Code:

R
Copy code
# Cook's distance
cooks_distance <- cooks.distance(model)
influential_cooks <- which(cooks_distance > 1)

# DFFITS
dffits_values <- dffits(model)
p <- length(coef(model)) # Number of parameters
n <- length(y) # Number of observations
dffits_threshold <- 2 * sqrt(p / n)
influential_dffits <- which(abs(dffits_values) > dffits_threshold)

# Display results
influential_cooks
influential_dffits

(e) Evaluate Improvement After Omitting Outliers

If outliers are found, we can rerun the regression after removing them and check if the
residuals improve.

R Code:

R
Copy code
# Remove outliers identified earlier
y_no_outliers <- y[-outliers]
libby_x_no_outliers <- libby_x[-outliers]
# Refit the model without outliers
model_no_outliers <- lm(y_no_outliers ~ libby_x_no_outliers)

# Residual analysis after removing outliers

ordinary_res_no_outliers <- residuals(model_no_outliers)
student_res_no_outliers <- rstudent(model_no_outliers)
deleted_student_res_no_outliers <- studres(model_no_outliers)

# Normal probability plots after removing outliers

par(mfrow = c(1, 3))
qqnorm(ordinary_res_no_outliers, main = "Ordinary Residuals (No
Outliers)")
qqline(ordinary_res_no_outliers)

qqnorm(student_res_no_outliers, main = "Studentized Residuals (No

Outliers)")
qqline(student_res_no_outliers)

qqnorm(deleted_student_res_no_outliers, main = "Deleted Studentized

Residuals (No Outliers)")
qqline(deleted_student_res_no_outliers)

Interpretation:

• (a) Normality: You will examine the normal probability plots to assess whether the
residuals appear normally distributed.
• (b) Outliers: By identifying observations with studentized and deleted studentized
residuals greater than 3 in absolute value, we detect outliers.
• (c) Leverage: Leverage points are identified if their leverage values are twice or
thrice the average leverage.
• (d) Influence: Influential observations are those with high Cook's distance or
DFFITS values.
• (e) Improvement: After removing outliers, we check if the normality of residuals
improves.

This approach will help you analyze the water flow data using various residual diagnostics
in R.
4o

Multiple Linear Regression in R
No ratings yet
Multiple Linear Regression in R
5 pages
Data Scinece Practical File
No ratings yet
Data Scinece Practical File
23 pages
MIT 302 - Statistical Computing II - Tutorial 03
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 03
16 pages
Group 6 Solution For Assignment
No ratings yet
Group 6 Solution For Assignment
17 pages
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
No ratings yet
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
8 pages
BZAN 535: Linear Regression
No ratings yet
BZAN 535: Linear Regression
11 pages
Make Up Cat
No ratings yet
Make Up Cat
6 pages
Homework 2
100% (1)
Homework 2
14 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
Problem Set 3: General Guideline
No ratings yet
Problem Set 3: General Guideline
12 pages
Chapter 8 B - Trendlines and Regression Analysis
No ratings yet
Chapter 8 B - Trendlines and Regression Analysis
73 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Regression Model Diagnostics
No ratings yet
Regression Model Diagnostics
7 pages
WEEK
No ratings yet
WEEK
17 pages
Regression Analysis Using R
No ratings yet
Regression Analysis Using R
17 pages
R Unit 4th and 5th
No ratings yet
R Unit 4th and 5th
17 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
7 pages
Assignment 4
No ratings yet
Assignment 4
7 pages
Topic 7-Regression Analysis
No ratings yet
Topic 7-Regression Analysis
56 pages
Wilkinson PS4
No ratings yet
Wilkinson PS4
4 pages
Lab 10 Forest Regression
No ratings yet
Lab 10 Forest Regression
5 pages
Data Analytics CS 605
No ratings yet
Data Analytics CS 605
4 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Regression Diagnostics With R: Anne Boomsma
No ratings yet
Regression Diagnostics With R: Anne Boomsma
23 pages
2023 Tutorial 11
No ratings yet
2023 Tutorial 11
7 pages
R Lab 1
No ratings yet
R Lab 1
5 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
Exam 1 Notes
No ratings yet
Exam 1 Notes
2 pages
Sample Exam For ML YSZ Sample For Machine Lerning - CMNKNVMNCS."NMD, MN, MVN, MDNV, MNDV MC, MDN, MDCNVM, NDV, M Ccwdmnbnbew, Mwbe
No ratings yet
Sample Exam For ML YSZ Sample For Machine Lerning - CMNKNVMNCS."NMD, MN, MVN, MDNV, MNDV MC, MDN, MDCNVM, NDV, M Ccwdmnbnbew, Mwbe
4 pages
R Programming Student Lab Manual-52-63-3-12
No ratings yet
R Programming Student Lab Manual-52-63-3-12
10 pages
Regression An Ova
No ratings yet
Regression An Ova
24 pages
Pratapa P Evidence of Learning 4
No ratings yet
Pratapa P Evidence of Learning 4
2 pages
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
No ratings yet
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
9 pages
Lecture 12
No ratings yet
Lecture 12
5 pages
Regression Analysis
No ratings yet
Regression Analysis
52 pages
Linear Regression Assumptions and Diagnostics in R - Essentials - Articles - STHDA
No ratings yet
Linear Regression Assumptions and Diagnostics in R - Essentials - Articles - STHDA
21 pages
Expt 4
No ratings yet
Expt 4
5 pages
R Module 11 - Statistics
No ratings yet
R Module 11 - Statistics
35 pages
Sample Exam For ML YSZ: Question 1 (Linear Regression)
No ratings yet
Sample Exam For ML YSZ: Question 1 (Linear Regression)
4 pages
H-311 Linear Regression Analysis With R
100% (1)
H-311 Linear Regression Analysis With R
71 pages
BDA Lab Manual (12 Weeks)
No ratings yet
BDA Lab Manual (12 Weeks)
22 pages
9 W9INSE6220 Fall 2023
No ratings yet
9 W9INSE6220 Fall 2023
42 pages
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
No ratings yet
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
23 pages
Advanced - Linear Regression
No ratings yet
Advanced - Linear Regression
57 pages
Assignment3 A20
No ratings yet
Assignment3 A20
3 pages
DA All
No ratings yet
DA All
15 pages
Lec 11
No ratings yet
Lec 11
4 pages
Sakhil Assignment 02
No ratings yet
Sakhil Assignment 02
8 pages
R Code For Linear Regression Analysis 1 Way ANOVA
No ratings yet
R Code For Linear Regression Analysis 1 Way ANOVA
8 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
BES - Lecture 10 - Simple Linear Regression
No ratings yet
BES - Lecture 10 - Simple Linear Regression
15 pages
R Lab 3
No ratings yet
R Lab 3
7 pages
King County House Price Prediction
No ratings yet
King County House Price Prediction
10 pages
Unit 4 - R Programming
No ratings yet
Unit 4 - R Programming
26 pages
Statistics MCQ
100% (1)
Statistics MCQ
15 pages
Length of A Confidence Interval
No ratings yet
Length of A Confidence Interval
18 pages
De13 Mortari
No ratings yet
De13 Mortari
37 pages
Introduction To Statistical Learning
No ratings yet
Introduction To Statistical Learning
16 pages
Senior High Probability Quiz
No ratings yet
Senior High Probability Quiz
6 pages
Beta-4 Manual Supplement
No ratings yet
Beta-4 Manual Supplement
10 pages
Causal Inference and Research Design Scott Cunningham (Baylor)
100% (1)
Causal Inference and Research Design Scott Cunningham (Baylor)
1,056 pages
Detecting Multicollinearity in Regression Analysis: Keywords
No ratings yet
Detecting Multicollinearity in Regression Analysis: Keywords
4 pages
INDR 372 Selected Solutions of Review Exercises For The Midterm Exam
No ratings yet
INDR 372 Selected Solutions of Review Exercises For The Midterm Exam
15 pages
Important Question - Iat 2
No ratings yet
Important Question - Iat 2
10 pages
Chi-Square Tests for Independence
No ratings yet
Chi-Square Tests for Independence
2 pages
The Normal Distribution MS
No ratings yet
The Normal Distribution MS
7 pages
MPRA Paper 13560
No ratings yet
MPRA Paper 13560
246 pages
Basics of ML W Solution - Pages
No ratings yet
Basics of ML W Solution - Pages
3 pages
Module 4 Production System - Forecasting
No ratings yet
Module 4 Production System - Forecasting
9 pages
L3 Overview of ML Model Development Lifecycle-1
No ratings yet
L3 Overview of ML Model Development Lifecycle-1
30 pages
Converting A Normal Random Variable To A Standard Normal Variable and Vice Versa
No ratings yet
Converting A Normal Random Variable To A Standard Normal Variable and Vice Versa
13 pages
Data Mining Suggestions
No ratings yet
Data Mining Suggestions
5 pages
Unit 1 2000 PDF
No ratings yet
Unit 1 2000 PDF
253 pages
Uji Beda 2 Mean
No ratings yet
Uji Beda 2 Mean
42 pages
Chapter 7 (Forecasting)
No ratings yet
Chapter 7 (Forecasting)
37 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
The Smartpls Analyzes Approach in Validity and Reliability of Graduate Marketability Instrument
No ratings yet
The Smartpls Analyzes Approach in Validity and Reliability of Graduate Marketability Instrument
16 pages
Machine Learning One Shot Formatted
No ratings yet
Machine Learning One Shot Formatted
2 pages
Oneway ANOVA
No ratings yet
Oneway ANOVA
38 pages
InferenceForMeans Hotelling PDF
No ratings yet
InferenceForMeans Hotelling PDF
23 pages
Lecture Notes 8.2 Testing A Claim About A Proportion
No ratings yet
Lecture Notes 8.2 Testing A Claim About A Proportion
3 pages
A Multivariate Heavy-Tailed Integer-Valued GARCH Process With EM
No ratings yet
A Multivariate Heavy-Tailed Integer-Valued GARCH Process With EM
21 pages
Data Analysis Report On "Food For Fork"
No ratings yet
Data Analysis Report On "Food For Fork"
14 pages
CHAPTER 4: FORECASTING - Suggested Solutions: Summer II, 2009
No ratings yet
CHAPTER 4: FORECASTING - Suggested Solutions: Summer II, 2009
9 pages