0% found this document useful (0 votes)

28 views89 pages

Simple Linear Regression Guide

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views89 pages

Simple Linear Regression Guide

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 89

Highline Class, Busn 210

Business Statistics using Excel

Chapter 14: Simple Linear Regression
1
Topics
1. Decisions Based on Relationship Between Two or More Variables
2. Regression Analysis
3. Scatter Chart
4. Types of Relationships
5. Scatter Chart and Ybar and X Bar Lines
6. Covariance and Correlation
7. Simple Liner Regression Model
8. Assumptions about Error in Model
9. Simple Liner Regression Equation
10. Estimated Simple Linear Regression Equation
11. Calculating Slope & Y-Intercept using the Least Squares Method
12. Experimental Region
13. How to Interpret Slope and Y-Intercept
14. Prediction with Estimated Simple Liner Regression Equation
15. Residuals
16. Coefficient of Determination or R Squared
17. SST = SSR + SSE
18. Standard Error
19. Data Analysis Regression feature
20. Testing Significance of Slope and Y-Intercept (Inference)
21. Multicollinearity 2
22. Categorical Variable
23. Inference in Vary Large Samples
Decisions Based on Relationship Between Two or More Variables
• Managerial decisions are often based on the relationship between two
or more variables
• Predict/Estimate Sales (Y) based on:
• Advertising Expenditures (x1)
• Predict/Estimate Bike Price (Y) based on:
• Bike Weight (x1)
• Predict/Estimate Annual Amount Spent on Credit Card (Y) based on:
• Household Annual Income (x1)
• Education (x2).
• Predict/Estimate Stroke (Y) based on:
• Age (x1)
• Blood Pressure (x2)
• Smoking (x3) 3
X – Y Data
• Independent Variable = x
• Predictor variable
• Dependent Variable = y = f(x)
• Variable that is predicted or estimated
• Response variable

4
Regression Analysis
• Regression Analysis • Simple Regression
• A statistical procedure used to develop an • Regression analysis involving one independent
equation showing how two or more variables variable (x) and one dependent variable (y).
are related. • Linear Regression
• Allows us to Build Model/Equation to help • Regression analysis in which relationships
Estimate and Predict. between the independent variables and the
• The entire process will take us from: dependent variable are approximated by a
• Taking an initial look at data to see if there is straight line.
a relationship. • Simple Linear Regression
• Creating an equation to help us • Relationship between one independent variable
estimate/predict. and one dependent variable that is
• Assessing whether equation fits the sample approximated by a straight line, with slope and
data. intercept.
• Use Statistical Inference to see if there is a • Multiple Linear Regression (not covered in
significant relationship.
this class)
• Predict with the equation.
• Regression analysis involving two or more
• Regression Analysis does not prove a cause independent variables to create straight line
and effect relationship, but rather it helps model/equation.
us to create a model (equation) that can • Curvilinear Relationships (not covered in
help us to estimate or make predictions. this class) 5
• Relationships that are not linear.
Scatter Chart to “See” If There Is a Relationship
• Graphical method to investigate if there
is a relationship between 2 quantitative
variables
• Excel Charting:
• Independent Variable = x
• Horizontal Axis
• Left most column in data set
• Dependent Variable = y = f(x)
• Vertical Axis
• Column to right of x data column
• Always label the x and y axes.
• Use an informative chart title.
• Goal of chart: Visually, we are “looking” to To get estimated line & equation and r^2, right-
see if there is a relationship pattern. click markers in chart and click on “Add
Trendline”. Then click dialog button for “Linear”
• For our Sales (x) Ad Expense (y) data and the checkboxes for “Display equation on 6
we “see” a direct relationship. chart” & “Display R^2 on chart”. Learn about
equation & r^2 later…
Types of Relationships
Investigate if there is a relationship: With the Scatter Chart, you look to
see if there is a relationship.

Looks like “As x increases, y Looks like “As x increases, y Looks like No Relationship
increases”. Direct or Positive decreases”. Inverse or Indirect 7
Relationship or Negative Relationship
Baseball Data Scatter Charts

8
Covariance and Correlation: Numerical Measures to
Investigate if There is a Relationship
• These numerical measures will be more precise that the “Positive”,
“Negative” “No Relationship” (also “Little Relationship”) categories
that the Scatter Chart gave us.
• Numerical measures to investigate if there is a relationship
between two quantitative variables.

9
Scatter Chart and Ybar and X Bar Lines
• Scatter Charts are graphical means to find a relationship between 2
quantitative variables.
• We need a numerical measure that is more precise than our Scatter Chart.
• To understand how the numerical measure can do this, we plot a Ybar line
and Xbar line on our chart.

10
Covariance
• Measure of the linear relationship
between two quantitative variables.
1. Positive values indicate a positive
relationship; negative, a negative
relationship.
2. Close to zero means there is not
much of a relationship.
3. The magnitude of covariance is
difficult to interpret.
4. Covariance has problems with units
(like feet compared to inches).
5. We can standardize covariance by
dividing it by sx*sy to get Coefficient
of Correlation.
• In Excel use COVARIANCE.S function
for sample data:
• Y data first, x data second
11
12
Coefficient of Correlation (rxy)
• Measure the strength and direction of the linear • Investigate if there is a relationship: We will
relationship between two quantitative variables. have a number answer that indicates the
• A relative measure of strength of association
strength and direction:
1. Always a number between -1 and 1.
(relationship) between 2 variables or a measure of
2. 0 = No correlation
strength per unit of standard deviation, sx * sy . 3. Near 0.5 of -0.5 = moderate correlation
• Solves Covariance “units”/ magnitude problem. 4. Near -1 or 1 = strong correlation
• In Excel use CORREL or PEARSON functions. 5. Does not have problems with units like
Covariance does.
6. Can only be used for one independent
variable to measure a linear relationship
• As opposed to Coefficient of Determination (“r
squared” or “Goodness of Fit Test”), which can be
used for 1 or more independent variables and for
linear or non-liner relationships
• Note: Because the Correlation Coefficient
measure the strength and direction of a
LINEAR relationship, not nonlinear
relationships. If you get a correlation
measure near zero, it may be true that there
is a very weak linear relationship, but that 13
does not say that there is not some other
sort of non-linear relationship.
14
Ad Expenditures / Sales Example:
Covariance and Correlation in Excel

Covariance and Correlation look very strong.

15
Analyst Wants to See if There is a Relationship Between
BMX Racing Bike Weight and Price.

Negative/Inverse Relationship: As Bike Weight

increases, Price decreases. 16
Covariance & Correlation are negative & strong.
Ad Expenditures / Sales Example:
Chart, Covariance and Correlation Indicate a Relationship
• Now that we see that there is a relationship between the two variables, “Weekly
Ad Expense” and “Weekly Sales”, through our Scatter Chart, Covariance &
Correlation:
• We choose the simple linear regression to create an equation that will allow use
to predict and estimate sales based on advertising expenditures.

17
Overview: Simple Linear Regression
• Algebra:
• f(x) = y = m*x + b
• Statistics:
• Yhat = ŷ = b1*x + b0 (sample statistics)
• y = β1 x + β0 (population parameters)

• Slope = m = b1 = β1 = “For every one unit of x, how much does y change?”

• Intercept = b = b0 = β0 = “At what point does line cross y-axis?” or “what is y, when
x = 0?”
• The equation describes a straight line.
18
Simple Liner Regression Model
with Population Parameters

1. Simple Linear Regression Model:

y = β1x + β0 + ε
• y = Predicted value
• β1 = Slope β1 = “Beta sub 1”
• x = Value you put into the equation to try and predict the y value.
• β0 = Y-Intercept β0 = “Beta sub 0”
• ε = Error Value = random variable that accounts for the
variability in y that cannot be explained by the
liner relationship between x and y. ε = “Epsilon”
19
Because Not All Sample Points Are On The Estimated Line
We Will Get Some Error (e)

20
Assumptions About The Error Value (e) Necessary for the “Least
Squares Method” of calculating b1 and b0.
1. The assumption of bell shape for
errors, indicates that right on the
line, the mean of the error value at
any particular x is zero. E(e) = 0.
• This means that we can use the
slope and intercept (β1 & β0) as
constants.
2. Total population can be thought of
as having sub-populations.
• For each x value there is a range of
possible y values (sub-population).
• The Bell Shaped distribution is an
assumption about the possibility of
getting a y value above or below the
line for a given x value.
3. The error (e) variation will be
constant
• E(y|x) = β1x + β0
• Describes the line down the middle, 21
where ε = 0.
• Is the mean of all the y values and sits
exactly on the line.
Simple Liner Regression Equation
with Population Parameters

2. Simple Linear Regression Equation:

E(y|x) = β1x + β0
• E(y|x) = Expected Value or Mean
of all the y values at a particular x value.
• E(y|x) = β1x + β0 describes s straight line
down the middle, where ε = 0.

22
Sample Slope and Y-Intercept
• Because population parameters for slope & intercept are not usually known, we
estimate them using sample data in order to calculate sample statistics for slope
and y-intercept.

23
Estimated Simple Linear Regression Equation
with Sample Statistics

3. Estimated Simple Linear Regression Equation:

yhat = 𝑦! = b1x + b0 𝑦! = “y hat”
• b1 = Slope = a sample statistic that estimates the population parameter β1 = slope of estimated
regression line.
• b0 = y-intercept = a sample statistic that estimates the population parameter β0 = y-intercept of
estimates regression line.
• 𝑦! gives us two possible interpretations:

1. 𝑦! = Point estimator of E(y|x) = Estimates mean of all y values for a given x in population.
or
2. 𝑦! = Can Predict individual y value for a particular business situation.
24
• Graph of the estimated simple linear regression equation is called “estimated regression line”.
Estimation Process
for Simple Linear
Regression

25
Overview:
Least Squares Method to Derive Formula for b1 & b0

26
**For proof of formulas, see
downloadable pdf file.
Formulas for estimated
Slope (b1)
Y-intercept (b0)

• The formulas used to

estimate b1 and b0 can be
derived using Differential
Calculus in what is called the
“Least Squares Method”.
• **For proof of formulas, see
downloadable pdf file.

27
Ad Expenditures / Sales Example:
Calculate Slope and Y-intercept

28
Experimental Region
• We are not sure if the relationship is
linear outside our x sample data range.
• It is best to make predictions over the
range of the min and max of the
sample x data.
• This range is called the
“Experimental Region”
• When you make predictions outside
the Experimental Region, it is called
“Extrapolation”
• The y-intercept is often estimated
using Extrapolation
• We do not have empirical evidence
that the relationship holds outside the
experimental region; an estimate
created outside the experimental
region is an unreliable estimate. 29
Bike Weight/ Bike Price Example:
Calculate Slope and Y-intercept from Sample Data, Make a Prediction

• Can we expect that a bike can have a price of $1,732.40 (x = 0)?

• The y-intercept was estimated using Extrapolation
• The range for these entry level racing bikes is about 20 to 27 lbs.
• There are specialized pro BMX racing bikes that weigh less than 20 lbs., but they can cost upwards of $3,000. 30
• Our equation would predict that a bike weighing 17 lbs. would be $736.14…
• It is best to not use equation outside our range of min & max x sample values, our Experimental Region.
How to Interpret Slope and Y-Intercept
• Slope
• What is the correct interpretation of
slope?
• For every $1 increase in Ad Expense, the
equation predicts that Weekly Sales will
increase by $8.24.
• Y-Intercept
• What is the correct interpretation of the
y-intercept?
• When x = 0, the equation predicts that
the Weekly Sales will be $20,406.
Because 0 is not in our Experimental
Region, extrapolating what the sales will
be at x = 0 yields an unreliable estimate.
• We still can use it to make predictions
over the Experimental Region. 31
Prediction with Estimated Simple Liner Regression Equation
• ŷ = 8.24 * x + 36857.04
• Manager plans to spend
$43,000 on ads next week.
• X = 43,000
• ŷ = 8.24 * 43000 + 36857.04

• ŷ = $391,243.63

• Our prediction of $391,243.63

is reasonable because our
model is based on sample
evidence and we predicted
using an x value that is within 32
the Experimental Region.
Does the Equation Predict Perfectly?

• We need to examine something called Residuals.

33
Residuals = Y1 – ŷ = Particular Y value – Predicted Y Value
Sometimes the
Equation
Underpredicts

Vertical lines
on chart
represnet
residuals or
Sometimes the “Errors in
Equation
using
Overpredicts
Predicted
Value as
compared to
Actual Y 34
Value.
Residuals = (𝑦! − 𝑦̂! )
• Predicted Values = 𝑦!! = 𝑏" 𝑥! + 𝑏#
• Calculate predicted values using Estimated
Equation at each 𝑥!
• FORECAST function in Excel can be used for
predicted values. All it needs is an x value and the
known y and x values and it will calculate the
predicted value using the Estimated Regression
Equation Slope and Intercept.
• Residuals = (𝑌! − 𝑦!! )
• Particualr value – Predicted value
• Distance that “Original Y Sample Value” is above
or below the Estimated Line
• Vertical lines on chart represnet residuals

• Note 1:
• Sum of Yi values = Sum of predicted values (all ŷi )
• Note 2:
• Sum of residuals = 0
• Note 3:
• Residuals Squared are minimized becasue we calculated predicted values
35
with b1 & b0
Ybar Line vs. Estimated Simple Linear Equation Line
for Making Predictions
Ybar Model Yhat Model

These errors
are called These errors are
“Deviations” called
“Residuals”

Much Less Error. 36

Manager Now Has a Better Model For Predicting As
A Lot of Error when Making Predictions
Compared to a Ybar Model.
How Well Does Estimated Equation/Line
Fit the Sample Data?
• The measure we will use is called:
• Coefficient of Determination or R Squared.
• It is a measure of Goodness of Fit.

37
If All Sample Points Fell on Estimated Line, There Would
Be No Errors

38
Error in Predicting Y Using Equation

39
Total Error if we used just Ybar

40
Two Parts in Total Error

41
Total Error = Explained +Unexplained
Total Error = Regression +Error

42
Comparing “Regression” or “Error” to “Total” to measure “Goodness of Fit”…

• If we would like to compare the total “Regression” or “Error” to the “Total Error”, the
problem is:
• If we want to add up all “Total Error” or all “Unexplained” or all “Regression”:
• We would get zero!!!! 43

• No problem: We square each before adding!!!!

Squaring “Regression” or “Error” to “Total” to measure
“Goodness of Fit” Helps with the Zeros.

44
Squaring & then Summing “Total”, “Regression” & “Error”
2 2 2
3 3
𝑆𝑆𝑇 = % 𝑌/ − 𝑌( 3
𝑆𝑆𝑅 = % 𝑌*/ − 𝑌( 𝑆𝑆𝐸 = % 𝑌/ − 𝑌*/
/01 /01 /01
• Sum of Squares Total. • Sum of Squares due to • Sum of Squares due to Error.
• Measure of error involved Regression. • Measure of how far away
in using Ybar to make a • Measure of how far Particular Value is away from
prediction. Predicted Value is away Predicted Value.
• Amount of SST that is
• How well sample points from Ybar. unexplained by the Estimated
cluster around Ybar line. • Amount of SST that is Regression Equation/Line.
explained by the • Unexplained part of SST.
Estimated Regression • How well sample points
Equation/Line. cluster around 𝒀 E line
• Explained part of SST. • If all Sample Points fall on
• If all Sample Points fall on Estimated Line, SSE = 0.
Estimated Line, SSR = SST. • If you have residuals already 45
calculated, you can use the
Excel function SUMSQ.
How to Think About SST and SSE
SST SSE

SSE = Total Amount of Error in using Estimated Simple

SST = Measure of Total Error in using Ybar Line as 46
Linear Equation Line as Compared to Sample Data
Compared to Sample Data Points.
Points.
How to Think About SSR

• SSR = Total Amount

that Estimated
Simple Linear
Equation Explains
Over Ybar Line.

• Measure of How
Much Better Using
Yhat is for Making
Predictions than
using Ybar.
47
Relationship Between SST, SSR and SSE
2 2 2
3 3
𝑆𝑆𝑇 = % 𝑌/ − 𝑌( 3
𝑆𝑆𝑅 = % 𝑌*/ − 𝑌( 𝑆𝑆𝐸 = % 𝑌/ − 𝑌*/
/01 /01 /01

• SST = SSR + SSE

• Because SST = SSR + SSE, we can compare the parts to the whole.
• SSR/SST = Proportion of SST that Estimated Regression
Equation/Line Explains.
• SSE/SST = Proportion of SST that is NOT Explained by the 48
Estimated Regression Equation/Line.
Coefficient of Determination
𝑺𝑺𝑹 𝑺𝑺𝑬
Coefficient of Determination = 𝒓𝟐 = = 1- =
𝑺𝑺𝑻 𝑺𝑺𝑻
𝒓𝒙𝒚 𝟐
r^2 results in a number between 0 and 1 r^2can be thought of as:

• The closer to 1, the better the • Measure of Goodness of fit of

Estimated Regression Equation/Line
fit of the Estimated Equation to to the X-Y Sample Data Points.
• The proportion of the Sum of
the X-Y Sample Data Points. Squared Total that can be explained
• r^2 = 1 by using the Estimated Regression
Equation.
• all X-Y sample markers fall on the • The proportion of the variability in
Estimated Regression Line. the dependent variable (y) that is
• SST = SSR explained by the Estimated
Regression Equation/Line.
• SSE = 0
49

• In Excel the RSQ function can be used to calculate r^2 using just the sample x & y data.
Ad Expenditures / Sales Example:
Calculate Coefficient Of
Determination

50
The Closer to 1, the Better the Fit.

51
Coefficient of Determination
• From page 137 in our Essentials of Business Analytics textbook (ISBN10: 1-285-18726-1):
• For typical data in the social and behavioral sciences, values of r^2 as low as 0.25
are often considered useful.
• For data in physical and life sciences, r^2 values of 0.60 or greater are often
found, and in some cases , r^2 values greater than 0.90 can be found.
• In business applications, r^2 values vary greatly, depending on the unique
characteristics of each applications.

52
Compare Coefficient Of Determination &
Coefficient Of Correlation
C. Of Correlation = rxy C. Of Determination = r^2
• rxy = (Sign of b1)*SQRT(r^2). • r^2 = (rxy)^2.
• Number between -1 and 1. • Number between 0 and 1.
• Measures strength and direction of • Measure strength and goodness of
liner relationship between one fit of relationship.
independent variable and one • Can be used on linear or non-
dependent variable. linear relationships.
• Can be used for one or more
• Only for liner relationships. independent variables.
• Only for one independent variable. • Referred to as R^2 in Multiple
Regression 53
Estimates for Variance & Standard Deviation of the Estimated
Regression Equation
3
𝑆𝑆𝐸 = % 𝑌/ − 𝑌*/ = 𝑇𝑜𝑡𝑎𝑙 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝐸𝑟𝑟𝑜𝑟

∑ *! , *- ! "
𝑀𝑆𝐸 = = Estimate of Variance for Regression Equation
.,/

∑ *! , *- ! "
𝑠= = Estimate of Standard Deviation for Regression Equation,
.,/
called “Standard Error of Estimate” or “Standard Error of y”.
(Measure in spread of Residuals).
54
• If you already have the residual values calculated, you can use the Excel function STEYX
to calculate Standard Error of Estimate. All it needs are the x and y sample point values.
Standard Error of the Estimate

55
Bike Weight/ Bike Price Example:
Calculate Coefficient Of Determination & Standard Error

56
“How Fairly A Statistic Represents Its Data Points”.

Measuring How Fairly The “Mean” Measuring How Fairly the “Estimated
Represents Its Data Points with “Standard Regression Equation” Represents Its Data
Deviation” Points with “Standard Deviation of the
Estimated Line” or “Standard Error of the
Estimate”

$
∑(𝑦# − 𝑦)
% $
∑ 𝑌# − 𝑌+#
𝑛−1 𝑛−2
57
Data Analysis, Regression feature Step 1: Dialog Box

58
Degrees of Freedom
• Degrees of freedom represents the number of independent units of information in a calculation.
• In general, Degrees of Freedom = df = n - # of estimated parameters.
• Both of these would become:

' &
∑(𝑦% − 𝑦)
𝑠!"#$ =
𝑑𝑓

&
∑ 𝑌% − 𝑌-%
𝑠'"()"**%+$ -%$" =
𝑑𝑓
59
Data Analysis, Regression feature Step 2: Output

60
What Regression Output Means

61
LINEST Array Function to deliver 10 statistics
for Linear Regression
Highlight one more
column than there are
independent variables
and 5 rows.

Remember to verify that curly

brackets are present after you used
Ctrl + Shift + Enter.

Enter Array Function

with keystrokes
Ctrl + Shift + Enter.
62
Inference and Regression
• In Regression Analysis, we took a sample and estimated the parameters for slope
and y-intercept.
• Our sample statistics for slope and y-intercept are our point estimates of the
population parameters, slope and y-intercept.
• Now we need to test to see whether or not our point estimates are reasonable or
not.

• However:

• Before we can test the reasonableness of the slope and y-intercept, we must check
to see if the assumptions necessary to use the Least Squares Regression Model are
valid 63
Conditions Necessary for Valid Inference with the Least Squares Model
• For any given combination of
values of the independent
variables x1, x2,…, xq, the
population of potential error
terms e must be:
1. Normally Distributed (Bell
Shaped)
2. Has a mean of zero
3. Has a constant variance
4. The values of e are statistically
independent
• In general, this is a concern only
when we collect data from single
entity over time, like in a time
series 64
We can visually test assumptions by examining Residual Plots
Implication of Assumptions
• e Normally Distributed (Bell Shaped)
• Implication: Because y is a linear function of
e, y is normally distributed random variable
for all values of x.
• e has a mean of zero, E(e) = 0
• Implication: The slopes and intercepts are
constants and we can use the line to
estimate or predict
• Has a constant variance
• Implication: variance is same for all x
• The values of e are statistically
independent
• Linear model can be used without an
adjustment for seasonality or other cyclical 65
patterns
Visually Testing Assumptions: Plot Residuals Against X Values
• Markers above Zero Line indicate a
Positive Residual, which is a Sample
Point that is above the Predicted Value.
• The equation under estimated as
compared to the actual Sample Point.

• Markers below Zero Line indicate a

Negative Residual, which is a Sample
Point that is below the Predicted Value.
• The equation over estimated as
compared to the actual Sample Point.

If assumptions are met we should see consistent markers above and below the Zero Line with a 66
higher frequency near the Zero Line. This plot does not supply evidence that would support
rejecting the assumptions.
You can manually plot Residuals Against x or use Data
Analysis, Regression feature
Manual Data Analysis, Regression feature

67
Visually Testing Assumptions: Plot Residuals Against X Values
1. e term (and thus dependent variable y) is Normally
Distributed.
• If this assumption is met we should see:
• The frequency of values near Zero Line should be
greater than frequency of values away from Zero Line.
2. E(e) = 0, has a mean of zero.
• If this assumption is met we should see:
• For a given x value the center of the spread of the
errors should be near the Zero Line.
• About same number of values above and below the
Zero Line.
3. Has a constant variance.
• If this assumption is met we should see: This plot does NOT provide evidence of a violation of
• Errors are symmetrically distributed above and below the conditions necessary for valid inference in
and along the Zero Line across the x values. regression.
• Spread of errors looks similar for each x.

The implication of 1) bell shaped errors, 2) mean = 0 & 3) constant variance is that the point 68
estimates are unbiased (do not tend to underpredict or overpredict), for any given combination of
values of the independent variables x1, x2,…, xq .
If assumptions are met, point estimates tend to not
underpredict or overpredict (unbiased estimate)

69
Inference in Regression is Generally Valid Unless These plots provide strong evidence of a violation of
the conditions necessary for valid inference in
You See Marked Violations Such as These: regression.

Equation does not

Not consistent
reflect relationship
Variance
between x & y

“Collect data from

single entity over
Skewness indicates
time”. Time on
e Not Normal
horizontal axis.
Distributed
Residuals not
Independent.
70
* From Essentials of Business Analytics textbook
ISBN10: 1-285-18726-1
You Can Also Plot Predicted Values Against X Values To
Check Assumptions About e (Manual Method Only)

71
If Residual Plots Show Assumptions Are Met:
• We can run Hypothesis test to check reasonableness of regression parameters β0,
β1, β2, . . . , βq .
• We can create Confidence Intervals for our predicted values of our dependent
variable.
• Hypothesis Testing (Busn 210)
• A Statistical procedure that uses sample evidence & probability theory to
determine whether a statement about the values of a parameter are reasonable
(reliable) or not.
• Confidence Intervals (Busn 210)
• From sample data we calculate a lower & upper limit to make probability
statement about how sure we are that the population parameter will lie
between the lower and upper limit.
• 95% Confidence Intervals mean that if we constructed 100 similar intervals, 72
about 95 would contain the population parameter and 5 would not.
The Logical Element to Test in Linear Regression: Slope
If the slope is zero, there is If the slope is NOT zero, there is
probably NOT a relationship probably a relationship

73
Hypothesis Test to Check if Slope/s Are Equal to Zero
• If slope/s are ALL equal to zero, then:
• E(y|x1,x2,…xq) = β0 + β1x1 + β2x2 + · · · + βqxq
becomes:
• E(y|x1,x2,…xq) = β0 + 0*x1 + 0*x2 + · · · + 0*xq
becomes:
• E(y|x1,x2,…xq) = β0
• Not a linear function of x1, x2,…, xq

• For our Hypothesis Test, our goal is to “Reject” the hypothesis “ALL Slope/s = 0”.
• If ALL Slope/s = 0”, then model would be no better than the Ybar line for making 74
predictions.
Steps For Hypothesis Testing
1. State The Null and Alternative Hypotheses.
• Null Hypothesis = H0 = All Slope/s = 0
• Alternative Hypothesis = Ha = All Slope/s <> 0
• “At Least One Slope is NOT Equal to Zero”
2. Set Level of Significance = “alpha”.
• Alpha = risk of rejecting H0 when it is TRUE.
• Alpha determines the hurdle for whether or not the test statistic just represents sample error or
there is a true “statistically significant” difference (past the hurdle).
• Alpha is used to compare against p-value. P-value <= Alpha, we Reject H0 and accept Ha
• Alpha is often 0.05 or 0.01.
• When testing the slope, when we get a statistically significant difference, it will mean:
• It is reasonable to assume that at least one of the slopes is not zero.
• It is reasonable to assume that there is a statistically significant relationship.
3. Rejection Rule:
• “If p-value is less than our alpha, we reject H0 and accept Ha , otherwise, we fail to 75
reject H0 .”
Steps For Hypothesis Testing
4. From Sample Data calculate the Test Statistic and then calculate the p-value of the Test Statistic.
• Use F Test Statistic (and F Distribution) for Testing Overall Significance:
• F Test Statistic is:
!!"/$ !!"/(,- ./01/2234') 67.
• 𝐹= !!%/(' )$ )*)
= !!%/(,- 51141)
= 675
• p-value:
• = F.DIST.RT(F Test Statistic, q , n-q-1)
• Use t Test Statistic (and t Distribution) for testing individual Slope and Y-Intercept:
• t Test Statistic for Slope:
8!
• t= "
'
∑ $% &$

• t Test Statistic for Y-Intercept:

8(
• t=
s∗ ∑(:%' )/('∗∑ :% ):̅ ' )

• p-value for t Test Statistic:

• =T.DIST.2T(ABS(t Test Statistic), n-q-1)

• Data Analysis Regression Output provides F & t Test Statistics, & p-values. 76
• The key is: If p-value is less than Alpha, Reject H0 and Accept Ha
Steps For Hypothesis Testing
5. From the sample evidence makes reasonable statements about the population
parameter.
• If we reject H0 and accept Ha we will say:
• “The sample evidence suggests that at least one slope is not equal to zero. It is reasonable to
assume that there is a significant relationship at the given level of significance.”
• “xi and y are related and a linear relationship explains a statistically significant portion of the
variability in y over the Experimental Region.”
• If we fail to reject H0 we will say:
• “The sample evidence suggests that all slope/s are equal to zero. It is reasonable to assume
that there is NOT a significant relationship at the given level of significance.”

77
F
Distribution
for
Hypothesis
Test
78
F Test Statistic for Hypothesis Test
• Use the F Distribution • The larger the F value, the
• F Test Statistic is: stronger the evidence that there
!!"/$ !!"/(,- ./01/2234') 67. is an overall regression
𝐹= !!%/(' )$ )*)
= !!%/(,- 51141)
= 675 relationship.
• P-value = Probability of getting
• SSR = Sum of squares due to regression (explained the F Test Statistic or greater in
variation) the F Distribution.
• SSE = Sum of squares due to error (unexplained • The smaller the p-value the stronger
variation) the evidence that there is an overall
• q = the number of independent variables in the regression relationship (stronger the
regression model evidence against the Null that all
• n = the number of observations in the sample slopes are zero).
• SSR/q = MSR = Mean Square Regression = test statistic • If p-value is smaller than alpha, we
that measures variability in the dependent variable y reject H0 .
that is explained by the independent variables (x1, x2…
xq)
• Data Analysis Regression
• SSE/(n-q-1) = MSE = Mean Square Error = measure of 79
variability that is not explained by x1, x2… xq . Output provides F and p-value
• df = degrees of freedom = term used in Excel ANOVA
output.
Formulas for testing individual estimates of parameters:
• Sum of Squares of Error (Residuals) = SSE = ∑ 𝑦M − 𝑦!M N = ∑ 𝑦M − 𝑏O − 𝑏P𝑥M N

• Estimate of Variance of Estimated Regression Line: s 2 = MSE = SSE/(n - 2)

• Estimate of Standard Deviation of Estimated Regression Line = Standard Error Of
The Estimate = s = SQRT(MSE)
R
• Estimated Standard Deviation of a particular slope = 𝑠Q! = #
∑ T" UT̅

• Confidence interval for b1 is = 𝑏P ± 𝑡𝞪/N𝑠Q! ,

where ta/2 is the t value providing an area of a/2 in the upper tail of a t distribution with n - 2 degrees of freedom.

• Estimated Standard Deviation of y-intercept = 𝑠Q$ = s* ∑( 𝑥MN)/(𝑛 ∗ ∑ 𝑥M − 𝑥̅ N)

80
• Confidence interval for b0 is = 𝑏O ± 𝑡𝞪/N𝑠Q$ ,
where ta/2 is the t value providing an area of a/2 in the upper tail of a t distribution with n - 2 degrees of freedom.
Testing Individual Regression Parameters
• If F Test Statistic indicates that at least one of the slope/s are not zero, then we can test if there is a
statistically significant relationship between the dependent variable y and each of the independent
variables by testing each slope.
• We use the t Distribution (Bell Shaped Probability Distribution from Busn 210)
• We use t Test Statistic to test whether the slope is zero:
Q
t=R'
('
• b1 = slope
• 𝑠%= = estimated standard error of slope
• t = # of standard deviations.
• If t is past our hurdle, we reject H0 and accept Ha.
• H0 : Slope = 0
• Ha : Slope <> 0
• Alpha of 0.5 or 0.01 are often used. Alpha determines the hurdle or is used to compare against p-value.
• This is a two-tail test.
• If t is past hurdle in either direction, reject H0 and accept Ha. It seems reasonable that the slope is not zero.
• If the p-value is less than alpha, it seems reasonable that the slope is not zero. The smaller the p-value, the
stronger the evidence that the slope is not zero and the more evidence we have that a relationship exists 81
between y and x.
• In Simple Linear Regression, t test and F test will yield same p-value.
t Distribution for Hypothesis Test

82
Hypothesis Test For Weekly Ad Expense and Sales Example:

• First we look at the residual plot to see if the assumptions of the Least Squares Method are met. It
appears that the assumptions are met. The plot does NOT provide evidence of a violation of the
conditions necessary for valid inference in regression.
• Because the p-value for the F Test Statistic is less than 0.01, we reject H0 and accept Ha. It is reasonable
to assume that the slope is not zero and that there is a significant relationship between x and y. A linear
relationship explains a statistically significant portion of the variability in y over the Experimental Region. 83
• Similarly, p-value for Y-Intercept is less than 0.01 and so we conclude it is not zero. However, the Y-
Intercept value is not in our Experimental Region.
What the F Statistic Hypothesis Test Looks Like

84
Confidence Intervals to Test if Slope β1 & Y-Intercept β0 Are Equal to 0
• Excel Data Analysis, Regression tool calculates upper and lower limit for a Confidence Interval
• Interval does not contain 0: conclude Y-Intercept (β0) is not zero (when all x are set to zero).
• Interval does not contain 0: conclude Slope (β1) is not zero (there is a linear relationship)
Found an
overall
regression
relationship
at both alpha
= 0.05 &
alpha = 0.01

85
Nonsignificant Variables: Reassess Whole Model/Equation
Slope Intercept
• If Slope not significant (Do not reject H0 : • If Y-intercept not significant
Slope = 0)
• The decision to include or not include the
• If practical experience suggests that the
nonsignificant x (independent variable) has a calculated y-intercept may require special
relationship with the y variable, consider consideration because setting “Constant is
leaving the x in the model/equation. Zero” in Data Analysis Regression tool will
• Business example: # of deliveries for a truck
route had insignificant slope, but was clearly set the equation intercept equal to zero
related to total time. and may dramatically change the slope
values.
• If the model/equation adequately explains
the y variable without the nonsignificant x • Business example when you might want
independent variable, try rerunning the the equation to go through the origin
regression process without the nonsignificant (x=0,y=0): labor hours = x and Output = y.
x variable, but be aware that the calculations
for the remaining variables may change.
• Key is that you may have to run the
regression tool in Excel a number of times
over various variables to try and get the
best slopes and y-intercept for the
equation. 86
Multicollinearity
• Multicollinearity
• Correlation among the independent variables when performing multiple regression.
• In Multiple Regression when you have more than one x, each x should be related to the y value, but in general, no
two x values should be related to each other.
• For example, if we have y = time for truck deliveries in a day, x1 = number of miles, x2 = amount of gas, because number of miles is
related to gas, the resulting multiple regression process may have problems.
• Use PEARSON or CORREL to analyze any 2 x variables
• Rule of thumb: if absolute value is greater than 0.7, there is potential problem.
• Problems with correlation among the independent variables is that it increases the variances & standard errors of the
estimated parameters (β0, β1, β2, . . . , βq ) and predicted values of y, and so inference based on these estimates is not as
precise than it should be.
• For example, if t test or confidence intervals lead us to reject a variable as nonsignificant, it may be because there is
too much variation and thus the interval is too wide (or t stat not past hurdle).
• We may incorrectly conclude that the variable is not significantly different from zero when the independent variable
actually has a strong relationship with the dependent variable.
• If inference is a primary goal, we should avoid variables that are highly correlated.
• If two variables are highly correlated, consider removing one.
• If predicting is primary goal, multicollinearity is not necessarily a concern.
• Note: If any statistic (b0, b1, b2, . . . , bq ) or p-value changes significantly when a new x variable is added or removed, we
must suspect that multicollinearity is at play.
• Checking correlation between pairs of variables does not always uncover multicollinearity.
87
• Variable might be correlated with multiple other variables. To check: 1) treat x1 as dependent and the rest of the x
as independent and run ANOVA table to see if R^2 is big to see if there strong relationship. R^2 > 0.5, rule of thumb
that there might be multicollinearity.
Inference and Very Large Samples
• When sample size is large:
1. Estimates of Variance and Standard Error (# Standard Deviations) are calculated
with sample size in the denominator. As sample size increases, Estimates of
Variance and Standard Error decrease.
2. Law of Large Numbers (Large Sample Size) says that as sample size gets bigger,
statistic approaches parameter. As statistic approaches parameter, variation
between the two decreases. As the variation between the two decreases,
Estimates of Variance and Standard Error decrease.
3. As Estimates of Variance and Standard Error decrease, the intervals used in
inference (Hypothesis Testing and Confidences Intervals) decrease, p-values get
smaller, and almost all relationships will seem significant (meaningful and
specious).
• You can’t really tell from the small p-value if the relationship is meaningful or specious
(deceptively attractive)
88
4. Multicollinearity can still be an issue.
Small Sample Size
• Maybe be hard to test assumptions for inference in regression, like with a Residual
Plot (because not enough sample points).
• Assessing multicollinearity is difficult.

Presentation4 - Bivariate Analysis and Simple Linear Regression
No ratings yet
Presentation4 - Bivariate Analysis and Simple Linear Regression
31 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
83 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
22 pages
Correlation and Regression Analysis - Updated
No ratings yet
Correlation and Regression Analysis - Updated
49 pages
BSC - Applied Statistics - Correlation and SLR
No ratings yet
BSC - Applied Statistics - Correlation and SLR
67 pages
Simple Linear Regression and Correlation 568a5ac2ce9b3
No ratings yet
Simple Linear Regression and Correlation 568a5ac2ce9b3
31 pages
Simple Regression Analysis Guide
No ratings yet
Simple Regression Analysis Guide
58 pages
Module 3 - Data Analysis - S RM
No ratings yet
Module 3 - Data Analysis - S RM
63 pages
Lec 9 Linear Correlation and Linear Regression
No ratings yet
Lec 9 Linear Correlation and Linear Regression
71 pages
7.1 Regression Building Relationships
No ratings yet
7.1 Regression Building Relationships
44 pages
Corr - Regression Analysis
No ratings yet
Corr - Regression Analysis
19 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
MetNum1 2023 1 Week 13
No ratings yet
MetNum1 2023 1 Week 13
70 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
9 pages
Business Analytics Regression Guide
No ratings yet
Business Analytics Regression Guide
91 pages
Correlation and Regression Analysis: BMT 1063 Business Statistics
No ratings yet
Correlation and Regression Analysis: BMT 1063 Business Statistics
42 pages
14 Statistics and Probability
No ratings yet
14 Statistics and Probability
37 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
Correlation
100% (1)
Correlation
29 pages
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
No ratings yet
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
52 pages
BUSN 2429 Chapter 14 Correlation and Single Regression Model
No ratings yet
BUSN 2429 Chapter 14 Correlation and Single Regression Model
85 pages
Day 3
No ratings yet
Day 3
85 pages
DAM Class 21-24 Regression Analysis
No ratings yet
DAM Class 21-24 Regression Analysis
93 pages
Forecasting Models & Regression Analysis
No ratings yet
Forecasting Models & Regression Analysis
13 pages
Excel Regression for Finance Students
No ratings yet
Excel Regression for Finance Students
19 pages
Chapter 3 - Regression
No ratings yet
Chapter 3 - Regression
8 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
Business Stats: Regression Basics
No ratings yet
Business Stats: Regression Basics
55 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Simple Linear Regression Part 1
No ratings yet
Simple Linear Regression Part 1
63 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Correlation Regression
No ratings yet
Correlation Regression
58 pages
d90840b8 1721727178674
No ratings yet
d90840b8 1721727178674
43 pages
Ch. 8 Measures of Association
No ratings yet
Ch. 8 Measures of Association
8 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
06 Simple Linear Regression Part1
No ratings yet
06 Simple Linear Regression Part1
8 pages
Simple Regression and Correlation
No ratings yet
Simple Regression and Correlation
30 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Unit 2 - Scatterplots Correlation and Regression Summer 2021
No ratings yet
Unit 2 - Scatterplots Correlation and Regression Summer 2021
43 pages
Correlation & Regression Guide
No ratings yet
Correlation & Regression Guide
25 pages
Week 12+13
No ratings yet
Week 12+13
47 pages
Correlation
No ratings yet
Correlation
57 pages
Slides - Topic 6 - Regression Analysis
No ratings yet
Slides - Topic 6 - Regression Analysis
40 pages
Regression Analysis Basics
No ratings yet
Regression Analysis Basics
12 pages
6 Correlation and Linear Regression
No ratings yet
6 Correlation and Linear Regression
32 pages
6 ASAP Advanced Statistics-Regression
No ratings yet
6 ASAP Advanced Statistics-Regression
53 pages
Correlation and Regression
No ratings yet
Correlation and Regression
15 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
Intro to Correlation & Regression
No ratings yet
Intro to Correlation & Regression
71 pages
MGM3165 Chapter 9 10
No ratings yet
MGM3165 Chapter 9 10
44 pages
Regression and Correlation
No ratings yet
Regression and Correlation
66 pages
BSC - Applied Statistics - Correlation and SLR
No ratings yet
BSC - Applied Statistics - Correlation and SLR
67 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
Simple Regression
No ratings yet
Simple Regression
14 pages
Correlation and Regression Analyses
No ratings yet
Correlation and Regression Analyses
8 pages
Sta404 - Chapter 5 - Bivariate Analysis (Student)
No ratings yet
Sta404 - Chapter 5 - Bivariate Analysis (Student)
27 pages
Quantitative Analysis For Management Ch04
100% (1)
Quantitative Analysis For Management Ch04
71 pages
Using Statistical Techniques in Analysing Data: Lesson 3
No ratings yet
Using Statistical Techniques in Analysing Data: Lesson 3
3 pages
Which Chart When - Your Guide To Choosing The Right Chart!
No ratings yet
Which Chart When - Your Guide To Choosing The Right Chart!
19 pages
Correlation and Regression
No ratings yet
Correlation and Regression
45 pages
Python
No ratings yet
Python
29 pages
PowerBI - Assignment Questions
20% (5)
PowerBI - Assignment Questions
4 pages
Unit 5 (CORRELATION AND REGRESSION)
No ratings yet
Unit 5 (CORRELATION AND REGRESSION)
23 pages
Frequency Distribution and Charts and Graphs
No ratings yet
Frequency Distribution and Charts and Graphs
61 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
34 pages
Batch18 Group01 Excel Assignment Econ105
No ratings yet
Batch18 Group01 Excel Assignment Econ105
41 pages
Data Visualisation
No ratings yet
Data Visualisation
232 pages
TIBCO Spotfire Online Training Institutes
No ratings yet
TIBCO Spotfire Online Training Institutes
6 pages
Kyu Edu 2301 WK12
No ratings yet
Kyu Edu 2301 WK12
5 pages
Linear Regression Model Adequacy
No ratings yet
Linear Regression Model Adequacy
22 pages
Financial Algebra Advanced Algebra With Financial Applications 1st Edition Solution Manual
No ratings yet
Financial Algebra Advanced Algebra With Financial Applications 1st Edition Solution Manual
19 pages
2021 Arbovirus Report: Dengue, Chikungunya, Zika
No ratings yet
2021 Arbovirus Report: Dengue, Chikungunya, Zika
11 pages
MODULE-8 (Project Quality Management)
No ratings yet
MODULE-8 (Project Quality Management)
98 pages
BSc Nursing Semester Guide
No ratings yet
BSc Nursing Semester Guide
94 pages
Origin Software Tutorials
100% (1)
Origin Software Tutorials
26 pages
GraphingSkills LAB 2
No ratings yet
GraphingSkills LAB 2
6 pages
Bryan Bethea - Math8 - StudyGuide - SP123 - F3
No ratings yet
Bryan Bethea - Math8 - StudyGuide - SP123 - F3
11 pages
Statistics For Business and Economics: Anderson Sweeney Williams
No ratings yet
Statistics For Business and Economics: Anderson Sweeney Williams
31 pages
Business Statistics Assignment Analysis
No ratings yet
Business Statistics Assignment Analysis
21 pages
Activity Guide - Exploring Two Columns - Unit
100% (1)
Activity Guide - Exploring Two Columns - Unit
3 pages
Intro to Central Tendency Measures
100% (1)
Intro to Central Tendency Measures
27 pages
GSEB Solutions Class 12 Statistics Part 1 Chapter 2 Linear Corre
No ratings yet
GSEB Solutions Class 12 Statistics Part 1 Chapter 2 Linear Corre
43 pages
QI Macros User Guide
100% (1)
QI Macros User Guide
19 pages
An IBM SPSS® Companion To Political Analysis - Philip H. H. Pollock III
No ratings yet
An IBM SPSS® Companion To Political Analysis - Philip H. H. Pollock III
319 pages
Scatter Graphs GCSE Worksheet
100% (1)
Scatter Graphs GCSE Worksheet
29 pages
Assignment Questions and Solution
No ratings yet
Assignment Questions and Solution
16 pages

Simple Linear Regression Guide

Uploaded by

Simple Linear Regression Guide

Uploaded by

Highline Class, Busn 210

Business Statistics using Excel

Covariance and Correlation look very strong.

Negative/Inverse Relationship: As Bike Weight

• Slope = m = b1 = β1 = “For every one unit of x, how much does y change?”

1. Simple Linear Regression Model:

2. Simple Linear Regression Equation:

3. Estimated Simple Linear Regression Equation:

• The formulas used to

• Can we expect that a bike can have a price of $1,732.40 (x = 0)?

• Our prediction of $391,243.63

• We need to examine something called Residuals.

Much Less Error. 36

• No problem: We square each before adding!!!!

SSE = Total Amount of Error in using Estimated Simple

• SSR = Total Amount

• SST = SSR + SSE

• The closer to 1, the better the • Measure of Goodness of fit of

Remember to verify that curly

Enter Array Function

• Markers below Zero Line indicate a

Equation does not

“Collect data from

• t Test Statistic for Y-Intercept:

• p-value for t Test Statistic:

• Estimate of Variance of Estimated Regression Line: s 2 = MSE = SSE/(n - 2)

• Confidence interval for b1 is = 𝑏P ± 𝑡𝞪/N𝑠Q! ,

• Estimated Standard Deviation of y-intercept = 𝑠Q$ = s* ∑( 𝑥MN)/(𝑛 ∗ ∑ 𝑥M − 𝑥̅ N)

You might also like