FUNDAMENTALS OF BUSINESS ANALYTICS
Module 8: REGRESSION ANALYSIS
LEARNING OBJECTIVES:
After studying this chapter, you will be able to:
Explain the purpose of regression analysis and provide examples in business.
Use a scatter chart to identify the type of relationship between two variables
FOUNDATIONAL CONCEPTS FOR
REGRESSION ANALYSIS
Independent and Dependent Variables
The independent variable is the factor that could impact the dependent variable
Dependent variable — the main factor that you're trying to understand or predict.
Correlation and Causation
Correlation describes an association between types of variables: when one variable changes, so does
the other. A correlation is a statistical indicator of the relationship between variables. These variables
change together: they covary. But this covariation isn’t necessarily due to a direct or indirect causal
link.
Causation means that one variable caused the other to occur. Proving a causal relationship between
variables requires a true experiment with a control group (which doesn’t receive the independent
variable) and an experimental group (which receives the independent variable).
Regression Analysis
Regression analysis is a tool for building mathematical and statistical models that
characterize relationships between a dependent variable (which must be a ratio
variable and not categorical) and one or more independent, or explanatory,
variables, all of which are numerical (but may be either ratio or categorical).
Regression Analysis
Some potential uses of regression analysis in business include the following:
• How do wages of employees depend on years of experience, years of education, and gender?
• How does the current price of a stock depend on its own past values, as well as the current and
past values of a market index?
• How does a company’s current sales level depend on its current and past advertising levels, the
advertising levels of its competitors, the company’s own past sales levels, and the general level of
the market?
• How does the total cost of producing a batch of items depend on the total quantity of items
that have been produced?
• How does the selling price of a house depend on such factors as the appraised value of the
house, the square footage of the house, the number of bedrooms in the house, and perhaps
others?
Regression Analysis
Regression Analysis can be categorized:
Based on the overall purpose of the analysis : to understand how the world
operates and to make predictions.
Based on the type of data being analyzed : cross-sectional data and time series
data
Involves the number of explanatory variables in the analysis.
A final categorization of regression analysis is of linear versus nonlinear models.
Regression Analysis
Regression can be used to understand how the world operates, and it can be used
for prediction.
Regression can be used to analyze cross-sectional data or time series data.
Cross-sectional data are usually data gathered from approximately the same period
of time from a population. TIme series data involve one or more variables that are
observed at several, usually equally spaced, points in time.
A third categorization of regression analysis involves the number of explanatory variables
in the analysis. The dependent (or response or target) variable is the single variable being
explained by the regression. The explanatory (or independent or predictor) variables are
used to explain the dependent. A simple regression analysis includes a single explanatory
variable, whereas multiple regression can include any number of explanatory variables.
“Linear” regression allows you to estimate linear relationships as well as some nonlinear
relationships.
Scatterplot
A scatterplot is a graphical plot of two variables, an X and a Y.
If there is any relationship between the two variables, it is usually apparent from the
scatterplot. Scatterplots provide graphical indications of relationships, whether they are
linear, nonlinear, or essentially nonexistent.
Scatterplots are especially useful for identifying outliers,
An outlier is an observation that falls outside of the general pattern of the rest of the
observations.
Simple Linear Regression
Simple linear regression involves finding a linear relationship between one independent
variable, X, and one dependent variable, Y. The relationship between two variables
can assume many forms. The relationship may be linear or nonlinear, or there may be
no relationship at all.
Simple Linear Regression
Finding the Best-Fitting Regression Line
The idea behind simple linear regression is to express the relationship between the dependent
and independent variables by a simple linear equation, such as
market value = a + b * square feet
where a is the y-intercept and b is the slope of the line. If we draw a straight line through the
data, some of the points will fall above the line, some will fall below it, and a few might fall on
the line itself.
Simple Linear Regression
Finding the Best-Fitting Regression Line
Residual is the amount of the Error sum of squares which accounts for the total error in the experiments
Least-Squares Regression
The mathematical basis for the best-fitting regression line is called least-squares
regression. In regression analysis, we assume that the values of the dependent variable,
Y, in the sample data are drawn from some unknown population for each value of the
independent variable, X.
Using Excel Functions to Find Least-Squares Coefficients :
Intercept
Slope
Trend
Simple Linear Regression with Excel
Multiple R is another name for the sample correlation coefficient, r. Values of r range from
-1 to 1, where the sign is determined by the sign of the slope of the regression line. A
Multiple R value greater than 0 indicates positive correlation; that is, as the independent
variable increases, the dependent variable does also; a value less than 0 indicates
negative correlation—as X increases, Y decreases. A value of 0 indicates no correlation.
R-squared is called the coefficient of determination. R2 is a measure of the how well
the regression line fits the data;
Adjusted R Square is a statistic that modifies the value of R2 by incorporating the sample
size and the number of explanatory variables in the model.
Standard Error in the Excel output is the variability of the observed Y-values from the
predicted values 1Yn2. This is formally called the standard error of the estimate, SYX
Residual is the amount of the Error sum of squares which accounts for the total error in the
experiments
Multiple Linear Regression Model
A linear regression model with more than one independent variable is called a multiple
linear regression model.
A multiple linear regression model has the form:
where
Y is the dependent variable,
X1, c, Xk are the independent (explanatory) variables,
b0 is the intercept term,
b1, c, bk are the regression coefficients for the independent variables,
e is the error term
References:
Evans, J. (2016). Business Analytics (2nd ed.). Pearson.
Cheusheva,, S. (2023, February 7). Linear regression analysis in Excel.
Https://www.Ablebits.com. Retrieved February 8, 2023, from
https://www.ablebits.com/office-addins-blog/linear-regression-analysis-excel/