Introduction to simple linear
regression
Introduction to Linear Regression
• The Pearson correlation measures the degree
to which a set of data points form a straight
line relationship.
• Regression is a statistical procedure that
determines the equation for the straight line
that best fits a specific set of data.
2
Introduction to Linear Regression (cont.)
• Any straight line can be represented by an
equation of the form Y = bX + a, where b and a
are constants.
• The value of b is called the slope constant and
determines the direction and degree to which
the line is tilted.
• The value of a is called the Y-intercept and
determines the point where the line crosses
the Y-axis.
3
Regression model
• Relation between variables where changes in some
variables may “explain” or possibly “cause” changes
in other variables.
• Explanatory variables are termed the independent
variables and the variables to be explained are
termed the dependent variables.
• Regression model estimates the nature of the
relationship between the independent and
dependent variables.
– Change in dependent variables that results from changes
in independent variables, ie. size of the relationship.
– Strength of the relationship.
– Statistical significance of the relationship.
Examples
• Dependent variable is retail price of gasoline in Regina –
independent variable is the price of crude oil.
• Dependent variable is employment income – independent
variables might be hours of work, education, occupation, sex,
age, region, years of experience, unionization status, etc.
• Price of a product and quantity produced or sold:
– Quantity sold affected by price. Dependent variable is
quantity of product sold – independent variable is price.
– Price affected by quantity offered for sale. Dependent
variable is price – independent variable is quantity sold.
Bivariate and multivariate models
Bivariate or simple regression model
(Education) x y (Income)
Multivariate or multiple regression model
(Education) x1
(Gender) x2
y (Income)
(Experience) x3
(Age) x4
Model with simultaneous relationship
Price of wheat Quantity of wheat produced
Uses of regression
• Amount of change in a dependent variable that
results from changes in the independent variable(s) –
returns on investment in human capital, etc.
• Attempt to determine causes of phenomena.
• Prediction and forecasting of sales, economic
growth, etc.
Outliers
• Rare, extreme values may distort the
outcome.
– Could be an error.
– Could be a very important observation.
• Outlier: more than 3 standard deviations from
the mean.
9
.
Reference:
• Anonymous. (2018). “Introduction to Linear
Regression”.Available 2023-01-02 at
https://www.tutorialspoint.com/machine_lear
ning_with_python/machine_learning_with_py
thon_regression_algorithms_linear_regression
.htm
• Data Mining Concepts and Techniques, Third
Edition ,Jiawei Han, University of Illinois at
Urbana–Champaign , Micheline Kamber Jian
Pei Simon Fraser University