1
Two-Variable
Regression Analysis
2
A Hypothetical Example
● The data belongs to the total population of 60 families in a hypothetical
community and their weekly income (X) and weekly consumption
expenditure (Y), both in dollars.
● There are 10 fixed values of X and the corresponding Y values against
each of the X values.
● Thus, there are 10 Y subpopulations.
3
Example
4
Example
● There are 10 mean values for the 10 subpopulations of Y.
● These are called the conditional expected values, as they depend on the
given values of the (conditioning) variable X.
● It is written as E(Y | X), i.e. the expected value of Y given the value of X
5
Example
● The unconditional expected value of weekly consumption expenditure is
denoted as E(Y).
● If we add the weekly consumption expenditures for all the 60 families in the
population and divide this number by 60, we get the number $121.20
($7272/60), which is the unconditional mean, or expected, value of weekly
consumption expenditure, E(Y).
● It is unconditional in the sense that in arriving at this number we have
disregarded the income levels of the various families.
6
Example
● The graph
7
Example
● “What is the expected value of weekly consumption expenditure of a
family?”
● The answer is $121.20 (the unconditional mean).
● “What is the expected value of weekly consumption expenditure of a family
whose monthly income is, say, $140?”
● The answer is $101 (the conditional mean).
8
The Meaning of the Term Linear
● Linearity in the Variables
○ The first “natural” meaning of linearity is that the conditional expectation of Y is a linear
function of Xi
○ The above equation is not a linear function.
● Linearity in the Parameters
○ The second interpretation of linearity is that the conditional expectation of Y, E(Y | Xi), is a
linear function of the parameters, the 𝛽𝑠; it may or may not be linear in the variable X.
9
PRL
● If we join the conditional mean values, we get the population regression
line (PRL), or more generally, the population regression curve.
● It is also known as the regression of Y on X.
10
Example
11
The Concept of Population Regression
Function (PRF)
● E(Y | Xi) = f (Xi)
where f (Xi) denotes some function of the explanatory variable X.
● Known as the conditional expectation function (CEF) or population
regression function (PRF) or population regression (PR).
● The functional form of the PRF is therefore an empirical question and the
underlying theory may also suggest some form.
12
PRF
● The simplest model is
● where 𝛽1 and 𝛽2 are unknown but fixed parameters known as the
regression coefficients
● 𝛽1 and 𝛽2 are also known as intercept and slope coefficients,
respectively.
● Equation itself is known as the linear population regression function.
13
Stochastic Specification of PRF
● The deviation 𝜇𝑖 is an unobservable random variable taking positive or negative
values.
● 𝜇𝑖 is known as the stochastic disturbance or stochastic error term.
● It can be expressed as the sum of two components:
○ (1) E(Y | Xi): known as the systematic, or deterministic, component, and
○ (2) 𝜇𝑖 is the random, or nonsystematic, component.
14
PRF
15
PRF
● Since E(Yi | Xi) is the same thing as E(Y | Xi), the above implies that
E(ui | Xi) = 0
16
The Significance of the Stochastic
Disturbance Term
● The disturbance term 𝜇𝑖 is a proxy for all those variables that are omitted
from the model but that collectively affect Y.
● The relevant question is:
● Why not introduce these variables into the model explicitly?
17
Disturbance Term
The reasons are:
1. Vagueness of theory
2. Unavailability of data
3. Core variables versus peripheral variables
4. Intrinsic randomness in human behavior
5. Poor proxy variables
6. Principle of parsimony
7. Wrong functional form
18
The Sample Regression Function (SRF)
● Can we estimate the PRF from the sample data?
● We may not be able to estimate the PRF “accurately” because of sampling
fluctuations.
● The sample regression function can be written as
19
SRF
● Sample regression function in its stochastic form
20
SRF
● Estimate the PRF
● On the basis of the SRF
21
SRF
● How should the SRF be constructed so that 𝛽1 is as “close” as possible to
the true 𝛽1 and 𝛽2 is as “close” as possible to the true 𝛽2 even though we
will never know the true 𝛽1 and 𝛽2 ?
22
Illustrative Examples