0% found this document useful (0 votes)

14 views5 pages

Chapter - 1 - Lecture Notes

The document discusses the fundamentals of regression analysis in econometrics, tracing its origins to Francis Galton's work on 'regression to the mean.' It explains the distinction between statistical and deterministic relationships, emphasizing that econometrics deals with the former, and highlights the importance of understanding causation versus correlation in regression models. Additionally, it categorizes types of data used in econometrics and outlines different measurement scales that affect regression interpretations.

Uploaded by

chadeneilers586

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views5 pages

Chapter - 1 - Lecture Notes

Uploaded by

chadeneilers586

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

ECO242 Basic Econometrics

Chapter One Lecture Notes

__________________________________________________________________________________

The nature of regression: slides 4-6

In 1886, Francis Galton published an article titled ‘Regression towards mediocrity in hereditary
stature’ in the academic journal Anthropological Institute of Great Britain and Ireland. What Galton
then called ‘regression towards mediocrity’ (and what we would today call ‘regression to the mean’)
is best encapsulated with this quote from the article.

It is some years since I made an extensive series of experiments on the produce of

seeds of different size but of the same species. They yielded results that seemed very
noteworthy, and I used them as the basis of a lecture before the Royal Institution on
February 9th, 1877. It appeared from these experiments that the offspring did not
tend to resemble their parent seeds in size, but to be always more mediocre than
they-to be smaller than the parents, if the parents were large; to be larger than the
parents, if the parents were very small. F. Galton, 1886

The experiment he describes above was repeated observationally with humans. Galton measured
the heights of close to 1000 children and the heights of their parents. He noticed a similar pattern
described in the above quote. That is, male children with exceptionally tall parents tended, on
average, to be shorter than their fathers. Similarly, male children with exceptionally short parents
tended, on average to be taller than their fathers.

Today, when we talk about regression we mean something different to ‘regression to the mean’.

Specifically, regression is an approach to modelling the dependence of one variable on one or more
other variables. The interpretation of this modelling approach is that it gives us an estimate of the
mean or average response of the dependent variable following a change in the independent variable
or variables.

An example is provided on slide 5. Notice that a regression model is nothing but a linear equation,
much like 𝑦 = 𝑚𝑎 + 𝑐. By convention we substitute the Greek letters 𝛽 (beta) for the slope and
intercept parameters in the linear equation. In the example of this slide 𝑌 is final mark for a course,
and 𝑋 is the number of hours spent studying per week by students enrolled for the course. This
model gives us parameter estimates of 25 for the constant term and 7 for the slope term. These
estimates imply that students who spend no time studying can expect, on average, to achieve a
grade of 25; and, students can, on average, increase their final grade by 7 marks for each addition
hour studied per week.

The three bullet points at the bottom of slide 5 show different ways of expressing the same idea.
Bullet 3 is read as “the expected value of 𝑌 given that 𝑋 takes on some specific value 𝑥”. The fourth
bullet point just adds a value for 𝑋, and is read as follows: “the expected grade for a student who
studies for 5 hours per week is 60”. The fifth bullet just shows us the calculation used to get to 60.

The graph on slide 6 illustrates the equation presented on slide 5. The table on the left are the data
used to estimate the model. A sample of 25 students was used in the estimation. The 𝑋 column is
their hours studied per week, and the 𝑌 column is their final grade for the course. The graph on the
right shows a plot of the data points and the estimated regression line. Notice that the same 𝑋 value
can be associated with different 𝑌 values. There are, for instance, two students with only 5 weekly
study hours who achieved a higher grade than one student with 6 study hours. So even though the
model suggests that, on average, more study time implies a higher grade, there are individual data
points that diverge from what the model predicts. Hence the term on average!

Statistical vs deterministic relationships: slide 7

Econometrics deals with statistical relationships and not deterministic relationships. Deterministic
relations are ones with no variation (or randomness) among the variables. Many physical
relationships are of this nature. Newton’s law of gravity, which is shown on this slide, is one such
example. The relationship between objects in space don’t randomly change – we don’t
spontaneously start floating sometimes while walking down the street.

In economics, and social sciences more generally, we deal entirely with non-deterministic
relationships. We haven’t uncovered (and, perhaps, never will) strict laws about our social world
that hold universally and provide precise predictions. We can predict with great accuracy what
happens when a projectile is launched into the air. We don’t know with great precision what the
absolute best policy is to bring down unemployment to 5%.

Consider the example on slides 5 and 6. Although it makes sense that more study hours improves
grades we still observe examples where this is not the case. Other things seem to matter, and we
can think of many other explanatory variables that could influence grades (intelligence,
commitment, health, wealth, quality of high school, quality of instruction by lecturer, home
environment, and on and on). Even if all the variables we can think of are included in the regression
model (of course, some of these variables cannot be observed or are hard to measure), we may still
observe randomness in the grades achieved by students. There will always be some variation that
we simply cannot explain.

This is why we include an ‘error term’ in the regression model.

Regression and causation: slide 8

Ultimately, we develop regression models because we want to make causal claims. In other words,
we want to be able to say that “these factors (𝑋) explain these outcomes (𝑌)”. But it is rare that
regression models provide us with such causal interpretations. In the grades-study hours regression
on slide 5, is it reasonable to interpret the estimated parameter values as follows: students who
achieved higher grades did so purely because they spent more time studying? Well, not exactly: you
might immediately object to such an interpretation by raising all the other variables mentioned
above that are also important for achieving good grades (intelligence, commitment, motivation, etc).

The above paragraph is not meant to be interpreted as “all regression models are merely speculative
and can’t tell us anything meaningful about the relationships between variables or phenomena
under study”. Regression models can be very useful, and many regression models apply techniques
that lend themselves to a causal interpretation (you’ll learn more about these types of models in
honours econometrics). It is important to note that very careful consideration must be given to an
econometric analysis using regression before making causal claims. And always remember that
Correlation does not imply causation. Just because two variables ‘move in the same direction’ does
not mean that one causes the other. Careful work needs to be done to determine whether causation
is at play, and advanced regression analysis can be a useful tool in this regard.

Slide 9 has a nice light-hearted take on the confusion around correlation and causation. Notice that
the student, perhaps, takes the idea of correlation vs causation too far. Let’s return to the grades-
study hours model. Earlier I suggested that it is not accurate to state with certainty that “our model
shows that more study hours improves grades for ECO242 at a rate of 7 marks per hour studied”.
However, is the more general claim that “students who input more study-time performed better as a
result” unreasonable?

Regression vs correlation: slide 10-11

This distinction given on this slide is quite useful. Regression and correlation are two distinct
concepts and must not be confused. Often researchers might notice a correlation and may then
wonder whether there is causation going on. Certain, more advanced, regression techniques could
aid in answering this question.

Milton Friedman famously noticed a correlation between expansionary monetary policy and
recessions. He then used this to argue that changes in the money supply cause fluctuations in
economic activity. Econometric analysis supports this observation. Again, it is hard to use the word
‘cause’ because econometric models cannot necessarily be interpreted causally – they can provide
further evidence but are not necessarily conclusive.

The image on slide 11 shows five reasons why two variables may be correlated. It could be pure
chance (sometimes this is cause ‘spurious correlation’). There could also be causation from one to
the other. There could be reverse causation (in the money supply example, could it be that changes
in the economic environment influences the money supply?). There could also be confounding
variables (when ice-cream sales increase, shark attacks increase – the confounding variable is higher
temperatures). Then there’s selection (people who graduated from certain universities earn higher
incomes from people who graduate from other universities; is it because the ‘high-income’
universities provide better instruction, or is it because they admit – select – high achieving
students?).

Terminology: slide 12

The dependent and independent variables are referred to in several different ways. It is useful to be
aware of the different terminology that is sometimes used as different texts may prefer different
terminology.

Nature and sources of data: slide 13-15

Data used in econometrics are often split into four ‘types’: time-series, cross-sectional, pooled, and
panel/longitudinal. Each type has its own implications for econometric analysis. Also, different types
of data will be more suited to particular research questions. In macroeconomics, for instance, time-
series data are generally of interest. In development economics on the other hand, usually panel or
cross-section data are of interest.

It is important to understand what distinguishes the four types and this is best explained by
considering the unit of observation in each case.

Time-series are data series that are measured over time at some regular periodicity. The unit of
observation is time in this instance. E.g. if we are looking at a series of annual GDP data for years
2000-2020, any particular data point (say GDP in 2015 was R4.8 trillion) that observation is specific
to that year. Although GDP is the variable, we observe different values at different times. If values of
a variable are collected over time, the data are usually time series in nature.

Unlike in time series, cross-sectional data are collected at a particular point in time and not over
time. It is therefore often described as ‘snapshot’ data. It provides a ‘snapshot’ of the units of
observation at the point of time when data were collected. In the case of cross-sectional data, the
unit of observation is usually individuals (these could be more specific, such as ‘students’ or ‘female
students’ or ‘female students on residence’, etc.), but could also be households, municipalities,
provinces, firms, countries.

Pooled data are similar to cross-section data except that it may combine multiple cross-sections or
‘snapshots’. The unit of observation remains the same as in cross-section data. The only difference is
that data are collected more than once (though time is NOT the unit of observation here). There’s an
important characteristic of pooled data that doesn’t allow time to be the unit of observation: each
cross-section included in the pooled data set can be randomly drawn. E.g., let’s say that government
wants to estimate the unemployment rate. It draws a random sample of individuals and asks them
questions related to their work status. Let’s say government wants regular updates on the
unemployment rate. It again randomly draws a sample from the population and asks individuals in
the new sample about their work status. StatsSA in fact does this every three months – it is called
the Quarterly Labour Force Survey (QLFS). Now, a key feature of this activity is that every three
months a random sample is drawn – there is no guarantee that individuals forming part of the
sample this quarter will be in the samples taken in subsequent quarters. Using pooled data can be
useful in instances where sample sizes of individual cross-sections are small. It is also useful for
considering policy effects over time. E.g. if government implements a jobs programme in Q3 2018
(quarter 3 of 2018), to see whether it has been effective an economist could pool QLFS data for Q1
2018 to Q4 2019 to determine an effect.

Panel data are similar to pooled cross-section data with one important distinction. The same sample
is repeatedly surveyed over time. So, instead of the taking the approach of the QLFS, which surveys
a random sample every three months, researchers could follow the same households and ask them
about their employment status. This will allow researchers to consider factors that are associated
with individuals’ labour outcomes. Here, the unit of observation is a joint pair of variables
{individual, time}. Observations are made on household, firm etc, but also on different values of a
given variable over time. If an economist wants to analyse investment in research and development
at the firm level (meaning how different firms behave with respect to such investment), they are
observing individual firms as well as the value of investment at various points in time. Thus, panel
data can be thought of as a combination of cross-section data and time-series data. It is nonetheless
distinct from both of these types of data.

Slide 15 provides some sources for data. Clicking on the data source will take you to the relevant
webpage.

Measurements of scale: slide 16

It is useful to know the difference between these measurement scales as variables with different
measurement scales result in different interpretations in a regression model – you’ll encounter this
mainly in third-year econometrics.

Ratio scale variables have three properties: they are divisible, subtractable, and ordinal.

Interval scale variables only have the last two properties: meaningful difference, and meaningful
order. Temperature is an example of interval scale. Consider 𝑋1 = −10 and 𝑋2 = 10, where 𝑋 is
temperature in degrees Celsius. The ratio of these values is meaningless.

Ordinal scale variables retain only the third property of ratio scale variables: order. The most popular
example of this is the Likert scale surveys that one is subjected to after a service call with some
company or institution. “On a scale of one to five, where one means strongly disagree, and 5 means
strongly agree, please indicate your agreement with these statements: you are likely to recommend
us to a friend”. It is not quite meaningful to say a score of 2 is half the value of a score of 4.

Nominal scales have no order. There’s no meaningful order in which we can place different race
groups or gender groups.

Francis Galton: Galton's Law of Universal Regression Tall Fathers Less Short Fathers Was Greater
No ratings yet
Francis Galton: Galton's Law of Universal Regression Tall Fathers Less Short Fathers Was Greater
18 pages
The Nature of Regression Analysis
No ratings yet
The Nature of Regression Analysis
20 pages
Econometrics Jan 2018
No ratings yet
Econometrics Jan 2018
37 pages
Chapte 1
No ratings yet
Chapte 1
40 pages
Understanding Regression Analysis
No ratings yet
Understanding Regression Analysis
2 pages
The NATURE of Regression Analysis
No ratings yet
The NATURE of Regression Analysis
18 pages
The Nature of Regression Analysis: Explanatory Variables, With A View To Estimating And/predicting The (Populatiojn)
No ratings yet
The Nature of Regression Analysis: Explanatory Variables, With A View To Estimating And/predicting The (Populatiojn)
6 pages
SimpleRegression Transcript
No ratings yet
SimpleRegression Transcript
4 pages
Solution Manual For Introductory Econometrics 6th Edition by Woolridge
0% (3)
Solution Manual For Introductory Econometrics 6th Edition by Woolridge
7 pages
Correlation and Regression
No ratings yet
Correlation and Regression
111 pages
Regression & Correlation Basics
No ratings yet
Regression & Correlation Basics
17 pages
Wooldridge 7e Ch01 IM
No ratings yet
Wooldridge 7e Ch01 IM
8 pages
Correlation and Regression
No ratings yet
Correlation and Regression
93 pages
Regression Analysispdf
No ratings yet
Regression Analysispdf
20 pages
Unit Regression Analysis: Objectives
No ratings yet
Unit Regression Analysis: Objectives
18 pages
By: Domodar N. Gujarati: Prof. M. El-Sakka
No ratings yet
By: Domodar N. Gujarati: Prof. M. El-Sakka
19 pages
Econometrics Chapter 1
No ratings yet
Econometrics Chapter 1
19 pages
Lecture 1 ASE
No ratings yet
Lecture 1 ASE
6 pages
Simple Linear Regression Scott M Lynch
No ratings yet
Simple Linear Regression Scott M Lynch
111 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
Topics in Regression Fall2015
No ratings yet
Topics in Regression Fall2015
18 pages
04 16 Simple Regression
No ratings yet
04 16 Simple Regression
47 pages
Introduction of Econometrics
No ratings yet
Introduction of Econometrics
7 pages
Econometrics Unit 1
No ratings yet
Econometrics Unit 1
34 pages
Stats Ch.13 Linear Regression
No ratings yet
Stats Ch.13 Linear Regression
42 pages
Introduction To Econometrics: Basics of Regression Analysis Anoop Sasikumar
No ratings yet
Introduction To Econometrics: Basics of Regression Analysis Anoop Sasikumar
16 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Chapter Two: Simple Linear Regression Model: 2.1 Introduction To Regression Analysis
No ratings yet
Chapter Two: Simple Linear Regression Model: 2.1 Introduction To Regression Analysis
7 pages
Chapter 1-17 Answer Key
100% (1)
Chapter 1-17 Answer Key
52 pages
Eco No Metrics Answers Chapt 1 - 17
89% (19)
Eco No Metrics Answers Chapt 1 - 17
52 pages
Causal Inference in Econometrics
100% (1)
Causal Inference in Econometrics
23 pages
20.sykes .Regression
No ratings yet
20.sykes .Regression
33 pages
CHAPTER II - Nature of Regression Analysis
No ratings yet
CHAPTER II - Nature of Regression Analysis
27 pages
Note Simple Linear Regression
No ratings yet
Note Simple Linear Regression
17 pages
Quantitative Analysis in Social Sciences: An Brief Introduction For Non-Economists
No ratings yet
Quantitative Analysis in Social Sciences: An Brief Introduction For Non-Economists
26 pages
Correlation Used To Describe The Relation Between Variables - Mobin Bahjat Dasko
No ratings yet
Correlation Used To Describe The Relation Between Variables - Mobin Bahjat Dasko
31 pages
Chapter 5: Ordinary Least Squares Estimation Procedure - The Mechanics
No ratings yet
Chapter 5: Ordinary Least Squares Estimation Procedure - The Mechanics
32 pages
Simple Regression
No ratings yet
Simple Regression
14 pages
Summary Chapter 1 Gujarati 5th Edition Econometrics
No ratings yet
Summary Chapter 1 Gujarati 5th Edition Econometrics
2 pages
University of Caloocan City: Managerial Economics Eco 3
No ratings yet
University of Caloocan City: Managerial Economics Eco 3
34 pages
Solutions To Sample Final Exam ECO2151
No ratings yet
Solutions To Sample Final Exam ECO2151
7 pages
Bab 1
No ratings yet
Bab 1
26 pages
Social Sciences
No ratings yet
Social Sciences
28 pages
How Can We Explore The Association Between Two Quantitative Variables?
No ratings yet
How Can We Explore The Association Between Two Quantitative Variables?
7 pages
Iskak, Stats 2
No ratings yet
Iskak, Stats 2
5 pages
Intro to Two-Variable Regression
No ratings yet
Intro to Two-Variable Regression
245 pages
Two Variable Regression Analysis PDF
No ratings yet
Two Variable Regression Analysis PDF
13 pages
The Nature of Regression Analysis: Al Muizzuddin F
No ratings yet
The Nature of Regression Analysis: Al Muizzuddin F
15 pages
Lecture 2 - MRA and Inference
No ratings yet
Lecture 2 - MRA and Inference
57 pages
CH2 Complete Simple Linear Regression 2011 Mesfin
No ratings yet
CH2 Complete Simple Linear Regression 2011 Mesfin
42 pages
Homework 2
No ratings yet
Homework 2
3 pages
Econometrics: Regression Basics
No ratings yet
Econometrics: Regression Basics
19 pages
Looking at Data: Relationships - : Caution About Correlation and Regression The Question of Causation
No ratings yet
Looking at Data: Relationships - : Caution About Correlation and Regression The Question of Causation
20 pages
0205019676
No ratings yet
0205019676
28 pages
IST172 Problem Set II-2
No ratings yet
IST172 Problem Set II-2
7 pages
Topic 1
No ratings yet
Topic 1
105 pages
Econometrics
No ratings yet
Econometrics
20 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
A Case For Bayesian Statistics in Critical Quantitative Research in Education
No ratings yet
A Case For Bayesian Statistics in Critical Quantitative Research in Education
33 pages
Spearman 1904 B
No ratings yet
Spearman 1904 B
92 pages
Test La Limba Engleză Varianta A A. Partea I: Citit
No ratings yet
Test La Limba Engleză Varianta A A. Partea I: Citit
5 pages
Trait Perspective of Leadership
No ratings yet
Trait Perspective of Leadership
33 pages
Digital Image Processing
No ratings yet
Digital Image Processing
119 pages
History and Origin of Statistics
67% (3)
History and Origin of Statistics
11 pages
Textualterity by Joseph Grigely
No ratings yet
Textualterity by Joseph Grigely
232 pages
Giftedness in Children Handbook
No ratings yet
Giftedness in Children Handbook
420 pages
History of Psychological Testing 2
No ratings yet
History of Psychological Testing 2
53 pages
(2007 Rao & Sinharay) Handbook of Statistics. Volume 26. Psychometrics.
No ratings yet
(2007 Rao & Sinharay) Handbook of Statistics. Volume 26. Psychometrics.
1,162 pages
Karim 02u4sg00ur7 TH
No ratings yet
Karim 02u4sg00ur7 TH
461 pages
YDS Practice Test - October 2023
No ratings yet
YDS Practice Test - October 2023
24 pages
Medical Jurisprudence Textbook
No ratings yet
Medical Jurisprudence Textbook
361 pages
Functionalism Antecedent Influences
No ratings yet
Functionalism Antecedent Influences
33 pages
Introduction To Psychological Testing
100% (5)
Introduction To Psychological Testing
18 pages
History of Psychometrics Overview
No ratings yet
History of Psychometrics Overview
49 pages
Intelligence, Par T I: Key Terms
No ratings yet
Intelligence, Par T I: Key Terms
14 pages
Testing Intro 1
No ratings yet
Testing Intro 1
88 pages
Nazi Eugenics
No ratings yet
Nazi Eugenics
19 pages
Carlo Ginzburg - Family Resemblances
No ratings yet
Carlo Ginzburg - Family Resemblances
21 pages
Structuralism: Titchener's Approach
No ratings yet
Structuralism: Titchener's Approach
34 pages
Mendel's Principles of Heredity: A Defence, With A Translation of Mendel's Original Papers On Hybridisation PDF
0% (1)
Mendel's Principles of Heredity: A Defence, With A Translation of Mendel's Original Papers On Hybridisation PDF
235 pages
The Wisdom of Crowds
50% (2)
The Wisdom of Crowds
7 pages
Eugenics Society Background 1907-2011
100% (1)
Eugenics Society Background 1907-2011
23 pages
Psychological Testing Assessment
No ratings yet
Psychological Testing Assessment
253 pages
The Future of Everything - The Science of Prediction - From Wealth and Weather To Chaos and Complexity (PDFDrive)
0% (2)
The Future of Everything - The Science of Prediction - From Wealth and Weather To Chaos and Complexity (PDFDrive)
380 pages
Intellectual Giftedness
No ratings yet
Intellectual Giftedness
16 pages
New Frontiers of Genetics and The Risk of Eugenics
No ratings yet
New Frontiers of Genetics and The Risk of Eugenics
132 pages
Jeffrey E. Hecker, Geoffrey L. Thorpe-52-88
No ratings yet
Jeffrey E. Hecker, Geoffrey L. Thorpe-52-88
37 pages
Visual Representation: An Introduction
No ratings yet
Visual Representation: An Introduction
20 pages

Chapter - 1 - Lecture Notes

Uploaded by

Chapter - 1 - Lecture Notes

Uploaded by

ECO242 Basic Econometrics

Chapter One Lecture Notes

The nature of regression: slides 4-6

It is some years since I made an extensive series of experiments on the produce of

Statistical vs deterministic relationships: slide 7

This is why we include an ‘error term’ in the regression model.

Regression and causation: slide 8

Regression vs correlation: slide 10-11

Nature and sources of data: slide 13-15

Measurements of scale: slide 16

You might also like