[go: up one dir, main page]

0% found this document useful (0 votes)
52 views19 pages

Topic 6 Data Science Handout

This document outlines how to conduct fixed effects regression analysis on panel data to test hypotheses about the effects of various economic factors on GDP growth. It discusses setting up the regression model in Excel, collecting data from multiple countries over several years, interpreting the regression results including R-squared and p-values, and drawing conclusions about whether the hypotheses are supported or not supported. It also notes that correlation does not necessarily imply causation and other variables could be confounding the relationships found.

Uploaded by

meibrahim10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views19 pages

Topic 6 Data Science Handout

This document outlines how to conduct fixed effects regression analysis on panel data to test hypotheses about the effects of various economic factors on GDP growth. It discusses setting up the regression model in Excel, collecting data from multiple countries over several years, interpreting the regression results including R-squared and p-values, and drawing conclusions about whether the hypotheses are supported or not supported. It also notes that correlation does not necessarily imply causation and other variables could be confounding the relationships found.

Uploaded by

meibrahim10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Topic 6

Application of Data Science in Global Economic Analysis

1
Outline
 Panel Data
 Economic relevance
 Fixed Effects Regression
 Concept
 Application
 Example
 Setting Hypotheses
 Data Collection & Fixed Effect Adjustment
 Data Analysis - Excel
 Excel Setup
 Excel Implementation
 Regression Results
 Hypotheses Testing
 Results Interpretation
 Decision Rule
 Conclusions (Example)
 Data Analysis: Important Caveat
2
Panel Data
 Panel Data (or Longitudinal Data): data that contains
observations about different cross sections across time,
a combination of time-series and cross-sectional data.
most common form of economic data for multiple countries
 Example
Current A/C
Year Country Population Money Supply Balance (% of
Growth GDP)
2018 Singapore 5638676 3.90187887 15.40885493
2018 Thailand 69428454 4.667547639 5.610325153
2018 United States 326838199 4.030420024 -2.181753513
2019 Singapore 5703569 4.951350605 14.26299854
2019 Thailand 69625581 3.636896985 7.019674757
2019 United States 328329953 8.39189902 -2.240577453

2 years x 3 countries panel = 6 country-year observations


3
Fixed Effects Regression
 A fixed effects regression is an estimation
technique that allows control for UNOBSERVED
individual characteristics that do not vary with
time (i.e. fixed), but that might impact both
independent and/or dependent variables in the
regression analysis.
 It is usually employed in an analysis involving panel
data.
 Example: a multiple country panel data analysis where
each country might have individually different but
unobserved characteristic that do not vary with time,
which however might have an impact on the observed
variables in the analysis.
4
Fixed Effects Regression: Example
GDP
Growth Country A

Country B

+a

 Value of Trade
/GDP
• The fixed effect “a” captures the unobservable differences
between Countries A and B not captured by value of trade, but
having a potential impact on either or both variables.
• An example of "a" in this context is the political power of
labor unions in a country.
4
Fixed Effect Regression: Example
 Fixed Effect Regression Model:
gdp_growth(i,t) =  + a(i) +
1  unemp_rate (i,t) +
2  tariff_rate (i,t) +
3  inflat_cpi(i,t) +
4  trade_gdp(i,t) + (i,t)
 Dependent variable:
gdp_growth (i,t) is the GDP growth rate of country i in year t.
 Independent (explanatory) variables:
unemp_rate (i,t) is the unemployment rate of country i in year t.
tariff_rate (i,t) is the mean weighted tariff rate of country i in year t.
inflat_cpi(i,t) is the inflation rate of country i in year t.
trade_gdp(i,t) is the overall vale of trade as a % of GDP of country i
in year t.
6
Fixed Effect Regression: Example
 Fixed Effect Regression Model (mean adjustment
for excel):
(gdp_growth(i,t) − gdp_growth(i) ) =  +
1  unemp_rate (i,t) − unemp_rate (i) +
2  tariff_rate (i,t) − tariff_rate (i) +
3  inflat_cpi(i,t) − inflat_cpi(i) +
4  trade_gdp(i,t) − trade_gdp(i) + (i,t)

where, gdp_growth(i) , unemp_rate (i) , tariff_rate (i),


inflat_cpi(i) & trade_gdp(i) are mean values of each of
these variables calculated separately for each country
across all the years.
7
Setting Hypotheses
 Hypotheses:
 ෟ
H1: 𝒖𝒏𝒆𝒎𝒑_𝒓𝒂𝒕𝒆 ෣
has a negative effect on gdp_growth
(1 < 0)
 Explanation?
 ෣
H2: 𝒕𝒂𝒓𝒊𝒇𝒇_𝒓𝒂𝒕𝒆 ෣
has a negative effect on gdp_growth (2
< 0)
 Explanation?
 ෣ has a negative effect on gdp_growth
H3: inflat_cpi ෣ (3 <
0)
 Explanation?
 ෣ has a positive effect on gdp_growth
H4: trade_gdp ෣ (4 > 0)
 Explanation?

Note: All the above variables are mean adjusted.


8
Data Collection: Illustration
 At the World Bank data website:
(https://databank.worldbank.org/reports.aspx?so
urce=world-development-indicators)
 Database: Choose World Development
Indicators (WDI)
 Country: 10 (for our example dataset)
Series: GDP growth, Unemployment, Tariff,
Inflation, Trade (% GDP) (for our example)
Year: 2010 – 2019 (in our dataset)
 Example dataset: data_wdi
 10 countries  10 years panel dataset (after
creating the fixed effect adjusted data)
9
Data Analysis: Excel
 Step 1: Install Add-in “Analysis ToolPak” in Excel

10
Data Analysis: Excel
 Step 2: Select Data Analysis tool under Data tab

11
Data Analysis: Excel
 Step 3: Input the range for dependant variable (Y) and the
independent variables (X).

12
Data Analysis: Excel
 Step 4: Collect the relevant results.

13
Data Analysis: Excel Results
 Collect the results through a following table:
Dep. Variable: gdp_growth
No. Observations: 100
R-squared: 0.26561

Parameter Estimates
========================================================
Parameter Std. T-stat P-value
Error

Const -3.1x10-10 0.2383 -1.3x10-10 0.999631


(adj) unemp_rate -0.05075 0.133439 -0.3803 0.704574
(adj) tariff_rate -0.189382 0.339531 0.557776 0.57831
(adj) inflat_cpi -0.24105 0.051779 -4.65545 1.05x10-5
(adj) trade_gdp 0.10366 0.025378 4.08465 9.21x10-5

14
Hypothesis Testing: Results
Interpretation
 The overall model accuracy measured by R-
square (R2):
0  R2  1

 It is the proportion of the variance that can


be explained by the model.
 The larger is R2, the more accurate is the model
in fitting the data.

15
Hypothesis Testing: Results
Interpretation
 t-statistic - estimated coefficient divided by its
standard error.
 the smaller the standard error (i.e., the larger is
the absolute value of the t-statistic), the more
statistically significant is the estimated coefficient.

 p-value of the t-statistic: the probability that


the null hypothesis ( = 0) is wrongly rejected
(i.e., Type-I error)
 the smaller is the p-value, the higher is the
confidence in rejecting the null hypothesis ( = 0).

16
Hypothesis Testing: Decision Rule
 Significance of Estimated Coefficient:
 When p-value ≤ 0.01, one can reject the null
hypothesis of  = 0 at the 1% level of significance.

 When p-value ≤ 0.05, one can reject the null


hypothesis of  = 0 at the 5% level of significance.

 When p-value > 0.05, one cannot reject the null


hypothesis of  = 0 (at least at the 5% level of
significance) => there is lack of evidence
indicating that the estimated parameter is not
equal to zero.
17
Hypothesis Testing
 Conclusions:
 H1 is not supported: The effect of unemp_rate on
gdp_growth is not significantly different from zero
(p-value is high => 1 = 0). (Why?)
 H2 is not supported: The effect of tariff_rate on
gdp_growth is not significantly different from zero
(p-value is high => 2 = 0). (Why?)
 H3 is supported: inflat_cpi has a significantly
negative effect on gdp_growth (3 < 0).
 H4 is supported: trade_gdp has a significantly
positive effect on gdp_growth (4 > 0).
18
Data Analysis: Important Caveat
 Correlation DOES NOT IMPLY Causation
 Storks Deliver Babies (p= 0.008) – studies the relationship
between the number of storks in a country and the number of
human births using data from 17 European countries.
 Finds the existence of a statistically significant correlation (p-value =
0.008) between stork populations and human birth rates.
 Does this really imply that storks do actually delivers
babies?
 – unmindful and reckless usage of correlation and p-values
can deliver unreliable conclusions!
 Usually such spurious correlations occur due to the presence of a
confounding variable (in this particular case country size).
 To reduce the possibility of such false and misleading
results it is important to have:
i. a good statistical design that includes all relevant variables,
including possible confounders
ii. a compelling conceptual analysis that demonstrates a
potential causal mechanism in the statistical results
19

You might also like