Topic 6
Application of Data Science in Global Economic Analysis
1
Outline
Panel Data
Economic relevance
Fixed Effects Regression
Concept
Application
Example
Setting Hypotheses
Data Collection & Fixed Effect Adjustment
Data Analysis - Excel
Excel Setup
Excel Implementation
Regression Results
Hypotheses Testing
Results Interpretation
Decision Rule
Conclusions (Example)
Data Analysis: Important Caveat
2
Panel Data
Panel Data (or Longitudinal Data): data that contains
observations about different cross sections across time,
a combination of time-series and cross-sectional data.
most common form of economic data for multiple countries
Example
Current A/C
Year Country Population Money Supply Balance (% of
Growth GDP)
2018 Singapore 5638676 3.90187887 15.40885493
2018 Thailand 69428454 4.667547639 5.610325153
2018 United States 326838199 4.030420024 -2.181753513
2019 Singapore 5703569 4.951350605 14.26299854
2019 Thailand 69625581 3.636896985 7.019674757
2019 United States 328329953 8.39189902 -2.240577453
2 years x 3 countries panel = 6 country-year observations
3
Fixed Effects Regression
A fixed effects regression is an estimation
technique that allows control for UNOBSERVED
individual characteristics that do not vary with
time (i.e. fixed), but that might impact both
independent and/or dependent variables in the
regression analysis.
It is usually employed in an analysis involving panel
data.
Example: a multiple country panel data analysis where
each country might have individually different but
unobserved characteristic that do not vary with time,
which however might have an impact on the observed
variables in the analysis.
4
Fixed Effects Regression: Example
GDP
Growth Country A
Country B
+a
Value of Trade
/GDP
• The fixed effect “a” captures the unobservable differences
between Countries A and B not captured by value of trade, but
having a potential impact on either or both variables.
• An example of "a" in this context is the political power of
labor unions in a country.
4
Fixed Effect Regression: Example
Fixed Effect Regression Model:
gdp_growth(i,t) = + a(i) +
1 unemp_rate (i,t) +
2 tariff_rate (i,t) +
3 inflat_cpi(i,t) +
4 trade_gdp(i,t) + (i,t)
Dependent variable:
gdp_growth (i,t) is the GDP growth rate of country i in year t.
Independent (explanatory) variables:
unemp_rate (i,t) is the unemployment rate of country i in year t.
tariff_rate (i,t) is the mean weighted tariff rate of country i in year t.
inflat_cpi(i,t) is the inflation rate of country i in year t.
trade_gdp(i,t) is the overall vale of trade as a % of GDP of country i
in year t.
6
Fixed Effect Regression: Example
Fixed Effect Regression Model (mean adjustment
for excel):
(gdp_growth(i,t) − gdp_growth(i) ) = +
1 unemp_rate (i,t) − unemp_rate (i) +
2 tariff_rate (i,t) − tariff_rate (i) +
3 inflat_cpi(i,t) − inflat_cpi(i) +
4 trade_gdp(i,t) − trade_gdp(i) + (i,t)
where, gdp_growth(i) , unemp_rate (i) , tariff_rate (i),
inflat_cpi(i) & trade_gdp(i) are mean values of each of
these variables calculated separately for each country
across all the years.
7
Setting Hypotheses
Hypotheses:
ෟ
H1: 𝒖𝒏𝒆𝒎𝒑_𝒓𝒂𝒕𝒆
has a negative effect on gdp_growth
(1 < 0)
Explanation?
H2: 𝒕𝒂𝒓𝒊𝒇𝒇_𝒓𝒂𝒕𝒆
has a negative effect on gdp_growth (2
< 0)
Explanation?
has a negative effect on gdp_growth
H3: inflat_cpi (3 <
0)
Explanation?
has a positive effect on gdp_growth
H4: trade_gdp (4 > 0)
Explanation?
Note: All the above variables are mean adjusted.
8
Data Collection: Illustration
At the World Bank data website:
(https://databank.worldbank.org/reports.aspx?so
urce=world-development-indicators)
Database: Choose World Development
Indicators (WDI)
Country: 10 (for our example dataset)
Series: GDP growth, Unemployment, Tariff,
Inflation, Trade (% GDP) (for our example)
Year: 2010 – 2019 (in our dataset)
Example dataset: data_wdi
10 countries 10 years panel dataset (after
creating the fixed effect adjusted data)
9
Data Analysis: Excel
Step 1: Install Add-in “Analysis ToolPak” in Excel
10
Data Analysis: Excel
Step 2: Select Data Analysis tool under Data tab
11
Data Analysis: Excel
Step 3: Input the range for dependant variable (Y) and the
independent variables (X).
12
Data Analysis: Excel
Step 4: Collect the relevant results.
13
Data Analysis: Excel Results
Collect the results through a following table:
Dep. Variable: gdp_growth
No. Observations: 100
R-squared: 0.26561
Parameter Estimates
========================================================
Parameter Std. T-stat P-value
Error
Const -3.1x10-10 0.2383 -1.3x10-10 0.999631
(adj) unemp_rate -0.05075 0.133439 -0.3803 0.704574
(adj) tariff_rate -0.189382 0.339531 0.557776 0.57831
(adj) inflat_cpi -0.24105 0.051779 -4.65545 1.05x10-5
(adj) trade_gdp 0.10366 0.025378 4.08465 9.21x10-5
14
Hypothesis Testing: Results
Interpretation
The overall model accuracy measured by R-
square (R2):
0 R2 1
It is the proportion of the variance that can
be explained by the model.
The larger is R2, the more accurate is the model
in fitting the data.
15
Hypothesis Testing: Results
Interpretation
t-statistic - estimated coefficient divided by its
standard error.
the smaller the standard error (i.e., the larger is
the absolute value of the t-statistic), the more
statistically significant is the estimated coefficient.
p-value of the t-statistic: the probability that
the null hypothesis ( = 0) is wrongly rejected
(i.e., Type-I error)
the smaller is the p-value, the higher is the
confidence in rejecting the null hypothesis ( = 0).
16
Hypothesis Testing: Decision Rule
Significance of Estimated Coefficient:
When p-value ≤ 0.01, one can reject the null
hypothesis of = 0 at the 1% level of significance.
When p-value ≤ 0.05, one can reject the null
hypothesis of = 0 at the 5% level of significance.
When p-value > 0.05, one cannot reject the null
hypothesis of = 0 (at least at the 5% level of
significance) => there is lack of evidence
indicating that the estimated parameter is not
equal to zero.
17
Hypothesis Testing
Conclusions:
H1 is not supported: The effect of unemp_rate on
gdp_growth is not significantly different from zero
(p-value is high => 1 = 0). (Why?)
H2 is not supported: The effect of tariff_rate on
gdp_growth is not significantly different from zero
(p-value is high => 2 = 0). (Why?)
H3 is supported: inflat_cpi has a significantly
negative effect on gdp_growth (3 < 0).
H4 is supported: trade_gdp has a significantly
positive effect on gdp_growth (4 > 0).
18
Data Analysis: Important Caveat
Correlation DOES NOT IMPLY Causation
Storks Deliver Babies (p= 0.008) – studies the relationship
between the number of storks in a country and the number of
human births using data from 17 European countries.
Finds the existence of a statistically significant correlation (p-value =
0.008) between stork populations and human birth rates.
Does this really imply that storks do actually delivers
babies?
– unmindful and reckless usage of correlation and p-values
can deliver unreliable conclusions!
Usually such spurious correlations occur due to the presence of a
confounding variable (in this particular case country size).
To reduce the possibility of such false and misleading
results it is important to have:
i. a good statistical design that includes all relevant variables,
including possible confounders
ii. a compelling conceptual analysis that demonstrates a
potential causal mechanism in the statistical results
19