The Nature of Qualitative Information
Main Idea
Until now, econometric analysis has focused on variables that are quantitative—meaning they
can be measured with numbers (e.g., income, prices, GDP). But in the real world, qualitative
information also plays a major role in influencing outcomes, and this chapter explains how to
include it in econometric models using dummy variables.
Problem: Some Important Variables Are Not Numbers
Some things that affect outcomes aren’t easily measured in numbers—but they still matter a
lot. These are qualitative variables.
📊 Examples in Cross-Sectional Data (Data from many people at one time):
1. Gender may influence salary.
2. Ethnicity may affect spending and saving habits.
3. Education level can change income from jobs.
4. Union membership can impact workplace treatment or policies.
These are examples where differences between people (not numbers) matter.
📈 Examples in Time Series Data (Data over time for one subject):
1. Political changes might affect how companies operate or how jobs are managed.
2. Wars can change the whole economy.
3. Days of the week or months of the year might influence stock prices.
4. Seasons affect product demand—like more ice cream in summer or fur coats in winter.
So, even over time, non-numeric factors can affect outcomes.
Solution: Use Dummy (Dichotomous) Variables
The chapter introduces dummy variables—a way to turn qualitative information into a form we
can include in regression models.
Dummy variable = a variable that takes only 2 values, usually 0 or 1.
o Example: If gender is the variable:
Male = 1, Female = 0 (or vice versa)
o If it’s a season:
Summer = 1, Other seasons = 0
These variables help capture the influence of categories or events that can't be measured
numerically.
Main Idea
Sometimes, the intercept (constant term) in a regression model isn’t the same for all
observations—especially when qualitative differences exist (like regions, genders, or time
periods). To account for these differences, we use dummy variables.
Key Concepts Explained Simply
📌 1. The Usual Regression Equation
Let’s say you have this simple regression:
Yi=β1+β2X2i+uiY
Here:
Yi could be GDP growth.
X2i could be investment rate.
β1 is the intercept—the average value of Y when X2=0X_2 = 0X2=0.
But this equation assumes that all countries or individuals have the same starting point (same
intercept)—which isn’t realistic.
📌 2. Why the Intercept May Differ
Let’s say you’re analyzing EU countries’ GDP growth. There may be real differences between:
Core countries (like Germany or France)
Peripheral countries (like Greece or Portugal)
These differences might shift the whole regression line up or down. So we want to capture this
qualitative difference in the regression.
✅ 3. Enter the Dummy Variable
We create a new variable, DDD, to represent the region:
D=1 for core countries
D=0 for peripheral countries
This variable turns qualitative information into a numeric form that we can use.
📌 4. New Regression with Dummy
The updated model is:
Yi = β1 + β2X2i + β3Di + uiY
Let’s break this down:
When D=0 (peripheral country), the equation becomes:
Yi = β1 + β2X2i + uiY
When D=1 (core country), the equation becomes:
Yi = (β1+β3) + β2X2i + uiY
So, only the intercept changes depending on whether the country is core or peripheral.
📊 5. What does β₃ mean?
If β₃ > 0, the intercept is higher for core countries → they have higher average GDP
growth at any level of investment.
If β₃ < 0, then peripheral countries perform better.
If β₃ = 0, then there’s no difference.
🔍 6. How to Test It
We check if β₃ is significantly different from zero using a t-test.
If it’s significant → the difference between the two groups matters statistically.
📦 Other Examples of Dummy Variables
1. Salary model:
o Y= salary, X= years of experience, dummy for gender:
D=1 for male, D=0 for female
2. Time series model:
o Dummy = 1 during war years, 0 otherwise
3. Event study:
o Dummy = 1 during oil shock period, 0 otherwise
✅ In Short
Dummy variables let us change the intercept in a regression model based on qualitative
group differences.
They help include important non-numeric information (like region, gender, or special
time periods).
The coefficient on the dummy (β₃) tells us how much the baseline level shifts for that
group.
We test whether the dummy really matters using standard statistical tests.
Slope Dummy Variables
Main Idea
Previously, we saw how dummy variables could change the intercept of a regression line—this
is called a constant dummy. Now we’re learning how dummy variables can also affect the slope
of the regression line, meaning the relationship between X and Y can change across groups or
time periods. These are called slope dummy variables.
Key Concepts (Explained Simply)
📌 1. What Is the Slope in a Regression?
In a simple regression:
Yt=β1+β2X2t+utY_t
β2β_2 is the slope—it tells us how much Y changes when X changes.
For example, in a Keynesian consumption model, Y is consumer spending, and X is
income.
Then β2β_2 is the marginal propensity to consume (MPC)—how much more people
spend when they earn a little more.
📌 2. What If the Slope Changes Over Time?
Let’s say:
You have UK data from 1970–1999.
You believe the MPC changed after 1982 because of the oil price shock.
To test this, create a dummy variable:
D = 0 for years 1970–1981
D = 1 for years 1982–1999
✅ 3. How to Model a Changing Slope
To let the slope change, we multiply the dummy with X:
Yt = β1 + β2X2t + β3(Dt⋅X2t) + utY
This is called a slope interaction term.
📊 4. What Happens in Each Period?
Before 1982 (D = 0):
Yt = β1 + β2X2t + β3(0) X2t+ ut = β1 + β2X2t + utY_t
After 1982 (D = 1):
Yt = β1 + β2X2t + β3 (1) X2t + ut = β1 + (β2+β3)X2t + utY_t
So:
The slope is β2β_2 before 1982.
The slope becomes β2+β3β_2 + β_3 after 1982.
🔍 5. What Does β3β_3 Tell Us?
If β_3 > 0: the slope increased → MPC rose after 1982.
If β_3 < 0: the slope decreased → MPC fell after 1982.
If β_3 = 0: no change → no difference in slope across the periods.
📈 6. Why Use Slope Dummies?
Because sometimes, the effect of X on Y isn’t constant. Different:
Time periods
Groups
Events (like policy changes or crises)
…can change how strongly X affects Y.
✅ In Short:
Constant dummy changes the intercept (baseline value of Y).
Slope dummy changes the relationship between X and Y.
Use slope dummies when you believe the effect of X on Y changes across groups or over
time.
You do this by creating an interaction term: dummy × X.
You then check if the slope change (β₃) is significant.
This approach allows for more flexible and realistic models that can reflect how relationships
change due to economic events, policies, or group differences.
Combined Effect of Intercept and Slope Dummies
Main Idea
We’ve seen how dummy variables can be used in a regression to change:
Only the intercept (constant dummy)
Only the slope (slope dummy)
Now we’re looking at what happens when a dummy variable affects both — intercept and
slope. This gives us a more flexible model that lets both the starting point and the rate of change
vary across groups or time periods.
Step-by-Step Breakdown
📌 1. The Base Model
Start with a basic regression:
Yt=β1+β2X2t+utY_t
Here:
YtY_tYt is the dependent variable (e.g., consumption).
X2tX_{2t}X2t is the independent variable (e.g., income).
β1β_1β1 is the intercept.
β2β_2β2 is the slope (how much Y changes as X changes).
📌 2. Add a Dummy Variable
Let’s say there was a structural change after time period s, and you believe both the intercept and
slope changed.
Define a dummy variable:
Dt=0 for t=1,2,...,st = 1, 2, ..., st=1,2,...,s
Dt=1 for t=s+1,...,Tt = s+1, ..., Tt=s+1,...,T
✅ 3. Build the Full Model
To allow both intercept and slope to change, include two terms:
Dt: for intercept change
Dt⋅X2t for slope change
So the model becomes:
Yt = β1 + β2X2t + β3Dt + β4(Dt⋅X2t) + utY_t
📊 4. Two Scenarios
Before the change (D = 0):
Yt = β1 + β2X2t + utY_t
→ Normal regression line with original intercept and slope.
After the change (D = 1):
Yt = (β1+β3) + (β2+β4)X2t + utY_t
→ Both the intercept and slope are different.
📈 5. What Each Coefficient Means
β3: change in intercept after the dummy switch
β4: change in slope after the dummy switch
So:
If β3>0, the whole line shifts up after the change.
If β4>0, the line gets steeper—the effect of X on Y increases.
If both are significant, the entire regression relationship is different in the two time
periods (or two groups).
📐 6. Graphical Interpretation
Before change: Regression line with slope β2 and intercept β1
After change: New regression line with slope β2+β4 and intercept β1+β3
Depending on the signs and values of β3and β4, the second line could:
Shift up or down
Rotate to be steeper or flatter
✅ Computer Example Summary: The Use of Dummy Variables in Wage
Determination
This example uses real data from 935 individuals on wages and IQ, along with a gender
dummy (male = 1 for men, 0 for women), to explore the effects of gender and IQ on wages
using EViews regression analysis.
🔹 Step 1: Basic Regression (No Dummy)
Command: ls wage c iq
Model:
WAGEi=β1+β2⋅IQi+ui\text{WAGE}_i = \beta_1 + \beta_2 \cdot \text{IQ}_i + u_i
Variable Coefficient Interpretation
Intercept
116.99 Base wage when IQ = 0
(C)
Each 1-point increase in IQ raises wage by
IQ 8.30
8.3 units
Low explanatory power (only IQ
R² = 0.0955
included)
✅ Conclusion: IQ significantly affects wages. But model is incomplete.
🔹 Step 2: Add Gender Dummy (Intercept Only)
Command: ls wage c iq male
Model:
WAGEi=β1+β2⋅IQi+β3⋅MALEi+ui\text{WAGE}_i = \beta_1 + \beta_2 \cdot \text{IQ}_i + \
beta_3 \cdot \text{MALE}_i + u_i
Variable Coefficient Interpretation
Intercept (C) 224.84 Wage for a female with IQ = 0
IQ 5.08 Each IQ point raises wage by 5.08 units (same for both sexes)
MALE 498.05 Being male adds 498 units to the base wage
✅ Conclusion: Males earn significantly more than females, even after controlling for IQ.
📈 R² jumps to 0.455 → gender is an important wage determinant.
🔹 Step 3: Use a Slope Dummy (Interaction: IQ × MALE)
Command: ls wage c iq male*iq
Model:
WAGEi=β1+β2⋅IQi+β4⋅(MALEi⋅IQi)+ui
Variable Coefficient Interpretation
Intercept (C) 412.86 Base wage for females with IQ = 0
IQ 3.18 Each IQ point raises female wage by 3.18 units
MALE × IQ 4.84 Additional effect for men → male slope = 3.18 + 4.84 = 8.02
✅ Conclusion:
IQ has a greater impact on male wages.
Marginal effect of IQ is much stronger for males.
R² improves slightly to 0.458, suggesting better fit than the previous model.
🔍 Interpretation Summary
Model Gender Effect IQ Effect
Model 1 Ignored 8.3 (same for all)
Model 2 (Intercept Dummy) Males earn +498 units 5.08 (same for all)
Model 3 (Slope Dummy) No shift in base wage IQ effect = 3.18 (F), 8.02 (M)
✅ Key Takeaways
1. Dummy variables let us test qualitative effects (e.g., gender).
2. A constant dummy changes the intercept → shows group-level wage differences.
3. A slope dummy allows differences in how one variable (IQ) affects the dependent
variable (wage) across groups.
4. Combining both gives a flexible model to capture complex real-world effects.
✅ Analysis: Using Both Intercept and Slope Dummies in Wage Regression
In this final step, both a constant dummy (MALE) and a slope dummy (MALE × IQ) are
used in the regression to fully examine gender differences in wages.
🔹 Model Specification
WAGEi=β1+β2⋅IQi+β3⋅MALEi+β4⋅(MALEi⋅IQi)+ui\text{WAGE}_i = \beta_1 + \beta_2 \cdot \
text{IQ}_i + \beta_3 \cdot \text{MALE}_i + \beta_4 \cdot (\text{MALE}_i \cdot \text{IQ}_i) +
u_i
EViews Command:
ls wage c iq male male*iq
🔹 Regression Output (Table 9.4)
Variable Coefficient Interpretation
Intercept (C) 357.86 Base wage for females (IQ = 0)
IQ 3.73 Effect of IQ on female wages
Variable Coefficient Interpretation
MALE 149.10 Additional base wage for males (not statistically significant)
MALE × IQ 3.41 Additional IQ effect for males (statistically significant)
R² = 0.459 → model explains ~45.9% of wage variation.
Adjusted R² = 0.457 → only a marginal improvement over the slope-only dummy
model.
🔍 Interpretation
For females:
WAGEf=357.86+3.73⋅IQ\text{WAGE}_f = 357.86 + 3.73 \cdot \text{IQ}
For males:
WAGEm=(357.86+149.10)+(3.73+3.41)⋅IQ=506.96+7.14⋅IQ\text{WAGE}_m = (357.86
+ 149.10) + (3.73 + 3.41) \cdot \text{IQ} = 506.96 + 7.14 \cdot \text{IQ}
However:
The intercept shift (149.10) is not statistically significant (p = 0.286).
The slope difference (3.41) is statistically significant (p = 0.012).
✅ Conclusion:
Once the difference in IQ returns is accounted for, the baseline wage gap (intercept)
between males and females disappears statistically.
What really drives the gender wage difference is the higher return to IQ for males,
not a fixed wage premium.
📌 Final Takeaways
Combined dummies allow both the starting point (intercept) and rate of change (slope)
to differ across groups.
In this example, only slope differences matter: males benefit more from higher IQs,
while the base wage gap is not significant after accounting for that.
This highlights the importance of interaction terms when analyzing differential effects
across groups.
✅ Special Case: Using Dummy Variables with Multiple Categories
This section explains how to handle categorical variables with more than two levels (e.g.,
educational attainment) in regression analysis using dummy variables — and how to avoid the
dummy variable trap.
🎓 Example: Education and Wages
You're modeling:
Yi=WAGEi;X2i=Years of ExperienceY_i = \text{WAGE}_i \quad ; \quad X_{2i} = \text{Years
of Experience}
And individuals belong to one of four education levels:
D1: Primary only
D2: Secondary only
D3: BSc
D4: MSc
Each is coded as a dummy:
Dk={1if individual belongs to category k0otherwiseD_k = \begin{cases} 1 & \text{if individual
belongs to category k} \\ 0 & \text{otherwise} \end{cases}
But:
D1+D2+D3+D4=1(for all individuals)D1 + D2 + D3 + D4 = 1 \quad \text{(for all individuals)}
This leads to perfect multicollinearity if all four are included, known as the dummy variable
trap.
🧠 Avoiding the Dummy Variable Trap
To avoid this, omit one dummy variable. The omitted group becomes the reference group.
Suppose we omit D1 (Primary Education).
Then the model becomes:
Yi=β1+β2X2i+a2D2i+a3D3i+a4D4i+uiY_i = \beta_1 + \beta_2 X_{2i} + a_2 D2_i + a_3 D3_i
+ a_4 D4_i + u_i
Now:
β1 = baseline wage (for primary education group)
a₂ = difference in wage between secondary and primary
a₃ = difference in wage between BSc and primary
a₄ = difference in wage between MSc and primary
🧾 Interpretation for Each Group
1. Primary (D2 = D3 = D4 = 0)
Yi=β1+β2X2iY_i = \beta_1 + \beta_2 X_{2i}
2. Secondary (D2 = 1)
Yi=(β1+a2)+β2X2iY_i = (\beta_1 + a_2) + \beta_2 X_{2i}
3. BSc (D3 = 1)
Yi=(β1+a3)+β2X2iY_i = (\beta_1 + a_3) + \beta_2 X_{2i}
4. MSc (D4 = 1)
Yi=(β1+a4)+β2X2iY_i = (\beta_1 + a_4) + \beta_2 X_{2i}
Each dummy shifts the intercept, meaning the regression lines are parallel (same slope), but
start at different wage levels depending on education.
📈 Visual Representation (Based on Figure 9.6)
In the graph:
X-axis: Experience (X₂)
Y-axis: Wage (Y)
Lines for each education group are parallel, but:
o Secondary: starts at β₁ + a₂
o BSc: starts at β₁ + a₃
o MSc: starts at β₁ + a₄
o Primary (reference): starts at β₁
This setup quantifies the wage premium for higher education levels compared to primary
education.
📝 Key Rules and Notes
Only (k − 1) dummies for a categorical variable with k categories
Omitting a dummy sets the baseline/reference group
The choice of omitted category doesn’t affect fit, only interpretation
Including all categories causes perfect multicollinearity → regression cannot run
✅ Conclusion
Using dummy variables with multiple categories allows you to:
Compare subgroups (e.g., education levels)
Measure how each group deviates from a reference
Avoid the dummy variable trap by omitting one category
This section shows how to include multiple dummy variables—some with more than two
categories—in a single regression model. Let's break down the structure and interpretation.
🧾 Full Regression Model with Multiple Dummies
The estimated wage equation is:
Yi=β1+β2X2i+β3EDUC2i+β4EDUC3i+β5EDUC4i+β6SEXMi+β7AGE2i+β8AGE3i+β9OCUP
2i+β10OCUP3i+β11OCUP4i+uiY_i = \beta_1 + \beta_2 X_{2i} + \beta_3 \text{EDUC2}_i + \
beta_4 \text{EDUC3}_i + \beta_5 \text{EDUC4}_i + \beta_6 \text{SEXM}_i + \beta_7 \
text{AGE2}_i + \beta_8 \text{AGE3}_i + \beta_9 \text{OCUP2}_i + \beta_{10} \
text{OCUP3}_i + \beta_{11} \text{OCUP4}_i + u_i
Where:
X2iX_{2i}: Years of experience (continuous)
EDUC\text{EDUC}: Education level (4 categories)
SEX\text{SEX}: Gender (binary)
AGE\text{AGE}: Age group (3 categories)
OCUP\text{OCUP}: Occupation type (4 categories)
📌 Dummy Variables & Reference Categories
Variable Type Dummies Used in Model Reference Category
Education EDUC2, EDUC3, EDUC4 EDUC1 (Primary)
Variable Type Dummies Used in Model Reference Category
Gender SEXM SEXF (Female)
Age AGE2, AGE3 AGE1 (<30 years)
Occupation OCUP2, OCUP3, OCUP4 OCUP1 (Unskilled)
You always omit one dummy per category to avoid the dummy variable trap (perfect
multicollinearity).
✅ Interpreting the Coefficients
Each dummy coefficient tells us how that category differs from the reference group:
β₃ (EDUC2): Wage difference between secondary and primary education
β₄ (EDUC3): Difference between BSc and primary
β₅ (EDUC4): Difference between MSc and primary
β₆ (SEXM): Wage difference between male and female
β₇ (AGE2): Difference between 30–40 years and <30
β₈ (AGE3): Difference between >40 years and <30
β₉ (OCUP2): Difference between skilled and unskilled
β₁₀ (OCUP3): Difference between clerical and unskilled
β₁₁ (OCUP4): Difference between self-employed and unskilled
📘 Example Interpretation
If you obtain the following estimates:
Variable Coefficient
β₁ (Constant) 250
β₂
10
(Experience)
β₄ (EDUC3) 100
β₆ (SEXM) 50
β₈ (AGE3) 40
β₁₁ (OCUP4) −30
Then for a 40-year-old male with BSc, 10 years of experience, self-employed, the wage would
be:
Yi=250+10(10)+100+50+40−30=510Y_i = 250 + 10(10) + 100 + 50 + 40 - 30 = 510
If the same individual was female (SEXM = 0), her wage would be:
Yi=250+100+100+40−30=460Y_i = 250 + 100 + 100 + 40 - 30 = 460
⚠️Important Notes
If you include all dummies from each category, OLS fails due to exact
multicollinearity.
The choice of reference group affects only interpretation, not the model's overall fit.
Be careful when interpreting results: each coefficient shows the effect relative to the
reference category, not an absolute effect.
✅ Summary
Using multiple dummies in a regression allows you to:
Account for multiple qualitative (categorical) factors.
Estimate wage (or other dependent variable) differences between specific subgroups.
Ensure comparability by defining reference groups and interpreting coefficients
accordingly.
This section explains how to use seasonal dummy variables in time series analysis to capture
seasonal effects—systematic variations in the dependent variable that recur at regular intervals
(e.g., each quarter or month).
📅 Seasonal Dummy Variables: The Setup
Let's say you have quarterly data and want to account for seasonal differences across the year.
You define dummy variables like this:
D1=1D_1 = 1 if Q1, else 0
D2=1D_2 = 1 if Q2, else 0
D3=1D_3 = 1 if Q3, else 0
D4=1D_4 = 1 if Q4, else 0
However, in the regression model, you only include three dummies (e.g., D2,D3,D4D_2, D_3,
D_4) to avoid multicollinearity (dummy variable trap).
📉 Regression Model with Seasonal Dummies
Yt=β1+β2X2t+α2D2t+α3D3t+α4D4t+utY_t = \beta_1 + \beta_2 X_{2t} + \alpha_2 D_{2t} + \
alpha_3 D_{3t} + \alpha_4 D_{4t} + u_t
Where:
YtY_t: Dependent variable (e.g., output, returns) at time tt
X2tX_{2t}: Continuous explanatory variable (e.g., GDP, temperature, etc.)
D2t,D3t,D4tD_{2t}, D_{3t}, D_{4t}: Seasonal dummy variables
D1D_1: Reference group (Q1)
α2,α3,α4\alpha_2, \alpha_3, \alpha_4: Measure the seasonal effect relative to Q1
✅ How to Interpret Seasonal Dummies
α2\alpha_2: The difference in the mean of YtY_t in Q2 vs. Q1
α3\alpha_3: Difference in Q3 vs. Q1
α4\alpha_4: Difference in Q4 vs. Q1
If:
α3=20\alpha_3 = 20, it means that, holding other factors constant, Q3 values are on
average 20 units higher than Q1 values.
If any seasonal coefficient is statistically significant, it implies the presence of a
seasonal effect for that quarter.
📆 For Monthly Data
You would define 12 monthly dummies: M1M_1 (January), M2M_2 (February), ...,
M12M_{12} (December).
To avoid the dummy variable trap:
Include only 11 of them in the regression.
The omitted month (e.g., January) becomes the reference category.
📊 Illustration: January Effect Hypothesis
Suppose you are analyzing monthly stock returns and want to test if January returns differ
significantly from other months.
Model:
Returnt=β1+β2M2t+⋯+β12M12t+ut\text{Return}_t = \beta_1 + \beta_2 M_{2t} + \cdots + \
beta_{12} M_{12t} + u_t
Where:
M2M_2: February, ..., M12M_{12}: December
January is the omitted month
Then:
Each βj\beta_j tells you how average returns in month jj differ from January.
If many βj\beta_j are significantly negative, you might find evidence supporting a
January effect.
🛠️Why Use Seasonal Dummies?
Control for systematic fluctuations in the data that recur with time.
Improve the accuracy of forecasts.
Identify seasonal trends or anomalies (like holiday spikes, Q4 sales, etc.).
Summary
Concept Explanation
Seasonal dummy Captures the effect of time-specific fluctuations (e.g., quarters or months)
Reference category One dummy is omitted to avoid multicollinearity
Interpretation Coefficients show differences from the reference period
Use case Time series data with regular periodic variation
Here is a clear summary and explanation of the concepts presented in the section you've
provided on tests for structural stability:
1. Dummy Variable Approach to Structural Stability
Purpose:
To determine if the relationships in a regression model change across different conditions or time
periods—i.e., whether the estimated parameters (intercept and slopes) remain stable.
Method:
Introduce dummy variables into the regression.
Use:
o A dummy variable to shift the intercept, and
o Interaction (multiplicative) dummy variables to test if slopes change.
Model Example:
For two time periods (before and after an event):
Yt=β1+β2Xt+δ0Dt+δ1(Dt×Xt)+utY_t = β₁ + β₂X_t + δ₀D_t + δ₁(D_t × X_t) + u_t
Where:
o Dt=1D_t = 1 for one period (e.g., after an oil shock), 0 otherwise
o δ0δ₀ tests for intercept change
o δ1δ₁ tests for slope change
Key Points:
Only one equation is estimated.
t-tests evaluate the significance of each parameter change.
Wald test can assess the joint significance of all dummy-related terms.
Advantages:
One equation estimates multiple structures.
Minimal loss of degrees of freedom.
Uses full sample (increases precision).
Identifies which coefficients are unstable.
2. Chow Test for Structural Stability
Purpose:
Formally test whether the entire model structure changes between two subsamples.
Steps:
1. Estimate the model on:
o Whole sample: get SSRnSSR_n
o Subsample 1: get SSRn1SSR_{n1}
o Subsample 2: get SSRn2SSR_{n2}
2. Calculate the Chow F-statistic:
F=[SSRn−(SSRn1+SSRn2)]/k(SSRn1+SSRn2)/(n1+n2−2k)F = \frac{[SSR_n - (SSR_{n1} +
SSR_{n2})] / k}{(SSR_{n1} + SSR_{n2}) / (n1 + n2 - 2k)}
kk: number of parameters (e.g., intercept + 1 slope = 2)
n1,n2n1, n2: number of observations in each subsample
3. Compare this F-statistic to the critical value from an F-distribution.
Interpretation:
If F > F-critical, reject the null hypothesis H0H_0: parameters are not stable.
Suggests structural break occurred between subsamples.
Limitations:
Doesn’t specify which parameters changed.
Less efficient than dummy variable approach due to splitting data into subsamples.
Comparison: Dummy Variables vs. Chow Test
Feature Dummy Variable Approach Chow Test
Data usage Full sample Subsamples
Degrees of freedom Better preserved Reduced in subsamples
Identifies specific coefficient changes ✅ Yes ❌ No
Simpler to implement for small models ✅ ✅
Joint significance testing Wald test F-test
Here's a step-by-step guide for creating daily dummies in Stata using the BARC_DOW.dat
dataset for Barclays stock:
✅ Step 1: Load and Set Up the Time Series
First, load the dataset and convert the date string into a proper Stata date:
gen datevar = date(time, "DMY") // Convert string to date
format datevar %td // Format as Stata date
sort datevar // Sort by date
tsset datevar // Declare time series
✅ Output should show:
time variable: datevar, 04jan2016 to 23jan2020, but with gaps
delta: 1 day
✅ Step 2: Generate Week Numbers (Optional for Plotting/Grouping)
generate weeknum = floor((datevar - datevar[1])/7) + 1
This gives each week in your sample a unique ID.
✅ Step 3: Create Day-of-the-Week Dummies
Now generate the day-of-week dummy variables using the dow() function, which returns:
0 = Sunday
1 = Monday
2 = Tuesday
…
6 = Saturday
Since your dataset likely only includes weekdays, you'll focus on values 1 to 5.
generate Monday = dow(datevar) == 1
generate Tuesday = dow(datevar) == 2
generate Wednesday = dow(datevar) == 3
generate Thursday = dow(datevar) == 4
generate Friday = dow(datevar) == 5
🔍 Use the Data Editor (browse) after each command to check correctness:
browse datevar Monday Tuesday Wednesday Thursday Friday
✅ Step 4: Calculate Daily Returns (if needed)
If your goal is to test for the day-of-the-week effect on returns, calculate the daily log returns:
gen return = 100 * (ln(barc) - ln(L.barc))
This generates percentage log returns of the Barclays stock.
✅ Step 5: Run the Regression
Now you can estimate the model, excluding one day (e.g., Friday) as the base:
regress return Monday Tuesday Wednesday Thursday
📌 The constant represents Friday's average return, and each coefficient shows how that day’s
return differs from Friday’s.
✅ Interpretation Example
coef(Monday) = 0.15 and statistically significant → Monday returns are 0.15% higher
than Friday, on average.
Absolutely! Here's a detailed answer to each of the four questions about dummy variables,
seasonality, and structural stability in regression—complete with examples, formulas, and
interpretation.
✅ 1. Using Dummy Variables to Quantify Qualitative Information
Qualitative variables (also called categorical variables) represent characteristics or groupings,
like gender, education level, country, or employment status. These can’t be directly included in a
regression model because they aren’t numerical.
To include them, we create dummy variables: variables coded as 1 if the observation belongs to
a specific category and 0 otherwise.
📌 Example from Economic Theory: Gender and Wages
Suppose we are studying how wages (WAGE) are affected by IQ and gender.
We set up a dummy:
MALE = 1 if the person is male
MALE = 0 if the person is female
Then the regression model is:
WAGEi=β1+β2⋅IQi+β3⋅MALEi+ui\text{WAGE}_i = \beta_1 + \beta_2 \cdot \text{IQ}_i + \
beta_3 \cdot \text{MALE}_i + u_i
β1\beta_1: wage for females with IQ = 0
β2\beta_2: effect of IQ on wage (same for both genders)
β3\beta_3: difference in wage between males and females
👉 Dummy variable captures the effect of gender, allowing us to quantify the wage gap.
✅ 2. Graphical and Mathematical Effect of a Dichotomous Dummy
Let’s consider a simple regression with one independent variable XX (e.g., education) and a
dummy variable DD (e.g., urban = 1, rural = 0).
📌 Model 1: No Dummy
Yi=β1+β2Xi+uiY_i = \beta_1 + \beta_2 X_i + u_i
This is a single regression line for all observations, regardless of whether they are rural or
urban.
📌 Model 2: Dummy Affects Intercept
Yi=β1+β2Xi+β3Di+uiY_i = \beta_1 + \beta_2 X_i + \beta_3 D_i + u_i
β1\beta_1: intercept for rural group (D = 0)
β3\beta_3: vertical shift in the intercept for the urban group
So:
Rural: Yi=β1+β2XiY_i = \beta_1 + \beta_2 X_i
Urban: Yi=(β1+β3)+β2XiY_i = (\beta_1 + \beta_3) + \beta_2 X_i
✅ Same slope, but different starting points.
📌 Model 3: Dummy Affects Slope (Interaction)
Yi=β1+β2Xi+β3Di+β4(Xi⋅Di)+uiY_i = \beta_1 + \beta_2 X_i + \beta_3 D_i + \beta_4 (X_i \cdot
D_i) + u_i
Now:
Rural: Yi=β1+β2XiY_i = \beta_1 + \beta_2 X_i
Urban: Yi=(β1+β3)+(β2+β4)XiY_i = (\beta_1 + \beta_3) + (\beta_2 + \beta_4) X_i
✅ Both slope and intercept differ between groups.
📊 Graphically:
Two lines: one for rural, one for urban
Different intercepts if β3≠0\beta_3 \ne 0
Different slopes if β4≠0\beta_4 \ne 0
✅ 3. Seasonal Dummy Variables in Economic Theory
📌 Example: Quarterly GDP Analysis
Suppose we want to analyze the effect of interest rates and seasonal patterns on GDP, measured
quarterly. Seasonality matters—retail spikes in Q4 (holidays), agriculture may boom in Q3, etc.
Define dummy variables:
D1=1D_1 = 1 if Q1, 0 otherwise
D2=1D_2 = 1 if Q2, 0 otherwise
D3=1D_3 = 1 if Q3, 0 otherwise
D4=1D_4 = 1 if Q4, 0 otherwise
Problem: D1+D2+D3+D4=1D_1 + D_2 + D_3 + D_4 = 1 for every observation
→ This creates perfect multicollinearity if we include all four dummies with a constant term.
⚠️Dummy Variable Trap
To avoid this, we omit one dummy (e.g., omit Q1). It becomes the reference category.
Now the model is:
GDPi=β1+β2⋅InterestRatei+γ2D2+γ3D3+γ4D4+uiGDP_i = \beta_1 + \beta_2 \cdot \
text{InterestRate}_i + \gamma_2 D_2 + \gamma_3 D_3 + \gamma_4 D_4 + u_i
Interpretation:
β1\beta_1: GDP in Q1 (baseline)
γ2\gamma_2: difference between Q2 and Q1
γ3\gamma_3: difference between Q3 and Q1
γ4\gamma_4: difference between Q4 and Q1
📌 What’s a Reference Dummy?
The reference dummy is the category we leave out. All other categories are interpreted relative
to it. It’s not “lost”—it’s the baseline that comparisons are made against.
✅ 4. Chow Test for Structural Stability
The Chow Test checks whether the relationship between variables changes across two groups
or over two time periods.
✅ When to Use:
Did the COVID-19 pandemic change the effect of interest rates on inflation?
Do rich and poor countries respond differently to investment?
🧪 Steps of the Chow Test
Suppose we have data split into two groups (e.g., before and after a policy):
1. Estimate a regression using all data (pooled model)
o Get total sum of squared residuals: SSRPSSR_P
2. Estimate the model separately for each group
o Group 1: SSR1SSR_1
o Group 2: SSR2SSR_2
3. Compute the F-statistic:
F=(SSRP−(SSR1+SSR2))/k(SSR1+SSR2)/(n1+n2−2k)F = \frac{(SSR_P - (SSR_1 + SSR_2)) /
k}{(SSR_1 + SSR_2)/(n_1 + n_2 - 2k)}
Where:
kk: number of parameters in the model
n1n_1, n2n_2: number of observations in each group
4. Compare the F-statistic to the critical value from F-distribution.
✅ Chow Test vs. Dummy Variable Approach
Aspect Chow Test Dummy Variable Approach
Output One test result (F-stat) Full model with group differences
Flexibility Only tests for a break Allows modeling the nature of the break
Interpretation Pass/fail result Shows how intercept/slope change
When to use Formal hypothesis testing When you want to model the differences
✅ Conclusion: The dummy variable approach is more flexible and informative, while the
Chow test is better for formal testing of structural changes.
Let me know if you'd like example data or visual illustrations to go with these answers!