Description
I find that trying to run clustered standard errors, I get a "ValueError: only two groups are supported." But clustered standard errors should be fine with many clusters. I have about 300 clusters X 30 observations each.
Reproducing code example:
print(sm.OLS.from_formula('price_change ~ log_F_permits + log_A_permits + C(year) + C(metro)',data=data_frame).fit(cov_type="cluster",cov_kwds={"groups":metro}).summary())
The regression works fine with the .fit() part empty. But when run with these parts (a bunch of metro areas in a panel with about 30 observations per cluster), I get: the following
data_frame is populated by the variables listed above.
Traceback (most recent call last):
File "Documents/williams/scripts/annual.py", line 102, in
print(sm.OLS.from_formula('price_change ~ log_F_permits + log_A_permits + C(year) + C(metro)',data=data_frame).fit(cov_type="cluster",cov_kwds={"groups":metro}).summary())
File "/usr/lib/python3.8/site-packages/statsmodels/regression/linear_model.py", line 342, in fit
lfit = OLSResults(
File "/usr/lib/python3.8/site-packages/statsmodels/regression/linear_model.py", line 1556, in init
self.get_robustcov_results(cov_type=cov_type, use_self=True,
File "/usr/lib/python3.8/site-packages/statsmodels/regression/linear_model.py", line 2464, in get_robustcov_results
raise ValueError('only two groups are supported')
ValueError: only two groups are supported
#2 ~/Documents/williams/scripts/annual.py_ou