Description
Describe the issue:
While I was trying to reproduce source code of het_white test in statsmodels package i got an error of unequality of matrix ranks. Diggering in the problem I understood that question was in 2 ways of calculating degrees of freedom in statmodels. One uses a rank of exog matrix and another - rank of diagonal matrix of singular values, computed after np.linalg.svd of the same exog matric. As far as I understand statistics 2 ways must be equal, but no... I have an example of colab dataset and source code where computed ranks are different. Please help me to solve the problem...
Reproduce the code example:
import statsmodels.api as sm
import pandas as pd
import numpy as np
df = pd.read_csv('/content/sample_data/california_housing_train.csv')
X = sm.add_constant(df.drop(columns='median_house_value'))
y = df['median_house_value']
# Обучение модели линейной регрессии
model_res = sm.OLS(y, X).fit()
x = model_res.model.exog
y = model_res.resid
# тест предполагает наличие константы
x = sm.add_constant(x)
nvars0 = x.shape[1]
i0, i1 = np.triu_indices(nvars0)
exog = x[:, i0] * x[:, i1]
nobs, nvars = exog.shape
u, s, vt = np.linalg.svd(exog, False)
np.linalg
5BEF
.matrix_rank(np.diag(s))==np.linalg.matrix_rank(exog)
Error message:
result is False
Python and NumPy Versions:
1.26.4
3.10.12 (main, Nov 6 2024, 20:22:13) [GCC 11.4.0]
Runtime Environment:
[{'numpy_version': '1.26.4',
'python': '3.10.12 (main, Nov 6 2024, 20:22:13) [GCC 11.4.0]',
'uname': uname_result(system='Linux', node='a0806de48948', release='6.1.85+', version='#1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_KNM',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'Haswell',
'filepath': '/usr/local/lib/python3.10/dist-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so',
'internal_api': 'openblas',
'num_threads': 2,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23.dev'}]
Context for the issue:
This issue is connected with possible problem in calcuating white test of heteroscedasticity