8000 BUG: Fix assert_frame_equal with check_dtype=False for pd.NA dtype differences (GH#61473) by gamzeozgul · Pull Request #61748 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

BUG: Fix assert_frame_equal with check_dtype=False for pd.NA dtype differences (GH#61473) #61748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gamzeozgul
Copy link

Problem

When comparing two DataFrames containing pd.NA values with check_dtype=False, assert_frame_equal fails when the DataFrames only differ in dtype (object vs Int32). This happens because pd.NA and np.nan are treated as different values even though they represent the same missing value.

Solution

Modified assert_frame_equal in pandas/_testing/asserters.py to normalize pd.NA and np.nan values when check_dtype=False is specified. This ensures that DataFrames with equivalent missing values but different dtypes can be compared successfully.

Changes Made

  • Added normalization logic in assert_frame_equal function to handle pd.NA and np.nan equivalence when check_dtype=False
  • Added comprehensive unit test in pandas/tests/util/test_assert_frame_equal.py to verify the fix

Testing

  • Added unit test test_assert_frame_equal_pd_na_dtype_difference that reproduces the original issue and verifies the fix
  • Test passes successfully with the implemented solution

@gamzeozgul
Copy link
Author

Hello,

It seems that the "Downstream Compat" CI job is failing with an ImportError related to statsmodels and scipy. This appears to be an issue with the CI environment's dependencies (e.g., version mismatch between statsmodels and scipy), and not directly related to the changes introduced in this PR.

The error message is:

_______________________________ test_statsmodels _______________________________
E       pytest.PytestDeprecationWarning: 
E       Module 'statsmodels.formula.api' was found, but when imported by pytest it raised:
E           ImportError("cannot import name '_lazywhere' from 'scipy._lib._util' (/home/runner/micromamba/envs/test/lib/python3.11/site-packages/scipy/_lib/_util.py)")

I believe this is an infrastructure/dependency issue on the CI side, rather than a bug in my proposed changes for GH#61473.

Could you please confirm this and let me know if there's anything I can do to help resolve this CI issue, or if this is something the core pandas team needs to address?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: assert_frame_equal(check_dtype=False) fails when comparing two DFs containing pd.NA that only differ in dtype (object vs Int32)
1 participant
0