[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Target encoder const y #28233

Merged
merged 10 commits into from
Feb 2, 2024

Conversation

s-banach
Copy link
Contributor
@s-banach s-banach commented Jan 23, 2024

Reference Issues/PRs

Regarding #27879, test failure when pandas copy-on-write is enabled.

What does this implement/fix? Explain your changes.

The Cython code for TargetEncoder does not modify the target column y, so we mark it const.

Any other comments?

Simply adding const is now an option, since cython/cython#5230 is closed.

I assume someone else may want to improve the test here:

  • I didn't know how to modify the existing tests to fail, so I just made a new one.
  • I don't know what happens if I stick pd.options.mode.copy_on_write = True in one test.
    Does this configuration carry forward to later tests in the test suite?

Copy link
github-actions bot commented Jan 23, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: abca02b. Link to the linter CI: here

@adrinjalali
Copy link
Member

CI fails to compile though. Do we need a newwer cython for this to compile?

As for setting the option on pandas, it should be in a context manager kind of thing, and adding a new test for this is okay.

@s-banach
Copy link
Contributor Author

Evidently the version of cython on conda-forge is a couple weeks out of date. I know you guys need conda because your build system is so complicated, so I’ll just wait on this.

@s-banach s-banach marked this pull request as draft January 25, 2024 12:19
@adrinjalali
Copy link
Member

@lesteve would know better where we need to change to upgrade the cython version on builds.

@lesteve
Copy link
Member
lesteve commented Jan 25, 2024

If I understand correctly, it looks like you need Cython 3.0.8 to be our minimum Cython supported version, I think we need to discuss this outside of this PR. There was at least one other issue where Cython >= 3 was needed #27682.

I am fine with it personnally since numpy and scipy require Cython >=3 IIRC but feed-back needs to be gathered and then someone needs to update the lock-files. Also currently the lock-file update failed because pandas started to warn about PyArrow becoming a required dependency, so this needs to be handled too ...

I don't know what happens if I stick pd.options.mode.copy_on_write = True in one test. Does this configuration carry forward to later tests in the test suite?

Yes it does, so to avoid side-effects maybe pandas has a context manager or you need to reset it with a try/finally.

On It would be great to do a common test although it won't catch everything since it uses default parameter, see #27879 (comment)

@adrinjalali
Copy link
Member

Opened #28258 to discuss pandas issue. I think we can simply update Cython, since the main issue was on cython==3.0, which is resolved now.

@lesteve
Copy link
Member
lesteve commented Jan 25, 2024

I opened #28259 to get feed-back on Cython 3.0.8 being our minimum supported version. I am +1 for this.

@s-banach s-banach marked this pull request as ready for review January 31, 2024 18:50
@adrinjalali
Copy link
Member

CI failing @s-banach

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @s-banach !

@@ -701,3 +701,10 @@ def test_target_encoding_for_linear_regression(smooth, global_random_seed):
# cardinality yet non-informative feature instead of the lower
# cardinality yet informative feature:
assert abs(coef[0]) < abs(coef[2])


def test_27879():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a better function name?

Suggested change
def test_27879():
def test_pandas_copy_on_write_true():
"""Check target encoder works with copy on write.
Non-regression test for gh-27879.
"""



def test_27879():
pd = pytest.importorskip("pandas")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the failing CI, it is using a really old version of pandas that does not have mode.copy_on_write. Here we can set the min version here:

Suggested change
pd = pytest.importorskip("pandas")
pd = pytest.importorskip("pandas", minversion="2.0")

@thomasjpfan thomasjpfan changed the title Target encoder const y FIX Target encoder const y Feb 2, 2024
@thomasjpfan thomasjpfan added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Feb 2, 2024
@thomasjpfan thomasjpfan added this to the 1.4.1 milestone Feb 2, 2024
@thomasjpfan
Copy link
Member

Please add an entry to the change log at doc/whats_new/v1.4.rst with tag |Fix|. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

Copy link
Member
@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

doc/whats_new/v1.4.rst Outdated Show resolved Hide resolved
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
@thomasjpfan thomasjpfan merged commit ce8cb6e into scikit-learn:main Feb 2, 2024
30 checks passed
Copy link
@Ingkoun Ingkoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hhhjk

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 10, 2024
Co-authored-by: s-banach <john@hopfensperger.family>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 13, 2024
Co-authored-by: s-banach <john@hopfensperger.family>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
glemaitre pushed a commit that referenced this pull request Feb 13, 2024
Co-authored-by: s-banach <john@hopfensperger.family>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cython module:preprocessing To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants