8000 Add check that envvar configs are boolean by Raymo111 · Pull Request #145454 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check that envvar configs are boolean #145454

Closed
wants to merge 1 commit into from

Conversation

Raymo111
Copy link
Member

So we don't get unexpected behavior when higher typed values are passed in

@Raymo111 Raymo111 added the topic: not user facing topic category label Jan 23, 2025
@Raymo111 Raymo111 requested a review from c00w January 23, 2025 03:28
Copy link
pytorch-bot bot commented Jan 23, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145454

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures, 1 Unrelated Failure

As of commit ee54917 with merge base 7f65a20 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@c00w c00w requested a review from oulgen January 23, 2025 20:58
@Raymo111 Raymo111 force-pushed the gh/raymo/add-envvars-bool-check branch from 94f0067 to 9752ee2 Compare January 23, 2025 21:42
@Raymo111 Raymo111 requested a review from c00w January 23, 2025 21:59
@Raymo111 Raymo111 force-pushed the gh/raymo/add-envvars-bool-check branch from 9752ee2 to d64c6e8 Compare January 24, 2025 01:04
Copy link
Contributor
@c00w c00w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to write at least one test that making a invalid ConfigObject throws (and resolve where value_type is created by default).

@@ -102,6 +102,12 @@ def __init__(
assert isinstance(
self.default, bool
), f"justknobs only support booleans, {self.default} is not a boolean"
if self.value_type is not None and (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests here? In particular, value_type is created if not set below inside the _configEntry constructor

self.value_type = (
            config.value_type if config.value_type is not None else type(self.default)
        )

So we probably want to either move the default creation up, or push the assertion down.

I think we probably want to move the value creation up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdym by "some tests here"? What kind of tests, assert checks or unit tests? Moving it up results in line 199 never being True, that looks like it may cause issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: Tests - We should add some tests to https://github.com/pytorch/pytorch/blob/main/test/test_utils_config_module.py

Since this code is being run within the config constructor, I expect we can do

def testInvalidConfig(self):
  self.assertRaises(AssertionException):
    Config(default=2, env_name_default="FAKE_DISABLE")

and similar for the other issues.

The other option is to make a full config module (similar to https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/fake_config_module.py).
And ensure that importing it fails.

The third option is maybe to do something interesting with writing a python file to disk, then importing it. (Not sure how feasible this is, but it would reduce some busywork in having to make a second config module file).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: location - Ah excellent, I missed that annotated types were supported. Yeah I think this assertion (and honestly the JK one as well), probably need to happen in _ConfigEntry's init function, instead of Config, since we can't determine the type within Config's init function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we confident that _Config isn't used anywhere else but through _ConfigEntry? Is there any chance that _Config is used standalone and removing the assert in its init might break something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confident :) - We wrote it recently just for this class. Grep confirms.

(pytorch-3.10) [clr@devvm26913.atn0 ~/local/pytorch]$ grep _Config torch -r
torch/_inductor/fuzzer.py:from torch.utils._config_module import _ConfigEntry, ConfigModule
torch/_inductor/fuzzer.py:        self.fields: dict[str, _ConfigEntry] = self.config_module._config
grep: torch/utils/__pycache__/_config_module.cpython-310.pyc: binary file matches
torch/utils/_config_module.py:class _Config(Generic[T]):
torch/utils/_config_module.py:        self.env_name_default = _Config.string_or_list_of_string_to_list(
torch/utils/_config_module.py:        self.env_name_force = _Config.string_or_list_of_string_to_list(env_name_force)
torch/utils/_config_module.py:    ) -> _Config[T]:
torc
10000
h/utils/_config_module.py:        return _Config(
torch/utils/_config_module.py:                or (isinstance(value, type) and issubclass(value, _Config))
torch/utils/_config_module.py:                config[name] = _ConfigEntry(
torch/utils/_config_module.py:                    _Config(default=value, value_type=annotated_type)
torch/utils/_config_module.py:            elif isinstance(value, _Config):
torch/utils/_config_module.py:                config[name] = _ConfigEntry(value)
torch/utils/_config_module.py:    config: dict[str, _ConfigEntry] = {}
torch/utils/_config_module.py:class _ConfigEntry:
torch/utils/_config_module.py:    def __init__(self, config: _Config):
torch/utils/_config_module.py:    _config: dict[str, _ConfigEntry]
torch/utils/_config_module.py:        self, entry: _ConfigEntry
torch/utils/_config_module.py:    def _get_alias_val(self, entry: _ConfigEntry) -> Any:
torch/utils/_config_module.py:    def _set_alias_val(self, entry: _ConfigEntry, val: Any) -> None:
torch/distributed/fsdp/wrap.py:    with _ConfigAutoWrap(**kwargs):
torch/distributed/fsdp/wrap.py:    if _ConfigAutoWrap.in_autowrap_context:
torch/distributed/fsdp/wrap.py:        assert _ConfigAutoWrap.wrapper_cls is not None
torch/distributed/fsdp/wrap.py:        wrap_overrides = {**_ConfigAutoWrap.kwargs, **wrap_overrides}
torch/distributed/fsdp/wrap.py:            _ConfigAutoWrap.wrapper_cls,
torch/distributed/fsdp/wrap.py:class _ConfigAutoWrap:
torch/distributed/fsdp/wrap.py:        if _ConfigAutoWrap.in_autowrap_context:
torch/distributed/fsdp/wrap.py:        _ConfigAutoWrap.in_autowrap_context = True
torch/distributed/fsdp/wrap.py:        ), "Expected to pass in wrapper_cls arg into _ConfigAutoWrap."
torch/distributed/fsdp/wrap.py:        _ConfigAutoWrap.wrapper_cls = cast(Callable, kwargs["wrapper_cls"])
torch/distributed/fsdp/wrap.py:        _ConfigAutoWrap.kwargs = kwargs
torch/distributed/fsdp/wrap.py:        _ConfigAutoWrap.in_autowrap_context = False
torch/distributed/fsdp/wrap.py:        _ConfigAutoWrap.wrapper_cls = None
torch/distributed/fsdp/wrap.py:        _ConfigAutoWrap.kwargs = {}

@Raymo111 Raymo111 force-pushed the gh/raymo/add-envvars-bool-check branch from d64c6e8 to 8b3e7e8 Compare January 31, 2025 22:33
@Raymo111 Raymo111 requested a review from c00w January 31, 2025 23:00
@Raymo111 Raymo111 force-pushed the gh/raymo/add-envvars-bool-check branch from 8b3e7e8 to e269802 Compare January 31, 2025 23:39
Copy link
Contributor
@c00w c00w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big blockers are the tests failing, and the logic in ConfigEntry being wrapped in the justknob check.

Also note that I expect you'll have to figure out how to test the _ConfigEntry constructor (either directly via calling it, or indirectly by passing a synthetic module to install_config_module), since the code is in _ConfigEntry, so just calling Config shouldn't manage to test your assert anymore.

assert isinstance(
self.default, bool
), f"justknobs only support booleans, {self.default} is not a boolean"
if self.value_type is not None and (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation typo - I expect you want this one level out :). Probably also why your testcase is failing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - We allow, both env variables and JKs on configs, so you want to check environment variables, whether or not we have a JK config.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest version

@@ -326,6 +321,21 @@ def __init__(self, config: _Config):
self.env_value_force = env_value
break

8000
# Ensure envvars are boolean
if self.justknob is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad copy? Or did you intentionally want to also clean up this check here?

@@ -395,6 +395,14 @@ def test_reference_is_default(self):
t["a"] = "b"
self.assertFalse(config._is_default("e_dict"))

def testInvalidConfig(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - tests are not passing, let me know if you need help reproing the failure locally on your machine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I figured out how to repro it earlier, I was testing on _Config instead of _ConfigModule and the logic was also not right.

@Raymo111 Raymo111 force-pushed the gh/raymo/add-envvars-bool-check branch from e269802 to f71439c Compare February 1, 2025 01:11
@Raymo111
Copy link
Member Author
Raymo111 commented Feb 1, 2025

@c00w sorry please ignore the previous version, once the tests pass on the latest one you can review.

@Raymo111 Raymo111 force-pushed the gh/raymo/add-envvars-bool-check branch from f71439c to fc17aad Compare February 1, 2025 01:43
@Raymo111 Raymo111 requested a review from c00w February 3, 2025 18:06
Copy link
Contributor
@c00w c00w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to decouple the JK and env variable type checking :).

), f"envvar configs only support (optional) booleans or strings, {self.value_type} is neither"
else:
assert isinstance(
self.default, bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - We should also allowed Optional[bool] here (Not blocking this PR).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@c00w c00w requested a review from jamesjwu February 4, 2025 18:11
@Raymo111 Raymo111 force-pushed the gh/raymo/add-envvars-bool-check branch from fc17aad to ee54917 Compare February 4, 2025 20:39
@Raymo111
Copy link
Member Author
Raymo111 commented Feb 4, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 4, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 jobs have failed, first few of them are: trunk / linux-focal-rocm6.3-py3.10 / test (default, 2, 2, linux.rocm.gpu.2), trunk / linux-focal-rocm6.3-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4)

Details for Dev Infra team Raised by workflow job

@Raymo111
Copy link
Member Author
Raymo111 commented Feb 5, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

@Raymo111
Copy link
Member Author
Raymo111 commented Feb 5, 2025

@pytorchbot merge -f The ROCM errors are AMD things and don't have anything to do with my changes, and Windows errors are also unrelated.

Copy link
pytorch-bot bot commented Feb 5, 2025

❌ 🤖 pytorchbot command failed:

Got EOF while in a quoted string```
Try `@pytorchbot --help` for more info.

@Raymo111
Copy link
Member Author
Raymo111 commented Feb 5, 2025

@pytorchbot merge -f "The ROCM errors are AMD things and don't have anything to do with my changes, and Windows errors are also unrelated."

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@Raymo111 Raymo111 deleted the gh/raymo/add-envvars-bool-check branch February 5, 2025 19:41
mori360 pushed a commit to mori360/pytorch that referenced this pull request Feb 6, 2025
So we don't get unexpected behavior when higher typed values are passed in
Pull Request resolved: pytorch#145454
Approved by: https://github.com/c00w, https://github.com/jamesjwu
Tytskiy pushed a commit to Tytskiy/pytorch that referenced this pull request Feb 18, 2025
So we don't get unexpected behavior when higher typed values are passed in
Pull Request resolved: pytorch#145454
Approved by: https://github.com/c00w, https://github.com/jamesjwu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0