-
Notifications
You must be signed in to change notification settings - Fork 23.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add check that envvar configs are boolean #145454
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145454
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 New Failures, 1 Unrelated FailureAs of commit ee54917 with merge base 7f65a20 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
94f0067
to
9752ee2
Compare
9752ee2
to
d64c6e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to write at least one test that making a invalid ConfigObject throws (and resolve where value_type is created by default).
torch/utils/_config_module.py
Outdated
@@ -102,6 +102,12 @@ def __init__( | |||
assert isinstance( | |||
self.default, bool | |||
), f"justknobs only support booleans, {self.default} is not a boolean" | |||
if self.value_type is not None and ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some tests here? In particular, value_type is created if not set below inside the _configEntry constructor
self.value_type = (
config.value_type if config.value_type is not None else type(self.default)
)
So we probably want to either move the default creation up, or push the assertion down.
I think we probably want to move the value creation up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wdym by "some tests here"? What kind of tests, assert checks or unit tests? Moving it up results in line 199 never being True, that looks like it may cause issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: Tests - We should add some tests to https://github.com/pytorch/pytorch/blob/main/test/test_utils_config_module.py
Since this code is being run within the config constructor, I expect we can do
def testInvalidConfig(self):
self.assertRaises(AssertionException):
Config(default=2, env_name_default="FAKE_DISABLE")
and similar for the other issues.
The other option is to make a full config module (similar to https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/fake_config_module.py).
And ensure that importing it fails.
The third option is maybe to do something interesting with writing a python file to disk, then importing it. (Not sure how feasible this is, but it would reduce some busywork in having to make a second config module file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: location - Ah excellent, I missed that annotated types were supported. Yeah I think this assertion (and honestly the JK one as well), probably need to happen in _ConfigEntry's init function, instead of Config, since we can't determine the type within Config's init function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we confident that _Config
isn't used anywhere else but through _ConfigEntry
? Is there any chance that _Config
is used standalone and removing the assert in its init might break something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confident :) - We wrote it recently just for this class. Grep confirms.
(pytorch-3.10) [clr@devvm26913.atn0 ~/local/pytorch]$ grep _Config torch -r
torch/_inductor/fuzzer.py:from torch.utils._config_module import _ConfigEntry, ConfigModule
torch/_inductor/fuzzer.py: self.fields: dict[str, _ConfigEntry] = self.config_module._config
grep: torch/utils/__pycache__/_config_module.cpython-310.pyc: binary file matches
torch/utils/_config_module.py:class _Config(Generic[T]):
torch/utils/_config_module.py: self.env_name_default = _Config.string_or_list_of_string_to_list(
torch/utils/_config_module.py: self.env_name_force = _Config.string_or_list_of_string_to_list(env_name_force)
torch/utils/_config_module.py: ) -> _Config[T]:
torc
10000
h/utils/_config_module.py: return _Config(
torch/utils/_config_module.py: or (isinstance(value, type) and issubclass(value, _Config))
torch/utils/_config_module.py: config[name] = _ConfigEntry(
torch/utils/_config_module.py: _Config(default=value, value_type=annotated_type)
torch/utils/_config_module.py: elif isinstance(value, _Config):
torch/utils/_config_module.py: config[name] = _ConfigEntry(value)
torch/utils/_config_module.py: config: dict[str, _ConfigEntry] = {}
torch/utils/_config_module.py:class _ConfigEntry:
torch/utils/_config_module.py: def __init__(self, config: _Config):
torch/utils/_config_module.py: _config: dict[str, _ConfigEntry]
torch/utils/_config_module.py: self, entry: _ConfigEntry
torch/utils/_config_module.py: def _get_alias_val(self, entry: _ConfigEntry) -> Any:
torch/utils/_config_module.py: def _set_alias_val(self, entry: _ConfigEntry, val: Any) -> None:
torch/distributed/fsdp/wrap.py: with _ConfigAutoWrap(**kwargs):
torch/distributed/fsdp/wrap.py: if _ConfigAutoWrap.in_autowrap_context:
torch/distributed/fsdp/wrap.py: assert _ConfigAutoWrap.wrapper_cls is not None
torch/distributed/fsdp/wrap.py: wrap_overrides = {**_ConfigAutoWrap.kwargs, **wrap_overrides}
torch/distributed/fsdp/wrap.py: _ConfigAutoWrap.wrapper_cls,
torch/distributed/fsdp/wrap.py:class _ConfigAutoWrap:
torch/distributed/fsdp/wrap.py: if _ConfigAutoWrap.in_autowrap_context:
torch/distributed/fsdp/wrap.py: _ConfigAutoWrap.in_autowrap_context = True
torch/distributed/fsdp/wrap.py: ), "Expected to pass in wrapper_cls arg into _ConfigAutoWrap."
torch/distributed/fsdp/wrap.py: _ConfigAutoWrap.wrapper_cls = cast(Callable, kwargs["wrapper_cls"])
torch/distributed/fsdp/wrap.py: _ConfigAutoWrap.kwargs = kwargs
torch/distributed/fsdp/wrap.py: _ConfigAutoWrap.in_autowrap_context = False
torch/distributed/fsdp/wrap.py: _ConfigAutoWrap.wrapper_cls = None
torch/distributed/fsdp/wrap.py: _ConfigAutoWrap.kwargs = {}
d64c6e8
to
8b3e7e8
Compare
8b3e7e8
to
e269802
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big blockers are the tests failing, and the logic in ConfigEntry being wrapped in the justknob check.
Also note that I expect you'll have to figure out how to test the _ConfigEntry constructor (either directly via calling it, or indirectly by passing a synthetic module to install_config_module), since the code is in _ConfigEntry, so just calling Config shouldn't manage to test your assert anymore.
torch/utils/_config_module.py
Outdated
assert isinstance( | ||
self.default, bool | ||
), f"justknobs only support booleans, {self.default} is not a boolean" | ||
if self.value_type is not None and ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation typo - I expect you want this one level out :). Probably also why your testcase is failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI - We allow, both env variables and JKs on configs, so you want to check environment variables, whether or not we have a JK config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in latest version
torch/utils/_config_module.py
Outdated
@@ -326,6 +321,21 @@ def __init__(self, config: _Config): | |||
self.env_value_force = env_value | |||
break | |||
8000 |
|||
# Ensure envvars are boolean | |||
if self.justknob is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad copy? Or did you intentionally want to also clean up this check here?
test/test_utils_config_module.py
Outdated
@@ -395,6 +395,14 @@ def test_reference_is_default(self): | |||
t["a"] = "b" | |||
self.assertFalse(config._is_default("e_dict")) | |||
|
|||
def testInvalidConfig(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI - tests are not passing, let me know if you need help reproing the failure locally on your machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I figured out how to repro it earlier, I was testing on _Config instead of _ConfigModule and the logic was also not right.
e269802
to
f71439c
Compare
@c00w sorry please ignore the previous version, once the tests pass on the latest one you can review. |
f71439c
to
fc17aad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure to decouple the JK and env variable type checking :).
), f"envvar configs only support (optional) booleans or strings, {self.value_type} is neither" | ||
else: | ||
assert isinstance( | ||
self.default, bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - We should also allowed Optional[bool] here (Not blocking this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
fc17aad
to
ee54917
Compare
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 2 jobs have failed, first few of them are: trunk / linux-focal-rocm6.3-py3.10 / test (default, 2, 2, linux.rocm.gpu.2), trunk / linux-focal-rocm6.3-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4) Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 3 jobs have failed, first few of them are: trunk / linux-focal-rocm6.3-py3.10 / test (default, 2, 2, linux.rocm.gpu.2), trunk / linux-focal-rocm6.3-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4), trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral) Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge -f The ROCM errors are AMD things and don't have anything to do with my changes, and Windows errors are also unrelated. |
❌ 🤖 pytorchbot command failed:
|
@pytorchbot merge -f "The ROCM errors are AMD things and don't have anything to do with my changes, and Windows errors are also unrelated." |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
So we don't get unexpected behavior when higher typed values are passed in Pull Request resolved: pytorch#145454 Approved by: https://github.com/c00w, https://github.com/jamesjwu
So we don't get unexpected behavior when higher typed values are passed in Pull Request resolved: pytorch#145454 Approved by: https://github.com/c00w, https://github.com/jamesjwu
So we don't get unexpected behavior when higher typed values are passed in