8000 Allow BUILD/NEWOBJ instruction for items added via torch.serialization.add_safe_globals by mikaylagawarecki · Pull Request #129251 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Allow BUILD/NEWOBJ instruction for items added via torch.serialization.add_safe_globals #129251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

mikaylagawarecki
Copy link
Contributor
@mikaylagawarecki mikaylagawarecki commented Jun 21, 2024

Previously, allowlisting functions/classes via torch.serialization.add_safe_globals(obj) for the weights_only Unpickler had the following effect:

  • For a GLOBAL instruction, GLOBAL obj.__module__ obj.__name__ would be allowed and translated back to obj to be pushed back to the stack.
  • For a REDUCE instruction where we expect the stack to contain func and args, func is allowed if it was added via add_safe_globals

However, it did not have an effect on BUILD and NEWOBJ instructions

Some classes may be rebuilt via NEWOBJ instruction, which indicates that their constructor should be used to rebuild the class.

Further, a BUILD instruction might be used if an object's __reduce__/__reduce_ex__ returns a non-None value for state. Which indicates a __setstate__ or __dict__.update.

This PR makes sure that adding objects to the allowlist will also allow NEWOBJ and BUILD instructions for them.

In particular, the update for NEWOBJ should unblock allowlisting of ScaledMMConfig in float8_experimental @drisspg

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @albanD

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
Copy link
pytorch-bot bot commented Jun 21, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129251

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5c9cdb3 with merge base 0acd09a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
[ghstack-poisoned]
@mikaylagawarecki mikaylagawarecki changed the title Allow NEWOBJ instruction for items added via torch.serialization.add_safe_globals Allow BUILD/NEWOBJ instruction for items added via torch.serialization.add_safe_globals Jun 21, 2024
@mikaylagawarecki mikaylagawarecki requested a review from albanD June 21, 2024 21:28
@mikaylagawarecki mikaylagawarecki marked this pull request as ready for review June 21, 2024 21:28
[ghstack-poisoned]
[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jun 21, 2024
@mikaylagawarecki mikaylagawarecki removed the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jun 21, 2024
@pytorch-bot pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jun 21, 2024
[ghstack-poisoned]
Copy link
Collaborator
@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.
Might want to clarify add_safe_globals doc to mention that the given object might be called but also state is going to get populated.

[ghstack-poisoned]
@mikaylagawarecki mikaylagawarecki added module: python frontend For issues relating to PyTorch's Python frontend topic: improvements topic category labels Jun 24, 2024
pytorchmergebot pushed a commit that referenced this pull request Jun 25, 2024
Also changes default for `weights_only` to `None` per comment below (hence the `suppress-bc-linter` tag)

Pull Request resolved: #129239
Approved by: https://github.com
8000
/albanD
ghstack dependencies: #129244, #129251
pytorchmergebot pushed a commit that referenced this pull request Jun 25, 2024
mikaylagawarecki added a commit to mikaylagawarecki/pytorch that referenced this pull request Jun 25, 2024
…safe_globals

ghstack-source-id: 34a8fc3
Pull Request resolved: pytorch#129251

(cherry picked from commit 50b888d)
atalman pushed a commit that referenced this pull request Jun 26, 2024
* Fix allowlisting of builtins for weights_only unpickler

ghstack-source-id: de329c7
Pull Request resolved: #129244

(cherry picked from commit cc99c01)

* Allow NEWOBJ instruction for items added via torch.serialization.add_safe_globals

ghstack-source-id: 34a8fc3
Pull Request resolved: #129251

(cherry picked
8000
 from commit 50b888d)

* Add warning for weights_only

ghstack-source-id: ffa772c
Pull Request resolved: #129239

(cherry picked from commit b3f9aa3f8f4c03b40fed53423d4a0a9340e3bd09)

* Add example for torch.serialization.add_safe_globals

ghstack-source-id: 6dc3275
Pull Request resolved: #129396

(cherry picked from commit ed8c36eda0f4dcf7b1d9c5eb2fb1cdccdf3fee6e)
atalman added a commit that referenced this pull request Jun 26, 2024
atalman added a commit that referenced this pull request Jun 26, 2024
…4" (#129571)

Revert "Cherry pick #129244, #129251, #129239, 129396 into release/2.4 (#129478)"

This reverts commit 22a4d46.
mikaylagawarecki added a commit to mikaylagawarecki/pytorch that referenced this pull request Jun 26, 2024
…n.add_safe_globals (pytorch#129251)

Previously, allowlisting functions/classes via `torch.serialization.add_safe_globals(obj)` for the `weights_only` Unpickler had the following effect:

- For a [`GLOBAL`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1926-L1939) instruction, `GLOBAL obj.__module__ obj.__name__` would be allowed and translated back to obj to be pushed back to the stack.
- For a [`REDUCE`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1926-L1982) instruction where we expect the stack to contain `func` and `args`, `func` is allowed if it was added via `add_safe_globals`

However, it did not have an effect on `BUILD` and `NEWOBJ` instructions

Some classes may be rebuilt via [`NEWOBJ`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L2091-L2104) instruction, which indicates that their constructor should be used to rebuild the class.

Further, a [`BUILD`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1984-L2007) instruction might be used if an object's `__reduce__`/`__reduce_ex__` returns a non-None value for `state`. Which indicates a `__setstate__` or `__dict__.update`.

**This PR makes sure that adding objects to the allowlist will also allow `NEWOBJ` and `BUILD` instructions for them.**

In particular, the update for `NEWOBJ` should unblock allowlisting of [`ScaledMMConfig`](https://github.com/pytorch-labs/float8_experimental/blob/d4ade877dff327ea7f51e91f7cc218ae956e8cfd/float8_experimental/float8_tensor.py#L26-L30) in float8_experimental @drisspg

Pull Request resolved: pytorch#129251
Approved by: https://github.com/albanD
ghstack dependencies: pytorch#129244
atalman pushed a commit that referenced this pull request Jun 27, 2024
* Fix allowlisting of builtins for weights_only unpickler (#129244)

Since we use [`DEFAULT_PROTOCOL=2`](https://github.com/pytorch/pytorch/blob/main/torch/serialization.py#L62), some functions/classes that were renamed from python 2-->3 will be pickled with their python2 name. This PR ensures that when a mod `GLOBAL <python2_mod>.<python2_name> ` is encountered, [following the strategy used by pickle](https://github.com/python/cpython/blob/main/Lib/pickle.py#L1590C13-L1593C63) it is properly mapped to `<python3_mod>.<python3_name>`.

This fix ensures that `add_safe_globals` works properly for such functions/classes (i.e. users will allowlist the python3 func and the weights_only unpickler will do the appropriate translation when checking whether a class was allowlisted).

An example is as follows:
`__builtin__` was named to `builtins`, see the [release notes for Python 3.0](https://docs.python.org/3/whatsnew/3.0.html)

> Renamed module `__builtin__` to [`builtins`](https://docs.python.org/3/library/builtins.html#module-builtins) (removing the underscores, adding an ‘s’). The __builtins__ variable found in most global namespaces is unchanged. To modify a builtin, you should use [builtins](https://docs.python.org/3/library/builtins.html#module-builtins), not `__builtins__`!

However, since we use [`DEFAULT_PROTOCOL=2`](https://github.com/pytorch/pytorch/blob/main/torch/serialization.py#L62), builtins will be pickled with their module string as `__builtin__`.

```python
>>> import pickle
>>> import pickletools
>>> print.__module__
'builtins'
>>> with open('print.pkl', 'wb') as f:
>>>      pickle.dump(print, f, protocol=2) # 2 because this is the default protocol used by pytorch
>>> with open('print.pkl', 'rb') as f:
>>>     pickletools.dis(f)
0: \x80 PROTO      2
2: c    GLOBAL     '__builtin__ print' # pickle saves the module string as __builtin__ !!! :(
21: q    BINPUT     0
23: .    STOP
```

Pull Request resolved: #129244
Approved by: https://github.com/albanD

* Allow BUILD/NEWOBJ instruction for items added via torch.serialization.add_safe_globals (#129251)

Previously, allowlisting functions/classes via `torch.serialization.add_safe_globals(obj)` for the `weights_only` Unpickler had the following effect:

- For a [`GLOBAL`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1926-L1939) instruction, `GLOBAL obj.__module__ obj.__name__` would be allowed and translated back to obj to be pushed back to the stack.
- For a [`REDUCE`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1926-L1982) instruction where we expect the stack to contain `func` and `args`, `func` is allowed if it was added via `add_safe_globals`

However, it did not have an effect on `BUILD` and `NEWOBJ` instructions

Some classes may be rebuilt via [`NEWOBJ`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L2091-L2104) instruction, which indicates that their constructor should be used to rebuild the class.

Further, a [`BUILD`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1984-L2007) instruction might be used if an object's `__reduce__`/`__reduce_ex__` returns a non-None value for `state`. Which indicates a `__setstate__` or `__dict__.update`.

**This PR makes sure that adding objects to the allowlist will also allow `NEWOBJ` and `BUILD` instructions for them.**

In particular, the update for `NEWOBJ` should unblock allowlisting of [`ScaledMMConfig`](https://github.com/pytorch-labs/float8_experimental/blob/d4ade877dff327ea7f51e91f7cc218ae956e8cfd/float8_experimental/float8_tensor.py#L26-L30) in float8_experimental @drisspg

Pull Request resolved: #129251
Approved by: https://github.com/albanD
ghstack dependencies: #129244

* Remove dependency on private _compat_pickle in CPython

ghstack-source-id: 7d6ee40
Pull Request resolved: #129509
@github-actions github-actions bot deleted the gh/mikaylagawarecki/226/head branch July 26, 2024 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: python frontend For issues relating to PyTorch's Python frontend oncall: distributed Add this issue/PR to distributed oncall triage queue topic: improvements topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0