-
Notifications
You must be signed in to change notification settings - Fork 521
DistributedModelParallel resharding Interface #2945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request was exported from Phabricator. Differential Revision: D73049934 |
2958bf2
to
09a1ac2
Compare
Summary: Finally! DMP interface for resharding, most of the changes here are to enable proper testing of DMP. ## Main changes: ### 1. DMP reshard API: * which calls the underlying sharder for sharded module to reshard ### 2. Proper Testing: * A multi-rank test which generates a full Model and utilizes DMP interface. Currently only tests TW. * This test is called from `test_dynamic_sharding.py` -> `test_model_parallel.py` -> `test_sharding.py`, which follows the same structure as current DMP unit tests * This is how the test tests for correctness: ``` 1. Generate global model and inputs 2. Create 2 identical local models based on global model 3. Use planner to generate sharding plan for local model 4. Based on planner output, generate a second, different sharding plan 5. Shard both local models 1 and 2 through DMP with plan 1 and 2 respectively 6. Reshard (dynamic sharding API) model 1 with plan 2 7. Generate predictions for local models and compare them to global model prediction. Expect to be the same. ``` * This tests for `optimzier` being correctly saved in resharding * The test is setup with other variables to-be-set once more functionalities are enabled with dynamic sharding, e.g. `variable_batch_size` etc. ### 3. Helper functions for testing * `get_sharding_constructor_from_type` to enable setting sharding_type for each unit test. * `compare_model_pred_one_step` only used for debugging to get more information on whether models are identical after resharding/running initial step * `compare_model_weights` also for debugging ### 3. Small refactoring in `update_shards` call. Differential Revision: D73049934
This pull request was exported from Phabricator. Differential Revision: D73049934 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D73049934 |
Summary: Pull Request resolved: pytorch#2945 Finally! DMP interface for resharding, most of the changes here are to enable proper testing of DMP. ## Main changes: ### 1. DMP reshard API: * which calls the underlying sharder for sharded module to reshard ### 2. Proper Testing: * A multi-rank test which generates a full Model and utilizes DMP interface. Currently only tests TW. * This test is called from `test_dynamic_sharding.py` -> `test_model_parallel.py` -> `test_sharding.py`, which follows the same structure as current DMP unit tests * This is how the test tests for correctness: ``` 1. Generate global model and inputs 2. Create 2 identical local models based on global model 3. Use planner to generate sharding plan for local model 4. Based on planner output, generate a second, different sharding plan 5. Shard both local models 1 and 2 through DMP with plan 1 and 2 respectively 6. Reshard (dynamic sharding API) model 1 with plan 2 7. Generate predictions for local models and compare them to global model prediction. Expect to be the same. ``` * This tests for `optimzier` being correctly saved in resharding * The test is setup with other variables to-be-set once more functionalities are enabled with dynamic sharding, e.g. `variable_batch_size` etc. ### 3. Helper functions for testing * `get_sharding_constructor_from_type` to enable setting sharding_type for each unit test. * `compare_model_pred_one_step` only used for debugging to get more information on whether models are identical after resharding/running initial step * `compare_model_weights` also for debugging ### 3. Small refactoring in `update_shards` call. Differential Revision: D73049934
09a1ac2
to
7b44948
Compare
Summary: Finally! DMP interface for resharding, most of the changes here are to enable proper testing of DMP. ## Main changes: ### 1. DMP reshard API: * which calls the underlying sharder for sharded module to reshard ### 2. Proper Testing: * A multi-rank test which generates a full Model and utilizes DMP interface. Currently only tests TW. * This test is called from `test_dynamic_sharding.py` -> `test_model_parallel.py` -> `test_sharding.py`, which follows the same structure as current DMP unit tests * This is how the test tests for correctness: ``` 1. Generate global model and inputs 2. Create 2 identical local models based on global model 3. Use pl 8000 anner to generate sharding plan for local model 4. Based on planner output, generate a second, different sharding plan 5. Shard both local models 1 and 2 through DMP with plan 1 and 2 respectively 6. Reshard (dynamic sharding API) model 1 with plan 2 7. Generate predictions for local models and compare them to global model prediction. Expect to be the same. ``` * This tests for `optimzier` being correctly saved in resharding * The test is setup with other variables to-be-set once more functionalities are enabled with dynamic sharding, e.g. `variable_batch_size` etc. ### 3. Helper functions for testing * `get_sharding_constructor_from_type` to enable setting sharding_type for each unit test. * `compare_model_pred_one_step` only used for debugging to get more information on whether models are identical after resharding/running initial step * `compare_model_weights` also for debugging ### 3. Small refactoring in `update_shards` call. Differential Revision: D73049934
7b44948
to
401fb0d
Compare
This pull request was exported from Phabricator. Differential Revision: D73049934 |
Summary: Finally! DMP interface for resharding, most of the changes here are to enable proper testing of DMP. ## Main changes: ### 1. DMP reshard API: * which calls the underlying sharder for sharded module to reshard ### 2. Proper Testing: * A multi-rank test which generates a full Model and utilizes DMP interface. Currently only tests TW. * This test is called from `test_dynamic_sharding.py` -> `test_model_parallel.py` -> `test_sharding.py`, which follows the same structure as current DMP unit tests * This is how the test tests for correctness: ``` 1. Generate global model and inputs 2. Create 2 identical local models based on global model 3. Use planner to generate sharding plan for local model 4. Based on planner output, generate a second, different sharding plan 5. Shard both local models 1 and 2 through DMP with plan 1 and 2 respectively 6. Reshard (dynamic sharding API) model 1 with plan 2 7. Generate predictions for local models and compare them to global model prediction. Expect to be the same. ``` * This tests for `optimzier` being correctly saved in resharding * The test is setup with other variables to-be-set once more functionalities are enabled with dynamic sharding, e.g. `variable_batch_size` etc. ### 3. Helper functions for testing * `get_sharding_constructor_from_type` to enable setting sharding_type for each unit test. * `compare_model_pred_one_step` only used for debugging to get more information on whether models are identical after resharding/running initial step * `compare_model_weights` also for debugging ### 3. Small refactoring in `update_shards` call. Differential Revision: D73049934
401fb0d
to
5619d65
Compare
This pull request was exported from Phabricator. Differential Revision: D73049934 |
5619d65
to
26ab9ed
Compare
Summary: Finally! DMP interface for resharding, most of the changes here are to enable proper testing of DMP. ## Main changes: ### 1. DMP reshard API: * which calls the underlying sharder for sharded module to reshard ### 2. Proper Testing: * A multi-rank test which generates a full Model and utilizes DMP interface. Currently only tests TW. * This test is called from `test_dynamic_sharding.py` -> `test_model_parallel.py` -> `test_sharding.py`, which follows the same structure as current DMP unit tests * This is how the test tests for correctness: ``` 1. Generate global model and inputs 2. Create 2 identical local models based on global model 3. Use planner to generate sharding plan for local model 4. Based on planner output, generate a second, different sharding plan 5. Shard both local models 1 and 2 through DMP with plan 1 and 2 respectively 6. Reshard (dynamic sharding API) model 1 with plan 2 7. Generate predictions for local models and compare them to global model prediction. Expect to be the same. ``` * This tests for `optimzier` being correctly saved in resharding * The test is setup with other variables to-be-set once more functionalities are enabled with dynamic sharding, e.g. `variable_batch_size` etc. ### 3. Helper functions for testing * `get_sharding_constructor_from_type` to enable setting sharding_type for each unit test. * `compare_model_pred_one_step` only used for debugging to get more information on whether models are identical after resharding/running initial step * `compare_model_weights` also for debugging ### 3. Small refactoring in `update_shards` call. Differential Revision: D73049934
This pull request was exported from Phabricator. Differential Revision: D73049934 |
Summary: Finally! DMP interface for resharding, most of the changes here are to enable proper testing of DMP. ## Main changes: ### 1. DMP reshard API: * which calls the underlying sharder for sharded module to reshard ### 2. Proper Testing: * A multi-rank test which generates a full Model and utilizes DMP interface. Currently only tests TW. * This test is called from `test_dynamic_sharding.py` -> `test_model_parallel.py` -> `test_sharding.py`, which follows the same structure as current DMP unit tests * This is how the test tests for correctness: ``` 1. Generate global model and inputs 2. Create 2 identical local models based on global model 3. Use planner to generate sharding plan for local model 4. Based on planner output, generate a second, different sharding plan 5. Shard both local models 1 and 2 through DMP with plan 1 and 2 respectively 6. Reshard (dynamic sharding API) model 1 with plan 2 7. Generate predictions for local models and compare them to global model prediction. Expect to be the same. ``` * This tests for `optimzier` being correctly saved in resharding * The test is setup with other variables to-be-set once more functionalities are enabled with dynamic sharding, e.g. `variable_batch_size` etc. ### 3. Helper functions for testing * `get_sharding_constructor_from_type` to enable setting sharding_type for each unit test. * `compare_model_pred_one_step` only used for debugging to get more information on whether models are identical after resharding/running initial step * `compare_model_weights` also for debugging ### 3. Small refactoring in `update_shards` call. Differential Revision: D73049934
26ab9ed
to
8f816fd
Compare
This pull request was exported from Phabricator. Differential Revision: D73049934 |
8f816fd
to
be5f2e1
Compare
Summary: Finally! DMP interface for resharding, most of the changes here are to enable proper testing of DMP. ## Main changes: ### 1. DMP reshard API: * which calls the underlying sharder for sharded module to reshard ### 2. Proper Testing: * A multi-rank test which generates a full Model and utilizes DMP interface. Currently only tests TW. * This test is called from `test_dynamic_sharding.py` -> `test_model_parallel.py` -> `test_sharding.py`, which follows the same structure as current DMP unit tests * This is how the test tests for correctness: ``` 1. Generate global model and inputs 2. Create 2 identical local models based on global model 3. Use planner to generate sharding plan for local model 4. Based on planner output, generate a second, different sharding plan 5. Shard both local models 1 and 2 through DMP with plan 1 and 2 respectively 6. Reshard (dynamic sharding API) model 1 with plan 2 7. Generate predictions for local models and compare them to global model prediction. Expect to be the same. ``` * This tests for `optimzier` being correctly saved in resharding * The test is setup with other variables to-be-set once more functionalities are enabled with dynamic sharding, e.g. `variable_batch_size` etc. ### 3. Helper functions for testing * `get_sharding_constructor_from_type` to enable setting sharding_type for each unit test. * `compare_model_pred_one_step` only used for debugging to get more information on whether models are identical after resharding/running initial step * `compare_model_weights` also for debugging ### 3. Small refactoring in `update_shards` call. Differential Revision: D73049934
This pull request was exported from Phabricator. Differential Revision: D73049934 |
Summary: Finally! DMP interface for resharding, most of the changes here are to enable proper testing of DMP. ## Main changes: ### 1. DMP reshard API: * which calls the underlying sharder for sharded module to reshard ### 2. Proper Testing: * A multi-rank test which generates a full Model and utilizes DMP interface. Currently only tests TW. * This test is called from `test_dynamic_sharding.py` -> `test_model_parallel.py` -> `test_sharding.py`, which follows the same structure as current DMP unit tests * This is how the test tests for correctness: ``` 1. Generate global model and inputs 2. Create 2 identical local models based on global model 3. Use planner to generate sharding plan for local model 4. Based on planner output, generate a second, different sharding plan 5. Shard both local models 1 and 2 through DMP with plan 1 and 2 respectively 6. Reshard (dynamic sharding API) model 1 with plan 2 7. Generate predictions for local models and compare them to global model prediction. Expect to be the same. ``` * This tests for `optimzier` being correctly saved in resharding * The test is setup with other variables to-be-set once more functionalities are enabled with dynamic sharding, e.g. `variable_batch_size` etc. ### 3. Helper functions for testing * `get_sharding_constructor_from_type` to enable setting sharding_type for each unit test. * `compare_model_pred_one_step` only used for debugging to get more information on whether models are identical after resharding/running initial step * `compare_model_weights` also for debugging ### 3. Small refactoring in `update_shards` call. Differential Revision: D73049934
be5f2e1
to
a0fd27c
Compare
This pull request was exported from Phabricator. Differential Revision: D73049934 |
Summary: Pull Request resolved: pytorch#2945 Finally! DMP interface for resharding, most of the changes here are to enable proper testing of DMP. ## Main changes: ### 1. DMP reshard API: * which calls the underlying sharder for sharded module to reshard ### 2. Proper Testing: * A multi-rank test which generates a full Model and utilizes DMP interface. Currently only tests TW. * This test is called from `test_dynamic_sharding.py` -> `test_model_parallel.py` -> `test_sharding.py`, which follows the same structure as current DMP unit tests * This is how the test tests for correctness: ``` 1. Generate global model and inputs 2. Create 2 identical local models based on global model 3. Use planner to generate sharding plan for local model 4. Based on planner output, generate a second, different sharding plan 5. Shard both local models 1 and 2 through DMP with plan 1 and 2 respectively 6. Reshard (dynamic sharding API) model 1 with plan 2 7. Generate predictions for local models and compare them to global model prediction. Expect to be the same. ``` * This tests for `optimzier` being correctly saved in resharding * The test is setup with other variables to-be-set once more functionalities are enabled with dynamic sharding, e.g. `variable_batch_size` etc. ### 3. Helper functions for testing * `get_sharding_constructor_from_type` to enable setting sharding_type for each unit test. * `compare_model_pred_one_step` only used for debugging to get more information on whether models are identical after resharding/running initial step * `compare_model_weights` also for debugging ### 3. Small refactoring in `update_shards` call. Differential Revision: D73049934
a0fd27c
to
def7b50
Compare
This pull request was exported from Phabricator. Differential Revision: D73049934 |
def7b50
to
6091434
Compare
Summary: Finally! DMP interface for resharding, most of the changes here are to enable proper testing of DMP. ## Main changes: ### 1. DMP reshard API: * which calls the underlying sharder for sharded module to reshard ### 2. Proper Testing: * A multi-rank test which generates a full Model and utilizes DMP interface. Currently only tests TW. * This test is called from `test_dynamic_sharding.py` -> `test_model_parallel.py` -> `test_sharding.py`, which follows the same structure as current DMP unit tests * This is how the test tests for correctness: ``` 1. Generate global model and inputs 2. Create 2 identical local models based on global model 3. Use planner to generate sharding plan for local model 4. Based on planner output, generate a second, different sharding plan 5. Shard both local models 1 and 2 through DMP with plan 1 and 2 respectively 6. Reshard (dynamic sharding API) model 1 with plan 2 7. Generate predictions for local models and compare them to global model prediction. Expect to be the same. ``` * This tests for `optimzier` being correctly saved in resharding * The test is setup with other variables to-be-set once more functionalities are enabled with dynamic sharding, e.g. `variable_batch_size` etc. ### 3. Helper functions for testing * `get_sharding_constructor_from_type` to enable setting sharding_type for each unit test. * `compare_model_pred_one_step` only used for debugging to get more information on whether models are identical after resharding/running initial step * `compare_model_weights` also for debugging ### 3. Small refactoring in `update_shards` call. Differential Revision: D73049934
This pull request was exported from Phabricator. Differential Revision: D73049934 |
Summary: Pull Request resolved: pytorch#2945 Finally! DMP interface for resharding, most of the changes here are to enable proper testing of DMP. ## Main changes: ### 1. DMP reshard API: * which calls the underlying sharder for sharded module to reshard ### 2. Proper Testing: * A multi-rank test which generates a full Model and utilizes DMP interface. Currently only tests TW. * This test is called from `test_dynamic_sharding.py` -> `test_model_parallel.py` -> `test_sharding.py`, which follows the same structure as current DMP unit tests * This is how the test tests for correctness: ``` 1. Generate global model and inputs 2. Create 2 identical local models based on global model 3. Use planner to generate sharding plan for local model 4. Based on planner output, generate a second, different sharding plan 5. Shard both local models 1 and 2 through DMP with plan 1 and 2 respectively 6. Reshard (dynamic sharding API) model 1 with plan 2 7. Generate predictions for local models and compare them to global model prediction. Expect to be the same. ``` * This tests for `optimzier` being correctly saved in resharding * The test is setup with other variables to-be-set once more functionalities are enabled with dynamic sharding, e.g. `variable_batch_size` etc. ### 3. Helper functions for testing * `get_sharding_constructor_from_type` to enable setting sharding_type for each unit test. * `compare_model_pred_one_step` only used for debugging to get more information on whether models are identical after resharding/running initial step * `compare_model_weights` also for debugging ### 4. Bug fixes in `update_shards` call. * namely input dist was not properly updated - this will cause error when I am testing the reshard function in the *middle of training*. As input dist depends on the shard placements. Differential Revision: D73049934
6091434
to
8ccf66f
Compare
This pull request was exported from Phabricator. Differential Revision: D73049934 |
Summary: Finally! DMP interface for resharding, most of the changes here are to enable proper testing of DMP. ## Main changes: ### 1. DMP reshard API: * which calls the underlying sharder for sharded module to reshard ### 2. Proper Testing: * A multi-rank test which generates a full Model and utilizes DMP interface. Currently only tests TW. * This test is called from `test_dynamic_sharding.py` -> `test_model_parallel.py` -> `test_sharding.py`, which follows the same structure as current DMP unit tests * This is how the test tests for correctness: ``` 1. Generate global model and inputs 2. Create 2 identical local models based on global model 3. Use planner to generate sharding plan for local model 4. Based on planner output, generate a second, different sharding plan 5. Shard both local models 1 and 2 through DMP with plan 1 and 2 respectively 6. Reshard (dynamic sharding API) model 1 with plan 2 7. Generate predictions for local models and compare them to global model prediction. Expect to be the same. ``` * This tests for `optimzier` being correctly saved in resharding * The test is setup with other variables to-be-set once more functionalities are enabled with dynamic sharding, e.g. `variable_batch_size` etc. ### 3. Helper functions for testing * `get_sharding_constructor_from_type` to enable setting sharding_type for each unit test. * `compare_model_pred_one_step` only used for debugging to get more information on whether models are identical after resharding/running initial step * `compare_model_weights` also for debugging ### 4. Bug fixes in `update_shards` call. * namely input dist was not properly updated - this will cause error when I am testing the reshard function in the *middle of training*. As input dist depends on the shard placements. Reviewed By: aliafzal Differential Revision: D73049934
8ccf66f
to
66de0cc
Compare
This pull request was exported from Phabricator. Differential Revision: D73049934 |
Summary:
Finally! DMP interface for resharding, most of the changes here are to enable proper testing of DMP.
Main changes:
1. DMP reshard API:
2. Proper Testing:
test_dynamic_sharding.py
->test_model_parallel.py
->test_sharding.py
, which follows the same structure as current DMP unit testsoptimzier
being correctly saved in reshardingvariable_batch_size
etc.3. Helper functions for testing
get_sharding_constructor_from_type
to enable setting sharding_type for each unit test.compare_model_pred_one_step
only used for debugging to get more information on whether models are identical after resharding/running initial stepcompare_model_weights
also for debugging3. Small refactoring in
update_shards
call.Differential Revision: D73049934