-
Notifications
You must be signed in to change notification settings - Fork 24.3k
Fix get_source_partitions when weights are tied #142446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142446
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 5134ff9 with merge base 45411d1 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D66998592 |
Summary: Fix pytorch#142035 When Linear module params are tied to another parameter, like this: ``` class SimpleLinearModel(nn.Module): def __init__(self, input_size, output_size): super(SimpleLinearModel, self).__init__() # Define a linear layer self.linear = nn.Linear(input_size, output_size) self.tied_weight = self.linear.weight def forward(self, x): # Forward pass through the linear layer b = self.tied_weight + 1 return self.linear(x), b ``` We get a graph like below: ``` graph(): %p_tied_weight : [num_users=0] = placeholder[target=p_tied_weight] %p_linear_weight : [num_users=2] = placeholder[target=p_linear_weight] %p_linear_bias : [num_users=1] = placeholder[target=p_linear_bias] %x : [num_users=1] = placeholder[target=x] %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%p_linear_weight, 1), kwargs = {}) %linear : [num_users=1] = call_function[target=torch.ops.aten.linear.default](args = (%x, %p_linear_weight, %p_linear_bias), kwargs = {}) return (linear, add) ``` Notice that ` %p_linear_weight : [num_users=2]`. When we get source partitions, we should exclude attributes nodes like `p_linear_weight` from outputs. A real world example where people do something like this is in pytorch#142035. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r test_module_partitioner_weight_tied ``` Differential Revision: D66998592
a18a945
to
fa70cd0
Compare
This pull request was exported from Phabricator. Differential Revision: D66998592 |
fa70cd0
to
7098027
Compare
Summary: Fix pytorch#142035 When Linear module params are tied to another parameter, like this: ``` class SimpleLinearModel(nn.Module): def __init__(self, input_size, output_size): super(SimpleLinearModel, self).__init__() # Define a linear layer self.linear = nn.Linear(input_size, output_size) self.tied_weight = self.linear.weight def forward(self, x): # Forward pass through the linear layer b = self.tied_weight + 1 return self.linear(x), b ``` We get a graph like below: ``` graph(): %p_tied_weight : [num_users=0] = placeholder[target=p_tied_weight] %p_linear_weight : [num_users=2] = placeholder[target=p_linear_weight] %p_linear_bias : [num_users=1] = placeholder[target=p_linear_bias] %x : [num_users=1] = placeholder[target=x] %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%p_linear_weight, 1), kwargs = {}) %linear : [num_users=1] = call_function[target=torch.ops.aten.linear.default](args = (%x, %p_linear_weight, %p_linear_bias), kwargs = {}) return (linear, add) ``` Notice that ` %p_linear_weight : [num_users=2]`. When we get source partitions, we should exclude attributes nodes like `p_linear_weight` from outputs. A real world example where people do something like this is in pytorch#142035. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r test_module_partitioner_weight_tied ``` Differential Revision: D66998592
This pull request was exported from Phabricator. Differential Revision: D66998592 |
Summary: Fix pytorch#142035 When Linear module params are tied to another parameter, like this: ``` class SimpleLinearModel(nn.Module): def __init__(self, input_size, output_size): super(SimpleLinearModel, self).__init__() # Define a linear layer self.linear = nn.Linear(input_size, output_size) self.tied_weight = self.linear.weight def forward(self, x): # Forward pass through the linear layer b = self.tied_weight + 1 return self.linear(x), b ``` We get a graph like below: ``` graph(): %p_tied_weight : [num_users=0] = placeholder[target=p_tied_weight] %p_linear_weight : [num_users=2] = placeholder[target=p_linear_weight] %p_linear_bias : [num_users=1] = placeholder[target=p_linear_bias] %x : [num_users=1] = placeholder[target=x] %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%p_linear_weight, 1), kwargs = {}) %linear : [num_users=1] = call_function[target=torch.ops.aten.linear.default](args = (%x, %p_linear_weight, %p_linear_bias), kwargs = {}) return (linear, add) ``` Notice that ` %p_linear_weight : [num_users=2]`. When we get source partitions, we should exclude attributes nodes like `p_linear_weight` from outputs. A real world example where people do something like this is in pytorch#142035. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r test_module_partitioner_weight_tied ``` Differential Revision: D66998592
7098027
to
4799d59
Compare
This pull request was exported from Phabricator. Differential Revision: D66998592 |
Summary: Fix pytorch#142035 When Linear module params are tied to another parameter, like this: ``` class SimpleLinearModel(nn.Module): def __init__(self, input_size, output_size): super(SimpleLinearModel, self).__init__() # Define a linear layer self.linear = nn.Linear(input_size, output_size) self.tied_weight = self.linear.weight def forward(self, x): # Forward pass through the linear layer b = self.tied_weight + 1 return self.linear(x), b ``` We get a graph like below: ``` graph(): %p_tied_weight : [num_users=0] = placeholder[target=p_tied_weight] %p_linear_weight : [num_users=2] = placeholder[target=p_linear_weight] %p_linear_bias : [num_users=1] = placeholder[target=p_linear_bias] %x : [num_users=1] = placeholder[target=x] %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%p_linear_weight, 1), kwargs = {}) %linear : [num_users=1] = call_function[target=torch.ops.aten.linear.default](args = (%x, %p_linear_weight, %p_linear_bias), kwargs = {}) return (linear, add) ``` Notice that ` %p_linear_weight : [num_users=2]`. When we get source partitions, we should exclude attributes nodes like `p_linear_weight` from outputs. A real world example where people do something like this is in pytorch#142035. #build_targets_regex[fbcode//bolt/nn/.*] Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r test_module_partitioner_weight_tied ``` Differential Revision: D66998592
4799d59
to
9a2e624
Compare
This pull request was exported from Phabricator. Differential Revision: D66998592 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @andrewor14 @tarun292 I'm not sure if there's some partitioner relying on this behavior, but it looks like CI is green so far
Summary: Fix pytorch#142035 When Linear module params are tied to another parameter, like this: ``` class SimpleLinearModel(nn.Module): def __init__(self, input_size, output_size): super(SimpleLinearModel, self).__init__() # Define a linear layer self.linear = nn.Linear(input_size, output_size) self.tied_weight = self.linear.weight def forward(self, x): # Forward pass through the linear layer b = self.tied_weight + 1 return self.linear(x), b ``` We get a graph like below: ``` graph(): %p_tied_weight : [num_users=0] = placeholder[target=p_tied_weight] %p_linear_weight : [num_users=2] = placeholder[target=p_linear_weight] %p_linear_bias : [num_users=1] = placeholder[target=p_linear_bias] %x : [num_users=1] = placeholder[target=x] %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%p_linear_weight, 1), kwargs = {}) %linear : [num_users=1] = call_function[target=torch.ops.aten.linear.default](args = (%x, %p_linear_weight, %p_linear_bias), kwargs = {}) return (linear, add) ``` Notice that ` %p_linear_weight : [num_users=2]`. When we get source partitions, we should exclude attributes nodes like `p_linear_weight` from outputs. A real world example where people do something like this is in pytorch#142035. #build_targets_regex[fbcode//bolt/nn/.*] Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r test_module_partitioner_weight_tied ``` Reviewed By: angelayi Differential Revision: D66998592
9a2e624
to
5134ff9
Compare
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
2 similar comments
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
Merge startedYour change will be merged while ignoring the following 1 checks: Lint / Test Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
18 similar comments
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
Merge startedYour change will be merged while ignoring the following 1 checks: Lint / Test Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary:
Fix #142035 and #143621
When Linear module params are tied to another parameter, like this:
We get a graph like below:
Notice that
%p_linear_weight : [num_users=2]
.When we get source partitions, we should exclude attributes nodes like
p_linear_weight
from outputs.A real world example where people do something like this is in #142035.
Test Plan:
Differential Revision: D66998592
cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv