8000 Fix get_source_partitions when weights are tied by yushangdi · Pull Request #142446 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Fix get_source_partitions when weights are tied #142446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

yushangdi
Copy link
Contributor
@yushangdi yushangdi commented Dec 10, 2024

Summary:
Fix #142035 and #143621

When Linear module params are tied to another parameter, like this:

class SimpleLinearModel(nn.Module):
    def __init__(self, input_size, output_size):
        super(SimpleLinearModel, self).__init__()
        # Define a linear layer
        self.linear = nn.Linear(input_size, output_size)
        self.tied_weight = self.linear.weight

    def forward(self, x):
        # Forward pass through the linear layer
        b = self.tied_weight + 1
        return self.linear(x), b

We get a graph like below:

graph():
    %p_tied_weight : [num_users=0] = placeholder[target=p_tied_weight]
    %p_linear_weight : [num_users=2] = placeholder[target=p_linear_weight]
    %p_linear_bias : [num_users=1] = placeholder[target=p_linear_bias]
    %x : [num_users=1] = placeholder[target=x]
    %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%p_linear_weight, 1), kwargs = {})
    %linear : [num_users=1] = call_function[target=torch.ops.aten.linear.default](args = (%x, %p_linear_weight, %p_linear_bias), kwargs = {})
    return (linear, add)

Notice that %p_linear_weight : [num_users=2].

When we get source partitions, we should exclude attributes nodes like p_linear_weight from outputs.

A real world example where people do something like this is in #142035.

Test Plan:

 buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r test_module_partitioner_weight_tied

Differential Revision: D66998592

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv

Copy link
pytorch-bot bot commented Dec 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142446

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 5134ff9 with merge base 45411d1 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66998592

yushangdi added a commit to yushangdi/pytorch that referenced this pull request Dec 10, 2024
Summary:

Fix pytorch#142035



When Linear module params are tied to another parameter, like this:

```
class SimpleLinearModel(nn.Module):
    def __init__(self, input_size, output_size):
        super(SimpleLinearModel, self).__init__()
        # Define a linear layer
        self.linear = nn.Linear(input_size, output_size)
        self.tied_weight = self.linear.weight

    def forward(self, x):
        # Forward pass through the linear layer
        b = self.tied_weight + 1
        return self.linear(x), b
```

We get a graph like below:

```
graph():
    %p_tied_weight : [num_users=0] = placeholder[target=p_tied_weight]
    %p_linear_weight : [num_users=2] = placeholder[target=p_linear_weight]
    %p_linear_bias : [num_users=1] = placeholder[target=p_linear_bias]
    %x : [num_users=1] = placeholder[target=x]
    %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%p_linear_weight, 1), kwargs = {})
    %linear : [num_users=1] = call_function[target=torch.ops.aten.linear.default](args = (%x, %p_linear_weight, %p_linear_bias), kwargs = {})
    return (linear, add)
```

Notice that ` %p_linear_weight : [num_users=2]`.

When we get source partitions, we should exclude attributes nodes like `p_linear_weight` from outputs.


A real world example where people do something like this is in pytorch#142035.

Test Plan:
```
 buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r test_module_partitioner_weight_tied
```

Differential Revision: D66998592
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66998592

yushangdi added a commit to yushangdi/pytorch that referenced this pull request Dec 18, 2024
Summary:

Fix pytorch#142035



When Linear module params are tied to another parameter, like this:

```
class SimpleLinearModel(nn.Module):
    def __init__(self, input_size, output_size):
        super(SimpleLinearModel, self).__init__()
        # Define a linear layer
        self.linear = nn.Linear(input_size, output_size)
        self.tied_weight = self.linear.weight

    def forward(self, x):
        # Forward pass through the linear layer
        b = self.tied_weight + 1
        return self.linear(x), b
```

We get a graph like below:

```
graph():
    %p_tied_weight : [num_users=0] = placeholder[target=p_tied_weight]
    %p_linear_weight : [num_users=2] = placeholder[target=p_linear_weight]
    %p_linear_bias : [num_users=1] = placeholder[target=p_linear_bias]
    %x : [num_users=1] = placeholder[target=x]
    %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%p_linear_weight, 1), kwargs = {})
    %linear : [num_users=1] = call_function[target=torch.ops.aten.linear.default](args = (%x, %p_linear_weight, %p_linear_bias), kwargs = {})
    return (linear, add)
```

Notice that ` %p_linear_weight : [num_users=2]`.

When we get source partitions, we should exclude attributes nodes like `p_linear_weight` from outputs.


A real world example where people do something like this is in pytorch#142035.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r test_module_partitioner_weight_tied
```

Differential Revision: D66998592
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66998592

yushangdi added a commit to yushangdi/pytorch that referenced this pull request Dec 18, 2024
Summary:

Fix pytorch#142035



When Linear module params are tied to another parameter, like this:

```
class SimpleLinearModel(nn.Module):
    def __init__(self, input_size, output_size):
        super(SimpleLinearModel, self).__init__()
        # Define a linear layer
        self.linear = nn.Linear(input_size, output_size)
        self.tied_weight = self.linear.weight

    def forward(self, x):
        # Forward pass through the linear layer
        b = self.tied_weight + 1
        return self.linear(x), b
```

We get a graph like below:

```
graph():
    %p_tied_weight : [num_users=0] = placeholder[target=p_tied_weight]
    %p_linear_weight : [num_users=2] = placeholder[target=p_linear_weight]
    %p_linear_bias : [num_users=1] = placeholder[target=p_linear_bias]
    %x : [num_users=1] = placeholder[target=x]
    %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%p_linear_weight, 1), kwargs = {})
    %linear : [num_users=1] = call_function[target=torch.ops.aten.linear.default](args = (%x, %p_linear_weight, %p_linear_bias), kwargs = {})
    return (linear, add)
```

Notice that ` %p_linear_weight : [num_users=2]`.

When we get source partitions, we should exclude attributes nodes like `p_linear_weight` from outputs.


A real world example where people do something like this is in pytorch#142035.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r test_module_partitioner_weight_tied
```

Differential Revision: D66998592
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66998592

yushangdi added a commit to yushangdi/pytorch that referenced this pull request Dec 18, 2024
Summary:

Fix pytorch#142035



When Linear module params are tied to another parameter, like this:

```
class SimpleLinearModel(nn.Module):
    def __init__(self, input_size, output_size):
        super(SimpleLinearModel, self).__init__()
        # Define a linear layer
        self.linear = nn.Linear(input_size, output_size)
        self.tied_weight = self.linear.weight

    def forward(self, x):
        # Forward pass through the linear layer
        b = self.tied_weight + 1
        return self.linear(x), b
```

We get a graph like below:

```
graph():
    %p_tied_weight : [num_users=0] = placeholder[target=p_tied_weight]
    %p_linear_weight : [num_users=2] = placeholder[target=p_linear_weight]
    %p_linear_bias : [num_users=1] = placeholder[target=p_linear_bias]
    %x : [num_users=1] = placeholder[target=x]
    %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%p_linear_weight, 1), kwargs = {})
    %linear : [num_users=1] = call_function[target=torch.ops.aten.linear.default](args = (%x, %p_linear_weight, %p_linear_bias), kwargs = {})
    return (linear, add)
```

Notice that ` %p_linear_weight : [num_users=2]`.

When we get source partitions, we should exclude attributes nodes like `p_linear_weight` from outputs.


A real world example where people do something like this is in pytorch#142035.


#build_targets_regex[fbcode//bolt/nn/.*]

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r test_module_partitioner_weight_tied
```

Differential Revision: D66998592
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66998592

Copy link
Contributor
@angelayi angelayi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @andrewor14 @tarun292 I'm not sure if there's some partitioner relying on this behavior, but it looks like CI is green so far

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 2, 2025
Summary:

Fix pytorch#142035



When Linear module params are tied to another parameter, like this:

```
class SimpleLinearModel(nn.Module):
    def __init__(self, input_size, output_size):
        super(SimpleLinearModel, self).__init__()
        # Define a linear layer
        self.linear = nn.Linear(input_size, output_size)
        self.tied_weight = self.linear.weight

    def forward(self, x):
        # Forward pass through the linear layer
        b = self.tied_weight + 1
        return self.linear(x), b
```

We get a graph like below:

```
graph():
    %p_tied_weight : [num_users=0] = placeholder[target=p_tied_weight]
    %p_linear_weight : [num_users=2] = placeholder[target=p_linear_weight]
    %p_linear_bias : [num_users=1] = placeholder[target=p_linear_bias]
    %x : [num_users=1] = placeholder[target=x]
    %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%p_linear_weight, 1), kwargs = {})
    %linear : [num_users=1] = call_function[target=torch.ops.aten.linear.default](args = (%x, %p_linear_weight, %p_linear_bias), kwargs = {})
    return (linear, add)
```

Notice that ` %p_linear_weight : [num_users=2]`.

When we get source partitions, we should exclude attributes nodes like `p_linear_weight` from outputs.


A real world example where people do something like this is in pytorch#142035.


#build_targets_regex[fbcode//bolt/nn/.*]

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r test_module_partitioner_weight_tied
```

Reviewed By: angelayi

Differential Revision: D66998592
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

2 similar comments
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: Lint / Test run_test.py is usable without boto3

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

18 similar comments
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
8000 Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: Lint / Test run_test.py is usable without boto3

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PT2E][Quant] prepare_pt2e model with tied weights will raise ValueError when using export_for_training
4 participants
0