8000 [JIT] Optimize DCE by storing a MemoryLocations for an entire set<Value*> by davidberard98 · Pull Request #153645 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[JIT] Optimize DCE by storing a MemoryLocations for an entire set<Value*> #153645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

davidberard98
Copy link
Contributor
@davidberard98 davidberard98 commented May 15, 2025

Summary:
TL;DR: make DCE faster by replacing a Set<Value*> with a MemoryLocations sparse bitset (representing all the memory locations stored by the collection of all values in the set).

Details
The goal of this PR is to optimize this function from AliasDb:

bool AliasDb::writesToAlias(Node* n, const ValueSet& vs) const {
  const auto writtenTo = getWrites(n);
  if (writtenTo.empty()) {
    return false;
  }

  MemoryLocations locs;
  for (const auto v : vs) {
    auto it = elementMap_.find(v);
    if (it != elementMap_.end()) {
      const auto& vlocs = memoryDAG_->ge
8000
tMemoryLocations(it->second);
      if (writtenTo.intersects(vlocs)) {
        return true;
      }
    }
  }

  return false;
}

In the DCE use case, we have a ValueSet of live values, into which we insert Value*s; and sometimes need to check whether a node mutates any of the live values using writesToAlias.

Looping through all the values in the ValueSet and indexing into the elementMap_ is slow; so if we can pre-compute the MemoryLocations set, this speeds up the function. In some large model examples, I see ~15-25x speedups from this change.

Implementation: To avoid exposing too many details of AliasDb, I introduce a friend class ValueAndMemoryLocationSet, which is an insert-only set of Values, which also maintains the corresponding MemoryLocations.

Then in AliasDb, I use ValueAndMemoryLocationSet if we're using AliasDb for analysis, and otherwise use a Set<Value*> if we don't have AliasDb.

Test Plan: Rely on unit tests.

Differential Revision: D74827086

cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel

Copy link
pytorch-bot bot commented May 15, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153645

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2b6ee35 with merge base ea17cd0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: jit release notes category label May 15, 2025
@facebook-github-bot facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue fb-exported labels May 15, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74827086

davidberard98 added a commit to davidberard98/pytorch that referenced this pull request May 15, 2025
…ue*> (pytorch#153645)

Summary:

**TL;DR**: make DCE faster by replacing a Set<Value*> with a MemoryLocations sparse bitset (representing all the memory locations stored by the collection of all values in the set).

**Details**
The goal of this PR is to optimize this function from AliasDb:

```
bool AliasDb::writesToAlias(Node* n, const ValueSet& vs) const {
  const auto writtenTo = getWrites(n);
  if (writtenTo.empty()) {
    return false;
  }

  MemoryLocations locs;
  for (const auto v : vs) {
    auto it = elementMap_.find(v);
    if (it != elementMap_.end()) {
      const auto& vlocs = memoryDAG_->getMemoryLocations(it->second);
      if (writtenTo.intersects(vlocs)) {
        return true;
      }
    }
  }

  return false;
}
```

In the DCE use case, we have a ValueSet of live values, into which we insert `Value*`s; and sometimes need to check whether a node mutates any of the live values using `writesToAlias`.

Looping through all the values in the ValueSet and indexing into the elementMap_ is slow; so if we can pre-compute the MemoryLocations set, this speeds up the function. In some large model examples, I see ~15-25x speedups from this change.

**Implementation**: To avoid exposing too many details of AliasDb, I introduce a friend class `ValueAndMemoryLocationSet`, which is an insert-only set of Values, which also maintains the corresponding MemoryLocations.

Then in AliasDb, I use `ValueAndMemoryLocationSet` if we're dealing with

Test Plan: Rely on unit tests.

Differential Revision: D74827086
davidberard98 added a commit to davidberard98/pytorch that referenced this pull request May 15, 2025
…ue*> (pytorch#153645)

Summary:

**TL;DR**: make DCE faster by replacing a Set<Value*> with a MemoryLocations sparse bitset (representing all the memory locations stored by the collection of all values in the set).

**Details**
The goal of this PR is to optimize this function from AliasDb:

```
bool AliasDb::writesToAlias(Node* n, const ValueSet& vs) const {
  const auto writtenTo = getWrites(n);
  if (writtenTo.empty()) {
    return false;
  }

  MemoryLocations locs;
  for (const auto v : vs) {
    auto it = elementMap_.find(v);
    if (it != elementMap_.end()) {
      const auto& vlocs = memoryDAG_->getMemoryLocations(it->second);
      if (writtenTo.intersects(vlocs)) {
        return true;
      }
    }
  }

  return false;
}
```

In the DCE use case, we have a ValueSet of live values, into which we insert `Value*`s; and sometimes need to check whether a node mutates any of the live values using `writesToAlias`.

Looping through all the values in the ValueSet and indexing into the elementMap_ is slow; so if we can pre-compute the MemoryLocations set, this speeds up the function. In some large model examples, I see ~15-25x speedups from this change.

**Implementation**: To avoid exposing too many details of AliasDb, I introduce a friend class `ValueAndMemoryLocationSet`, which is an insert-only set of Values, which also maintains the corresponding MemoryLocations.

Then in AliasDb, I use `ValueAndMemoryLocationSet` if we're dealing with

Test Plan: Rely on unit tests.

Differential Revision: D74827086
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74827086

davidberard98 added a commit to davidberard98/pytorch that referenced this pull request May 15, 2025
…ue*> (pytorch#153645)

Summary:
Pull Request resolved: pytorch#153645

**TL;DR**: make DCE faster by replacing a Set<Value*> with a MemoryLocations sparse bitset (representing all the memory locations stored by the collection of all values in the set).

**Details**
The goal of this PR is to optimize this function from AliasDb:

```
bool AliasDb::writesToAlias(Node* n, const ValueSet& vs) const {
  const auto writtenTo = getWrites(n);
  if (writtenTo.empty()) {
    return false;
  }

  MemoryLocations locs;
  for (const auto v : vs) {
    auto it = elementMap_.find(v);
    if (it != elementMap_.end()) {
      const auto& vlocs = memoryDAG_->getMemoryLocations(it->second);
      if (writtenTo.intersects(vlocs)) {
        return true;
      }
    }
  }

  return false;
}
```

In the DCE use case, we have a ValueSet of live values, into which we insert `Value*`s; and sometimes need to check whether a node mutates any of the live values using `writesToAlias`.

Looping through all the values in the ValueSet and indexing into the elementMap_ is slow; so if we can pre-compute the MemoryLocations set, this speeds up the function. In some large model examples, I see ~15-25x speedups from this change.

**Implementation**: To avoid exposing too many details of AliasDb, I introduce a friend class `ValueAndMemoryLocationSet`, which is an insert-only set of Values, which also maintains the corresponding MemoryLocations.

Then in AliasDb, I use `ValueAndMemoryLocationSet` if we're dealing with

Test Plan: Rely on unit tests.

Differential Revision: D74827086
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74827086

davidberard98 added a commit to davidberard98/pytorch that referenced this pull request May 15, 2025
…ue*> (pytorch#153645)

Summary:
Pull Request resolved: pytorch#153645

**TL;DR**: make DCE faster by replacing a Set<Value*> with a MemoryLocations sparse bitset (representing all the memory locations stored by the collection of all values in the set).

**Details**
The goal of this PR is to optimize this function from AliasDb:

```
bool AliasDb::writesToAlias(Node* n, const ValueSet& vs) const {
  const auto writtenTo = getWrites(n);
  if (writtenTo.empty()) {
    return false;
  }

  MemoryLocations locs;
  for (const auto v : vs) {
    auto it = elementMap_.find(v);
    if (it != elementMap_.end()) {
      const auto& vlocs = memoryDAG_->getMemoryLocations(it->second);
      if (writtenTo.intersects(vlocs)) {
        return true;
      }
    }
  }

  return false;
}
```

In the DCE use case, we have a ValueSet of live values, into which we insert `Value*`s; and sometimes need to check whether a node mutates any of the live values using `writesToAlias`.

Looping through all the values in the ValueSet and indexing into the elementMap_ is slow; so if we can pre-compute the MemoryLocations set, this speeds up the function. In some large model examples, I see ~15-25x speedups from this change.

**Implementation**: To avoid exposing too many details of AliasDb, I introduce a friend class `ValueAndMemoryLocationSet`, which is an insert-only set of Values, which also maintains the corresponding MemoryLocations.

Then in AliasDb, I use `ValueAndMemoryLocationSet` if we're dealing with

Test Plan: Rely on unit tests.

Differential Revision: D74827086
@davidberard98 davidberard98 added the ciflow/trunk Trigger trunk jobs on your pull request label May 15, 2025
davidberard98 added a commit to davidberard98/pytorch that referenced this pull request May 15, 2025
…ue*> (pytorch#153645)

Summary:

**TL;DR**: make DCE faster by replacing a Set<Value*> with a MemoryLocations sparse bitset (representing all the memory locations stored by the collection of all values in the set).

**Details**
The goal of this PR is to optimize this function from AliasDb:

```
bool AliasDb::writesToAlias(Node* n, const ValueSet& vs) const {
  const auto writtenTo = getWrites(n);
  if (writtenTo.empty()) {
    return false;
  }

  MemoryLocations locs;
  for (const auto v : vs) {
    auto it = elementMap_.find(v);
    if (it != elementMap_.end()) {
      const auto& vlocs = memoryDAG_->getMemoryLocations(it->second);
      if (writtenTo.intersects(vlocs)) {
        return true;
      }
    }
  }

  return false;
}
```

In the DCE use case, we have a ValueSet of live values, into which we insert `Value*`s; and sometimes need to check whether a node mutates any of the live values using `writesToAlias`.

Looping through all the values in the ValueSet and indexing into the elementMap_ is slow; so if we can pre-compute the MemoryLocations set, this speeds up the function. In some large model examples, I see ~15-25x speedups from this change.

**Implementation**: To avoid exposing too many details of AliasDb, I introduce a friend class `ValueAndMemoryLocationSet`, which is an insert-only set of Values, which also maintains the corresponding MemoryLocations.

Then in AliasDb, I use `ValueAndMemoryLocationSet` if we're dealing with

Test Plan: Rely on unit tests.

Differential Revision: D74827086
davidberard98 added a commit to davidberard98/pytorch that referenced this pull request May 15, 2025
…ue*> (pytorch#153645)

Summary:

**TL;DR**: make DCE faster by replacing a Set<Value*> with a MemoryLocations sparse bitset (representing all the memory locations stored by the collection of all values in the set).

**Details**
The goal of this PR is to optimize this function from AliasDb:

```
bool AliasDb::writesToAlias(Node* n, const ValueSet& vs) const {
  const auto writtenTo = getWrites(n);
  if (writtenTo.empty()) {
    return false;
  }

  MemoryLocations locs;
  for (const auto v : vs) {
    auto it = elementMap_.find(v);
    if (it != elementMap_.end()) {
      const auto& vlocs = memoryDAG_->getMemoryLocations(it->second);
      if (writtenTo.intersects(vlocs)) {
        return true;
      }
    }
  }

  return false;
}
```

In the DCE use case, we have a ValueSet of live values, into which we insert `Value*`s; and sometimes need to check whether a node mutates any of the live values using `writesToAlias`.

Looping through all the values in the ValueSet and indexing into the elementMap_ is slow; so if we can pre-compute the MemoryLocations set, this speeds up the function. In some large model examples, I see ~15-25x speedups from this change.

**Implementation**: To avoid exposing too many details of AliasDb, I introduce a friend class `ValueAndMemoryLocationSet`, which is an insert-only set of Values, which also maintains the corresponding MemoryLocations.

Then in AliasDb, I use `ValueAndMemoryLocationSet` if we're dealing with

Test Plan: Rely on unit tests.

Differential Revision: D74827086
…ue*> (pytorch#153645)

Summary:
Pull Request resolved: pytorch#153645

**TL;DR**: make DCE faster by replacing a Set<Value*> with a MemoryLocations sparse bitset (representing all the memory locations stored by the collection of all values in the set).

**Details**
The goal of this PR is to optimize this function from AliasDb:

```
bool AliasDb::writesToAlias(Node* n, const ValueSet& vs) const {
  const auto writtenTo = getWrites(n);
  if (writtenTo.empty()) {
    return false;
  }

  MemoryLocations locs;
  for (const auto v : vs) {
    auto it = elementMap_.find(v);
    if (it != elementMap_.end()) {
      const auto& vlocs = memoryDAG_->getMemoryLocations(it->second);
      if (writtenTo.intersects(vlocs)) {
        return true;
      }
    }
  }

  return false;
}
```

In the DCE use case, we have a ValueSet of live values, into which we insert `Value*`s; and sometimes need to check whether a node mutates any of the live values using `writesToAlias`.

Looping through all the values in the Value
BCE6
Set and indexing into the elementMap_ is slow; so if we can pre-compute the MemoryLocations set, this speeds up the function. In some large model examples, I see ~15-25x speedups from this change.

**Implementation**: To avoid exposing too many details of AliasDb, I introduce a friend class `ValueAndMemoryLocationSet`, which is an insert-only set of Values, which also maintains the corresponding MemoryLocations.

Then in AliasDb, I use `ValueAndMemoryLocationSet` if we're dealing with

Test Plan: Rely on unit tests.

Differential Revision: D74827086
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74827086

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request fb-exported oncall: jit Add this issue/PR to JIT oncall triage queue release notes: jit release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0