8000 String#gsub! Elide MatchData allocation when we know it can't escape by byroot · Pull Request #12801 · ruby/ruby · GitHub
[go: up one dir, main page]

Skip to content

String#gsub! Elide MatchData allocation when we know it can't escape #12801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account rel 8000 ated emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 24, 2025

Conversation

byroot
Copy link
Member
@byroot byroot commented Feb 24, 2025

In gsub is used with a string replacement or a map that doesn't have a default proc, we know for sure no code can cause the MatchData to escape the gsub call.

In such case, we still have to allocate a new MatchData because we don't know what is the lifetime of the backref, but for any subsequent match we can re-use the MatchData we allocated ourselves, reducing allocations significantly.

This partially fixes [Misc #20652], except when a block is used, and partially reduce the performance impact of
abc0304cb28cb9dcc3476993bc487884c139fd11 / [Bug #17507]

compare-ruby: ruby 3.5.0dev (2025-02-24T09:44:57Z master 5cf146399f) +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-02-24T10:58:27Z gsub-elude-match da966636e9) +PRISM [arm64-darwin24]
warming up....

|                 |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|escape           |      3.577k|    3.697k|
|                 |           -|     1.03x|
|escape_bin       |      5.869k|    6.743k|
|                 |           -|     1.15x|
|escape_utf8      |      3.448k|    3.738k|
|                 |           -|     1.08x|
|escape_utf8_bin  |      6.361k|    7.267k|
|                 |           -|     1.14x|

@byroot byroot requested a review from jeremyevans February 24, 2025 11:09
In gsub is used with a string replacement or a map that doesn't
have a default proc, we know for sure no code can cause the MatchData
to escape the `gsub` call.

In such case, we still have to allocate a new MatchData because we
don't know what is the lifetime of the backref, but for any subsequent
match we can re-use the MatchData we allocated ourselves, reducing
allocations significantly.

This partially fixes [Misc #20652], except when a block is used,
and partially reduce the performance impact of
abc0304 / [Bug #17507]

```
compare-ruby: ruby 3.5.0dev (2025-02-24T09:44:57Z master 5cf1463) +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-02-24T10:58:27Z gsub-elude-match da966636e9) +PRISM [arm64-darwin24]
warming up....

|                 |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|escape           |      3.577k|    3.697k|
|                 |           -|     1.03x|
|escape_bin       |      5.869k|    6.743k|
|                 |           -|     1.15x|
|escape_utf8      |      3.448k|    3.738k|
|                 |           -|     1.08x|
|escape_utf8_bin  |      6.361k|    7.267k|
|                 |           -|     1.14x|
```

Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
@byroot
Copy link
Member Author
byroot commented Feb 24, 2025

In such case, we still have to allocate a new MatchData because we don't know what is the lifetime of the backref,

I just figured out that the MATCH_BUSY flag tells us whether $~ was accessed or not, so I think we can do 0 alloc in most cases.

Copy link
Contributor
@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I believe this optimization is safe for the reasons you mention, and the use of MATCH_BUSY should hopefully allow this to apply to most gsub/gsub! calls, as well as other methods like scan.

@byroot byroot merged commit 97e6ad4 into ruby:master Feb 24, 2025
75 checks passed
@byroot byroot deleted the gsub-elude-match branch February 24, 2025 17:32
peterzhu2118 added a commit to peterzhu2118/ruby that referenced this pull request Mar 11, 2025
ruby#12801 changed regexp matches to reuse
the backref, which causes memory to leak if the original registers of the
match is not freed.

For example, the following script leaks memory:

    10.times do
      1_000_000.times do
        "aaaaaaaaaaa".gsub(/a/, "")
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

    774256
    1535152
    2297360
    3059280
    3821296
    4583552
    5160304
    5091456
    5114256
    4980192

After:

    12480
    11440
    11696
    11632
    11632
    11760
    11824
    11824
    11824
    11888
peterzhu2118 added a commit that referenced this pull request Mar 12, 2025
#12801 changed regexp matches to reuse
the backref, which causes memory to leak if the original registers of the
match is not freed.

For example, the following script leaks memory:

    10.times do
      1_000_000.times do
        "aaaaaaaaaaa".gsub(/a/, "")
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

    774256
    1535152
    2297360
    3059280
    3821296
    4583552
    5160304
    5091456
    5114256
    4980192

After:

    12480
    11440
    11696
    11632
    11632
    11760
    11824
    11824
    11824
    11888
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0