8000 Reuse Regexp ptr when recompiling · luke-gruber/ruby@d42b9ff · GitHub
[go: up one dir, main page]

Skip to content

Commit d42b9ff

Browse files
committed
Reuse Regexp ptr when recompiling
When matching an incompatible encoding, the Regexp needs to recompile. If `usecnt == 0`, then we can reuse the `ptr` because nothing else is using it. This avoids allocating another `regex_t`. This speeds up matches that switch to incompatible encodings by 15%. Branch: ``` Regex#match? with different encoding 1.431M (± 1.3%) i/s - 7.264M in 5.076153s Regex#match? with same encoding 16.858M (± 1.1%) i/s - 85.347M in 5.063279s ``` Base: ``` Regex#match? with different encoding 1.248M (± 2.0%) i/s - 6.342M in 5.083151s Regex#match? with same encoding 16.377M (± 1.1%) i/s - 82.519M in 5.039504s ``` Script: ``` regex = /foo/ str1 = "日本語" str2 = "English".force_encoding("ASCII-8BIT") Benchmark.ips do |x| x.report("Regex#match? with different encoding") do |times| i = 0 while i < times regex.match?(str1) regex.match?(str2) i += 1 end end x.report("Regex#match? with same encoding") do |times| i = 0 while i < times regex.match?(str1) i += 1 end end end ```
1 parent a542512 commit d42b9ff

File tree

1 file changed

+25
-10
lines changed

1 file changed

+25
-10
lines changed

re.c

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1606,9 +1606,30 @@ rb_reg_prepare_re(VALUE re, VALUE str)
16061606
const char *ptr;
16071607
long len;
16081608
RSTRING_GETMEM(unescaped, ptr, len);
1609-
r = onig_new(&reg, (UChar *)ptr, (UChar *)(ptr + len),
1610-
reg->options, enc,
1611-
OnigDefaultSyntax, &einfo);
1609+
1610+
/* If there are no other users of this regex, then we can directly overwrite it. */
1611+
if (RREGEXP(re)->usecnt == 0) {
1612+
regex_t tmp_reg;
1613+
r = onig_new_without_alloc(&tmp_reg, (UChar *)ptr, (UChar *)(ptr + len),
1614+
reg->options, enc,
1615+
OnigDefaultSyntax, &einfo);
1616+
1617+
if (r) {
1618+
/* There was an error so perform cleanups. */
1619+
onig_free_body(&tmp_reg);
1620+
}
1621+
else {
1622+
onig_free_body(reg);
1623+
/* There are no errors so set reg to tmp_reg. */
1624+
*reg = tmp_reg;
1625+
}
1626+
}
1627+
else {
1628+
r = onig_new(&reg, (UChar *)ptr, (UChar *)(ptr + len),
1629+
reg->options, enc,
1630+
OnigDefaultSyntax, &einfo);
1631+
}
1632+
16121633
if (r) {
16131634
onig_error_code_to_str((UChar*)err, r, &einfo);
16141635
rb_reg_raise(pattern, RREGEXP_SRC_LEN(re), err, re);
@@ -1634,13 +1655,7 @@ rb_reg_onig_match(VALUE re, VALUE str,
16341655

16351656
if (!tmpreg) RREGEXP(re)->usecnt--;
16361657
if (tmpreg) {
1637-
if (RREGEXP(re)->usecnt) {
1638-
onig_free(reg);
1639-
}
1640-
else {
1641-
onig_free(RREGEXP_PTR(re));
1642-
RREGEXP_PTR(re) = reg;
1643-
}
1658+
onig_free(reg);
16441659
}
16451660

16461661
if (result < 0) {

0 commit comments

Comments
 (0)
0