8000 Inline CORINFO_HELP_ARRADDR_ST helper call, remove WriteBarrier FCall by EgorBo · Pull Request #117583 · dotnet/runtime · GitHub
[go: up one dir, main page]

Skip to content

Inline CORINFO_HELP_ARRADDR_ST helper call, remove WriteBarrier FCall#117583

Open
EgorBo wants to merge 23 commits intodotnet:mainfrom
EgorBo:remove-wb-fcall
Open

Inline CORINFO_HELP_ARRADDR_ST helper call, remove WriteBarrier FCall#117583
EgorBo wants to merge 23 commits intodotnet:mainfrom
EgorBo:remove-wb-fcall

Conversation

@EgorBo
Copy link
Member
@EgorBo EgorBo commented Jul 13, 2025

Inline write barriers with covariance checks (in case if inliner decides that it's profitable) - this allows to remove redundant write barriers (previously we could only do that in early phases) and range checks (previously they were inside the helper call and now JIT can fold them).

Related: #9159

Benchmark

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);

public class Benchmarks
{
    static object[] _strings = new string[4];
    static object _x = "";

    [Benchmark]
    public void AssignString()
    {
        var arr = _strings;
        if (arr.Length >= 4)
        {
            // We now can remove write barriers and redundant range checks if StelemRef gets inlined
            arr[0] = "";
            arr[1] = "";
            arr[2] = "";
            arr[3] = "";
        }
    }

    [Benchmark]
    public void SwapElements()
    {
        var arr = _strings;
        (arr[1], arr[0]) = (arr[0], arr[1]);
    }

    [Benchmark]
    public void SingleAssignmentCns()
    {
        _strings[0] = "";
    }

    [Benchmark]
    public void SingleAssignmentVar()
    {
        _strings[0] = _x;
    }
}

Linux-AMD (Genoa):

Method Toolchain Mean Error Ratio
AssignString Main 7.5633 ns 0.0022 ns 1.00
AssignString PR 1.3160 ns 0.0033 ns 0.17
SwapElements Main 2.9979 ns 0.0008 ns 1.00
SwapElements PR 1.0471 ns 0.0012 ns 0.35
SingleAssignmentCns Main 1.9075 ns 0.0005 ns 1.00
SingleAssignmentCns PR 0.2723 ns 0.0002 ns 0.14
SingleAssignmentVar Main 1.9080 ns 0.0003 ns 1.00
SingleAssignmentVar PR 1.6356 ns 0.0005 ns 0.86

Linux-ARM64 (Cobalt100):

Method Toolchain Mean Error Ratio
AssignString Main 10.3296 ns 0.0032 ns 1.00
AssignString PR 1.8738 ns 0.0006 ns 0.18
SwapElements Main 2.6604 ns 0.0012 ns 1.00
SwapElements PR 1.1768 ns 0.0024 ns 0.44
SingleAssignmentCns Main 1.9655 ns 0.0023 ns 1.00
SingleAssignmentCns PR 0.3821 ns 0.0003 ns 0.19
SingleAssignmentVar Main 1.8030 ns 0.0013 ns 1.00
SingleAssignmentVar PR 0.7219 ns 0.0006 ns 0.40

@EgorBo
Copy link
Member Author
EgorBo commented Jul 13, 2025

@MihuBot

@EgorBo

This comment was marked as resolved.

@jkotas
Copy link
Member
jkotas commented Jul 13, 2025

JIT should still see that the target is on the heap

Right, you would have to fix the JIT to be able to prove this in order to make this change without introducing regressions.

we might in the future, so the barrier will need to be made checked.

The barrier needs to be made checked when executed against stack allocated copy. It would be tough tradeoff to de-optimize all write barriers to enable more stack allocation.

@EgorBo
Copy link
Member Author
EgorBo commented Jul 13, 2025

Right, you would have to fix the JIT to be able to prove this in order to make this change without introducing regressions.

I believe I've already done that in this PR. The previous logic was a bit conservative - it tried to handle only knownHeapAddr + cns while in fact I think we can safely say that if we have object assignment for knownHeapAddr + anything we can use unchecked WB too (it's UB if the destination is not on the heap).

@EgorBo
Copy link
Member Author
EgorBo commented Jul 13, 2025

Although, this one can't be made unchecked by the JIT due to NoInlining:

image

And if I inline it, then StelemRef becomes too big and I was planning to inline it in the JIT (although, not sure, might be inlined in fact)

@EgorBo
Copy link
Member Author
EgorBo commented Jul 13, 2025

Ugh. another problem is that JIT never does tail-calling for helper-calls 😢

image

@EgorBo
Copy link
Member Author
EgorBo commented Jul 13, 2025

@MihuBot -arm

@EgorBo EgorBo changed the title Remove WriteBarrier FCall Inline CORINFO_HELP_ARRADDR_ST helper call, remove WriteBarrier FCall Jul 13, 2025
@EgorBo
Copy link
Member Author
EgorBo commented Jul 13, 2025

@EgorBot -amd -arm -windows_intel

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);

public class Benchmarks
{
    static object[] _strings = new string[4];

    [Benchmark]
    public void AssignString()
    {
        var arr = _strings;
        if (arr.Length >= 4)
        {
            arr[0] = "";
            arr[1] = "";
            arr[2] = "";
            arr[3] = "";
        }
    }

    [Benchmark]
    public void SwapElements()
    {
        var arr = _strings;
        (arr[1], arr[0]) = (arr[0], arr[1]);
    }

    [Benchmark]
    public void SingleAssignment()
    {
        _strings[0] = "";
    }
}

@EgorBo EgorBo force-pushed the remove-wb-fcall branch from db666a2 to 872e969 Compare July 16, 2025 01:52
…timeHelpers.cs

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@jkotas
Copy link
Member
jkotas commented Sep 3, 2025

it is fixed by this change, once it inlines, it is smart enough to handle that, e.g example from that issue:

What's the code for this method with this change?

[MethodImpl(MethodImplOptions.NoInlining)]
void Swap(object[] a, int i, int j)
{
    object temp = a[i];
    a[i] = a[j];        // store check optimized by jit to simple write barrier
    a[j] = temp;        // store check not optimized
}

There are two array store invariance checks in this method. One of them is optimized before this change. Is this change enabling the second one to be optimized (the improvement tracked by #9159)? I do not see what makes it possible.

@EgorBo
Copy link
Member Author
EgorBo commented Sep 3, 2025

What's the code for this method with this change?

The codegen (win-x64) for this method is:

G_M39739_IG01:  ;; offset=0x0000
       push     rdi
       push     rsi
       push     rbp
       push     rbx
       sub      rsp, 40
       mov      rbx, rcx
       mov      esi, r8d
						;; size=14 bbWeight=1 PerfScore 4.75
G_M39739_IG02:  ;; offset=0x000E
       mov      edi, dword ptr [rbx+0x08]
       cmp      edx, edi
       jae      SHORT G_M39739_IG09
       mov      ecx, edx
       mov      rbp, gword ptr [rbx+8*rcx+0x10]
       cmp      esi, edi
       jae      SHORT G_M39739_IG09
       mov      edx, esi
       mov      rdx, gword ptr [rbx+8*rdx+0x10]
       lea      rcx, bword ptr [rbx+8*rcx+0x10]
       call     CORINFO_HELP_ASSIGN_REF
       mov      ecx, esi
       mov      edx, edi
       cmp      rdx, rcx
       jbe      SHORT G_M39739_IG06
       lea      rcx, bword ptr [rbx+8*rcx+0x10]
       mov      rdx, qword ptr [rbx]
       mov      rdx, qword ptr [rdx+0x30]
       test     rbp, rbp
       je       SHORT G_M39739_IG07
       cmp      rdx, qword ptr [rbp]
       jne      SHORT G_M39739_IG08
						;; size=67 bbWeight=1 PerfScore 23.00
G_M39739_IG03:  ;; offset=0x0051
       mov      rdx, rbp
       call     CORINFO_HELP_ASSIGN_REF
						;; size=8 bbWeight=1 PerfScore 1.25
G_M39739_IG04:  ;; offset=0x0059
       nop      
						;; size=1 bbWeight=1 PerfScore 0.25
G_M39739_IG05:  ;; offset=0x005A
       add      rsp, 40
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       ret      
						;; size=9 bbWeight=1 PerfScore 3.25
G_M39739_IG06:  ;; offset=0x0063
       call     [System.Runtime.CompilerServices.CastHelpers:ThrowIndexOutOfRangeException()]
       int3     
						;; size=7 bbWeight=0 PerfScore 0.00
G_M39739_IG07:  ;; offset=0x006A
       xor      rax, rax
       mov      gword ptr [rcx], rax
       jmp      SHORT G_M39739_IG04
						;; size=7 bbWeight=0 PerfScore 0.00
G_M39739_IG08:  ;; offset=0x0071
       mov      r8, 0x7FF8FA732308      ; System.Object[]
       cmp      qword ptr [rbx], r8
       je       SHORT G_M39739_IG03
       mov      r8, rbp
       call     [System.Runtime.CompilerServices.CastHelpers:StelemRef_Helper(byref,ptr,System.Object)]
       jmp      SHORT G_M39739_IG04
						;; size=26 bbWeight=0 PerfScore 0.00
G_M39739_IG09:  ;; offset=0x008B
       call     CORINFO_HELP_RNGCHKFAIL
       int3     
						;; size=6 bbWeight=0 PerfScore 0.00

; Total bytes of code 145

@EgorBo
Copy link
Member Author
EgorBo commented Sep 3, 2025

I guess it's not entirely eliminated and the if (elementType != RuntimeHelpers.GetMethodTable(obj)) check and probably some things can be better CSE'd but it makes it 2-3x faster in microbenchmarks. The element type check might require teaching JIT about GetMethodTable(array)->ElementType field

@jkotas
Copy link
Member
jkotas commented Sep 3, 2025

In order to eliminate the second invariant array store check, the JIT would have to figure out what the value came from the same array. Helper inlining won't help you with that.

I do not think that this PR is fixing #9159 that's specifically about eliminating invariant array store checks in code that is swapping elements of an array. Invariant array store checks can be very expensive depending on the type of the array and type of the item being stored into the array. It is why it is interesting to eliminate them.

@jkotas
Copy link
Member
jkotas commented Sep 3, 2025

(The idea behind this change sounds reasonable to me otherwise.)

@EgorBo
Copy link
Member Author
EgorBo commented Sep 3, 2025

I do not think that this PR is fixing #9159

You're right I just misremembered that it did fix it when I was working on this. Changed the wording.

Copy link
Member
@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

EgorBo and others added 2 commits September 5, 2025 13:21
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
EgorBo and others added 2 commits March 13, 2026 03:52
# Conflicts:
#	src/coreclr/jit/namedintrinsiclist.h
#	src/coreclr/vm/ecalllist.h
#	src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.cs
The WriteBarrier_Helper define and all callers were removed in this PR,
so this PORTABILITY_ASSERT stub is now dead code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4D1C
Copilot AI review requested due to automatic review settings March 13, 2026 03:15
Copy link
Contributor
Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.

BEA4

Comment on lines +3645 to +3650
case NI_System_Runtime_CompilerServices_RuntimeHelpers_WriteBarrier:
{
GenTree* val = impPopStack().val;
GenTree* dst = impPopStack().val;
retNode = gtNewStoreIndNode(TYP_REF, dst, val, GTF_IND_TGT_HEAP);
break;
}

InternalCalls.RhpAssignRef(ref element, obj);
RuntimeHelpers.WriteBarrier(ref element, obj);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants

0