8000 [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 by jayfoad · Pull Request #105550 · llvm/llvm-project · GitHub
[go: up one dir, main page]

Skip to content

Conversation

jayfoad
Copy link
Contributor
@jayfoad jayfoad commented Aug 21, 2024

When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.

Copy link
Contributor Author
jayfoad commented Aug 21, 2024

@jayfoad jayfoad marked this pull request as ready for review August 21, 2024 16:43
@llvmbot
Copy link
Member
llvmbot commented Aug 21, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.


Full diff: https://github.com/llvm/llvm-project/pull/105550.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir (+5-5)
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 4262e7b5d9c25..eafe20be17d5b 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -2390,7 +2390,7 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML,
   }
   if (!ST->hasVscnt() && HasVMemStore && !HasVMemLoad && UsesVgprLoadedOutside)
     return true;
-  return HasVMemLoad && UsesVgprLoadedOutside;
+  return HasVMemLoad && UsesVgprLoadedOutside && ST->hasVmemWriteVgprInOrder();
 }
 
 bool SIInsertWaitcnts::runOnMachineFunction(MachineFunction &MF) {
diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
index bdef55ab956a0..0ddd2aa285b26 100644
--- a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
+++ b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
@@ -295,7 +295,7 @@ body:             |
 # GFX12-LABEL: waitcnt_vm_loop2
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
@@ -342,7 +342,7 @@ body:             |
 # GFX12-LABEL: waitcnt_vm_loop2_store
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
@@ -499,9 +499,9 @@ body:             |
 # GFX12-LABEL: waitcnt_vm_loop2_reginterval
 # GFX12-LABEL: bb.0:
 # GFX12: GLOBAL_LOAD_DWORDX4
-# GFX12: S_WAIT_LOADCNT 0
-# GFX12-LABEL: bb.1:
 # GFX12-NOT: S_WAIT_LOADCNT 0
+# GFX12-LABEL: bb.1:
+# GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
 name:            waitcnt_vm_loop2_reginterval
 body:             |
@@ -600,7 +600,7 @@ body:             |
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:

@jayfoad jayfoad force-pushed the users/foad/vmem-write-vgpr-in-order_split branch from 9a2103d to c3cbf18 Compare August 22, 2024 10:43
Base automatically changed from users/foad/vmem-write-vgpr-in-order_split to main August 22, 2024 10:46
@jayfoad jayfoad force-pushed the users/foad/vmem-write-vgpr-in-order_split_split branch from e53f758 to 283d345 Compare August 22, 2024 10:48
When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.
@jayfoad jayfoad force-pushed the users/foad/vmem-write-vgpr-in-order_split_split branch from 283d345 to ba06857 Compare August 23, 2024 08:30
@jayfoad jayfoad merged commit fa2dccb into main Aug 23, 2024
@jayfoad jayfoad deleted the users/foad/vmem-write-vgpr-in-order_split_split branch August 23, 2024 09:31
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Aug 23, 2024
…m#105550)

When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.

(cherry picked from commit fa2dccb)
cjdb pushed a commit to cjdb/llvm-project that referenced this pull request Aug 23, 2024
…m#105550)

When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.
tru pushed a commit to llvmbot/llvm-project that referenced this pull request Sep 1, 2024
…m#105550)

When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.

(cherry picked from commit fa2dccb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

0