[CIR][ThroughMLIR] Improving eraseIfSafe and canonical ForOp lowering #1660

felixdaas · 2025-06-02T15:57:11Z

Improving eraseIfSafe:

as far as I understand it eraseIfSafe should intuativly check if all memref load/store ops are created, which obtain offsets from the memref.reinterpret_cast in the eraseList. If so the operations in the eraseList are erased, otherwise they are kept until all cir.load/store ops relying on them are lowered.

One challenge here is that we can't actually do this by checking the uses of memref.reinterpret_cast operations, as their results aren't actually used in the created memref load/store ops (the base alloca result found via findBaseAndIndices is used). Because of this, this base alloca result is passed as the newAddr Value to eraseIfSafe in the cir.load/cir.store lowerings.

Currently the eraseIfSafe function counts all memref.load/store values that use this base address:

clangir/clang/lib/CIR/Lowering/ThroughMLIR/LowerCIRToMLIR.cpp

Lines 215 to 218 in 6e5fa09

    
           for (auto *user : newAddr.getUsers()) { 
        
             if (isa<mlir::memref::LoadOp>(*user) || isa<mlir::memref::StoreOp>(*user)) 
        
               ++newUsedNum; 
        
           }

The problem here is that this also counts all the other memref.load/store ops, which store/load to/from the base address, but don't use the memref.reinterpret_cast ops to obtain the offsets. Because of this the lowering fails if multiple store/loads to/from the same array are performed in the original C code as in the example of issue #1405. Because we are erroneously counting all the other memref.load/store ops, the newUsedNum is (for the later stores) larger than oldUsedNum (only the uses of the cir.ptr_stride op) and the memref.reinterpret_cast ops are not removed.

This PR contains a first attempt to fix this (i.e only count the memref.load/store ops, which obtain offsets from the memref.reinterpret_cast in the eraseList). I only count memref.load/store ops, if the first offset value, corresponds to the offset value in the last memref.reinterpret_cast.

Limitations of this PR:
This fixes the indirect lowering of the example in issue #1405 and also works for other test I made where multiple store/loads to/from the same array, but assumes two thing to be the case:

The cir.const used as the stride in the cir.ptr_str is not reused in other cir.ptr_stride ops
Only the last cir.ptr_stride can have multiple uses (for multidim arrays)

Both of these assumptions seem to be true for the C-Code I testet (for the translation of accesses to C/C++ Arrays to cir ops). But the eraseIfSafe function might need to be changed/further improved in the future to support cases, where those assumptions fail.

For example if an optimization is run on cir where the cir.const ops with the same value are reused for the different cir.ptr_stride ops, the indirect lowering would still fail. Or if in a multidimensional array a subarray is accessed, e.g.

int arr[3][4];
int *row = arr[1];

(Note: I pretty sure for this it isn't suffiicient to just extend the function to check if all offset value, corre 8000 sponds to the offset value in the all memref.reinterpret_cast, but we would probably need to seperatly check for each memref.reinterpret_cast if it can be removed (instead of removing all or non in the eraseList))

Improving the lowering of canonical ForOps:

While debugging issue #1405 I noticed a few thing that I think could be improved in the canonical ForOp lowering:

There is one edge case, where the forOp should not be marked as canonical in my opinion:

  int i;
  for (i = 0; i < 100; i++);
  i += 10;

(with the current lowering this for is marked canonical but since i is replaced by the induction var of the scf.for op and the actual memory representing i is not updated i has a wrong value after the for. This is avoided when we lower this for as a non canonical for.)
2. I think we can directly replace the loads to the CIR.IV with the scf.IV and not create the dummy arith.add IV, 0 op (I think this might be a relic from previous MLIR version where the replaceOp only worked with operations (not values). This make the IR more readable and easier to understand. If I'm missing somethink here and the arith.add IV, 0 has a purpose I'm not seeing let me know.
3. When implementing the change in 1, we know that in a canonical for the induction variable is definied inside the for and is only valid here. Because of this and since we replace the loads of the cir IV with the scf.IV we can remove the unneccessary alloca and store op created for the cir.IV

(These changes only show up in an non-optimized binary, but aren't relevant when running clang with optimations, I still think they improve the readability + understandability of the core ir)

Running SCFPreparePass cir pass always when lowering throughMLIR:

I also noticed, that we are currently only running the SCFPreparePass when we are printing the result of the cir to core dialect translation.

clangir/clang/lib/CIR/CodeGen/CIRPasses.cpp

Lines 84 to 85 in 6e5fa09

    
           if (emitCore) 
        
             pm.addPass(mlir::createSCFPreparePass());

Because of this compiling to an object file (or llvm IR) with the indirect lowering path fails, if the code contains a canonical for. I suggest always running this pass, when were going throughMLIR.

Passing through is_nontemporal in loads/store lowerings:

Since the corresponding memref ops also have this attribute it's basically just passing through a boolean (and doesn't need any special handling, I think). Even tho there is probably no practical application now I think this might avoid bugs/confusion in the future. If there is any reason not to do this let me know.

I also added a new test case for arrays, adjusted the canonical forOp test to reflect the made changes and combined the non canonical forOp tests into one file and added a test case for the edge case describe before.

(Note: if I find the time I will try to run the SingleSource test suite with the throughMLIR lowering in the next week to get a better idea, where we are with this pipeline. In general I agree with everything discussed in issue #1219, but I think we probably can already add more support in regard to arrays (and maybe pointers) with the existing mlir core constructs)

github-actions · 2025-06-02T16:00:21Z

✅ With the latest revision this PR passed the C/C++ code formatter.

bcardosolopes · 2025-06-04T21:53:38Z

For some reasons we got one canceled tests, can you rebase so we can run those again?

…l forOp lowering (TODO: cleanup + write tests)

…r tests

felixdaas · 2025-06-06T17:13:23Z

So I did run the SingleSource test suite with the throughMLIR lowering (I just followed the guide at the clang ir website and also added a CMake flag to enable the indirect lowering).

I got the following results:
before this PR:

Testing Time: 3.61s

Total Discovered Tests: 1833
Passed : 374 (20.40%)
Failed : 11 (0.60%)
Executable Missing: 1448 (79.00%)

with this PR:

Testing Time: 4.00s

Total Discovered Tests: 1833
Passed : 420 (22.91%)
Failed : 10 (0.55%)
Executable Missing: 1403 (76.54%)

I did check a few of the tests that failed before the pr and are now working manually (just because i wasn't to sure if I'm testing it correctly) and for those it checked out (i.e they compiled with the code of this PR and not with the code before)

One other thing I noticed which (to some extend) explains why so many executables can't be compiled with the indirect lowering is (besides that is it still at an early stage), that every executable which contains a printf call fails to compile since the func.func ops currenctly can't work with variadic functions.

So I'm pretty sure these test results are realistic and I'm testing it correctly.

bcardosolopes · 2025-06-09T14:02:38Z

Thanks for the test numbers, way to go! it'd probably be nice to put this on clangir.org too

felixdaas · 2025-06-14T17:32:01Z

it'd probably be nice to put this on clangir.org too

Sounds good. So I would do this by creating a PR to the gh-pages repo, right? I might take a look at that sometime the next weeks.

felixdaas force-pushed the arraySupport branch from 846858d to d1ef2e2 Compare June 3, 2025 13:25

felixdaas marked this pull request as ready for review June 3, 2025 13:52

felixdaas requested review from lanza and bcardosolopes as code owners June 3, 2025 13:52

bcardosolopes approved these changes Jun 4, 2025

View reviewed changes

felixdaas added 5 commits June 5, 2025 16:42

pass nonTemporal Attribute in cir.store to memref.store lowering

83b9de8

pass nonTemporal Attribute in cir.load to memref.load lowering

975893e

first version of fix for eraseifsafe + small improvements to canonica…

a81b1e9

…l forOp lowering (TODO: cleanup + write tests)

improvement to canonical forOp lowering + added/improved new array/fo…

78a81f8

…r tests

run SCFPreparePass cir pass always when lowering throughMLIR

3963265

felixdaas force-pushed the arraySupport branch from d1ef2e2 to 3963265 Compare June 5, 2025 14:59

bcardosolopes merged commit 2afbab5 into llvm:main Jun 9, 2025
9 checks passed

felixdaas deleted the arraySupport branch July 22, 2025 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CIR][ThroughMLIR] Improving eraseIfSafe and canonical ForOp lowering #1660

[CIR][ThroughMLIR] Improving eraseIfSafe and canonical ForOp lowering #1660

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	for (auto *user : newAddr.getUsers()) {
	if (isa<mlir::memref::LoadOp>(user) \|\| isa<mlir::memref::StoreOp>(user))
	++newUsedNum;
	}

[CIR][ThroughMLIR] Improving eraseIfSafe and canonical ForOp lowering #1660

[CIR][ThroughMLIR] Improving eraseIfSafe and canonical ForOp lowering #1660

Uh oh!

Conversation

Uh oh!

Improving eraseIfSafe:

Improving the lowering of canonical ForOps:

Running SCFPreparePass cir pass always when lowering throughMLIR:

Passing through is_nontemporal in loads/store lowerings:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!