Fix73878 by zahiraam · Pull Request #21499 · intel/llvm

zahiraam · 2026-03-12T14:27:31Z

No description provided.

No behavior change.

The user can now manually toggle the light or dark theme instead of waiting for the system theme to change. Also fixes a typo that caused some overflow issues even when there was no content to cause an overflow.

…#182896) Current tooling for the WebAssembly component model uses import modules and names such as `$root` and `[thread-index]`. Importing these from assembly files requires support for non-valid identifiers in `.import_name` and `.import_module` directives. This PR adds support for specifying those as strings, e.g.: ```asm .import_module __wasm_component_model_builtin_thread_index, "$root" .import_name __wasm_component_model_builtin_thread_index, "[thread-index]" ```

The tests for mlir-reduce are currently scattered. To centralize the tests for mlir-reduce, I added the split-input-file feature to mlir-reduce.It is part of llvm/llvm-project#184974.

…ass` switch (#185072) This removes the `wasm-disable-fix-irreducible-control-flow-pass` switch. It was originally added in #67715 as a way to avoid the potentially absurd compile times the pass used to bring. However with the successful merge of #184441, the pass itself has been fixed to avoid this issue. Given that, it is no longer necessary nor desirable to keep this switch.

This is a prerequisite for full ARM64 Windows ASan support. The runtime interception changes needed to make ASan functional end-to-end on ARM64 Windows will be opened separately. Motivated by microsoft/STL#6095 (more specifically [this reference to clang-cl](microsoft/STL#6095.)) The latest MSVC toolset includes ARM64 AddressSanitizer support. This change adds AArch64 to the Windows 64-bit shadow mapping condition when compiling with `-fsanitize=address` with `clang-cl`. Without this, consumers on Windows who target ARM64 with `clang-cl -fsanitize=address` and then link with `link.exe` will see this at runtime: ```text ERROR: AddressSanitizer: access-violation on unknown address ... ``` since the shadow memory offset is not properly assigned. Windows ARM64 uses the same dynamic shadow allocation strategy as x64 via `__asan_shadow_memory_dynamic_address`.

We have to materialize `fir.box` before adding a `fir.convert` to a memref type. Otherwise we get: `'fir.convert' op invalid type conversion'!fir.box<!fir.array<?xi32>>' / 'memref<?xi32, strided<[?], offset: ?>>'`

Reverts llvm/llvm-project#182532 to unblock CI. The original patch causes some test failures related to undef bits, as it incorrectly assumes `std::uniform_int_distribution` returns the same result with different C++ stdlib vendors.

When an op's assembly format prints an attribute via `printStrippedAttrOrType`, two independent space-emission mechanisms would fire: the op format generator emits a space before each argument, and the attribute's generated `print` method also emits a leading space (`shouldEmitSpace` initialized to true). This caused double spaces like `gpu.shuffle xor`. The usual workaround for this was to add double backticks to consume the leading space. Fixed by removing the leading space from generated attr/type `print()` methods and compensating in the print dispatcher by conditionally adding a space between the mnemonic and `print` call when the format starts with a name or keyword rather than punctuation. Also remove some workarounds for the double-spacing in op formats and fix tests that now don't have leading spaces. Assisted-by: claude

…lvm.matrix.multiply` (#184882) Fixes #99138 - Defines a `__builtin_hlsl_mul` clang builtin in `Builtins.td`. - Links the `__builtin_hlsl_mul` clang builtin with `hlsl_alias_intrinsics.h` under the name `mul` for matrix cases - Implement scalar and vector elementwise multiplication cases of the `mul` function in `hlsl_intrinsics.h` and `hlsl_intrinsic_helpers.h` - Adds sema for `__builtin_hlsl_mul` to `CheckBuiltinFunctionCall` in `SemaHLSL.cpp` - Adds codegen for `__builtin_hlsl_mul` to `EmitHLSLBuiltinExpr` in `CGHLSLBuiltins.cpp` - Vector-vector cases lower to `dot` (except double vectors, which expands to scalar multiply-adds). - Matrix-matrix, matrix-vector, and vector-matrix multiplication lower to the `llvm.matrix.multiply` intrinsic - Adds codegen tests to `clang/test/CodeGenHLSL/builtins/mul.hlsl` - Adds sema tests to `clang/test/SemaHLSL/BuiltIns/mul-errors.hlsl` - Implements lowering of the `llvm.matrix.multiply` intrinsic to DXIL in `DXILIntrinsicExpansion.cpp` Note: Currently the SPIRV backend does not support row-major matrix memory layouts when lowering matrix multiply, and just assumes column-major layout. Therefore this PR also makes the DirectX backend only assume column-major layout. Implementing support for row-major order shall be done in a separate PR. (Tracked by llvm/llvm-project#184906) This PR locally passes the `mul` offload tests in both DirectX 12 and Vulkan: llvm/offload-test-suite#941 Assisted-by: claude-opus-4.6

…nsfer op (#185106) Add an attribute to signal the presence of managed or unified symbols in the data transfer. In some case, the presence of such symbols require to insert synchronization. Adding the attribute in the op during lowering facilitate the recognition of such data transfer.

…s (#185078) They are not allowed by the HW.

This commit simplifies the cumbersome process of swapping the respective layout members for `__split_buffer` and `vector`.

…existingWaitcnts (#184925) The loop is collecting the first instruction of each waitcnt kind and is erasing the rest, with the exception of DEPCTR which needs more checks. The existing code was factoring out the instruction deletion and the setting of the collected instruction variables. But the special handling for DEPCTR and the in-loop deletion of `S_WAITCNT_lds_direct` was just complicating the logic.

…956)

Implement init support for reference type in the init catch param

…ayout (#184280) Fixes #183127 and #184371 This PR makes the matrix truncation cast implementation use the new matrix flattened index helper functions introduced by #182904 so that it reads elements from the source matrix using the default matrix memory layout instead of always assuming column-major order. This PR also fixes a bug where matrix truncation truncated the wrong elements. Assisted-by: claude-opus-4.6

Fix more typos in the AArch64 codebase using the https://github.com/crate-ci/typos Rust package. commit-id:9f4d826d Reviewers: davemgreen Pull Request: llvm/llvm-project#183086

Fix more typos in the AArch64 codebase using the https://github.com/crate-ci/typos Rust package. commit-id:33a1bb8d Reviewers: davemgreen Reviewed By: davemgreen Pull Request: llvm/llvm-project#183087

Reverts llvm/llvm-project#180102

This lets us find functions where we pessimize codegen by removing lifetimes. Reviewers: vitalybuka Reviewed By: vitalybuka Pull Request: llvm/llvm-project#183858

This PR replaces the Get*CallbackAtIndex pattern in the PluginManager with returning a snapshot of callbacks that the caller can iterate over using a range-based for loop. This is a continuation of #184452 which added thread safety by using snapshots. However, that introduced a bunch of unnecessary copies which are largely eliminated again by getting the snapshot once when gather all the callbacks, rather than doing that on each iteration when querying a plugin for a given index. It also eliminates the possibility of the snapshot changing underneath you when iterating over the plugins. This change was largely mechanical and I used Claude to do the menial work of updating the signatures and call sites.

…e Scanning (#183396) This PR fixes two issues of the in-memory buffer we use for the input file when a dependency scanner performs by-name queries. First, it renames the buffer. The temporary file was named `ScanningByName-%%%%%%%%.input`, which leads to weird diagnostics such as ``` ScanningByName-2d42a1e9.input:1:1: fatal error: could not build module 'X' ``` This PR changes the name of the file buffer, so we get diagnostics such as ``` module-include.input:1:1: fatal error: could not build module 'X' ``` which is more indicative. Additionally, this PR fixes a bug where the source location may overflow the temporary buffer by creating a 64k empty string which the temporary buffer occupies. When the source location overflows, the diagnostics could point to some random file that comes after the fake file and is incorrect. Currently, the maximum number of unique names from Apple's SDKs is around 3000. A 64k buffer per dependency scanning worker gives us around 20x capacity per worker (which scans fewer names than 3000 when the scanning is done in parallel). A fatal error is added to catch overflows.

…-bit targets (#181288) This PR optimizes 32-bit unsigned division by constants when the magic constant is 33 bits (IsAdd=true case in UnsignedDivisionByConstantInfo) on 64-bit targets. ## Overview Compiler optimization for constant division of `uint32_t` variables (such as `x / 7`) is based on the method proposed by Granlund and Montgomery in 1994 (hereafter referred to as the GM method). However, the GM method for the IsAdd=true case was optimized for 32-bit CPUs, not 64-bit CPUs. This patch provides optimizations specifically for 64-bit CPUs (such as x86_64 and Apple M-series). A simple benchmark demonstrates over 60% speedup on both Intel Xeon and Apple M4 processors. ## The GM Method The GM method for `x / 7` can be expressed in C code as follows, where the constants `c` and `a` are magic numbers determined by the divisor: ```cpp uint32_t udiv_original(uint32_t x) { uint64_t v = x * c; v >>= 32; uint32_t t = uint32_t(x) - uint32_t(v); t >>= 1; t += uint32_t(v); t >>= a - 33; return t; } ``` For example, division by 7 on x86_64 generates 7 instructions: ```asm movl %edi, %eax imulq $613566757, %rax, %rax shrq $32, %rax subl %eax, %edi shrl %edi addl %edi, %eax shrl $2, %eax ``` ## Proposed Solution This patch generates the following optimized code: ```cpp uint32_t udiv_optimized(uint32_t x) { uint128_t v = uint128_t(x) * ((c + 0x100000000) << (64 - a)); return uint32_t(v >> 64); } ``` Since a 64-bit right shift of a 128-bit variable extracts the upper 64 bits, this code eliminates the need for shifts after multiplication. The implementation pre-shifts the 33-bit magic constant `c = 2^32 + Magic` left by `(64-a)` bits and uses the high 64 bits of a 64 x 64 -> 128 bit multiplication directly. This eliminates the add/sub/shift sequence. After optimization, division by 7 becomes 4 instructions (or 3 with BMI2): ```asm # Standard (4 instructions) movl %edi, %eax movabsq $2635249153617166336, %rcx mulq %rcx movq %rdx, %rax # With BMI2 (3 instructions) movl %edi, %edx movabsq $2635249153617166336, %rax mulxq %rax, %rax, %rax ```

When input is zero or sign extended.

Most of the plugins have only a small number of instances. Use `llvm::SmallVector` instead of `std::vector`. Depends on llvm/llvm-project#184837

The default move constructor wasn't nulling out the callbacks. Combined with the fact that llvm::sys::DynamicLibrary has no explicit move constructor and hence library.isValid() still returned true after having moved-from, we would end up calling plugin_term_callback() when destroying the moved-from PluginInfo, calling it prematurely.

This header has a case sensitivity syntax error, delete it since it's unused

This patch relands llvm/llvm-project#182532. The original version causes test failures related undef bits since it incorrectly assumes `std::uniform_int_distribution` yields the same results across different stdlib vendors. This patch simply uses low bits to avoid the issue. I am not sure whether it still generates uniformly distributed random numbers. But abseil also uses this trick: https://github.com/abseil/abseil-cpp/blob/e72b94a2f257ba069ec0b99e557e9f1f6b9c1a3e/absl/random/uniform_int_distribution.h#L203-L206 I have confirmed all tests passed with libstdc++ and libc++. Original PR description: Bytes are adjusted to respect the incoming byte type proposed in https://discourse.llvm.org/t/rfc-add-a-new-byte-type-to-llvm-ir/89522. Note that the current implementation of constant folding doesn't handle bitcasts with weird types like `<8 x i3> to <3 x i8>`: https://github.com/llvm/llvm-project/blob/49de34459c99b33c27aae4596ed5e67ce3d89eae/llvm/lib/Analysis/ConstantFolding.cpp#L223-L232 So I take the result from Alive2 as reference: https://alive2.llvm.org/ce/z/ZGA_xP

…ses, NFC

… (#185209) I missed this one when figuring out all the field offsets.

If the scalars are reused and the ReuseShuffleIndices is set, we may miss matching for the buildvector/gather nodes and add an extra cost

One of the headers has a circular dependency issue that makes it not isolated ``` .../googletest/include/gtest/internal/custom/gtest-printers.h:53:12: error: no member named 'testing' in the global namespace 53 | *OS << ::testing::PrintToString(S.str()); | ~~^ ```

Only minor fixes, just staying up to date.

When building with the GNU driver, we would pass in `/DELAYLOAD:...` without indicating that this is a linker flag. `clang` does not implictly forward non-consumed options to the linker like `cl` does, and this would cause the build to fail.

In order to support the Microsoft ABI alongside the Itanium one in the same process from different DLLs, this moves the Itanium ABI runtime plugin to the C++ language runtime (see llvm/llvm-project#168941 (comment)). Before this PR, the C++ language runtime wasn't a plugin. Instead, its functionality was provided by the Itanium ABI plugin. All Itanium specific methods are moved to a new class `ItaniumABIRuntime`. This includes resolving the dynamic type, setting exception filters, and getting the exception object. The other methods were added to `CPPLanguageRuntime`. `language cplusplus demangle` moved to `CommandObjectCPlusPlus`. The Clang REPL depended on the C++ runtime. Now that it's a plugin, this failed the layering check. Since the REPL doesn't use the C++ runtime, I removed the dependency.

Fixes 58efc42.

…185165) Use LangOptions::AllowLiteralDigitSeparator added in #184235 for the IntegerLiteralSeparator option.

It's rare but possible for the codebase not to build. When that happens, we should carry on and still submit an empty LNT report for that order, otherwise we'll get stuck thinking that order hasn't been benchmarked yet.

- Rename `__optional_iterator` into `__optional_iterator_base` and make it part of the `__optional_{meow}_base` inheritance chain. This allows us to get rid of a sketchy-looking downcast and shorten some code.

At the one call-site where the result of hoistDeps is used, self-dependencies will already have been eliminated. That means we can use the "deps changed" property that we're already computing rather than tracking "deps graph changed" separately.

Summary: The current handling of `libclc` is to use custom clang jobs to compile source files. This is the way we used to compile GPU libraries, but we have largely moved away from this. The eventual goal is to simple be able to use `clang` like a normal project. One barrier to this is the lack of language support for OpenCL in CMake. This PR uses CMake's custom language support to define a minimal OpenCL language, largely just a wrapper around `clang -x cl`. This does nothing, just enables the support. Future PRs will need to untangle the mess of dependencies. The 'link+opt' steps that we now do should be able to simply be done as a custom `llvm-link` and `opt` job on the library, which isn't ideal but it still better than the current state.

sarnex · 2026-03-12T14:29:12Z

:)

zahiraam · 2026-03-12T14:31:46Z

:)

Can't say that I wanted to land here ... I was trying to make a fix in sycl-web.

Fznamznon · 2026-03-12T18:25:36Z

No, switching base branch didn't help make it easier to review.

zahiraam · 2026-03-12T18:28:28Z

No, switching base branch didn't help make it easier to review.

I am creating another PR with sycl-web as a base branch (I have used sycl-web-to-resolve) for this one. It will make it easier to review.

nico and others added 30 commits March 6, 2026 15:33

builtins: Make cmake formatting self-consistent aftr #183871

4d53c42

No behavior change.

[clang-doc] Add button toggle for light/dark theme (#181587)

eada0f5

The user can now manually toggle the light or dark theme instead of waiting for the system theme to change. Also fixes a typo that caused some overflow issues even when there was no content to cause an overflow.

[mlir][reducer] Add split-input-file to mlir-reduce (#184970)

a99d4a6

The tests for mlir-reduce are currently scattered. To centralize the tests for mlir-reduce, I added the split-input-file feature to mlir-reduce.It is part of llvm/llvm-project#184974.

[flang] materialize fir.box when it is from a block argument (#184898)

97cf8bf

We have to materialize `fir.box` before adding a `fir.convert` to a memref type. Otherwise we get: `'fir.convert' op invalid type conversion'!fir.box<!fir.array<?xi32>>' / 'memref<?xi32, strided<[?], offset: ?>>'`

[gn] port 3da28bf (DiagnosticStableIDs)

bc55e5e

[AMDGPU] Disable negative imm offset for async load/store instruction…

ab5844d

…s (#185078) They are not allowed by the HW.

[libcxx] Add __split_buffer::__swap_layouts (#180102)

65f39a1

This commit simplifies the cumbersome process of swapping the respective layout members for `__split_buffer` and `vector`.

[AArch64][GlobalISel] Add more gisel test coverage. NFC

89d6936

AMDGPU/GlobalISel: RegBankLegalize rules for amdgcn_exp/exp_row (#181…

d34f179

…956)

[CIR] Implement reference type in init catch param (#184442)

6dea289

Implement init support for reference type in the init catch param

[AArch64] Fix more typos (NFC)

43f7838

Fix more typos in the AArch64 codebase using the https://github.com/crate-ci/typos Rust package. commit-id:9f4d826d Reviewers: davemgreen Pull Request: llvm/llvm-project#183086

[ARM] Fix more typos (NFC)

cf21ea9

Fix more typos in the AArch64 codebase using the https://github.com/crate-ci/typos Rust package. commit-id:33a1bb8d Reviewers: davemgreen Reviewed By: davemgreen Pull Request: llvm/llvm-project#183087

Revert "[libcxx] adds __split_buffer::__swap_layouts" (#185120)

01a9705

Reverts llvm/llvm-project#180102

[HWASan] add optimization remark for supported lifetimes

610ed83

This lets us find functions where we pessimize codegen by removing lifetimes. Reviewers: vitalybuka Reviewed By: vitalybuka Pull Request: llvm/llvm-project#183858

[RISCV][P-ext] Custom legalize i64 SHL to WSLL(I)/WSLA(I) (#185079)

3164d54

When input is zero or sign extended.

[lldb] Use llvm::SmallVector in the PluginManager (NFC) (#184912)

541d546

Most of the plugins have only a small number of instances. Use `llvm::SmallVector` instead of `std::vector`. Depends on llvm/llvm-project#184837

[bolt][NFC] Remove unused ReorderUtils.h (#184642)

f540ad6

This header has a case sensitivity syntax error, delete it since it's unused

dtcxzyw and others added 18 commits March 8, 2026 00:00

[SLP][NFC]Add a test with the missing matching of same nodes with reu…

ac5081a

…ses, NFC

clang/AMDGPU: Add missing field to implicit kernarg struct definition…

810af8a

… (#185209) I missed this one when figuring out all the field offsets.

[X86] narrow-shl-load.ll - regenerate test checks (#185211)

c6805ab

[SLP]Fix the matching of the nodes with the same scalars, but reused

a96a0de

If the scalars are reused and the ReuseShuffleIndices is set, we may miss matching for the buildvector/gather nodes and add an extra cost

[bazel] Bump to latest point release (#185139)

7dd069d

Only minor fixes, just staying up to date.

[Bazel] Fixes 58efc42 (#185227)

b5f3ea1

Fixes 58efc42.

[clang-format][NFC] Use the newly added AllowLiteralDigitSeparator (#…

ff11354

…185165) Use LangOptions::AllowLiteralDigitSeparator added in #184235 for the IntegerLiteralSeparator option.

libclc: Add amdgpu wait_group_events (#185177)

baac39c

10BC0

[libc++][NFC] Simplify optional::iterator (#183230)

a223164

- Rename `__optional_iterator` into `__optional_iterator_base` and make it part of the `__optional_{meow}_base` inheritance chain. This allows us to get rid of a sketchy-looking downcast and shorten some code.

Merge from 'main' to 'sycl-web' (87 commits)

b0d0646

[sycl-web] Reland 8ce2b9c

ed16979

zahiraam requested review from a team and bader as code owners March 12, 2026 14:27

zahiraam marked this pull request as draft March 12, 2026 14:27

jsji changed the base branch from sycl to sycl-web March 12, 2026 14:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix73878#21499

Fix73878#21499
zahiraam wants to merge 89 commits intosycl-webfrom
Fix73878

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants