[pull] main from scala-js:main #55

pull · 2025-05-20T15:23:21Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

Previously, we used several typed arrays pointing to the same `ArrayBuffer`. Now, we use a single `DataView`. Since we can (must) force a specific endianness with `DataView`, we get rid of the `highOffset` and `lowOffset` fields, which further streamlines the code. The assembly produced by V8 is now better. Since it knows that it manipulates a single buffer, it needs fewer loads and type tests internally. I expect the same would be true for other optimizing engines.

The compilation scheme for `finally` is a beast on Wasm. It's good to have direct feedback on it without having to wait for the full partest.

As well as for the locals of `Labeled` blocks whose `Return`s cross a `try..finally` boundary. If their original type was not defaultable, we cast away nullability when reading them back.

Fix #5165: Use defaultable types for the locals of TryFinallys.

That allows the IR cleaner to see the JS type definitions from the library when it appears as a jar on the classpath. This is the case for the linker private library. Adding that support will allow the linker private library to use JS type definitions from the Scala.js library, as long as the references can be erased away.

Previously, our floating point bit manipulation methods were implemented in user space. This commit instead introduces dedicated IR `UnaryOp`s for them. On Wasm, the implementations are straightforward opcodes. In JavaScript, we use the same strategies as before, but moved to the linker world. In most cases, we use a unique scratch `DataView` to perform the manipulations. It is now allocated as a `globalVar` in the emitter, rather than in user space. The functions for `Float`/`Int` conversions are straightforward, as well as those for `Double`/`Long` when we use `bigint`s for `Long`s. When we use `RuntimeLong`s, the conversions are implemented in `RuntimeLong`, but require the scratch `DataView` as an argument, which the linker injects. This new strategy brings several benefits: * For JavaScript, the generated code is basically optimal now. It produces tight sequences of Assembly instructions that do not allocate anything. * For Wasm, the implementation does not rely on JS interop anymore, even when the optimizer does not run. This property will be required when we target Wasm outside of a JS host. * We open a path to adding support for the "raw" variants `floatToRawIntBits`/`doubleToRawLongBits` in Wasm in the future, should we need it.

Introduce IR UnaryOps for floating point bit manipulation.

This allows to better mutualize their implementation with the signed divisions. Moreover, our 3 implementation strategies (JS with `RuntimeLong`, JS with `bigint` and Wasm) have different efficient implementations of those operations. Using IR BinaryOps for them allows each backend to use the most appropriate implementation, while letting the optimizer generically manipulate their mathematical properties.

Introduce IR BinaryOps for unsigned division and remainder.

This way, we only have `require-jdk*` directories for JDK versions that we actually test in our CI. They also correspond to JDK LTS versions, so they are not arbitrary.

Implement jl.Math.{multiplyFull, multiplyHigh, unsignedMultiplyHigh}.

We only need the load spec, this allows memory to be freed up more eagerly.

Improvements to the IRCleaner

Extract branches to be mostly dependent on `isInclusive` and `step >= 0`. These are often constant, and when they're not statically constant, they are probably predictable anyway.

Previously, `Range` used a number of intermediate operations on `Long`s to avoid overflow. The optimizer was already good enough to remove them in many cases, in particular when the step is `1` and/or the start is `0`, which are common cases. Now that the optimizer can reason about unsigned division, we can do a lot better. We can entirely avoid `Long` computations, *and* streamline a lot of code, by using unsigned `Int` arithmetics. This produces much better code for the cases where the optimizer cannot entirely get rid of the computations.

Opt: Use unsigned arithmetics in Range, instead of Longs.

Previously, we used instance methods on `RuntimeLong` to implement the operators whose lhs is a `RuntimeLong`. Other operators, such as `fromDouble`, were implemented as methods of the module. Now, we use static methods for all the operators. Once inline, it does not make any difference. However, it leaves behind a few unused static methods, instead of some instance methods. The former can be removed by off-the-shelf JS minifies, unlike the latter. Paradoxically, the `Minify` output, as we measure it, gets bigger. That's because static method names are not renamed by the name minifier.

The original implementation of `ju.ArrayDeque` used `js.Array` for its internal data structure. However, when compiling to WebAssembly, it requires JavaScript interop calls are required to access the underlying js.Array, leading to a significant performance overhead. This commit switches the internal data structure of ju.ArrayDeque to use scala.Array instead, for both the WebAssembly and JavaScript backends. Using `scala.Array` in both environments avoids complicating the logic by conditionally using `js.Array` or `scala.Array` based on whether it's Wasm or JS. This change significantly improves ArrayDeque's performance on WebAssembly and results in only minor performance degradation on JavaScript. See the discussion at #5164

Implement ArrayDeque using scala.Array.

Use static methods as entry points for RuntimeLong operator methods.

This particular operation is used by a number of our core numerical algorithms. It makes sense for it to receive a dedicated opcode. The dedicated opcode lets the optimizer reason about its purity.

Introduce IR UnaryOps for `numberOfLeadingZeros` (`clz`).

This was forgotten in f414dae.

Through clever analysis of approximation errors, it turns out we can rely on a `Double` division by 10^9 rather than a `Long` division. Since there was already a `Double` division (and remainder) at the end of the `Long` division algorithm, the new implementation should be strictly better. We do have a new call to `js.Math.floor`, but that should be efficient, as `floor` is typically a hardware instruction. We also remove the hackish way of reusing `unsignedDivModHelper` to compute `toString()`. According to the `longMicro` benchmark, this change gives a 3x speed improvement to this code path.

Discovered while working on #5077. This allows to remove unnecessary unboxes and unlocks more inlining.

Opt: Avoid the `Long` division in `RuntimeLong.toString`.

Add missing cases in `Printers` for `{Int,Long}_clz`.

Opt: Recognize non-nullabe RT long as subtype of LongType

If we cannot inline RuntimeLong (the class or its static methods), something is broken and we should fail.

Optimizer: Fail if we cannot inline RuntimeLong

Hacker's Delight offers branchless formulas for double-word additions and subtraction, without access to the machine's carry bit. Although the new formula contains more elementary instructions, removal of the branch is significant. When one operand is constant, folding reduces to 3 bitwise operations to compute the carry, which is very fast. When both operands are variable, then the carry is often 50/50 unpredictable, which means the branch is unpredictable, and removing it is worth the full 5 bitwise operations anyway. Negation does not need a special code path anymore. Regular folding of the `0L - b` formula yields optimal, branchless code anyway. We remove `RuntimeLong.neg` and its code paths in the optimizer and emitter. While we're there, we also remove `RuntimeLong.not`, since the regular code paths for `-1L ^ b` fold in the same way. This change reduces execution time of the `sha512` benchmark by a whopping 25-30%.

Opt: Branchless addition, subtraction and negation for `RuntimeLong`.

sjrd added 4 commits May 19, 2025 23:24

Add copies of the run/finally.scala partest into our own test suite.

f84fa2e

The compilation scheme for `finally` is a beast on Wasm. It's good to have direct feedback on it without having to wait for the full partest.

Fix #5165: Use defaultable types for the locals of TryFinallys.

1291818

As well as for the locals of `Labeled` blocks whose `Return`s cross a `try..finally` boundary. If their original type was not defaultable, we cast away nullability when reading them back.

Merge pull request #5166 from sjrd/fix-wasm-try-finally-non-defaultable

eed6479

Fix #5165: Use defaultable types for the locals of TryFinallys.

pull bot added the ⤵️ pull label May 20, 2025

sjrd and others added 25 commits May 21, 2025 15:33

Merge pull request #5158 from sjrd/floating-point-bits-opcodes

9e5b837

Introduce IR UnaryOps for floating point bit manipulation.

Merge pull request #5171 from sjrd/ir-unsigned-div-mod

4c11983

Introduce IR BinaryOps for unsigned division and remainder.

Move the require-jdk15 tests to require-jdk17.

8a885ad

This way, we only have `require-jdk*` directories for JDK versions that we actually test in our CI. They also correspond to JDK LTS versions, so they are not arbitrary.

Implement jl.Math.{multiplyFull, multiplyHigh, unsignedMultiplyHigh}.

4c128da

Optimize away comparisons of a variable with itself.

303de0d

Opt: Rewrite 1 + ~x to -x.

52668c3

Merge pull request #5175 from sjrd/long-mul-hi

f73a8ae

Implement jl.Math.{multiplyFull, multiplyHigh, unsignedMultiplyHigh}.

Use Java Streams for lazy Jar traversal in IRCleaner

4e32fa7

Do not retain the entire ClassDef in the IRCleaner

8a35205

We only need the load spec, this allows memory to be freed up more eagerly.

Merge pull request #5178 from gzm0/walk-stream

25d9a5f

Improvements to the IRCleaner

Opt: Fewer (and more predictable) branches in Range.isEmpty.

a5337ed

Extract branches to be mostly dependent on `isInclusive` and `step >= 0`. These are often constant, and when they're not statically constant, they are probably predictable anyway.

Merge pull request #5176 from sjrd/range-unsigned-ints-instead-of-longs

29e02bf

Opt: Use unsigned arithmetics in Range, instead of Longs.

Merge pull request #5164 from tanishiking/arraydeque-wasm

e293a4c

Implement ArrayDeque using scala.Array.

Merge pull request #5177 from sjrd/rt-long-static-methods

6b83864

Use static methods as entry points for RuntimeLong operator methods.

Introduce IR UnaryOps for numberOfLeadingZeros (clz).

f414dae

This particular operation is used by a number of our core numerical algorithms. It makes sense for it to receive a dedicated opcode. The dedicated opcode lets the optimizer reason about its purity.

Merge pull request #5181 from sjrd/ir-clz

a2ee9df

Introduce IR UnaryOps for `numberOfLeadingZeros` (`clz`).

Add missing cases in Printers for {Int,Long}_clz.

1830051

This was forgotten in f414dae.

Opt: Recognize non-nullabe RT long as subtype of LongType

0c68950

Discovered while working on #5077. This allows to remove unnecessary unboxes and unlocks more inlining.

sjrd and others added 7 commits June 1, 2025 14:48

Merge pull request #5182 from sjrd/opt-long-tostring

4403864

Opt: Avoid the `Long` division in `RuntimeLong.toString`.

Merge pull request #5187 from sjrd/fix-ir-printer-clz

862e0d0

Add missing cases in `Printers` for `{Int,Long}_clz`.

Merge pull request #5189 from gzm0/long-subtype

289a5e1

Opt: Recognize non-nullabe RT long as subtype of LongType

Optimizer: Fail if we cannot inline RuntimeLong

3a253e3

If we cannot inline RuntimeLong (the class or its static methods), something is broken and we should fail.

Merge pull request #5188 from gzm0/opt-long-cleanup

7e9dbfc

Optimizer: Fail if we cannot inline RuntimeLong

Merge pull request #5184 from sjrd/rt-long-branchless-add-sub-neg

052d861

Opt: Branchless addition, subtraction and negation for `RuntimeLong`.

pull bot merged commit 052d861 into Mu-L:main Jun 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] main from scala-js:main #55

[pull] main from scala-js:main #55

Uh oh!

Uh oh!

Uh oh!

[pull] main from scala-js:main #55

[pull] main from scala-js:main #55

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!