8000 [pull] main from scala-js:main by pull[bot] · Pull Request #55 · Mu-L/scala-js · GitHub
[go: up one dir, main page]

Skip to content

[pull] main from scala-js:main #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Jun 7, 2025
Merged

[pull] main from scala-js:main #55

merged 36 commits into from
Jun 7, 2025

Conversation

pull[bot]
Copy link
@pull pull bot commented May 20, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

sjrd added 4 commits May 19, 2025 23:24
Previously, we used several typed arrays pointing to the same
`ArrayBuffer`. Now, we use a single `DataView`. Since we can (must)
force a specific endianness with `DataView`, we get rid of the
`highOffset` and `lowOffset` fields, which further streamlines the
code.

The assembly produced by V8 is now better. Since it knows that it
manipulates a single buffer, it needs fewer loads and type tests
internally. I expect the same would be true for other optimizing
engines.
The compilation scheme for `finally` is a beast on Wasm. It's good
to have direct feedback on it without having to wait for the full
partest.
As well as for the locals of `Labeled` blocks whose `Return`s cross
a `try..finally` boundary.

If their original type was not defaultable, we cast away
nullability when reading them back.
Fix #5165: Use defaultable types for the locals of TryFinallys.
@pull pull bot added the ⤵️ pull label May 20, 2025
sjrd and others added 25 commits May 21, 2025 15:33
That allows the IR cleaner to see the JS type definitions from
the library when it appears as a jar on the classpath. This is
the case for the linker private library. Adding that support will
allow the linker private library to use JS type definitions from
the Scala.js library, as long as the references can be erased
away.
Previously, our floating point bit manipulation methods were
implemented in user space. This commit instead introduces dedicated
IR `UnaryOp`s for them.

On Wasm, the implementations are straightforward opcodes. In
JavaScript, we use the same strategies as before, but moved to the
linker world.

In most cases, we use a unique scratch `DataView` to perform the
manipulations. It is now allocated as a `globalVar` in the emitter,
rather than in user space. The functions for `Float`/`Int`
conversions are straightforward, as well as those for
`Double`/`Long` when we use `bigint`s for `Long`s. When we use
`RuntimeLong`s, the conversions are implemented in `RuntimeLong`,
but require the scratch `DataView` as an argument, which the
linker injects.

This new strategy brings several benefits:

* For JavaScript, the generated code is basically optimal now.
  It produces tight sequences of Assembly instructions that do not
  allocate anything.
* For Wasm, the implementation does not rely on JS interop anymore,
  even when the optimizer does not run. This property will be
  required when we target Wasm outside of a JS host.
* We open a path to adding support for the "raw" variants
  `floatToRawIntBits`/`doubleToRawLongBits` in Wasm in the future,
  should we need it.
Introduce IR UnaryOps for floating point bit manipulation.
This allows to better mutualize their implementation with the
signed divisions.

Moreover, our 3 implementation strategies (JS with `RuntimeLong`,
JS with `bigint` and Wasm) have different efficient implementations
of those operations. Using IR BinaryOps for them allows each backend
to use the most appropriate implementation, while letting the
optimizer generically manipulate their mathematical properties.
Introduce IR BinaryOps for unsigned division and remainder.
This way, we only have `require-jdk*` directories for JDK versions
that we actually test in our CI. They also correspond to JDK LTS
versions, so they are not arbitrary.
Implement jl.Math.{multiplyFull, multiplyHigh, unsignedMultiplyHigh}.
We only need the load spec, this allows memory to be freed up more
eagerly.
Improvements to the IRCleaner
Extract branches to be mostly dependent on `isInclusive` and
`step >= 0`. These are often constant, and when they're not
statically constant, they are probably predictable anyway.
Previously, `Range` used a number of intermediate operations on
`Long`s to avoid overflow. The optimizer was already good enough
to remove them in many cases, in particular when the step is `1`
and/or the start is `0`, which are common cases.

Now that the optimizer can reason about unsigned division, we can
do a lot better. We can entirely avoid `Long` computations, *and*
streamline a lot of code, by using unsigned `Int` arithmetics.
This produces much better code for the cases where the optimizer
cannot entirely get rid of the computations.
Opt: Use unsigned arithmetics in Range, instead of Longs.
Previously, we used instance methods on `RuntimeLong` to implement
the operators whose lhs is a `RuntimeLong`. Other operators, such
as `fromDouble`, were implemented as methods of the module.

Now, we use static methods for all the operators. Once inline, it
does not make any difference. However, it leaves behind a few
unused static methods, instead of some instance methods. The former
can be removed by off-the-shelf JS minifies, unlike the latter.

Paradoxically, the `Minify` output, as we measure it, gets bigger.
That's because static method names are not renamed by the name
minifier.
The original implementation of `ju.ArrayDeque` used `js.Array` for its internal data structure.
However, when compiling to WebAssembly, it requires JavaScript interop calls are required to access the underlying js.Array, leading to a significant performance overhead.

This commit switches the internal data structure of ju.ArrayDeque to use scala.Array instead, for both the WebAssembly and JavaScript backends.
Using `scala.Array` in both environments avoids complicating the logic by conditionally using `js.Array` or `scala.Array` based on whether it's Wasm or JS.

This change significantly improves ArrayDeque's performance on WebAssembly and results in only minor performance degradation on JavaScript.

See the discussion at #5164
Implement ArrayDeque using scala.Array.
Use static methods as entry points for RuntimeLong operator methods.
This particular operation is used by a number of our core numerical
algorithms. It makes sense for it to receive a dedicated opcode.

The dedicated opcode lets the optimizer reason about its purity.
Introduce IR UnaryOps for `numberOfLeadingZeros` (`clz`).
Through clever analysis of approximation errors, it turns out we
can rely on a `Double` division by 10^9 rather than a `Long`
division.

Since there was already a `Double` division (and remainder) at the
end of the `Long` division algorithm, the new implementation should
be strictly better. We do have a new call to `js.Math.floor`, but
that should be efficient, as `floor` is typically a hardware
instruction.

We also remove the hackish way of reusing `unsignedDivModHelper`
to compute `toString()`.

According to the `longMicro` benchmark, this change gives a 3x
speed improvement to this code path.
Discovered while working on #5077. This allows to remove unnecessary
unboxes and unlocks more inlining.
sjrd and others added 7 commits June 1, 2025 14:48
Opt: Avoid the `Long` division in `RuntimeLong.toString`.
Add missing cases in `Printers` for `{Int,Long}_clz`.
Opt: Recognize non-nullabe RT long as subtype of LongType
If we cannot inline RuntimeLong (the class or its static methods),
something is broken and we should fail.
Optimizer: Fail if we cannot inline RuntimeLong
Hacker's Delight offers branchless formulas for double-word
additions and subtraction, without access to the machine's carry
bit.

Although the new formula contains more elementary instructions,
removal of the branch is significant. When one operand is
constant, folding reduces to 3 bitwise operations to compute the
carry, which is very fast. When both operands are variable, then
the carry is often 50/50 unpredictable, which means the branch
is unpredictable, and removing it is worth the full 5 bitwise
operations anyway.

Negation does not need a special code path anymore. Regular
folding of the `0L - b` formula yields optimal, branchless code
anyway. We remove `RuntimeLong.neg` and its code paths in the
optimizer and emitter. While we're there, we also remove
`RuntimeLong.not`, since the regular code paths for `-1L ^ b`
fold in the same way.

This change reduces execution time of the `sha512` benchmark by a
whopping 25-30%.
Opt: Branchless addition, subtraction and negation for `RuntimeLong`.
@pull pull bot merged commit 052d861 into Mu-L:main Jun 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0