8000 Wasm: Implement PriorityQueue without js.Array in Wasm backend by tanishiking · Pull Request #5168 · scala-js/scala-js · GitHub
[go: up one dir, main page]

Skip to content

Wasm: Implement PriorityQueue without js.Array in Wasm backend #5168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

tanishiking
Copy link
Contributor
@tanishiking tanishiking commented May 15, 2025

Simliar to ju.ArrayList, this commit adds a Wasm version of internal storage implementation to remove JS interop for better performance.


Branched from #5164

This implementation uses the same technique as parseFloat to simplify having the two implementation for different platforms. see the conversation #5164 (comment)
Also, this is a draft PR for now, because without linkTimeIf it seems that the optimizer fails to de-virtualize the function calls of methods on InnerArrayImpl. Waiting for it
5e842d8


// The index 0 is not used; the root is at index 1.
// This is standard practice in binary heaps, to simplify arithmetics.
private var _size = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something went wrong during the refactoring, because this state variable was moved to the global impl object. 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks, moved back to the PriorityQueue

@tanishiking tanishiking force-pushed the priorityqueue-wasm branch 4 times, most recently from 6f8093e to ea2171f Compare May 17, 2025 04:36
@tanishiking tanishiking marked this pull request as ready for review May 17, 2025 05:09
@tanishiking tanishiking force-pushed the priorityqueue-wasm branch from ea2171f to bca2c21 Compare May 23, 2025 13:31
@tanishiking tanishiking force-pushed the priorityqueue-wasm branch from bca2c21 to da5c453 Compare June 6, 2025 05:49
@tanishiking tanishiking requested a review from sjrd June 6, 2025 05:50
Copy link
Member
@sjrd sjrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. We need to address the _size issue before going further.

< 10000 div data-view-component="true" class="comment-reactions js-reactions-container js-reaction-buttons-container social-reactions reactions-container d-flex">

private sealed abstract class InnerArrayImpl {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in the companion object of PriorityQueue, so that instances do not depend on the enclosing PriorityQueue object.

Suggested change
private sealed abstract class InnerArrayImpl {
}
object PriorityQueue {
private sealed abstract class InnerArrayImpl {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I now understand that they actually refer to the enclosing object for _size. 😱 That's super confusing: the dependencies on inner are made explicit (params), but those on _size are implicit (scope).

We should also pass _size explicitly. But then that poses an issue for methods that need to return a new Repr and a new size. 😖

Here is a crazy idea: we have an unused slot at index 0 of the array. What if we use it to store the size on Wasm? It would be boxed, but unless it exceeds 2^30, it will fit in a (ref i31), so no JS interop should occur. (and if it does exceed 2^30, I don't think that's going to be the bottleneck :p).

This way we still have a single Repr that embodies both the size and the capacity, on Wasm as well as on JS.

Copy link
Contributor Author
@tanishiking tanishiking Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think it's reasonable technique to re-use the free space to store size information 👍
Here I make a prototype implementation for moving InnerArrayImpl to companion object, and store size to array(0) 0832ba7

* the function calls.
*/

private val innerImpl: InnerArrayImpl =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unfortunate to store this field inside instances of PriorityQueue. I think this can be avoided by putting this private val in the companion object of PriorityQueue.

If my prediction is correct, that companion will entirely disappear after optimizations.

import java.lang.Utils.roundUpToPowerOfTwo

import scala.annotation.tailrec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports of scala.annotation.* always come first (after scala.language.* imports, if any):

Suggested change
10000
import java.lang.Utils.roundUpToPowerOfTwo
import scala.annotation.tailrec
import scala.annotation.tailrec
import java.lang.Utils.roundUpToPowerOfTwo

Comment on lines 102 to 103
// The index 0 is not used; the root is at index 1.
// This is standard practice in binary heaps, to simplify arithmetics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep these two lines as comments on private var inner. They apply to JS as well.

Also, that means that the notion of "capacity" is a skewed in this class. The array should have 1 more elements than the capacity, for "capacity" being the external notion of capacity.

Put differently, it is probably easier to think of "capacity" as the internal notion of capacity, which is 1 more than the "public capacity". The public capacity only shows up in def this(initialCapacity) and def this(c: Collection).

{
if (initialCapacity < 1)
throw new IllegalArgumentException
initialCapacity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
initialCapacity
initialCapacity + 1 // index 0 is unused

@@ -47,47 +62,74 @@ class PriorityQueue[E] private (
NaturalComparator.select(c.comparator().asInstanceOf[Comparator[_ >: E]])
case _ =>
NaturalComparator
}, internal = true)
}, internal = true, roundUpToPowerOfTwo(c.size()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}, internal = true, roundUpToPowerOfTwo(c.size()))
}, internal = true, roundUpToPowerOfTwo(c.size() + 1)) // index 0 is unused

addAll(c)
}

def this(c: PriorityQueue[_ <: E]) = {
this(c.comp.asInstanceOf[Comparator[_ >: E]], internal = true)
this(c.comp.asInstanceOf[Comparator[_ >: E]], internal = true, roundUpToPowerOfTwo(c.size()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this(c.comp.asInstanceOf[Comparator[_ >: E]], internal = true, roundUpToPowerOfTwo(c.size()))
this(c.comp.asInstanceOf[Comparator[_ >: E]], internal = true,
roundUpToPowerOfTwo(c.size() + 1)) // index 0 is unused

addAll(c)
}

def this(sortedSet: SortedSet[_ <: E]) = {
this(NaturalComparator.select(
sortedSet.comparator().asInstanceOf[Comparator[_ >: E]]),
internal = true)
internal = true,
roundUpToPowerOfTwo(sortedSet.size()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
roundUpToPowerOfTwo(sortedSet.size()))
roundUpToPowerOfTwo(sortedSet.size() + 1)) // index 0 is unused

Comment on lines +110 to +94
if (LinkingInfo.isWebAssembly) {
val minCapacity = innerImpl.length(inner) + 1
if (innerImpl.capacity(inner) < minCapacity)
inner = innerImpl.resized(inner, minCapacity)
}
< 8000 div class="js-line-comments js-suggested-changes-container js-suggested-changes-contents js-quote-selection-container" data-quote-markdown=".js-comment-body">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be moved to JArrayImpl.push.

@tanishiking
Copy link
Contributor Author
tanishiking commented Jun 12, 2025

I measured performance of PriorityQueue using a simple priority-queue micro benchmark with fastLinkJS. It does add/peek/poll 10000 numbers (in order from 0-10000, maybe inappropriate for priorityqueue.)

download

The performance change on wasm (red and purple bars) was okay: add/peek gets a bit slower (probably because of array expansion), but the overall performance with poll is much faster. 🎉

The problem is the execution performance on JS.
As you can see from the blue (main) and orange (this change) graphs, the performance on add/peek is about 10 times slower, and poll is also degraded by 2 times.

null checks on innnerImpl

One problem is that, with linktimeIf, the null check on innerImpl has inserted everywhere $n($m_ju_PriorityQueue$().ju_PriorityQueue$__f_java$util$PriorityQueue$$innerImpl); The diff between main vs this branch, generated from the Scala code below: https://gist.github.com/tanishiking/9d0e24de18e994cae5e8742aa9749d98#file-foo-diff

Interestingly, without linkeTimeIf on innerImpl private val innerImpl: InnerArrayImpl = InnerArrayImpl.JSArrayImpl (green bar), innerImpl gets inlined and there's no null check on it. diff: https://gist.github.com/tanishiking/9d0e24de18e994cae5e8742aa9749d98#file-foo2-diff

linktimeIf should generate the same code as the innerImpl = InnerArrayImpl.JSArrayImpl.

package helloworld

import java.util.PriorityQueue

object HelloWorld {
  def main(args: Array[String]): Unit = {
    val pq = new PriorityQueue[Int]()
    for (element <- List(1,2,3,4,5)) {
      pq.add(element)
    }
  }
}

It's still slower

With private val innerImpl: InnerArrayImpl = InnerArrayImpl.JSArrayImpl, the performance gets better (green bar), but it's still much slower than main.

I'm not sure why it's much slower with this diff 🤔 https://gist.github.com/tanishiking/9d0e24de18e994cae5e8742aa9749d98#file-foo2-diff

@tanishiking tanishiking force-pushed the priorityqueue-wasm branch 4 times, most recently from 0832ba7 to 144bca0 Compare June 16, 2025 12:33
@sjrd
Copy link
Member
sjrd commented Jun 16, 2025

linktimeIf should generate the same code as the innerImpl = InnerArrayImpl.JSArrayImpl.

I managed to get this to happen by changing genApplyTypeApply in GenJSCode.scala to

    private def genApplyTypeApply(tree: Apply, isStat: Boolean): js.Tree = {
      implicit val pos = tree.pos
      val Apply(TypeApply(fun @ Select(obj, _), targs), args) = tree
      val sym = fun.symbol

      sym match {
        case Object_isInstanceOf =>
          genIsAsInstanceOf(obj, targs, cast = false)
        case Object_asInstanceOf =>
          val targetTpe = targs.head.tpe
          obj match {
            case Apply(fun, List(cond, thenp, elsep))
                if fun.symbol == jsDefinitions.LinkingInfo_linkTimeIf &&
                thenp.tpe <:< targetTpe && elsep.tpe <:< targetTpe =>
              val genObj = genExpr(obj).asInstanceOf[js.LinkTimeIf]
              js.LinkTimeIf(genObj.cond, genObj.thenp, genObj.elsep)(toIRType(targetTpe))(genObj.pos)
            case _ =>
              genIsAsInstanceOf(obj, targs, cast = true)
          }
        case Object_synchronized =>
          genSynchronized(obj, args.head, isStat)
        case _ =>
          abort("Unexpected type application " + fun +
              "[sym: " + sym.fullName + "]" + " in: " + tree)
      }
    }

That removes a spurious js.AsInstanceOf around the js.LinkTimeIf, which prevents the module aliasing analysis done in IncOptimizer.scala.

(It's a rough draft; it should be cleaned up and tested in order to accept that change into main.)

The use-case of this function is for ensuring the size of internal buffers (e.g. internal Array of `ju.ArrayList`).
@tanishiking
Copy link
Contributor Author
tanishiking commented Jul 15, 2025

I'm getting back on this. Sorry for the delayed response, and thank you very much for addressing the issue about spurious js.AsInstanceOfs!

I found that, another significant performance degradation in add is caused by fixUp implementation. It always traversed all the way up to the root node even after finding a suitable location. fixed in dbfd7a7
(Interestingly, it wasn't a problem without innerImpl, because JS will be optimized to stop traversing when fixUp found the location).

Fixing the fixUp and applying your change, we could optimize Wasm without degrade the JS performance 🎉
Now there's a small diff between main and HEAD in JS https://gist.github.com/tanishiking/5f7b24a1e4e3e724b85d5e60415bf1e4

Benchmark(ms) JS (HEAD) JS (main) Wasm (HEAD) Wasm (main)
add_N 65.82 70.81 232.06 705.57
poll_N_FromFull 1557.52 1537.54 4025.59 8748.01
peek_Repeatedly 83.25 81.29 289.83 1077.45
build_FromCollection_N 77.42 82.55 285.58 721.59
pipelined_AddPoll 1724.26 1529.60 3539.56 8522.60

Your change in #5168 (comment) looks good to me, maybe we can make another PR with tests?

}
loop(parent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the loop(parent) was moved to the wrong place. So the "fix fixUp" commit should be squashed with this one. It was correct in the original code. ;)

Copy link
Contributor Author
@tanishiking tanishiking Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa..., I didn't realize I had made this change myself 🤣

tanishiking added a commit to tanishiking/scala-js that referenced this pull request Jul 16, 2025
When B<:A and C<:A, `linkTimeIf[A](cond){ B }{ C }` would
generate an `.asInstanceOf[A]` that casts the resulting
value to a supertype.
However, this cast is redundant,
as the linker will prune one branch at linktime,
and the remaining expression is already known to be a compatible type.

This commit eliminates the surrounding `.asInstanceOf` around the
`linkTimeIf[T]` when both `thenp` and `elsep` are surely subtypes of `T`.

context: scala-js#5168

Co-authored-by: Sébastien Doeraene <sjrdoeraene@gmail.com>
tanishiking and others added 3 commits July 16, 2025 15:45
Current js.Array-based PriorityQueue implementation on Wasm requires JS-interop for every operation, and JS-interop is very slow.
We use a `linkTimeIf` to select a `scala.Array`-based implementation of internal data structure for PriorityQueue, based on wether it is on JS or WebAssembly.
Also, store the size of array in the first element of Array
Co-authored-by: Sébastien Doeraene <sjrdoeraene@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0