-
Notifications
You must be signed in to change notification settings - Fork 617
cross-platforming compression #2771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
One question would be: can we think of something to organize all this stuff better? Right now we have (probably) too many "*platforms" and co. |
I couldn't find a way to simplify things more. And I don't think even using more The
Some of those are simpler (but complicated by the need to update the crc32 and/or count the bytes), others are tricky on their own. I'd be happy to receive suggestions, though :) |
The results with a combined
Before:
For a reference (copy from above): after doing weird things and removing the crc wrapper, the best I got was this:
And
|
combining the |
@yurique thanks for all the updates. Actually that last pair of numbers you posted in #2771 (comment) look quite good! Basically the same within the margin of error. At this point I'm not sure what constitutes weird anymore :) for example, I wonder if these I'll try and take a deeper look sometime this week. Thanks again, for the massive effort this turned out to be... |
That would be... tricky :) The The latest numbers are not too bad, I agree :) (but combining |
I'm still concerned about all the "weird" things we have to have with Should we instead think about introducing a new set of APIs: |
I'm less worried about that. In this case it would be okay to implement that class twice, once on JS and once on JVM and avoid the weirdness. If you still have appetite to try a different approach, here's an idea: you could try interacting directly with the Node.js APIs, rather than converting them to |
Interesting 🤔 Also, there are sync alternatives to the async functions in node. I can't remember if I considered this initially (and if so, what it was that made me not do it this was) – I'll take a peek :) |
I'd advise against the sync methods, since they are effectively blocking the main (and only) thread in the Node runtime. |
Oh, now I also remember – the sync functions operate on whole inputs (as well as the async ones). What we use is the "inflater" and "deflater" which are |
The |
Yeah, it can probably be more or less straightforward :) Those pulls that do (approx) The |
Otherwise, having only this, lower-level, part have platform-specific implementations and everything else – shared would be nice indeed! |
I don't think it has to return |
Indeed! :) Well, I'll give it a try as soon as I can. This should work. Curious to see how it affects the benchmark. |
# Conflicts: # build.sbt
1745d7a
to
022229e
Compare
I tried implementing inflate in terms of this: trait ChunkInflater[F[_]] {
/** @param bytesChunk bytes to inflate
* @return (inflatedChunk, remainingBytes, finished)
*/
def inflateChunk(
bytesChunk: Chunk[Byte]
): Pull[F, INothing, (Chunk[Byte], Int, Boolean)]
} jvm implementation: new ChunkInflater[F] {
def inflateChunk(
bytesChunk: Chunk.ArraySlice[Byte],
offset: Int
): Pull[F, INothing, (Chunk[Byte], Int, Boolean)] = {
inflater.setInput(
bytesChunk.values,
bytesChunk.offset + offset,
bytesChunk.length - offset
)
val inflatedBytes = inflater.inflate(inflatedBuffer)
Pull.pure(
(
copyAsChunkBytes(inflatedBuffer, inflatedBytes),
inflater.getRemaining,
inflater.finished()
)
)
}
} (changes: https://github.com/yurique/fs2/pull/1/files) Have the tests passing for jvm. Inflate implementation is the same, except using Benchmarks look weird, though (ran it a few times, it's consistent):
before:
I can't explain the difference in the Kind of half way there with the JS implementation (though it turns out even trickier than I expected), but unless we can fix this performance issue, not sure if that's the way to go. As a side note: in the inflate implementation, I have this:
This is used to send the whole chunk into the inflater (inflater doesn't always accept the whole input because the output might overflow the output buffer, so this slice is being re-used, and the So I thought if I should maybe just use the (changes: https://github.com/yurique/fs2/pull/2/files) Got a rather surprising result:
|
@yurique so if I'm reading those benchmarks correctly, |
@armanbilge yes, that's right – I was thinking "those parts haven't changed at all either" but your comment reminded me of something, I looked through it again. And there is a subtle difference: if (track) crcBuilder.update(inflatedChunk) So now I changed it so that the And the benchmark is almost back to where it was:
But I'm really surprised: the def toArraySlice[O2 >: O](implicit ct: ClassTag[O2]): Chunk.ArraySlice[O2] =
this match {
case as: Chunk.ArraySlice[_] if ct.wrap.runtimeClass eq as.getClass =>
as.asInstanceOf[Chunk.ArraySlice[O2]]
case _ => Chunk.ArraySlice(toArray, 0, size)
} But this is good! I'll try finishing the JS implementation. |
Nice! 🎉 And very interesting about |
@armanbilge so I managed to make a JS implementation – it was hell, but it works =) The benchmark results are the same as before the JS implementstion (as they should be – I added a couple of things to the shared inflate implementation, but those do not kick in in JVM).
As the result – the inflate and gzip implementation is shared between JS and jvm, apart from the platform specific interface to the inflater. Changes are here: https://github.com/yurique/fs2/pull/1/files jvm chunk inflater: https://github.com/yurique/fs2/pull/1/files#diff-8c2b0bb4111ef230a00138813c001efa1fa187ee4c6f43cf514e0074b570e084R147-R181 js chunk inflater: https://github.com/yurique/fs2/pull/1/files#diff-1b982c5a5c6bd8cdbb47ca7c2114327cb202075ee95749b8024d7078b6fa7490R73-R207 shared code: https://github.com/yurique/fs2/pull/1/files#diff-e7f835b1438d510176b8ad23b915f2c27e152bc2768847265da54e9ce98acb50R33 – |
Haven't had a chance to look yet but just wanted to 🎉 that's amazing! |
I'm really sorry, has this PR just been waiting for my review? 😬 😅 I'll try and resolve the merge conflicts. |
Oh I see, there are more changes in a PR against this PR :) |
I'm trying to recollect what was going on here. I think the PR agains this PR was an "experiment" which turned out to be a successful one – more shared code between JS and JVM, and without a performance hit. But I can't remember all the details off the top of my head now :'( |
I remember :) you were able to implement a common I think that's the hardest part of this PR 😅 so thanks for your big effort on it! IIRC some remaining things are dealing with bincompatibly changing the API to use The other thing that I remember was frustrating was the poor performance of the CRC calculation on JS, particularly when using the scodec-bits implementation. This PR inlines a mutable implementation which works better (I can't remember how much better). I wish we didn't have to take that on but not sure if there's another way since Node.js doesn't expose native methods for calculating the CRC. |
So I took another look at how http4s is using the It seems none of the http4s gzip middlewares are using the additional headers such as file name or modification time, it is just directly interfacing with the underlying byte stream: IIUC it should be possible to support these http4s use-cases just by directly shelling out to the Node.js APIs like in the example there. Not to discount your huge efforts on this PR :( but maybe the easy win here for http4s is to add new cross-platform APIs to Meanwhile, I see there is an (unfortunately inactive) feature request for Node.js itself to add native support for these optional headers. If that ever materializes, we would be able to implement these APIs much more easily and with much better performance. What do you think? |
No description provided.