8000 Script: Implement `TextDecoderStream` by minghuaw · Pull Request #38112 · servo/servo · GitHub
[go: up one dir, main page]

Skip to content

Conversation

minghuaw
Copy link
Contributor

This PR implements the TextDecoderStream. Other than introducing the necessary mod and webidl files corresponding to TextDecoderStream, this PR also involves some changes in TextDecoder and TrasnformStream:

  • The common part that can be shared between TextDecoder and TextDecoderStream are extracted into a separate type script::dom::textdecodercommon::TextDecoderCommon. This type could probably use a different name because there is an interface called TextDecoderCommon in the spec (https://encoding.spec.whatwg.org/#textdecodercommon) which just gets included in TextDecoder and TextDecoderStream.
  • The three algorithms in TransformStream (cancel, flush, and transform) all have become enum that has a Js variant for a JS function object and a Native variant for a rust trait object. Whether the cancel algorithm needs this enum type is debatable as I did not find any interface in the spec that explicitly sets the cancel algorithm.

Testing: Existing WPT tests tests/wpt/tests/encoding/stream should be sufficient
Fixes: #37723

@minghuaw minghuaw requested a review from gterzian as a code owner July 16, 2025 06:35
@minghuaw
Copy link
Contributor Author

Would it be better if this PR is split into three smaller PRs because of the changes mentioned above?

@minghuaw minghuaw force-pushed the feat/textdecoderstream-transformstream-enum branch from 116bbf6 to 852a99e Compare July 16, 2025 06:36
@minghuaw
Copy link
Contributor Author

Since this is adding a new interface, should "TextDecoderStream" be added to tests/wpt/mozilla/tests/mozilla/interfaces.https.html?

@minghuaw minghuaw requested a review from Taym95 July 16, 2025 07:49
@minghuaw minghuaw force-pushed the feat/textdecoderstream-transformstream-enum branch from 57c6fae to eb350cc Compare July 16, 2025 09:41
Copy link
Member
@gterzian gterzian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start, thank you.

Couple of general comments on the overall structure. I think there may have been a misunderstanding in zulip re the use of generics. I'm open to counter arguments off course.

@minghuaw minghuaw requested a review from gterzian July 17, 2025 05:18
@minghuaw minghuaw force-pushed the feat/textdecoderstream-transformstream-enum branch from 4f70be6 to a1f0c91 Compare July 17, 2025 08:20
Error::Type("Unable to convert chunk into ArrayBuffer or ArrayBufferView".to_string())
})?
};
let buffer_source = conversion_result.get_success_value().ok_or(Error::Type(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gterzian I thini this needs to use BufferSource from buffer_source.rs right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did effort to hide all unsafe code inside buffer_source.rs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes looks like it should.

Copy link
Contributor Author
@minghuaw minghuaw Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I fully understand this. The buffer_source is passed to the TextDecoderCommon::decode immediately after, which is a function shared between TextDecoderStream and TextDecoder. The reason why TextDecoderCommon::decode takes Option<&ArrayBufferViewOrArrayBuffer> as the argument type is because that is the type in the generated trait TextDecoderMethods.

Is the codegen supposed to generate traits with BufferSource as the arg type? Even though TextDecoder.webidl says BufferSource, the generated code in TextDecoderMethods is still Option<ArrayBufferViewOrArrayBuffer>.

The decode_and_enqueue_a_chunk function takes a SafeHandleValue because that's the type TransformStreamDefaultController::perform_transform uses. Plus, I was not able to find a way to convert a SafeHandleValue into a BufferSource without unsafe code. Neither was I able to find other places in the servo repo that performs this kind of conversion.

Would you mind providing some more guidance on this?

Copy link
Member
@Taym95 Taym95 Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, we’re using ArrayBufferViewOrArrayBuffer and calling unsafe { a.as_slice() } directly, which exposes raw JSObject* at every call.

Our HeapBufferSource solves all of this in one place:

  • It roots the JSObject in a Heap<*mut JSObject> → GC-safe
  • It has is_initialized() & is_detached_buffer() → prevents UB
  • It provides typed_array_to_option() and acquire_data() → safe slice copying
  • It aligns directly with WebIDL’s BufferSource union semantics, rather than ad hoc conversions

so I would do something like:

  let buffer_source_union = conversion_result.get_success_value().ok_or(Error::Type(
        "Unable to convert chunk into ArrayBuffer or ArrayBufferView".to_string(),
    ))?;

    let heap_buffer_source = match buffer_source_union {
        ArrayBufferViewOrArrayBuffer::ArrayBufferView(view) => {
            let view_heap = HeapBufferSource::<ArrayBufferViewU8>::new(
                BufferSource::ArrayBufferView(Heap::boxed(unsafe { (view.underlying_object()).get() })),
            );

            // normalize to its backing ArrayBuffer
            view_heap.get_array_buffer_view_buffer(cx)
        },
        ArrayBufferViewOrArrayBuffer::ArrayBuffer(buf) => {
            HeapBufferSource::<ArrayBufferU8>::new(BufferSource::ArrayBuffer(Heap::boxed(unsafe { (buf.underlying_object()).get() })))
        },
    };

and I would go further and have unsafe { (view.underlying_object()).get() } inside a function in HeapBufferSource

Copy link
Contributor Author
@minghuaw minghuaw Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added HeapBufferSource::from_array_buffer_view_or_array_buffer in 0f3c8c7. Is this a reasonable place to hide the unsafe code?
Reverted because found problem with other WPT subtests after switching to HeapBufferSource in TextDecoder

Copy link
Contributor Author
@minghuaw minghuaw Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in zulip, converting ArrayBufferViewOrArrayBuffer into HeapBufferSource<ArrayBufferU8> sometimes lead to inconsistent byte length number. One particular example is the WPT subtest https://wpt.live/referrer-policy/generic/unsupported-csp-referrer-directive.html, which gives a byte length of 489 before conversion and a byte length of 6084 after converting into HeapBufferSource. The extra spaces seem to be filled with nulls.

The byte length is given by two FFI functions in mozjs (JS_GetArrayBufferViewByteLength and GetArrayBufferByteLength), so troubleshooting and/or fixing the bug in mozjs is probably out of the scope of this PR? Does it make sense if we separate the inclusion of HeapBufferSource in another issue/PR (unless we make some ad-hoc workaround with the current HeapBufferSource)?

[additional writes should wait for backpressure to be relieved for class TextDecoderStream]
expected: FAIL

[write() should not complete until read relieves backpressure for TextEncoderStream]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea why this is still failing?

Copy link
Member
@gterzian gterzian Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important to fix since it goes to the core of the streams stuff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No actually this is nothing, since the PR implements TextDecoderStream, and this test is for the encoder.

TransformerType::Decoder(_) => {
// `TextDecoderStream` does NOT specify a cancel algorithm.
//
// Step 4. Let cancelAlgorithm be an algorithm which returns a promise resolved with undefined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come TextDecoderStream does not specify a cancel algorithm.` but we still have:

Step 4. Let cancelAlgorithm be an algorithm which returns a promise resolved with undefined.

Copy link
Member
@gterzian gterzian Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here,

// <https://streams.spec.whatwg.org/#transformstream-set-up>
// Let result be the result of running cancelAlgorithm given reason, if cancelAlgorithm was given, or null otherwise. 
// Note: `TextDecoderStream` does not specify a cancel algorithm.

// If result is a Promise, then return result.
// Note: not applicable. 

// Return a promise resolved with undefined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more spec steps in the documentation (b857c0c)

// chunk argument and runs the decode and enqueue a chunk
// algorithm with this and chunk.
decode_and_enqueue_a_chunk(cx, global, chunk, decoder, self, can_gc)
.map(|_| Promise::new_resolved(global, cx, (), can_gc))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually spec explicitly says return Promise, but I am missing that part here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this needs to implement, and document the steps of, the transformAlgorithmWrapper from https://streams.spec.whatwg.org/#transformstream-set-up like:

// <https://streams.spec.whatwg.org/#transformstream-set-up >
// Let transformAlgorithmWrapper be an algorithm that runs these steps given a value chunk:
// Let result be the result of running transformAlgorithm given chunk.
let result = decode_and_enqueue_a_chunk

For the exception, please use the same as

// If result is an abrupt completion,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more spec steps in the documentation (b857c0c)

@Taym95 Taym95 added the T-linux-wpt Do a try run of the WPT label Jul 17, 2025
@github-actions github-actions bot removed the T-linux-wpt Do a try run of the WPT label Jul 17, 2025
Copy link

🔨 Triggering try run (#16349517941) for Linux (WPT)

Copy link

Test results for linux-wpt from try job (#16349517941):

Flaky unexpected result (19)
  • FAIL [expected PASS] /_mozilla/css/dirty_viewport.html (#13731)
  • OK /css/css-cascade/layer-cssom-order-reverse.html (#36094)
    • FAIL [expected PASS] subtest: Delete layer invalidates @font-face

      assert_equals: expected "220px" but got "133px"
      

  • OK [expected ERROR] /fetch/fetch-later/quota/same-origin-iframe/multiple-iframes.tentative.https.window.html (#35176)
  • OK /html/browsers/browsing-the-web/navigating-across-documents/006.html (#21382)
    • PASS [expected FAIL] subtest: Link with onclick form submit and href navigation
  • OK /html/browsers/browsing-the-web/navigating-across-documents/initial-empty-document/iframe-src-aboutblank-navigate-immediately.html (#29048)
    • PASS [expected FAIL] subtest: Navigating to a different document with form submission
  • OK /html/browsers/browsing-the-web/navigating-across-documents/initial-empty-document/load-pageshow-events-iframe-contentWindow.html (#28681)
    • PASS [expected FAIL] subtest: load &amp; pageshow events do not fire on contentWindow of &lt;iframe&gt; element created with src='about:blank'
  • OK /html/browsers/browsing-the-web/navigating-across-documents/javascript-url-return-value-handling-dynamic.html (#28066)
    • FAIL [expected PASS] subtest: 0041 set in href="" targeting a frame and clicked

      assert_equals: expected "A" but got ""
      

    • FAIL [expected PASS] subtest: 0080 00FF set in href="" targeting a frame and clicked

      assert_equals: expected "�ÿ" but got ""
      

  • OK /html/browsers/browsing-the-web/navigating-across-documents/navigation-unload-cross-origin.sub.window.html (#29056)
    • PASS [expected FAIL] subtest: Cross-origin navigation started from unload handler must be ignored
  • OK /html/browsers/browsing-the-web/navigating-across-documents/navigation-unload-same-origin-fragment.html (#20768)
    • FAIL [expected PASS] subtest: Tests that a fragment navigation in the unload handler will not block the initial navigation

      assert_equals: expected "" but got "#fragment"
      

  • OK /html/browsers/history/the-history-interface/traverse_the_history_2.html (#21383)
    • PASS [expected FAIL] subtest: Multiple history traversals, last would be aborted
  • OK /html/browsers/history/the-history-interface/traverse_the_history_4.html (#21383)
    • PASS [expected FAIL] subtest: Multiple history traversals, last would be aborted
  • OK /html/browsers/history/the-history-interface/traverse_the_history_5.html (#21383)
    • PASS [expected FAIL] subtest: Multiple history traversals, last would be aborted
  • OK /html/semantics/embedded-content/the-iframe-element/iframe-loading-lazy-nav-location-assign.html (#32863)
    • FAIL [expected PASS] subtest: Navigating iframe loading='lazy' before it is loaded: location.assign

      uncaught exception: Error: assert_equals: expected "http://web-platform.test:8000/html/semantics/embedded-content/the-iframe-element/support/blank.htm?nav" but got "http://web-platform.test:8000/html/semantics/embedded-content/the-iframe-element/support/blank.htm?src"
      

  • CRASH [expected OK] /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_popups_escaping-1.html (#22647)
  • OK /html/semantics/embedded-content/the-video-element/intrinsic_sizes.htm (#37173)
    • FAIL [expected PASS] subtest: default object size after src is removed

      assert_equals: expected "300px" but got "320px"
      

  • OK /navigation-timing/test-navigation-type-reload.html (#33334)
    • PASS [expected FAIL] subtest: Reload domComplete &gt; Original domComplete
    • PASS [expected FAIL] subtest: Reload domContentLoadedEventEnd &gt; Original domContentLoadedEventEnd
    • PASS [expected FAIL] subtest: Reload domContentLoadedEventStart &gt; Original domContentLoadedEventStart
    • PASS [expected FAIL] subtest: Reload domInteractive &gt; Original domInteractive
    • PASS [expected FAIL] subtest: Reload fetchStart &gt; Original fetchStart
    • PASS [expected FAIL] subtest: Reload loadEventEnd &gt; Original loadEventEnd
    • PASS [expected FAIL] subtest: Reload loadEventStart &gt; Original loadEventStart
  • TIMEOUT [expected CRASH] /trusted-types/trusted-types-navigation.html?31-35 (#38034)
    • FAIL [expected PASS] subtest: Navigate a frame via form-submission with javascript:-urls in report-only mode.

      promise_test: Unhandled rejection with value: "Unexpected message received: \"No securitypolicyviolation reported!\""
      

    • TIMEOUT [expected PASS] subtest: Navigate a frame via form-submission with javascript:-urls w/ default policy in report-only mode.

      Test timed out
      

    • NOTRUN [expected PASS] subtest: Navigate a window via form-submission with javascript:-urls w/ a default policy throwing an exception in enforcing mode.
    • NOTRUN [expected PASS] subtest: Navigate a window via form-submission with javascript:-urls w/ a default policy throwing an exception in report-only mode.
    • NOTRUN [expected PASS] subtest: Navigate a window via form-submission with javascript:-urls w/ a default policy making the URL invalid in enforcing mode.
  • TIMEOUT [expected OK] /webmessaging/with-ports/018.html (#24485)
    • TIMEOUT [expected PASS] subtest: origin of the script that invoked the method, javascript:

      Test timed out
      

  • OK [expected TIMEOUT] /webmessaging/without-ports/018.html (#24485)
    • PASS [expected TIMEOUT] subtest: origin of the script that invoked the method, javascript:
Stable unexpected results that are known to be intermittent (19)
  • FAIL [expected PASS] /_mozilla/css/stacked_layers.html (#15988)
  • FAIL [expected PASS] /_mozilla/mozilla/sslfail.html (#10760)
  • TIMEOUT [expected OK] /_mozilla/mozilla/window_resize_event.html (#36741)
    • TIMEOUT [expected PASS] subtest: Popup onresize event fires after resizeTo

      Test timed out
      

  • FAIL [expected PASS] /css/css-transitions/render-blocking/no-transition-from-ua-to-blocking-stylesheet.html (#29187)
  • TIMEOUT [expected FAIL] /dom/xslt/large-cdata.html (#38029)
  • OK /fetch/metadata/generated/css-font-face.sub.tentative.html (#34624)
    • PASS [expected FAIL] subtest: sec-fetch-storage-access - Not sent to non-trustworthy cross-site destination
  • OK /html/browsers/browsing-the-web/navigating-across-documents/navigation-unload-same-origin.window.html (#29049)
    • FAIL [expected PASS] subtest: Same-origin navigation started from unload handler must be ignored

      assert_equals: expected "?pass" but got "?fail"
      

  • OK [expected TIMEOUT] /html/interaction/focus/the-autofocus-attribute/supported-elements.html (#24145)
    • FAIL [expected NOTRUN] subtest: Host element with delegatesFocus including no focusable descendants should be skipped

      assert_equals: expected Element node &lt;input autofocus=""&gt;&lt;/input&gt; but got Element node &lt;body&gt;&lt;div autofocus=""&gt;&lt;/div&gt;&lt;input autofocus=""&gt;&lt;/body&gt;
      

    • FAIL [expected NOTRUN] subtest: Area element should support autofocus

      promise_test: Unhandled rejection with value: object "TypeError: w.document.querySelector(...) is null"
      

  • TIMEOUT [expected OK] /html/interaction/focus/the-autofocus-attribute/update-the-rendering.html (#24145)
    • TIMEOUT [expected FAIL] subtest: "Flush autofocus candidates" should be happen before a scroll event and animation frame callbacks

      Test timed out
      

  • OK [expected CRASH] /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_popups_escaping-2.html (#22667)
  • CRASH [expected OK] /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_popups_nonescaping-1.html (#24066)
  • CRASH [expected OK] /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_popups_nonescaping-2.html (#22154)
  • TIMEOUT [expected CRASH] /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_popups_nonescaping-3.html (#24066)
  • OK [expected TIMEOUT] /html/semantics/forms/form-submission-0/reparent-form-during-planned-navigation-task.html (#29724)
    • PASS [expected TIMEOUT] subtest: reparent-form-during-planned-navigation-task
  • OK /html/semantics/forms/historical.html (#28568)
    • FAIL [expected PASS] subtest: &lt;input name=isindex&gt; should not be supported

      assert_regexp_match: expected object "/\?isindex=x$/" but got "about:blank"
      

  • OK /preload/preload-error.sub.html (#37177)
    • FAIL [expected PASS] subtest: CORS (style): main

      assert_greater_than: http://not-web-platform.test:8000/preload/resources/dummy.css?pipe=header%28Access-Control-Allow-Origin%2C*%29&amp;label=style should be loaded expected a number greater than 0 but got 0
      

    • PASS [expected FAIL] subtest: success (script): main
    • FAIL [expected PASS] subtest: CORS (script): main

      assert_greater_than: http://not-web-platform.test:8000/preload/resources/dummy.js?pipe=header%28Access-Control-Allow-Origin%2C*%29&amp;label=script should be loaded expected a number greater than 0 but got 0
      

    • PASS [expected FAIL] subtest: success (xhr): main
    • PASS [expected FAIL] subtest: 404 (xhr): main
    • FAIL [expected PASS] subtest: Decode-error (style): main

      assert_greater_than: http://web-platform.test:8000/preload/resources/dummy.xml?pipe=header%28Content-Type%2Ctext%2Fcss%29&amp;label=style should be loaded expected a number greater than 0 but got 0
      

    • FAIL [expected PASS] subtest: MIME-error (script): main

      assert_greater_than: http://web-platform.test:8000/preload/resources/dummy.css?pipe=header%28Content-Type%2Ctext%2Fnotjavascript%29&amp;label=script should be loaded expected a number greater than 0 but got 0
      

  • TIMEOUT /preload/preload-resource-match.https.html (#38088)
    • FAIL [expected TIMEOUT] subtest: Loading script (anonymous) with link (no-cors) should discard the preloaded response

      assert_equals: https://www1.web-platform.test:8443/preload/resources/echo-with-cors.py?type=application%2Fjavascript&amp;content=function%20dummy()%20%7B%20%7D&amp;uid=36738ade-eb4e-44eb-aa85-1bd72f290067&amp;d5630c95-1f09-41c8-be8c-3c1668c1c9f7 expected 2 but got 1
      

    • PASS [expected NOTRUN] subtest: Loading script (anonymous) with link (anonymous) should reuse the preloaded response
    • FAIL [expected NOTRUN] subtest: Loading script (anonymous) with link (use-credentials) should discard the preloaded response

      assert_equals: https://www1.web-platform.test:8443/preload/resources/echo-with-cors.py?type=application%2Fjavascript&amp;content=function%20dummy()%20%7B%20%7D&amp;uid=36738ade-eb4e-44eb-aa85-1bd72f290067&amp;31576c70-c75c-4a13-ab25-8872efe9ebaa expected 2 but got 1
      

    • FAIL [expected NOTRUN] subtest: Loading script (use-credentials) with link (no-cors) should discard the preloaded response

      assert_equals: https://www1.web-platform.test:8443/preload/resources/echo-with-cors.py?type=application%2Fjavascript&amp;content=function%20dummy()%20%7B%20%7D&amp;uid=36738ade-eb4e-44eb-aa85-1bd72f290067&amp;1a316f28-153e-45b6-973c-a0065da3ed0f expected 2 but got 1
      

    • TIMEOUT [expected NOTRUN] subtest: Loading script (use-credentials) with link (anonymous) should discard the preloaded response

      Test timed out
      

  • OK /resize-observer/eventloop.html (#33599)
    • FAIL [expected PASS] subtest: test0: multiple notifications inside same event loop

      assert_equals: new loop expected 1 but got 0
      

  • TIMEOUT [expected OK] /resource-timing/tentative/document-initiated.html (#37785)

Copy link

✨ Try run (#16349517941) succeeded.

Copy link
Member
@gterzian gterzian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of more comments on the algos.

self.transform_obj.set(*this_object);
if let TransformerType::Js { transform_obj, .. } = &self.transformer_type {
transform_obj.set(*this_object)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should assert that the type is Js.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It now panics in the else branch in b857c0c

Copy link
Member
@gterzian gterzian Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has already been changed to unreachable!() in the else branch in commit 4d04d78. GH somehow didn't mark this outdated

// chunk argument and runs the decode and enqueue a chunk
// algorithm with this and chunk.
decode_and_enqueue_a_chunk(cx, global, chunk, decoder, self, can_gc)
.map(|_| Promise::new_resolved(global, cx, (), can_gc))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this needs to implement, and document the steps of, the transformAlgorithmWrapper from https://streams.spec.whatwg.org/#transformstream-set-up like:

// <https://streams.spec.whatwg.org/#transformstream-set-up >
// Let transformAlgorithmWrapper be an algorithm that runs these steps given a value chunk:
// Let result be the result of running transformAlgorithm given chunk.
let result = decode_and_enqueue_a_chunk

For the exception, please use the same as

// If result is an abrupt completion,

TransformerType::Decoder(_) => {
// `TextDecoderStream` does NOT specify a cancel algorithm.
//
// Step 4. Let cancelAlgorithm be an algorithm which returns a promise resolved with undefined.
Copy link
Member
@gterzian gterzian Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here,

// <https://streams.spec.whatwg.org/#transformstream-set-up>
// Let result be the result of running cancelAlgorithm given reason, if cancelAlgorithm was given, or null otherwise. 
// Note: `TextDecoderStream` does not specify a cancel algorithm.

// If result is a Promise, then return result.
// Note: not applicable. 

// Return a promise resolved with undefined.

Copy link
Member
@gterzian gterzian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok aside from those few comments, this looks good overall. Haven't had time to review all the steps in details; @Taym95 do you want to do it since you wrote transform stream?

@servo-wpt-sync
Copy link
Collaborator

🛠 These changes could not be applied onto the latest upstream WPT. Servo's copy of the Web Platform Tests may be out of sync.

@minghuaw minghuaw requested review from Taym95 and gterzian July 21, 2025 05:46
@Taym95 Taym95 added the T-linux-wpt Do a try run of the WPT label Jul 21, 2025
@github-actions github-actions bot removed the T-linux-wpt Do a try run of the WPT label Jul 21, 2025
.extend_from_slice(unsafe { a.as_slice() });
},
None => {},
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to avoid the copy(see the note in the spec). Not sure how, could be more of a follow-up, but we should file an issue and add a TODO note.

Copy link
Contributor Author
@minghuaw minghuaw Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new implementation, this copying only happens if there are leftover bytes. Otherwise, it would just use the input slice directly. Any bytes that are not consumed will then be stored in the decoder's I/O queue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with the new implementation? In the current code, the extend_from_slice call clones the data rigtht? I think to avoid the copy you'd have to store the buffer in the I/O queue, and pass slices of it to the decoder?

Good for a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the changes introduced in a5bc680. GitHub somehow didn't mark this as outdated

minghuaw added 2 commits July 23, 2025 16:35
Signed-off-by: minghuaw <wuminghua7@huawei.com>
Signed-off-by: minghuaw <wuminghua7@huawei.com>
@minghuaw minghuaw requested a review from gterzian July 23, 2025 08:52
@minghuaw
Copy link
Contributor Author
minghuaw commented Jul 23, 2025

I think the confusion comes from the imperfect mapping between the encoder/decoder implementation we are using, ie. encoding_rs, and the spec's detailed instruction on how each byte should be handled.

  1. We should have something implementing https://encoding.spec.whatwg.org/#concept-encoding-process (calling into the decoder function would then correspond to "4: Let result be the result of running encoderDecoder’s handler on input and item."

This I believe is already taken care of by encoding_rs::Decoder::decode_to_string(_without_replacement). It describes the process of taking a byte ("item") and try to decode the given byte. The "result" it defines are intermediate states that encoding_rs::Decoder takes care of during the process of decoding an entire input buffer, which drains the input buffer (https://docs.rs/encoding_rs/latest/encoding_rs/#buffer-reading-and-writing-behavior). So every time it completes decoding, it has already run into an end-of-queue internally.

  1. We also need something to implement https://encoding.spec.whatwg.org/#concept-td-serialize

I believe this is also already handled within encoding_rs::Decoder. All this "serialize I/O" is doing is just to convert a Vec<char> into a String, but we are decoding into a String type buffer with encoding_rs::Decoder from the very beginning.

So currently everything goes through TextDecoderCommon::decode, but that needs to be broken-up as per the above.

Yes it should be broken up but not as per the above. I have moved do_not_flush back to TextDecoder, keeping TextDecoderCommon::decode as minimal as possible.

And we also need to store the output in the case of the "streaming" use of the i/o queue(not to be confused with the transform stream)

I think this is an misunderstanding. Isn't the "streaming" part referring to the Readable and Writable? The output is just referring to a buffer that stores all the decoded char before returning them altogether as a String, which is what encoding_rs::Decoder::decode_to_string already takes care of. This part of the spec looks confusing because it covers details that are already covered in encoding_rs

Signed-off-by: minghuaw <wuminghua7@huawei.com>
@gterzian
Copy link
Member
gterzian commented Jul 23, 2025

I think the confusion comes from the imperfect mapping between the encoder/decoder implementation we are using, ie. encoding_rs, and the spec's detailed instruction on how each byte should be handled.

  1. We should have something implementing https://encoding.spec.whatwg.org/#concept-encoding-process (calling into the decoder function would then correspond to "4: Let result be the result of running encoderDecoder’s handler on input and item."

This I believe is already taken care of by encoding_rs::Decoder::decode_to_string(_without_replacement). It describes the process of taking a byte ("item") and try to decode the given byte. The "result" it defines are intermediate states that encoding_rs::Decoder takes care of during the process of decoding an entire input buffer, which drains the input buffer (https://docs.rs/encoding_rs/latest/encoding_rs/#buffer-reading-and-writing-behavior). So every time it completes decoding, it has already run into an end-of-queue internally.

  1. We also need something to implement https://encoding.spec.whatwg.org/#concept-td-serialize

I believe this is also already handled within encoding_rs::Decoder. All this "serialize I/O" is doing is just to convert a Vec<char> into a String, but we are decoding into a String type buffer with encoding_rs::Decoder from the very beginning.

So currently everything goes through TextDecoderCommon::decode, but that needs to be broken-up as per the above.

Yes it should be broken up but not as per the above. I have moved do_not_flush back to TextDecoder, keeping TextDecoderCommon::decode as minimal as possible.

And we also need to store the output in the case of the "streaming" use of the i/o queue(not to be confused with the transform stream)

I think this is an misunderstanding. Isn't the "streaming" part referring to the Readable and Writable? The output is just referring to a buffer that stores all the decoded char before returning them altogether as a String, which is what encoding_rs::Decoder::decode_to_string already takes care of. This part of the spec looks confusing because it covers details that are already covered in encoding_rs

So I've had another look at your implementation and the spec.

What you implemented--passing "do not flush" along as "last" for the text decoder, and passing "last" as true in the flush algo for the stream decoder--makes conceptual sense to me, and it also seems to match what the JS examples do. And it also matches how encoding rs describes incremental decoding.

But what I find confusing is that it doesn't seem to be how the spec is written.

For example, the "decode and enqueue algo" only enqueues a chunk on the transform when "end of queue" is received, not each time an item is processed.

Also, for the text decoder, a result is only returned when "end of queue" is encountered, not each time an item is processed.

So the spec, to me at least, seems to imply that one should keep appending processed items to an output, but only return or enqueue the result of serialising the output when reaching end of queue.

Also note that "end of queue" is a special item that means there will be no more items, so it's not the same thing as "the i/o queue is currently empty", which is indeed reached at the end of processing each item.

In conclusion, I'd say we stick to your implementation, but you could consider opening an issue at the spec to point out the potential weirdness.

@gterzian
Copy link
Member
gterzian commented Jul 23, 2025

So last update in the above discussion: my reading of "end of queue" was wrong; it isn't a special value enqueued to signal that no more items will be enqueued, rather it just means "queue is currently empty".

So for each call to 'decode' the "end of queue" will be hit.

The way "streaming" works is that the end of queue will only be processed when the stream option is false, or, in the streaming decoder, when the stream is closed and the flush algorithm is called. The point of that seems to be to allow more data to be received before "end of queue" is finally processed.

See whatwg/encoding#184

So your implementation is right, and the spec is unambiguous.

Copy link
Member
@gterzian gterzian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so from my side we just need to throw the error.

@Taym95 how is it looking on the buffer use?

// <https://encoding.spec.whatwg.org/#dom-textdecoder-decode>
// Step 5.3.3 Otherwise, if result is error, throw a TypeError.
DecoderResult::Malformed(_, _) => {
return Err(Error::Type("Decoding failed".to_owned()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here I think you need to throw the type error, as done here, but using Error::Type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize that throwing is different from just returning an Err. Would you mind providing some more contexts regarding why both are used?

Copy link
Contributor Author
@minghuaw minghuaw Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The realm problem found in wpt/tests/encoding/streams/realms.window.js are actually solved by manually specifying the realm when creating the rejection Promise b66a365#diff-3b8b0e94fec7d463a8c649b8679235a8b77cbf2663a934f3b1ba16a7244ffff5

There is still one TextDecoderStream test that fails in that subtest, which is the case of invalid chunk, and it seems like the root cause is a bug in encoding_rs. encoding_rs::Decoder doesn't return an error with invalid input [0xFF]. See my comment below for an example


[TypeError for chunk with the wrong type should come from constructor realm of TextDecoderStream]
expected: FAIL
D306 Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those failing test should be fixed by throwing the type error.

expected: ERROR

[decode-utf8.any.worker.html]
expected: ERROR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: error is from

const createBuffer = (() => {

"sabConstructor is not a constructor".

minghuaw added 2 commits July 24, 2025 13:17
Signed-off-by: minghuaw <wuminghua7@huawei.com>
Signed-off-by: minghuaw <wuminghua7@huawei.com>
[TypeError for chunk with the wrong type should come from constructor realm of TextDecoderStream]
expected: FAIL

[TypeError for invalid chunk should come from constructor realm of TextDecoderStream]
Copy link
Contributor Author
@minghuaw minghuaw Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still failing because encoding_rs::Decoder::decode_to_string_without_replacement fails to return DecoderResult::Malformed with invalid input [0xFF]. This can be reproduced with this: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=8225345ee8ffedee5a4b6ecd0161beb2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue in encoding_rs repo: hsivonen/encoding_rs#122

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be just me using it wrong though because changing last to true would indeed give Malformed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the problem is that when last is set to false (not flushing), the number of bytes read would actually include the malformed byte, so when it gets to flushing, ie. last is set to true, it sees an empty slice because the malformed bytes are already removed.

Copy link
Contributor Author
@minghuaw minghuaw Jul 24, 2025 C462

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like that encoding_rs::Decoder only shows this problem when decoding an invalid input of length 1 with last set to false, which is exactly how the WPT test case is written

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want an issue in Servo tracking this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in 3039820

@minghuaw minghuaw requested a review from gterzian July 24, 2025 10:13
Copy link
Member
@gterzian gterzian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize that throwing is different from just returning an Err. Would you mind providing some more contexts regarding why both are used?

If you return an error from a method implementing webidl(like Decode), then the bindings will throw an exception.

But in the case where the transform algorithm is called, the error is not returned to Webidl; we do propagate the error by rejecting the promise with it, but that is not the same thing as throwing an exception. I don't see any tests asserting the one is thrown however(you are right to point out the realms.window.js is unrelated), so I would say we leave it like it is.

FYI, if you wanted to manually set the exception, you'd have to return Error::JSFailed, to prevent the bindings from setting the exception that would have already been set, and then to propagate to the promise you'd have to manually get the pending exception, as is done at

unsafe { assert!(JS_GetPendingException(*cx, rval.handle_mut())) };

Copy link
Member
@gterzian gterzian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks and good to see another use of streams in the codebase.

Couple of comments to address and issues to file, and then it's ready for the merge queue.

self.transform_obj.set(*this_object);
if let TransformerType::Js { transform_obj, .. } = &self.transformer_type {
transform_obj.set(*this_object)
}
Copy link
Member
@gterzian gterzian Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the case?

// <https://encoding.spec.whatwg.org/#decode-and-enqueue-a-chunk>
// Step 2. Push a copy of bufferSource to decoder’s I/O queue.
//
// NOTE: try to avoid this copy unless there are bytes left
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a note or a todo? If it's a note, it would be good to explain how the copy is avoided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a note. Here we only call extend_from_slice on the io_queue if we have leftover bytes from incomplete input. Otherwise, the input will be pointing to the input chunk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see, the code is clear actually.

.extend_from_slice(unsafe { a.as_slice() });
},
None => {},
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with the new implementation? In the current code, the extend_from_slice call clones the data rigtht? I think to avoid the copy you'd have to store the buffer in the I/O queue, and pass slices of it to the decoder?

Good for a follow-up.

[TypeError for chunk with the wrong type should come from constructor realm of TextDecoderStream]
expected: FAIL

[TypeError for invalid chunk should come from constructor realm of TextDecoderStream]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for a follow-up.

Signed-off-by: minghuaw <wuminghua7@huawei.com>
let decoder = if ignoreBOM {
encoding.new_decoder_without_bom_handling()
} else {
encoding.new_decoder_with_bom_removal()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that, because of my ignorance, I had been creating new decoder instances with new_decoder() which enables BOM sniffing, thus making the decoder not return immediately when it sees 0xFF. We now pass all the decoder related test cases in realms.window.js WPT subtest

@minghuaw minghuaw requested a review from gterzian July 28, 2025 01:07
Copy link
Member
@gterzian gterzian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok awesome, thanks. Time to add this to the merge queue.

@gterzian gterzian added this pull request to the merge queue Jul 28, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 28, 2025
@gterzian
Copy link
Member

image

Looks like there are some more test passes that need updated expectations

minghuaw added 2 commits July 29, 2025 09:34
Signed-off-by: minghuaw <wuminghua7@huawei.com>
Signed-off-by: minghuaw <wuminghua7@huawei.com>
@minghuaw
Copy link
Contributor Author

Looks like there are some more test passes that need updated expectations

Updated the WPT expectations. Should be good now (see the try run: https://github.com/minghuaw/servo/actions/runs/16584399397)

@gterzian gterzian added this pull request to the merge queue Jul 29, 2025
Merged via the queue into servo:main with commit 554b2da Jul 29, 2025
21 checks passed
minghuaw pushed a commit to minghuaw/servo that referenced this pull request Aug 1, 2025
This PR implements the `TextDecoderStream`. Other than introducing the
necessary mod and webidl files corresponding to `TextDecoderStream`,
this PR also involves some changes in `TextDecoder` and
`TrasnformStream`:

- The common part that can be shared between `TextDecoder` and
`TextDecoderStream` are extracted into a separate type
`script::dom::textdecodercommon::TextDecoderCommon`. This type could
probably use a different name because there is an interface called
`TextDecoderCommon` in the spec
(https://encoding.spec.whatwg.org/#textdecodercommon) which just gets
included in `TextDecoder` and `TextDecoderStream`.
- The three algorithms in `TransformStream` (`cancel`, `flush`, and
`transform`) all have become `enum` that has a `Js` variant for a JS
function object and a `Native` variant for a rust trait object. Whether
the cancel algorithm needs this enum type is debatable as I did not find
any interface in the spec that explicitly sets the cancel algorithm.

Testing: Existing WPT tests `tests/wpt/tests/encoding/stream` should be
sufficient
Fixes: servo#37723

---------

Signed-off-by: minghuaw <michael.wu1107@gmail.com>
Signed-off-by: minghuaw <wuminghua7@huawei.com>
Signed-off-by: Minghua Wu <michael.wu1107@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement TextDecoderStream
5 participants
0