Script: Implement `TextDecoderStream` #38112

minghuaw · 2025-07-16T06:35:44Z

This PR implements the TextDecoderStream. Other than introducing the necessary mod and webidl files corresponding to TextDecoderStream, this PR also involves some changes in TextDecoder and TrasnformStream:

The common part that can be shared between TextDecoder and TextDecoderStream are extracted into a separate type script::dom::textdecodercommon::TextDecoderCommon. This type could probably use a different name because there is an interface called TextDecoderCommon in the spec (https://encoding.spec.whatwg.org/#textdecodercommon) which just gets included in TextDecoder and TextDecoderStream.
The three algorithms in TransformStream (cancel, flush, and transform) all have become enum that has a Js variant for a JS function object and a Native variant for a rust trait object. Whether the cancel algorithm needs this enum type is debatable as I did not find any interface in the spec that explicitly sets the cancel algorithm.

Testing: Existing WPT tests tests/wpt/tests/encoding/stream should be sufficient
Fixes: #37723

minghuaw · 2025-07-16T06:35:57Z

Would it be better if this PR is split into three smaller PRs because of the changes mentioned above?

components/script/dom/transformstreamdefaultcontroller.rs

components/script/dom/textdecodercommon.rs

components/script/dom/textdecoderstream.rs

minghuaw · 2025-07-16T07:48:15Z

Since this is adding a new interface, should "TextDecoderStream" be added to tests/wpt/mozilla/tests/mozilla/interfaces.https.html?

gterzian

Great start, thank you.

Couple of general comments on the overall structure. I think there may have been a misunderstanding in zulip re the use of generics. I'm open to counter arguments off course.

components/script/dom/transformstreamdefaultcontroller.rs

components/script/dom/textdecoderstream.rs

components/script/dom/transformstreamdefaultcontroller.rs

components/script/dom/transformstream.rs

Taym95 · 2025-07-17T15:19:21Z

components/script/dom/textdecoderstream.rs

+            Error::Type("Unable to convert chunk into ArrayBuffer or ArrayBufferView".to_string())
+        })?
+    };
+    let buffer_source = conversion_result.get_success_value().ok_or(Error::Type(


@gterzian I thini this needs to use BufferSource from buffer_source.rs right?

We did effort to hide all unsafe code inside buffer_source.rs

Yes looks like it should.

I'm not sure if I fully understand this. The buffer_source is passed to the TextDecoderCommon::decode immediately after, which is a function shared between TextDecoderStream and TextDecoder. The reason why TextDecoderCommon::decode takes Option<&ArrayBufferViewOrArrayBuffer> as the argument type is because that is the type in the generated trait TextDecoderMethods.

Is the codegen supposed to generate traits with BufferSource as the arg type? Even though TextDecoder.webidl says BufferSource, the generated code in TextDecoderMethods is still Option<ArrayBufferViewOrArrayBuffer>.

The decode_and_enqueue_a_chunk function takes a SafeHandleValue because that's the type TransformStreamDefaultController::perform_transform uses. Plus, I was not able to find a way to convert a SafeHandleValue into a BufferSource without unsafe code. Neither was I able to find other places in the servo repo that performs this kind of conversion.

Would you mind providing some more guidance on this?

Right now, we’re using ArrayBufferViewOrArrayBuffer and calling unsafe { a.as_slice() } directly, which exposes raw JSObject* at every call.

Our HeapBufferSource solves all of this in one place:

It roots the JSObject in a Heap<*mut JSObject> → GC-safe

It has is_initialized() & is_detached_buffer() → prevents UB

It provides typed_array_to_option() and acquire_data() → safe slice copying

It aligns directly with WebIDL’s BufferSource union semantics, rather than ad hoc conversions

so I would do something like:

let buffer_source_union = conversion_result.get_success_value().ok_or(Error::Type( "Unable to convert chunk into ArrayBuffer or ArrayBufferView".to_string(), ))?; let heap_buffer_source = match buffer_source_union { ArrayBufferViewOrArrayBuffer::ArrayBufferView(view) => { let view_heap = HeapBufferSource::<ArrayBufferViewU8>::new( BufferSource::ArrayBufferView(Heap::boxed(unsafe { (view.underlying_object()).get() })), ); // normalize to its backing ArrayBuffer view_heap.get_array_buffer_view_buffer(cx) }, ArrayBufferViewOrArrayBuffer::ArrayBuffer(buf) => { HeapBufferSource::<ArrayBufferU8>::new(BufferSource::ArrayBuffer(Heap::boxed(unsafe { (buf.underlying_object()).get() }))) }, };

and I would go further and have unsafe { (view.underlying_object()).get() } inside a function in HeapBufferSource

~~Added HeapBufferSource::from_array_buffer_view_or_array_buffer in 0f3c8c7. Is this a reasonable place to hide the unsafe code?~~
Reverted because found problem with other WPT subtests after switching to HeapBufferSource in TextDecoder

As mentioned in zulip, converting ArrayBufferViewOrArrayBuffer into HeapBufferSource<ArrayBufferU8> sometimes lead to inconsistent byte length number. One particular example is the WPT subtest https://wpt.live/referrer-policy/generic/unsupported-csp-referrer-directive.html, which gives a byte length of 489 before conversion and a byte length of 6084 after converting into HeapBufferSource. The extra spaces seem to be filled with nulls.

The byte length is given by two FFI functions in mozjs (JS_GetArrayBufferViewByteLength and GetArrayBufferByteLength), so troubleshooting and/or fixing the bug in mozjs is probably out of the scope of this PR? Does it make sense if we separate the inclusion of HeapBufferSource in another issue/PR (unless we make some ad-hoc workaround with the current HeapBufferSource)?

components/script/dom/transformstream.rs

Taym95 · 2025-07-17T15:25:18Z

tests/wpt/meta/encoding/streams/backpressure.any.js.ini

-  [additional writes should wait for backpressure to be relieved for class TextDecoderStream]
-    expected: FAIL
-
  [write() should not complete until read relieves backpressure for TextEncoderStream]


any idea why this is still failing?

Important to fix since it goes to the core of the streams stuff.

No actually this is nothing, since the PR implements TextDecoderStream, and this test is for the encoder.

Taym95 · 2025-07-17T15:27:11Z

components/script/dom/transformstreamdefaultcontroller.rs

+            TransformerType::Decoder(_) => {
+                // `TextDecoderStream` does NOT specify a cancel algorithm.
+                //
+                // Step 4. Let cancelAlgorithm be an algorithm which returns a promise resolved with undefined.


how come TextDecoderStream does not specify a cancel algorithm.` but we still have:

Step 4. Let cancelAlgorithm be an algorithm which returns a promise resolved with undefined.

Same here,

// <https://streams.spec.whatwg.org/#transformstream-set-up> // Let result be the result of running cancelAlgorithm given reason, if cancelAlgorithm was given, or null otherwise. // Note: `TextDecoderStream` does not specify a cancel algorithm. // If result is a Promise, then return result. // Note: not applicable. // Return a promise resolved with undefined.

Added more spec steps in the documentation (b857c0c)

components/script/dom/transformstreamdefaultcontroller.rs

Taym95 · 2025-07-17T15:36:40Z

components/script/dom/transformstreamdefaultcontroller.rs

+                // chunk argument and runs the decode and enqueue a chunk
+                // algorithm with this and chunk.
+                decode_and_enqueue_a_chunk(cx, global, chunk, decoder, self, can_gc)
+                    .map(|_| Promise::new_resolved(global, cx, (), can_gc))


usually spec explicitly says return Promise, but I am missing that part here

Yes this needs to implement, and document the steps of, the transformAlgorithmWrapper from https://streams.spec.whatwg.org/#transformstream-set-up like:

// <https://streams.spec.whatwg.org/#transformstream-set-up > // Let transformAlgorithmWrapper be an algorithm that runs these steps given a value chunk: // Let result be the result of running transformAlgorithm given chunk. let result = decode_and_enqueue_a_chunk

For the exception, please use the same as

servo/components/script/dom/readablestreamdefaultcontroller.rs

Line 684 in fe2c13c

// If result is an abrupt completion,

Added more spec steps in the documentation (b857c0c)

github-actions · 2025-07-17T15:37:14Z

🔨 Triggering try run (#16349517941) for Linux (WPT)

github-actions · 2025-07-17T16:09:07Z

Test results for linux-wpt from try job (#16349517941):

Flaky unexpected result (19)

FAIL [expected PASS] /_mozilla/css/dirty_viewport.html (#13731)
OK /css/css-cascade/layer-cssom-order-reverse.html (#36094)
- FAIL [expected PASS] subtest: Delete layer invalidates @font-face
```
assert_equals: expected "220px" but got "133px"
```
OK [expected ERROR] /fetch/fetch-later/quota/same-origin-iframe/multiple-iframes.tentative.https.window.html (#35176)
OK /html/browsers/browsing-the-web/navigating-across-documents/006.html (#21382)
- PASS [expected FAIL] subtest: Link with onclick form submit and href navigation
OK /html/browsers/browsing-the-web/navigating-across-documents/initial-empty-document/iframe-src-aboutblank-navigate-immediately.html (#29048)
- PASS [expected FAIL] subtest: Navigating to a different document with form submission
OK /html/browsers/browsing-the-web/navigating-across-documents/initial-empty-document/load-pageshow-events-iframe-contentWindow.html (#28681)
- PASS [expected FAIL] subtest: load & pageshow events do not fire on contentWindow of <iframe> element created with src='about:blank'
OK /html/browsers/browsing-the-web/navigating-across-documents/javascript-url-return-value-handling-dynamic.html (#28066)
- FAIL [expected PASS] subtest: 0041 set in href="" targeting a frame and clicked
```
assert_equals: expected "A" but got ""
```
- FAIL [expected PASS] subtest: 0080 00FF set in href="" targeting a frame and clicked
```
assert_equals: expected "�ÿ" but got ""
```
OK /html/browsers/browsing-the-web/navigating-across-documents/navigation-unload-cross-origin.sub.window.html (#29056)
- PASS [expected FAIL] subtest: Cross-origin navigation started from unload handler must be ignored
OK /html/browsers/browsing-the-web/navigating-across-documents/navigation-unload-same-origin-fragment.html (#20768)
- FAIL [expected PASS] subtest: Tests that a fragment navigation in the unload handler will not block the initial navigation
```
assert_equals: expected "" but got "#fragment"
```
OK /html/browsers/history/the-history-interface/traverse_the_history_2.html (#21383)
- PASS [expected FAIL] subtest: Multiple history traversals, last would be aborted
OK /html/browsers/history/the-history-interface/traverse_the_history_4.html (#21383)
- PASS [expected FAIL] subtest: Multiple history traversals, last would be aborted
OK /html/browsers/history/the-history-interface/traverse_the_history_5.html (#21383)
- PASS [expected FAIL] subtest: Multiple history traversals, last would be aborted

OK /html/semantics/embedded-content/the-iframe-element/iframe-loading-lazy-nav-location-assign.html (#32863)

FAIL [expected PASS] subtest: Navigating iframe loading='lazy' before it is loaded: location.assign

uncaught exception: Error: assert_equals: expected "http://web-platform.test:8000/html/semantics/embedded-content/the-iframe-element/support/blank.htm?nav" but got "http://web-platform.test:8000/html/semantics/embedded-content/the-iframe-element/support/blank.htm?src"

CRASH [expected OK] /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_popups_escaping-1.html (#22647)
OK /html/semantics/embedded-content/the-video-element/intrinsic_sizes.htm (#37173)
- FAIL [expected PASS] subtest: default object size after src is removed
```
assert_equals: expected "300px" but got "320px"
```
OK /navigation-timing/test-navigation-type-reload.html (#33334)
- PASS [expected FAIL] subtest: Reload domComplete > Original domComplete
- PASS [expected FAIL] subtest: Reload domContentLoadedEventEnd > Original domContentLoadedEventEnd
- PASS [expected FAIL] subtest: Reload domContentLoadedEventStart > Original domContentLoadedEventStart
- PASS [expected FAIL] subtest: Reload domInteractive > Original domInteractive
- PASS [expected FAIL] subtest: Reload fetchStart > Original fetchStart
- PASS [expected FAIL] subtest: Reload loadEventEnd > Original loadEventEnd
- PASS [expected FAIL] subtest: Reload loadEventStart > Original loadEventStart
TIMEOUT [expected CRASH] /trusted-types/trusted-types-navigation.html?31-35 (#38034)
- FAIL [expected PASS] subtest: Navigate a frame via form-submission with javascript:-urls in report-only mode.
```
promise_test: Unhandled rejection with value: "Unexpected message received: \"No securitypolicyviolation reported!\""
```
- TIMEOUT [expected PASS] subtest: Navigate a frame via form-submission with javascript:-urls w/ default policy in report-only mode.
```
Test timed out
```
- NOTRUN [expected PASS] subtest: Navigate a window via form-submission with javascript:-urls w/ a default policy throwing an exception in enforcing mode.
- NOTRUN [expected PASS] subtest: Navigate a window via form-submission with javascript:-urls w/ a default policy throwing an exception in report-only mode.
- NOTRUN [expected PASS] subtest: Navigate a window via form-submission with javascript:-urls w/ a default policy making the URL invalid in enforcing mode.
TIMEOUT [expected OK] /webmessaging/with-ports/018.html (#24485)
- TIMEOUT [expected PASS] subtest: origin of the script that invoked the method, javascript:
```
Test timed out
```
OK [expected TIMEOUT] /webmessaging/without-ports/018.html (#24485)
- PASS [expected TIMEOUT] subtest: origin of the script that invoked the method, javascript:

Stable unexpected results that are known to be intermittent (19)

FAIL [expected PASS] /_mozilla/css/stacked_layers.html (#15988)
FAIL [expected PASS] /_mozilla/mozilla/sslfail.html (#10760)
TIMEOUT [expected OK] /_mozilla/mozilla/window_resize_event.html (#36741)
- TIMEOUT [expected PASS] subtest: Popup onresize event fires after resizeTo
```
Test timed out
```
FAIL [expected PASS] /css/css-transitions/render-blocking/no-transition-from-ua-to-blocking-stylesheet.html (#29187)
TIMEOUT [expected FAIL] /dom/xslt/large-cdata.html (#38029)
OK /fetch/metadata/generated/css-font-face.sub.tentative.html (#34624)
- PASS [expected FAIL] subtest: sec-fetch-storage-access - Not sent to non-trustworthy cross-site destination
OK /html/browsers/browsing-the-web/navigating-across-documents/navigation-unload-same-origin.window.html (#29049)
- FAIL [expected PASS] subtest: Same-origin navigation started from unload handler must be ignored
```
assert_equals: expected "?pass" but got "?fail"
```

OK [expected TIMEOUT] /html/interaction/focus/the-autofocus-attribute/supported-elements.html (#24145)

FAIL [expected NOTRUN] subtest: Host element with delegatesFocus including no focusable descendants should be skipped

assert_equals: expected Element node &lt;input autofocus=""&gt;&lt;/input&gt; but got Element node &lt;body&gt;&lt;div autofocus=""&gt;&lt;/div&gt;&lt;input autofocus=""&gt;&lt;/body&gt;

FAIL [expected NOTRUN] subtest: Area element should support autofocus

promise_test: Unhandled rejection with value: object "TypeError: w.document.querySelector(...) is null"

TIMEOUT [expected OK] /html/interaction/focus/the-autofocus-attribute/update-the-rendering.html (#24145)
- TIMEOUT [expected FAIL] subtest: "Flush autofocus candidates" should be happen before a scroll event and animation frame callbacks
```
Test timed out
```
OK [expected CRASH] /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_popups_escaping-2.html (#22667)
CRASH [expected OK] /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_popups_nonescaping-1.html (#24066)
CRASH [expected OK] /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_popups_nonescaping-2.html (#22154)
TIMEOUT [expected CRASH] /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_popups_nonescaping-3.html (#24066)
OK [expected TIMEOUT] /html/semantics/forms/form-submission-0/reparent-form-during-planned-navigation-task.html (#29724)
- PASS [expected TIMEOUT] subtest: reparent-form-during-planned-navigation-task
OK /html/semantics/forms/historical.html (#28568)
- FAIL [expected PASS] subtest: <input name=isindex> should not be supported
```
assert_regexp_match: expected object "/\?isindex=x$/" but got "about:blank"
```

OK /preload/preload-error.sub.html (#37177)

FAIL [expected PASS] subtest: CORS (style): main

assert_greater_than: http://not-web-platform.test:8000/preload/resources/dummy.css?pipe=header%28Access-Control-Allow-Origin%2C*%29&amp;label=style should be loaded expected a number greater than 0 but got 0

PASS [expected FAIL] subtest: success (script): main

FAIL [expected PASS] subtest: CORS (script): main

assert_greater_than: http://not-web-platform.test:8000/preload/resources/dummy.js?pipe=header%28Access-Control-Allow-Origin%2C*%29&amp;label=script should be loaded expected a number greater than 0 but got 0

PASS [expected FAIL] subtest: success (xhr): main
PASS [expected FAIL] subtest: 404 (xhr): main

FAIL [expected PASS] subtest: Decode-error (style): main

assert_greater_than: http://web-platform.test:8000/preload/resources/dummy.xml?pipe=header%28Content-Type%2Ctext%2Fcss%29&amp;label=style should be loaded expected a number greater than 0 but got 0

FAIL [expected PASS] subtest: MIME-error (script): main

assert_greater_than: http://web-platform.test:8000/preload/resources/dummy.css?pipe=header%28Content-Type%2Ctext%2Fnotjavascript%29&amp;label=script should be loaded expected a number greater than 0 but got 0

TIMEOUT /preload/preload-resource-match.https.html (#38088)

FAIL [expected TIMEOUT] subtest: Loading script (anonymous) with link (no-cors) should discard the preloaded response

assert_equals: https://www1.web-platform.test:8443/preload/resources/echo-with-cors.py?type=application%2Fjavascript&amp;content=function%20dummy()%20%7B%20%7D&amp;uid=36738ade-eb4e-44eb-aa85-1bd72f290067&amp;d5630c95-1f09-41c8-be8c-3c1668c1c9f7 expected 2 but got 1

PASS [expected NOTRUN] subtest: Loading script (anonymous) with link (anonymous) should reuse the preloaded response

FAIL [expected NOTRUN] subtest: Loading script (anonymous) with link (use-credentials) should discard the preloaded response

assert_equals: https://www1.web-platform.test:8443/preload/resources/echo-with-cors.py?type=application%2Fjavascript&amp;content=function%20dummy()%20%7B%20%7D&amp;uid=36738ade-eb4e-44eb-aa85-1bd72f290067&amp;31576c70-c75c-4a13-ab25-8872efe9ebaa expected 2 but got 1

FAIL [expected NOTRUN] subtest: Loading script (use-credentials) with link (no-cors) should discard the preloaded response

assert_equals: https://www1.web-platform.test:8443/preload/resources/echo-with-cors.py?type=application%2Fjavascript&amp;content=function%20dummy()%20%7B%20%7D&amp;uid=36738ade-eb4e-44eb-aa85-1bd72f290067&amp;1a316f28-153e-45b6-973c-a0065da3ed0f expected 2 but got 1

TIMEOUT [expected NOTRUN] subtest: Loading script (use-credentials) with link (anonymous) should discard the preloaded response
```
Test timed out
```

OK /resize-observer/eventloop.html (#33599)
- FAIL [expected PASS] subtest: test0: multiple notifications inside same event loop
```
assert_equals: new loop expected 1 but got 0
```
TIMEOUT [expected OK] /resource-timing/tentative/document-initiated.html (#37785)

github-actions · 2025-07-17T16:09:17Z

✨ Try run (#16349517941) succeeded.

gterzian

Couple of more comments on the algos.

gterzian · 2025-07-17T16:25:18Z

components/script/dom/transformstreamdefaultcontroller.rs

-        self.transform_obj.set(*this_object);
+        if let TransformerType::Js { transform_obj, .. } = &self.transformer_type {
+            transform_obj.set(*this_object)
+        }


Here we should assert that the type is Js.

It now panics in the else branch in b857c0c

Is this still the case?

This has already been changed to unreachable!() in the else branch in commit 4d04d78. GH somehow didn't mark this outdated

components/script/dom/transformstreamdefaultcontroller.rs

gterzian · 2025-07-17T16:34:03Z

components/script/dom/transformstreamdefaultcontroller.rs

+                // chunk argument and runs the decode and enqueue a chunk
+                // algorithm with this and chunk.
+                decode_and_enqueue_a_chunk(cx, global, chunk, decoder, self, can_gc)
+                    .map(|_| Promise::new_resolved(global, cx, (), can_gc))


Yes this needs to implement, and document the steps of, the transformAlgorithmWrapper from https://streams.spec.whatwg.org/#transformstream-set-up like:

// <https://streams.spec.whatwg.org/#transformstream-set-up > // Let transformAlgorithmWrapper be an algorithm that runs these steps given a value chunk: // Let result be the result of running transformAlgorithm given chunk. let result = decode_and_enqueue_a_chunk

For the exception, please use the same as

servo/components/script/dom/readablestreamdefaultcontroller.rs

Line 684 in fe2c13c

// If result is an abrupt completion,

gterzian · 2025-07-17T16:36:56Z

components/script/dom/transformstreamdefaultcontroller.rs

+            TransformerType::Decoder(_) => {
+                // `TextDecoderStream` does NOT specify a cancel algorithm.
+                //
+                // Step 4. Let cancelAlgorithm be an algorithm which returns a promise resolved with undefined.


Same here,

// <https://streams.spec.whatwg.org/#transformstream-set-up> // Let result be the result of running cancelAlgorithm given reason, if cancelAlgorithm was given, or null otherwise. // Note: `TextDecoderStream` does not specify a cancel algorithm. // If result is a Promise, then return result. // Note: not applicable. // Return a promise resolved with undefined.

components/script/dom/transformstreamdefaultcontroller.rs

gterzian

Ok aside from those few comments, this looks good overall. Haven't had time to review all the steps in details; @Taym95 do you want to do it since you wrote transform stream?

servo-wpt-sync · 2025-07-21T05:01:21Z

🛠 These changes could not be applied onto the latest upstream WPT. Servo's copy of the Web Platform Tests may be out of sync.

gterzian · 2025-07-22T21:06:58Z

components/script/dom/textdecodercommon.rs

+                    .extend_from_slice(unsafe { a.as_slice() });
+            },
+            None => {},
+        };


Would be good to avoid the copy(see the note in the spec). Not sure how, could be more of a follow-up, but we should file an issue and add a TODO note.

In the new implementation, this copying only happens if there are leftover bytes. Otherwise, it would just use the input slice directly. Any bytes that are not consumed will then be stored in the decoder's I/O queue

What do you mean with the new implementation? In the current code, the extend_from_slice call clones the data rigtht? I think to avoid the copy you'd have to store the buffer in the I/O queue, and pass slices of it to the decoder?

Good for a follow-up.

I meant the changes introduced in a5bc680. GitHub somehow didn't mark this as outdated

Signed-off-by: minghuaw <wuminghua7@huawei.com>

minghuaw · 2025-07-23T09:15:40Z

I think the confusion comes from the imperfect mapping between the encoder/decoder implementation we are using, ie. encoding_rs, and the spec's detailed instruction on how each byte should be handled.

We should have something implementing https://encoding.spec.whatwg.org/#concept-encoding-process (calling into the decoder function would then correspond to "4: Let result be the result of running encoderDecoder’s handler on input and item."

This I believe is already taken care of by encoding_rs::Decoder::decode_to_string(_without_replacement). It describes the process of taking a byte ("item") and try to decode the given byte. The "result" it defines are intermediate states that encoding_rs::Decoder takes care of during the process of decoding an entire input buffer, which drains the input buffer (https://docs.rs/encoding_rs/latest/encoding_rs/#buffer-reading-and-writing-behavior). So every time it completes decoding, it has already run into an end-of-queue internally.

We also need something to implement https://encoding.spec.whatwg.org/#concept-td-serialize

I believe this is also already handled within encoding_rs::Decoder. All this "serialize I/O" is doing is just to convert a Vec<char> into a String, but we are decoding into a String type buffer with encoding_rs::Decoder from the very beginning.

So currently everything goes through TextDecoderCommon::decode, but that needs to be broken-up as per the above.

Yes it should be broken up but not as per the above. I have moved do_not_flush back to TextDecoder, keeping TextDecoderCommon::decode as minimal as possible.

And we also need to store the output in the case of the "streaming" use of the i/o queue(not to be confused with the transform stream)

I think this is an misunderstanding. Isn't the "streaming" part referring to the Readable and Writable? The output is just referring to a buffer that stores all the decoded char before returning them altogether as a String, which is what encoding_rs::Decoder::decode_to_string already takes care of. This part of the spec looks confusing because it covers details that are already covered in encoding_rs

Signed-off-by: minghuaw <wuminghua7@huawei.com>

gterzian · 2025-07-23T11:26:10Z

I think the confusion comes from the imperfect mapping between the encoder/decoder implementation we are using, ie. encoding_rs, and the spec's detailed instruction on how each byte should be handled.

We should have something implementing https://encoding.spec.whatwg.org/#concept-encoding-process (calling into the decoder function would then correspond to "4: Let result be the result of running encoderDecoder’s handler on input and item."

This I believe is already taken care of by encoding_rs::Decoder::decode_to_string(_without_replacement). It describes the process of taking a byte ("item") and try to decode the given byte. The "result" it defines are intermediate states that encoding_rs::Decoder takes care of during the process of decoding an entire input buffer, which drains the input buffer (https://docs.rs/encoding_rs/latest/encoding_rs/#buffer-reading-and-writing-behavior). So every time it completes decoding, it has already run into an end-of-queue internally.

We also need something to implement https://encoding.spec.whatwg.org/#concept-td-serialize

I believe this is also already handled within encoding_rs::Decoder. All this "serialize I/O" is doing is just to convert a Vec<char> into a String, but we are decoding into a String type buffer with encoding_rs::Decoder from the very beginning.

So currently everything goes through TextDecoderCommon::decode, but that needs to be broken-up as per the above.

Yes it should be broken up but not as per the above. I have moved do_not_flush back to TextDecoder, keeping TextDecoderCommon::decode as minimal as possible.

And we also need to store the output in the case of the "streaming" use of the i/o queue(not to be confused with the transform stream)

I think this is an misunderstanding. Isn't the "streaming" part referring to the Readable and Writable? The output is just referring to a buffer that stores all the decoded char before returning them altogether as a String, which is what encoding_rs::Decoder::decode_to_string already takes care of. This part of the spec looks confusing because it covers details that are already covered in encoding_rs

So I've had another look at your implementation and the spec.

What you implemented--passing "do not flush" along as "last" for the text decoder, and passing "last" as true in the flush algo for the stream decoder--makes conceptual sense to me, and it also seems to match what the JS examples do. And it also matches how encoding rs describes incremental decoding.

But what I find confusing is that it doesn't seem to be how the spec is written.

For example, the "decode and enqueue algo" only enqueues a chunk on the transform when "end of queue" is received, not each time an item is processed.

Also, for the text decoder, a result is only returned when "end of queue" is encountered, not each time an item is processed.

So the spec, to me at least, seems to imply that one should keep appending processed items to an output, but only return or enqueue the result of serialising the output when reaching end of queue.

Also note that "end of queue" is a special item that means there will be no more items, so it's not the same thing as "the i/o queue is currently empty", which is indeed reached at the end of processing each item.

In conclusion, I'd say we stick to your implementation, but you could consider opening an issue at the spec to point out the potential weirdness.

gterzian · 2025-07-23T13:10:36Z

So last update in the above discussion: my reading of "end of queue" was wrong; it isn't a special value enqueued to signal that no more items will be enqueued, rather it just means "queue is currently empty".

So for each call to 'decode' the "end of queue" will be hit.

The way "streaming" works is that the end of queue will only be processed when the stream option is false, or, in the streaming decoder, when the stream is closed and the flush algorithm is called. The point of that seems to be to allow more data to be received before "end of queue" is finally processed.

See whatwg/encoding#184

So your implementation is right, and the spec is unambiguous.

gterzian

Ok so from my side we just need to throw the error.

@Taym95 how is it looking on the buffer use?

components/script/dom/textdecodercommon.rs

gterzian · 2025-07-23T15:47:04Z

components/script/dom/textdecodercommon.rs

+                // <https://encoding.spec.whatwg.org/#dom-textdecoder-decode>
+                // Step 5.3.3 Otherwise, if result is error, throw a TypeError.
+                DecoderResult::Malformed(_, _) => {
+                    return Err(Error::Type("Decoding failed".to_owned()));


So here I think you need to throw the type error, as done here, but using Error::Type.

I didn't realize that throwing is different from just returning an Err. Would you mind providing some more contexts regarding why both are used?

The realm problem found in wpt/tests/encoding/streams/realms.window.js are actually solved by manually specifying the realm when creating the rejection Promise b66a365#diff-3b8b0e94fec7d463a8c649b8679235a8b77cbf2663a934f3b1ba16a7244ffff5

There is still one TextDecoderStream test that fails in that subtest, which is the case of invalid chunk, and it seems like the root cause is a bug in encoding_rs. encoding_rs::Decoder doesn't return an error with invalid input [0xFF]. See my comment below for an example

gterzian · 2025-07-23T15:47:29Z

tests/wpt/meta/encoding/streams/realms.window.js.ini

-
  [TypeError for chunk with the wrong type should come from constructor realm of TextDecoderStream]
    expected: FAIL



Those failing test should be fixed by throwing the type error.

gterzian · 2025-07-23T15:51:05Z

tests/wpt/meta/encoding/streams/decode-utf8.any.js.ini

  expected: ERROR

 [decode-utf8.any.worker.html]
  expected: ERROR


Note to self: error is from

servo/tests/wpt/tests/common/sab.js

Line 1 in 0e4b2bf

const createBuffer = (() => {

"sabConstructor is not a constructor".

Signed-off-by: minghuaw <wuminghua7@huawei.com>

minghuaw · 2025-07-24T05:43:37Z

tests/wpt/meta/encoding/streams/realms.window.js.ini

-  [TypeError for chunk with the wrong type should come from constructor realm of TextDecoderStream]
-    expected: FAIL
-
  [TypeError for invalid chunk should come from constructor realm of TextDecoderStream]


This is still failing because encoding_rs::Decoder::decode_to_string_without_replacement fails to return DecoderResult::Malformed with invalid input [0xFF]. This can be reproduced with this: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=8225345ee8ffedee5a4b6ecd0161beb2

Opened an issue in encoding_rs repo: hsivonen/encoding_rs#122

This might be just me using it wrong though because changing last to true would indeed give Malformed

Actually, the problem is that when last is set to false (not flushing), the number of bytes read would actually include the malformed byte, so when it gets to flushing, ie. last is set to true, it sees an empty slice because the malformed bytes are already removed.

It seems like that encoding_rs::Decoder only shows this problem when decoding an invalid input of length 1 with last set to false, which is exactly how the WPT test case is written

Good for a follow-up.

Do we want an issue in Servo tracking this?

Solved in 3039820

gterzian

I didn't realize that throwing is different from just returning an Err. Would you mind providing some more contexts regarding why both are used?

If you return an error from a method implementing webidl(like Decode), then the bindings will throw an exception.

But in the case where the transform algorithm is called, the error is not returned to Webidl; we do propagate the error by rejecting the promise with it, but that is not the same thing as throwing an exception. I don't see any tests asserting the one is thrown however(you are right to point out the realms.window.js is unrelated), so I would say we leave it like it is.

FYI, if you wanted to manually set the exception, you'd have to return Error::JSFailed, to prevent the bindings from setting the exception that would have already been set, and then to propagate to the promise you'd have to manually get the pending exception, as is done at

servo/components/script/dom/readablestreamdefaultcontroller.rs

Line 686 in 5592061

unsafe { assert!(JS_GetPendingException(*cx, rval.handle_mut())) };

gterzian

LGTM, thanks and good to see another use of streams in the codebase.

Couple of comments to address and issues to file, and then it's ready for the merge queue.

gterzian · 2025-07-24T17:35:31Z

components/script/dom/transformstreamdefaultcontroller.rs

-        self.transform_obj.set(*this_object);
+        if let TransformerType::Js { transform_obj, .. } = &self.transformer_type {
+            transform_obj.set(*this_object)
+        }


Is this still the case?

gterzian · 2025-07-24T17:40:53Z

components/script/dom/textdecodercommon.rs

+        // <https://encoding.spec.whatwg.org/#decode-and-enqueue-a-chunk>
+        // Step 2. Push a copy of bufferSource to decoder’s I/O queue.
+        //
+        // NOTE: try to avoid this copy unless there are bytes left


Is this a note or a todo? If it's a note, it would be good to explain how the copy is avoided.

This is a note. Here we only call extend_from_slice on the io_queue if we have leftover bytes from incomplete input. Otherwise, the input will be pointing to the input chunk

Ok I see, the code is clear actually.

gterzian · 2025-07-24T17:42:47Z

components/script/dom/textdecodercommon.rs

+                    .extend_from_slice(unsafe { a.as_slice() });
+            },
+            None => {},
+        };


What do you mean with the new implementation? In the current code, the extend_from_slice call clones the data rigtht? I think to avoid the copy you'd have to store the buffer in the I/O queue, and pass slices of it to the decoder?

Good for a follow-up.

gterzian · 2025-07-24T17:43:03Z

tests/wpt/meta/encoding/streams/realms.window.js.ini

-  [TypeError for chunk with the wrong type should come from constructor realm of TextDecoderStream]
-    expected: FAIL
-
  [TypeError for invalid chunk should come from constructor realm of TextDecoderStream]


Good for a follow-up.

Signed-off-by: minghuaw <wuminghua7@huawei.com>

minghuaw · 2025-07-25T05:09:47Z

components/script/dom/textdecodercommon.rs

+        let decoder = if ignoreBOM {
+            encoding.new_decoder_without_bom_handling()
+        } else {
+            encoding.new_decoder_with_bom_removal()


It turns out that, because of my ignorance, I had been creating new decoder instances with new_decoder() which enables BOM sniffing, thus making the decoder not return immediately when it sees 0xFF. We now pass all the decoder related test cases in realms.window.js WPT subtest

gterzian

Ok awesome, thanks. Time to add this to the merge queue.

gterzian · 2025-07-28T16:00:19Z

Looks like there are some more test passes that need updated expectations

Signed-off-by: minghuaw <wuminghua7@huawei.com>

minghuaw · 2025-07-29T02:49:09Z

Looks like there are some more test passes that need updated expectations

Updated the WPT expectations. Should be good now (see the try run: https://github.com/minghuaw/servo/actions/runs/16584399397)

This PR implements the `TextDecoderStream`. Other than introducing the necessary mod and webidl files corresponding to `TextDecoderStream`, this PR also involves some changes in `TextDecoder` and `TrasnformStream`: - The common part that can be shared between `TextDecoder` and `TextDecoderStream` are extracted into a separate type `script::dom::textdecodercommon::TextDecoderCommon`. This type could probably use a different name because there is an interface called `TextDecoderCommon` in the spec (https://encoding.spec.whatwg.org/#textdecodercommon) which just gets included in `TextDecoder` and `TextDecoderStream`. - The three algorithms in `TransformStream` (`cancel`, `flush`, and `transform`) all have become `enum` that has a `Js` variant for a JS function object and a `Native` variant for a rust trait object. Whether the cancel algorithm needs this enum type is debatable as I did not find any interface in the spec that explicitly sets the cancel algorithm. Testing: Existing WPT tests `tests/wpt/tests/encoding/stream` should be sufficient Fixes: servo#37723 --------- Signed-off-by: minghuaw <michael.wu1107@gmail.com> Signed-off-by: minghuaw <wuminghua7@huawei.com> Signed-off-by: Minghua Wu <michael.wu1107@gmail.com>

minghuaw requested a review from gterzian as a code owner July 16, 2025 06:35

minghuaw force-pushed the feat/textdecoderstream-transformstream-enum branch from 116bbf6 to 852a99e Compare July 16, 2025 06:36

minghuaw commented Jul 16, 2025

View reviewed changes

components/script/dom/transformstreamdefaultcontroller.rs Outdated Show resolved Hide resolved

Taym95 8000 reviewed Jul 16, 2025

View reviewed changes

components/script/dom/textdecodercommon.rs Outdated Show resolved Hide resolved

components/script/dom/textdecoderstream.rs Show resolved Hide resolved

minghuaw requested a review from Taym95 July 16, 2025 07:49

minghuaw force-pushed the feat/textdecoderstream-transformstream-enum branch from 57c6fae to eb350cc Compare July 16, 2025 09:41

gterzian reviewed Jul 16, 2025

View reviewed changes

minghuaw requested a review from gterzian July 17, 2025 05:18

minghuaw force-pushed the feat/textdecoderstream-transformstream-enum branch from 4f70be6 to a1f0c91 Compare July 17, 2025 08:20

Taym95 reviewed Jul 17, 2025

View reviewed changes

components/script/dom/transformstream.rs Show resolved Hide resolved

Taym95 reviewed Jul 17, 2025

View reviewed changes

components/script/dom/transformstreamdefaultcontroller.rs Show resolved Hide resolved

Taym95 reviewed Jul 17, 2025

View reviewed changes

Taym95 added the T-linux-wpt Do a try run of the WPT label Jul 17, 2025

github-actions bot removed the T-linux-wpt Do a try run of the WPT label Jul 17, 2025

gterzian reviewed Jul 17, 2025

View reviewed changes

components/script/dom/transformstreamdefaultcontroller.rs Outdated Show resolved Hide resolved

gterzian reviewed Jul 17, 2025

View reviewed changes

minghuaw requested review from Taym95 and gterzian July 21, 2025 05:46

Taym95 added the T-linux-wpt Do a try run of the WPT label Jul 21, 2025

github-actions bot removed the T-linux-wpt Do a try run of the WPT label Jul 21, 2025

gterzian reviewed Jul 22, 2025

View reviewed changes

minghuaw added 2 commits July 23, 2025 16:35

rescope TextDecoderCommon

a5bc680

Signed-off-by: minghuaw <wuminghua7@huawei.com>

added more spec text

ee60807

Signed-off-by: minghuaw <wuminghua7@huawei.com>

minghuaw requested a review from gterzian July 23, 2025 08:52

remove unused #[allow(unsafe_code)]

5c26924

Signed-off-by: minghuaw <wuminghua7@huawei.com>

gterzian requested changes Jul 23, 2025

View reviewed changes

minghuaw added 2 commits July 24, 2025 13:17

fix type error realm

b66a365

Signed-off-by: minghuaw <wuminghua7@huawei.com>

update expected wpt results

12bcb51

Signed-off-by: minghuaw <wuminghua7@huawei.com>

minghuaw commented Jul 24, 2025

View reviewed changes

minghuaw requested a review from gterzian July 24, 2025 10:13

gterzian reviewed Jul 24, 2025

View reviewed changes

gterzian approved these changes Jul 24, 2025

View reviewed changes

disable BOM sniffing when creating new decoder

3039820

Signed-off-by: minghuaw <wuminghua7@huawei.com>

minghuaw commented Jul 25, 2025

View reviewed changes

minghuaw requested a review from gterzian July 28, 2025 01:07

gterzian approved these changes Jul 28, 2025

View reviewed changes

gterzian added this pull request to the merge queue Jul 28, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 28, 2025

minghuaw added 2 commits July 29, 2025 09:34

update WPT expectation

d65a5d9

Signed-off-by: minghuaw <wuminghua7@huawei.com>

fix typo

f68e45f

Signed-off-by: minghuaw <wuminghua7@huawei.com>

gterzian added this pull request to the merge queue Jul 29, 2025

Merged via the queue into servo:main with commit 554b2da Jul 29, 2025
21 checks passed

kkoyung mentioned this pull request Oct 5, 2025

script: Implement Compression API #39658

Open


		[TypeError for chunk with the wrong type should come from constructor realm of TextDecoderStream]
		expected: FAIL

Uh oh!

Script: Implement TextDecoderStream #38112

Script: Implement TextDecoderStream #38112

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Script: Implement `TextDecoderStream` #38112

Script: Implement `TextDecoderStream` #38112