TW202420830A

TW202420830A - Identifying and marking video data units for network transport of video data

Info

Publication number: TW202420830A
Application number: TW112121408A
Authority: TW
Inventors: 勇何; 莫哈美德塞伊德克班; 普拉享哈瑞達斯漢德; 依梅德堡爾吉吉; 尼古拉康拉德梁
Original assignee: 美商高通公司
Priority date: 2022-11-01
Filing date: 2023-06-08
Publication date: 2024-05-16
Also published as: CN120092455A; EP4612913A1; WO2024096934A1; KR20250103621A

Abstract

An example device for retrieving media data includes a memory; and a processing system comprising one or more processors implemented in circuitry, the processing system being configured to:receive a packet including a packet header and a payload including at least a portion of a frame of video data, the packet header being separate from the payload; extract, from the packet header, a video frame identifier for the frame of video data; and process the payload according to the video frame identifier.

Description

Identifies and tags video data units for network transmission of video data

本專利申請案主張2022年11月1日提出申請的美國臨時申請案第63/381,902號的權益，其全部內容以引用的方式併入本文。This patent application claims the benefit of U.S. Provisional Application No. 63/381,902, filed on November 1, 2022, the entire contents of which are incorporated herein by reference.

本案係關於編碼視訊資料的傳輸。This case is about the transmission of encoded video data.

數位視訊能力可以被包含到多種設備中，該設備包括數位電視、數位直播系統、無線廣播系統、個人數位助理（PDA）、膝上型或桌上型電腦、數碼相機、數位記錄設備、數位媒體播放機、視訊遊戲裝置、視訊遊戲控制台、蜂巢或衛星無線電話、視訊電話會議設備等。數位視訊設備實現視訊壓縮技術（諸如在由MPEG-2、MPEG-4、ITU-T H.263或ITU-T H.264/MPEG-4，第10部分、高級視訊編碼（AVC）、ITU-T H.265（亦被稱為高效視訊編碼（HEVC））以及此類標準的擴展定義的標準中所描述的技術，以更高效地發送和接收數位視訊資訊。Digital video capabilities may be incorporated into a variety of devices, including digital televisions, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite wireless phones, video teleconferencing equipment, etc. Digital video devices implement video compression techniques (such as those described in standards defined by MPEG-2, MPEG-4, ITU-T H.263 or ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), ITU-T H.265 (also known as High Efficiency Video Coding (HEVC)), and extensions of such standards) to more efficiently send and receive digital video information.

視訊壓縮技術進行空間預測及/或時間預測，以減少或移除視訊序列中固有的冗餘。針對基於塊的視訊編碼，可以將視訊訊框或切片分割成巨集塊。可以進一步分割每個巨集塊。使用相對於鄰近巨集塊的空間預測對訊框內編碼（I）訊框或切片中的巨集塊進行編碼。訊框間編碼（P或B）訊框或切片中的巨集塊可以使用相對於相同訊框或切片中的鄰近巨集塊的空間預測或相對於其他參考訊框的時間預測。Video compression techniques perform spatial prediction and/or temporal prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video frame or slice may be partitioned into macroblocks. Each macroblock may be further partitioned. Intra-frame coding (I) Macroblocks in a frame or slice are encoded using spatial prediction relative to neighboring macroblocks. Inter-frame coding (P or B) Macroblocks in a frame or slice may use spatial prediction relative to neighboring macroblocks in the same frame or slice or temporal prediction relative to other reference frames.

在已經對視訊資料進行了編碼之後，可以將視訊資料打包以用於發送或儲存。視訊資料可以被彙編成與多種標準（諸如國際標準組織（ISO）基礎媒體檔案格式及其擴展，諸如AVC）中的任一種相符的視訊檔。After the video data has been encoded, it can be packaged for transmission or storage. The video data can be assembled into video files that conform to any of a number of standards, such as the International Standards Organization (ISO) base media file format and its extensions, such as AVC.

一般而言，本案描述了與封包標頭中的網路封包中所包含的媒體資料的訊號傳遞特點相關的技術。這種特點可以包括例如媒體資料的辨識符（例如圖片的切片及/或圖片的辨識符）。辨識符可以指示或涉及在某些情況下是否可以諸如根據用於開始接收包括媒體資料的位元串流的隨機存取技術而丟棄媒體資料。例如，若媒體資料取決於沒有接收到的位元串流的早期媒體資料，則可以丟棄封包。若媒體資料被包括在用於隨機存取的漸進解碼器刷新（GDR）圖片中，若媒體資料被包括在隨機存取可跳過前導（RASL）圖片中，或其他此類實例，則此情形可能發生。Generally, techniques are described herein that relate to signaling characteristics of media data contained in a network packet in a packet header. Such characteristics may include, for example, an identifier of the media data (e.g., a slice of a picture and/or an identifier of the picture). The identifier may indicate or relate to whether the media data may be discarded under certain circumstances, such as depending on a random access technique used to begin receiving a bit stream that includes the media data. For example, the packet may be discarded if the media data depends on earlier media data of the bit stream not being received. This may occur if the media data is included in a Progressive Decoder Refresh (GDR) picture used for random access, if the media data is included in a Random Access Skippable Preamble (RASL) picture, or other such instances.

在一個實例中，接收視訊資料的方法包括：接收包括封包標頭和有效載荷的封包，該有效載荷包括視訊資料的訊框的至少一部分，封包標頭與有效載荷分離；從封包標頭中提取視訊資料的訊框的視訊訊框辨識符；及根據視訊訊框辨識符來處理有效載荷。In one example, a method for receiving video data includes: receiving a packet including a packet header and a payload, the payload including at least a portion of a frame of video data, the packet header being separated from the payload; extracting a video frame identifier of the frame of video data from the packet header; and processing the payload based on the video frame identifier.

在另一實例中，一種用於接收視訊資料的設備包括：記憶體，該記憶體被配置為儲存視訊資料；及一或多個處理器，該一或多個處理器在電路系統中實現並且被配置為：接收包括封包標頭和有效載荷的封包，該有效載荷包括視訊資料的訊框的至少一部分，封包標頭與有效載荷分離；從封包標頭中提取視訊資料的訊框的視訊訊框辨識符；及根據視訊訊框辨識符來處理有效載荷。In another example, a device for receiving video data includes: a memory configured to store the video data; and one or more processors implemented in a circuit system and configured to: receive a packet including a packet header and a payload, the payload including at least a portion of a frame of video data, the packet header being separated from the payload; extract a video frame identifier of the frame of video data from the packet header; and process the payload based on the video frame identifier.

在另一實例中，一種用於接收視訊資料的設備包括：用於接收包括封包標頭和有效載荷的封包的部件，該有效載荷包括視訊資料的訊框的至少一部分，封包標頭與有效載荷分離；用於從封包標頭中提取視訊資料的訊框的視訊訊框辨識符的部件；及用於根據視訊訊框辨識符來處理有效載荷的部件。In another example, an apparatus for receiving video data includes: a component for receiving a packet including a packet header and a payload, the payload including at least a portion of a frame of video data, the packet header being separated from the payload; a component for extracting a video frame identifier of the frame of video data from the packet header; and a component for processing the payload based on the video frame identifier.

在另一實例中，一種電腦可讀取儲存媒體在其上儲存有指令，該等指令在被執行時使處理器進行以下操作：接收包括封包標頭和有效載荷的封包，該有效載荷包括視訊資料的訊框的至少一部分，封包標頭與有效載荷分離；從封包標頭中提取視訊資料的訊框的視訊訊框辨識符；及根據視訊訊框辨識符來處理有效載荷。In another example, a computer-readable storage medium has instructions stored thereon that, when executed, cause a processor to perform the following operations: receive a packet including a packet header and a payload, the payload including at least a portion of a frame of video data, the packet header being separated from the payload; extract a video frame identifier of the frame of video data from the packet header; and process the payload based on the video frame identifier.

在附圖和以下描述中闡述了一或多個實例的細節。其他特徵、物件和優點將根據說明書和附圖以及申請專利範圍而變得顯而易見。The details of one or more embodiments are set forth in the accompanying drawings and the following description. Other features, objects, and advantages will become apparent from the description and drawings, and from the claims.

一般而言，本案描述了與發送和接收擴展現實（XR）媒體資料相關的技術，諸如增強現實（AR）、混合現實（MR）及/或虛擬實境（VR）。例如，兩個網路設備（諸如伺服器設備和客戶端設備或兩個使用者設備（UE）設備）之間的媒體通訊通信期可以包括音訊資料、視訊資料及/或XR資料。因此，使用者可以在與一或多個其他使用者進行通訊的同時參與XR通訊通信期。XR通訊通信期可以與基於XR的電訊通信期、遊戲等對應。In general, the present invention describes technologies related to sending and receiving extended reality (XR) media data, such as augmented reality (AR), mixed reality (MR), and/or virtual reality (VR). For example, a media communication session between two network devices (such as a server device and a client device or two user equipment (UE) devices) may include audio data, video data, and/or XR data. Therefore, a user may participate in an XR communication session while communicating with one or more other users. The XR communication session may correspond to an XR-based telecommunication session, a game, etc.

當前正在研究與XR和媒體（XRM）服務相關的各種問題。兩個問題包括協定資料單元（PDU）集合集的成封包處理和可區分的PDU集合處理的使用，其通常涉及在第五代（5G）使用者平面功能（UPF）增強PDU集合處理以最佳化XRM消費體驗。PDU集合可以包括複數個PDU，並且每個PDU可以包括公共呈現時間的資料。例如，PDU集合可以包括PDU，該PDU包括用於視訊資料的訊框的資料及/或用於電腦產生圖形的圖形XR資料。因此，PDU集合可以與視訊訊框對應，並且每個PDU可以是視訊訊框的切片或網路抽象層（NAL）單元。Various issues related to XR and media (XRM) services are currently being studied. Two issues include packetization of protocol data unit (PDU) sets and the use of distinguishable PDU set processing, which generally involves enhancing PDU set processing at the fifth generation (5G) user plane function (UPF) to optimize the XRM consumer experience. A PDU set may include a plurality of PDUs, and each PDU may include data for a common presentation time. For example, a PDU set may include a PDU that includes data for a frame of video data and/or graphics XR data for computer generated graphics. Thus, a PDU set may correspond to a video frame, and each PDU may be a slice of a video frame or a network abstraction layer (NAL) unit.

在5G系統（5GS）中，應用域與5GS之間的介面可以是基於服務品質（QoS）流的。QoS流是PDU通信期中的QoS區分的最佳細微性。QoS流辨識符（QFI）可以用於標識5GS中的QoS流。在PDU通信期內具有相同QFI的使用者平面傳輸量可以接收相同的傳輸量轉發處理。In the 5G system (5GS), the interface between the application domain and the 5GS can be based on quality of service (QoS) flows. A QoS flow is the finest granularity of QoS differentiation within a PDU communication period. A QoS flow identifier (QFI) can be used to identify a QoS flow in the 5GS. User plane traffic with the same QFI within a PDU communication period can receive the same traffic forwarding treatment.

每個PDU可以與例如經由基於電腦的網路傳送的封包對應。即時傳輸協定（RTP）或其他協定可以用於傳輸PDU。RTP通常經由統一資料包協定（UDP）進行。因而，封包可以被無序遞送，並且UDP不提供封包遞送保證。此外，因為有效載荷資料可以被加密或不可被網路元件存取，所以網路層級的封包處理通常不具有存取封包有效載荷資料（其可以包括視訊編碼層（VCL）資料）的許可權。Each PDU may correspond to a packet transmitted, for example, over a computer-based network. Real-time Transport Protocol (RTP) or other protocols may be used to transmit PDUs. RTP is typically carried over the Uniform Datagram Protocol (UDP). Thus, packets may be delivered out of order, and UDP does not provide packet delivery guarantees. Furthermore, because the payload data may be encrypted or inaccessible to network elements, network-level packet processing typically does not have permission to access packet payload data (which may include video coding layer (VCL) data).

因而，根據本案的技術，某些視訊資訊可以被包括在有效載荷之外的封包標頭中，使得封包可以由不能存取有效載荷資料的網路設備處理。此種資料可以包括例如視訊資料的訊框或訊框的一部分的標識資訊。標識可以諸如利用指示訊框的顯示順序的圖片順序計數（POC）值來具體地標識訊框。作為另一實例，訊框號可以指示訊框的編碼順序值，其可以不同於顯示順序。標識資訊亦可以（補充地或替代地）包括表示編碼層的資料，諸如時間層辨識符，其通常與訊框可以用於預測的可能參考訊框的數量以及後續訊框是否可以使用訊框作為參考訊框對應。Thus, according to the techniques of the present invention, certain video information may be included in a packet header outside of the payload so that the packet may be processed by a network device that cannot access the payload data. Such data may include, for example, identification information of a frame or a portion of a frame of video data. The identification may specifically identify the frame, such as using a picture order count (POC) value that indicates the display order of the frame. As another example, the frame number may indicate a coding order value of the frame, which may be different from the display order. The identification information may also (supplementarily or alternatively) include data representing a coding layer, such as a time layer identifier, which typically corresponds to the number of possible reference frames that the frame can use for prediction and whether a subsequent frame can use the frame as a reference frame.

以這種方式，使用者設備（例如使用者設備（user equipment））的網路設備或網路元件可以為每個接收到的封包決定封包的有效載荷中所包括的媒體資料的辨識符資訊。因此，網路設備或網路元件可以決定例如是否已經接收到訊框的所有封包、是否已經接收到訊框的參考訊框、是否可以對將訊框用於參考的後續訊框進行解碼（由於是否已經接收到訊框）等。以這種方式，網路設備或網路元件可以決定是否要（例如）向視訊解碼器提供封包的有效載荷中的視訊資料、是否要檢索丟失的參考訊框、是否要在不向視訊解碼器發送視訊資料的情況下丟棄視訊資料的一或多個集合等。In this manner, a network device or network element of a user device (e.g., user equipment) can determine, for each received packet, identifier information of the media data included in the payload of the packet. Thus, the network device or network element can determine, for example, whether all packets of a frame have been received, whether a reference frame of a frame has been received, whether a subsequent frame that uses the frame for reference can be decoded (due to whether the frame has been received), etc. In this manner, the network device or network element can determine, for example, whether to provide video data in the payload of a packet to a video decoder, whether to retrieve a missing reference frame, whether to discard one or more sets of video data without sending the video data to a video decoder, etc.

可以將本案的技術應用於與根據ISO基礎媒體檔案格式、可伸縮視訊編碼（SVC）檔案格式、高級視訊編碼（AVC）檔案格式、第三代合作夥伴計畫（3GPP）檔案格式及/或多視圖視訊編碼（MVC）檔案格式或其他類似視訊檔案格式中的任一者封裝的視訊資料相符的視訊檔。The technology of the present invention can be applied to video files that conform to video data encapsulated according to any of the ISO base media file format, the scalable video coding (SVC) file format, the advanced video coding (AVC) file format, the third generation partnership project (3GPP) file format and/or the multi-view video coding (MVC) file format or other similar video file formats.

圖1是圖示了實現用於經由網路資料串流媒體資料的技術的實例系統10的方塊圖。在該實例中，系統10包括內容準備設備20、伺服器設備60和客戶端設備40。者客戶端設備40和伺服器設備60經由網路74通訊耦合，該網路可以包括網際網路。在一些實例中，內容準備設備20和伺服器設備60亦可以經由網路74或另一網路耦合，或可以直接通訊耦合。在一些實例中，內容準備設備20和伺服器設備60可以包括相同設備。FIG. 1 is a block diagram illustrating an example system 10 implementing techniques for streaming media data via a network. In this example, system 10 includes content preparation device 20, server device 60, and client device 40. Client device 40 and server device 60 are communicatively coupled via network 74, which may include the Internet. In some examples, content preparation device 20 and server device 60 may also be coupled via network 74 or another network, or may be directly communicatively coupled. In some examples, content preparation device 20 and server device 60 may include the same device.

在圖1的實例中，內容準備設備20包括音訊源22和視訊源24。音訊源22可以包括例如麥克風，該麥克風產生表示要由音訊編碼器26編碼的擷取到的音訊資料的電訊號。替代地，音訊源22可以包括儲存先前記錄的音訊資料的儲存媒體、音訊資料產生器（諸如電腦化合成器）或任何其他音訊資料來源。視訊源24可以包括產生要由視訊轉碼器28編碼的視訊資料的視訊相機、用先前記錄的視訊資料編碼的儲存媒體、視訊資料產生單元（諸如電腦圖形源）或任何其他視訊資料來源。在所有實例中，內容準備設備20並不一定需要通訊地耦合到伺服器設備60，而是可以將多媒體內容儲存到由伺服器設備60讀取的單獨媒體。In the example of FIG. 1 , content preparation device 20 includes an audio source 22 and a video source 24. Audio source 22 may include, for example, a microphone that generates electrical signals representing captured audio data to be encoded by audio encoder 26. Alternatively, audio source 22 may include a storage medium storing previously recorded audio data, an audio data generator (such as a computerized synthesizer), or any other source of audio data. Video source 24 may include a video camera that generates video data to be encoded by video transcoder 28, a storage medium encoded with previously recorded video data, a video data generation unit (such as a computer graphics source), or any other source of video data. In all examples, the content preparation device 20 does not necessarily need to be communicatively coupled to the server device 60, but can store the multimedia content to a separate medium that is read by the server device 60.

原始音訊和視訊資料可以包括類比或數位資料。可以在被音訊編碼器26及/或視訊轉碼器28編碼之前使類比資料數位化。音訊源22可以在說話的參與者正在說話時獲得來自說話的參與者的音訊資料，並且視訊源24可以同時獲得說話的參與者的視訊資料。在其他實例中，音訊源22可以包括含有所儲存的音訊資料的電腦可讀取儲存媒體，而視訊源24可以包括含有所儲存的視訊資料的電腦可讀取儲存媒體。以這種方式，可以將本案中所描述的技術應用於實況的、資料串流的、即時音訊和視訊資料或應用於存檔的、預先記錄的音訊和視訊資料。The original audio and video data may include analog or digital data. The analog data may be digitized before being encoded by the audio encoder 26 and/or the video transcoder 28. The audio source 22 may obtain audio data from the speaking participant while the speaking participant is speaking, and the video source 24 may obtain video data of the speaking participant at the same time. In other examples, the audio source 22 may include a computer-readable storage medium containing stored audio data, and the video source 24 may include a computer-readable storage medium containing stored video data. In this way, the technology described in this case may be applied to live, streaming, real-time audio and video data or to archived, pre-recorded audio and video data.

與視訊訊框對應的音訊訊框通常是包含音訊資料的音訊訊框，該音訊資料是由音訊源22與被包含在視訊訊框內的由視訊源24擷取到的（或產生的）視訊資料同時擷取到的（或產生的）。例如，當說話的參與者通常經由說話來產生音訊資料時，音訊源22擷取音訊資料，並且視訊源24同時（亦即，在音訊源22正在擷取音訊資料的同時）擷取說話的參與者的視訊資料。因此，音訊訊框可以與一或多個特定視訊訊框在時間上對應。因此，與視訊訊框對應的音訊訊框通常與同時擷取到音訊資料和視訊資料的情況對應，並且對於該情況，音訊訊框和視訊訊框分別包括同時擷取到的音訊資料和視訊資料。An audio frame corresponding to a video frame is typically an audio frame containing audio data that is captured (or generated) by audio source 22 simultaneously with video data captured (or generated) by video source 24 contained within the video frame. For example, when a speaking participant typically generates audio data by speaking, audio source 22 captures audio data and video source 24 captures video data of the speaking participant simultaneously (i.e., while audio source 22 is capturing audio data). Thus, an audio frame may correspond in time to one or more particular video frames. Therefore, an audio frame corresponding to a video frame generally corresponds to a situation where audio data and video data are captured simultaneously, and for this situation, the audio frame and the video frame include the audio data and video data captured simultaneously, respectively.

在一些實例中，音訊編碼器26可以在每個經編碼音訊訊框中對時間戳記進行編碼，該時間戳記表示記錄經編碼音訊訊框的音訊資料的時間，並且類似地，視訊轉碼器28可以在每個經編碼視訊訊框中對時間戳記進行編碼，該時間戳記表示記錄經編碼視訊訊框的視訊資料的時間。在此類實例中，與視訊訊框對應的音訊訊框可以包括含有時間戳記的音訊訊框和含有相同時間戳記的視訊訊框。內容準備設備20可以包括內部時鐘，音訊編碼器26及/或視訊轉碼器28可以根據該內部時鐘來產生時間戳記，或音訊源22和視訊源24可以使用該內部時鐘將音訊和視訊資料分別與時間戳記相關聯。In some examples, the audio encoder 26 may encode a timestamp in each encoded audio frame, the timestamp indicating the time when the audio data of the encoded audio frame was recorded, and similarly, the video transcoder 28 may encode a timestamp in each encoded video frame, the timestamp indicating the time when the video data of the encoded video frame was recorded. In such examples, the audio frames corresponding to the video frames may include audio frames containing timestamps and video frames containing the same timestamps. The content preparation device 20 may include an internal clock, and the audio encoder 26 and/or the video transcoder 28 may generate timestamps according to the internal clock, or the audio source 22 and the video source 24 may use the internal clock to associate audio and video data with timestamps, respectively.

在一些實例中，音訊源22可以向音訊編碼器26發送與記錄音訊資料的時間對應的資料，並且視訊源24可以向視訊轉碼器28發送與記錄視訊資料的時間對應的資料。在一些實例中，音訊編碼器26可以在經編碼音訊資料中對序列辨識符進行編碼以指示經編碼音訊資料的相對時間順序，但並不一定指示記錄音訊資料的絕對時間，並且類似地，視訊轉碼器28亦可以使用序列辨識符來指示經編碼視訊資料的相對時間順序。類似地，在一些實例中，序列辨識符可以被映射或與時間戳記相關。In some examples, audio source 22 may send data corresponding to the time at which the audio data was recorded to audio encoder 26, and video source 24 may send data corresponding to the time at which the video data was recorded to video transcoder 28. In some examples, audio encoder 26 may encode a sequence identifier in the encoded audio data to indicate a relative temporal order of the encoded audio data, but not necessarily an absolute time at which the audio data was recorded, and similarly, video transcoder 28 may also use a sequence identifier to indicate a relative temporal order of the encoded video data. Similarly, in some examples, the sequence identifier may be mapped or associated with a timestamp.

音訊編碼器26通常產生經編碼音訊資料串流，而視訊轉碼器28產生經編碼視訊資料串流。每個單獨資料串流（無論是音訊亦是視訊）可以被稱為基本串流。基本串流是媒體呈現的單個數位編碼（可能是壓縮的）分量。例如，媒體呈現的經編碼視訊或音訊部分可以是基本串流。在基本串流被封裝在視訊檔內之前，可以將該基本串流轉換成封包化基本串流（PES）。在相同媒體呈現內，流ID可以用於將屬於一個基本串流的PES封包與另一個封包區分開。基本串流的基礎資料單元是封包化基本串流（PES）封包。因此，經編碼視訊資料通常與基本視訊流對應。類似地，音訊資料與一或多個相應基本串流對應。The audio encoder 26 typically produces a coded audio data stream, while the video transcoder 28 produces a coded video data stream. Each individual data stream (whether audio or video) can be referred to as an elementary stream. An elementary stream is a single digitally encoded (possibly compressed) component of a media presentation. For example, the encoded video or audio portion of a media presentation can be an elementary stream. Before the elementary stream is encapsulated in a video file, the elementary stream can be converted into a packetized elementary stream (PES). Within the same media presentation, a stream ID can be used to distinguish a PES packet belonging to one elementary stream from another. The basic data unit of an elementary stream is a packetized elementary stream (PES) packet. Therefore, coded video data typically corresponds to an elementary video stream. Similarly, audio data corresponds to one or more corresponding elementary streams.

在圖1的實例中，內容準備設備20的封裝單元30從視訊轉碼器28接收包括經編碼視訊資料的基本串流，並且從音訊編碼器26接收包括經編碼音訊資料的基本串流。在一些實例中，視訊轉碼器28和音訊編碼器26可以各自包括用於由編碼資料形成PES封包的封包化器。在其他實例中，視訊轉碼器28和音訊編碼器26可以各自與用於由編碼資料形成PES封包的相應封包化器介面連接。在又一些實例中，封裝單元30可以包括用於由經編碼音訊和視訊資料形成PES封包的封包化器。In the example of FIG. 1 , the encapsulation unit 30 of the content preparation device 20 receives an elementary stream including encoded video data from the video transcoder 28, and receives an elementary stream including encoded audio data from the audio encoder 26. In some examples, the video transcoder 28 and the audio encoder 26 may each include a packetizer for forming PES packets from the encoded data. In other examples, the video transcoder 28 and the audio encoder 26 may each be connected to a corresponding packetizer interface for forming PES packets from the encoded data. In still other examples, the encapsulation unit 30 may include a packetizer for forming PES packets from the encoded audio and video data.

視訊轉碼器28可以按各種方式對多媒體內容的視訊資料進行編碼以產生處於各種位元速率並且具有各種特點（諸如圖元解析度、畫面播放速率、符合各種編碼標準、符合各種設定檔及/或各種編碼標準的設定檔級別、具有一或多個視圖的表示（例如用於二維或三維重播）或其他此類特點）的多媒體內容的不同表示。如本案中所使用，表示可以包括音訊資料、視訊資料、文字資料（例如用於隱藏字幕）或其他此類資料中的一種。表示可以包括基本串流，諸如音訊基本串流或視訊基本串流。每個PES封包可以包括標識PES封包所屬的基本串流的stream_id。封裝單元30負責將基本串流彙編成可資料串流的媒體資料。The video transcoder 28 can encode the video data of the multimedia content in various ways to generate different representations of the multimedia content at various bit rates and with various characteristics (such as pixel resolution, frame playback rate, compliance with various coding standards, compliance with various profiles and/or profile levels of various coding standards, representations with one or more views (such as for two-dimensional or three-dimensional playback), or other such characteristics). As used in the present case, the representation may include audio data, video data, text data (such as for hidden subtitles), or one of other such data. The representation may include elementary streams, such as an audio elementary stream or a video elementary stream. Each PES packet may include a stream_id that identifies the elementary stream to which the PES packet belongs. The encapsulation unit 30 is responsible for assembling the elementary streams into media data that can be streamed.

封裝單元30從音訊編碼器26和視訊轉碼器28接收媒體呈現的基本串流的PES封包，並且由PES封包形成對應的網路抽象層（NAL）單元。經編碼視訊分段可以被組織成NAL單元，該NAL單元提供「網路友好型」視訊表示，從而解決諸如視訊電話、儲存、廣播或資料串流的應用。NAL單元可以被分類為視訊編碼層（VCL）NAL單元和非VCL NAL單元。VCL單元可以包含核心壓縮引擎，並且可以包括塊、巨集塊及/或切片級資料。其他NAL單元可以是非VCL NAL單元。在一些實例中，一個時間實例中的編碼圖片（通常呈現為主編碼圖片）可以被包含在存取單元中，該存取單元可以包括一或多個NAL單元。The encapsulation unit 30 receives PES packets of the elementary streams of the media presentation from the audio encoder 26 and the video transcoder 28, and forms corresponding network abstraction layer (NAL) units from the PES packets. The encoded video segments can be organized into NAL units, which provide a "network-friendly" video representation, thereby addressing applications such as video telephony, storage, broadcasting, or data streaming. NAL units can be classified into video coding layer (VCL) NAL units and non-VCL NAL units. VCL units can contain the core compression engine and can include block, macroblock and/or slice level data. Other NAL units can be non-VCL NAL units. In some examples, a coded picture in a temporal instance (usually presented as a primary coded picture) may be contained in an access unit, which may include one or more NAL units.

非VCL NAL單元可以包括參數集NAL單元和SEI NAL單元等。參數集可以包含序列級標頭資訊（在序列參數集（SPS）中）和不頻繁改變的圖片級標頭資訊（在圖片參數集（PPS）中）。利用參數集（例如PPS和SPS），不需要針對每個序列或圖片重複不頻繁改變的資訊；因此，可以提高編碼效率。此外，參數集的使用可以實現重要標頭資訊的帶外傳輸，從而避免了對用於容錯的冗餘傳輸的需要。在帶外傳輸的實例中，可以在與其他NAL單元（諸如SEI NAL單元）不同的通道上發送參數集NAL單元。Non-VCL NAL units can include parameter set NAL units and SEI NAL units, among others. Parameter sets can contain sequence-level header information (in a sequence parameter set (SPS)) and infrequently changing picture-level header information (in a picture parameter set (PPS)). With parameter sets (such as PPS and SPS), infrequently changing information does not need to be repeated for each sequence or picture; therefore, coding efficiency can be improved. In addition, the use of parameter sets enables out-of-band transmission of important header information, thereby avoiding the need for redundant transmission for error tolerance. In instances of out-of-band transmission, parameter set NAL units can be sent on a different channel from other NAL units (such as SEI NAL units).

補充增強資訊（SEI）可以包含對來自VCL NAL單元的經編碼圖片取樣進行解碼所不必要的資訊，但可以協助與解碼、顯示、容錯和其他目的相關的程序。SEI訊息可以被包含在非VCL NAL單元中。SEI訊息是一些標準規範的正規部分，並且因此對於依據標準的解碼器實現並不總是強制性的。SEI訊息可以是序列級SEI訊息或圖片級SEI訊息。一些序列級資訊可以被包含在SEI訊息中，諸如SVC的實例中的可伸縮性資訊SEI訊息和MVC中的視圖可伸縮性資訊SEI訊息。這些實例SEI訊息可以傳達關於例如操作點的提取和操作點的特徵的資訊。Supplementary Enhancement Information (SEI) may contain information that is not necessary for decoding coded picture samples from VCL NAL units, but may assist procedures related to decoding, display, error resilience, and other purposes. SEI messages may be included in non-VCL NAL units. SEI messages are a regular part of some standard specifications and are therefore not always mandatory for standard-compliant decoder implementations. SEI messages may be sequence-level SEI messages or picture-level SEI messages. Some sequence-level information may be included in SEI messages, such as the scalability information SEI messages in instances of SVC and the view scalability information SEI messages in MVC. These instance SEI messages may convey information about, for example, the extraction of operation points and the characteristics of the operation points.

伺服器設備60包括即時傳輸協定（RTP）發送單元70和網路介面72。在一些實例中，伺服器設備60可以包括複數個網路介面。此外，伺服器設備60的任何或所有特徵可以在內容遞送網路的其他設備（諸如路由器、橋接器、代理設備、交換機或其他設備）上實現。在一些實例中，內容遞送網路的中繼裝置可以緩存多媒體內容64的資料，並且包括與伺服器設備60的部件基本上相符的部件。一般而言，網路介面72被配置為經由網路74發送和接收資料。The server device 60 includes a real-time transport protocol (RTP) sending unit 70 and a network interface 72. In some examples, the server device 60 may include a plurality of network interfaces. In addition, any or all features of the server device 60 may be implemented on other devices of the content delivery network (such as routers, bridges, proxy devices, switches, or other devices). In some examples, a relay device of the content delivery network may cache data of the multimedia content 64 and include components that are substantially consistent with the components of the server device 60. In general, the network interface 72 is configured to send and receive data via the network 74.

RTP發送單元70被配置為根據RTP經由網路74向客戶端設備40遞送媒體資料，該RTP在網際網路工程任務組（IETF）的請求註解（RFC）3550中被標準化。RTP發送單元70亦可以實現與RTP相關的協定，諸如RTP控制協定（RTCP）、即時流傳輸協定（RTSP）、對話啟動協定（SIP）及/或工作階段描述通訊協定（SDP）。RTP發送單元70可以經由網路介面72發送媒體資料，該網路介面可以實現統一資料包協定（UDP）及/或網際網路協定（IP）。因此，在一些實例中，伺服器設備60可以使用網路74經由UDP經由RTP和RTSP發送媒體資料。The RTP sending unit 70 is configured to deliver media data to the client device 40 via the network 74 according to RTP, which is standardized in the Request for Comments (RFC) 3550 of the Internet Engineering Task Force (IETF). The RTP sending unit 70 can also implement protocols related to RTP, such as the RTP Control Protocol (RTCP), the Real-Time Streaming Protocol (RTSP), the Session Initiation Protocol (SIP) and/or the Session Description Protocol (SDP). The RTP sending unit 70 can send the media data via the network interface 72, which can implement the Uniform Datagram Protocol (UDP) and/or the Internet Protocol (IP). Thus, in some examples, server device 60 may use network 74 to send media data via UDP, via RTP, and via RTSP.

RTP發送單元70可以從例如客戶端設備40接收RTSP描述請求。RTSP描述請求可以包括指示客戶端設備40支援什麼類型的資料的資料。RTP發送單元70可以用指示媒體流的資料（諸如媒體內容64）對客戶端設備40做出回應，該資料可以連同對應網路位置辨識符（諸如統一資源定位符（URL）或統一資源名稱（URN））一起被發送給客戶端設備40。The RTP send unit 70 may receive an RTSP describe request from, for example, the client device 40. The RTSP describe request may include data indicating what types of data are supported by the client device 40. The RTP send unit 70 may respond to the client device 40 with data indicating a media stream, such as the media content 64, which may be sent to the client device 40 along with a corresponding network location identifier, such as a uniform resource locator (URL) or a uniform resource name (URN).

RTP發送單元70接著可以從客戶端設備40接收RTSP建立請求。RTSP建立請求通常可以指示將如何傳輸媒體流。RTSP建立請求可以包含所請求的媒體資料（例如媒體內容64）的網路位置辨識符和傳輸說明符，諸如用於在客戶端設備40上接收RTP資料和控制資料（例如RTCP資料）的本端埠。RTP發送單元70可以用確認和表示伺服器設備60的埠的資料來回復RTSP建立請求，將經由該埠發送RTP資料和控制資料。RTP發送單元70接著可以接收RTSP播放請求，以使得媒體流被「播放」，亦即，經由網路74發送給使用者客戶端設備40。RTP發送單元70亦可以接收對結束流傳輸通信期的RTSP拆除請求，回應於此，RTP發送單元70可以停止針對對應通信期向客戶端設備40發送媒體資料。The RTP send unit 70 may then receive an RTSP setup request from the client device 40. The RTSP setup request may generally indicate how the media stream will be transmitted. The RTSP setup request may include a network location identifier of the requested media data (e.g., media content 64) and a transmission descriptor, such as a local port for receiving RTP data and control data (e.g., RTCP data) on the client device 40. The RTP send unit 70 may then reply to the RTSP setup request with data indicating a port of the server device 60 over which the RTP data and control data will be sent. The RTP send unit 70 may then receive an RTSP play request to cause the media stream to be "played," i.e., sent to the user client device 40 over the network 74. The RTP sending unit 70 may also receive an RTSP teardown request to end a streaming communication period, in response to which the RTP sending unit 70 may stop sending media data to the client device 40 for the corresponding communication period.

同樣，RTP接收單元52可以經由最初向伺服器設備60發送RTSP描述請求來發起媒體流。RTSP描述請求可以指示客戶端設備40支援的資料類型。RTP接收單元52接著可以從伺服器設備60接收指定可用媒體流（諸如媒體內容64）的回復，該回復可以連同對應網路位置辨識符（諸如統一資源定位符（URL）或統一資源名稱（URN））一起被發送給客戶端設備40。Likewise, the RTP receiving unit 52 may initiate a media stream by initially sending an RTSP describe request to the server device 60. The RTSP describe request may indicate the data types supported by the client device 40. The RTP receiving unit 52 may then receive a reply from the server device 60 specifying available media streams, such as media content 64, which may be sent to the client device 40 along with a corresponding network location identifier, such as a uniform resource locator (URL) or uniform resource name (URN).

RTP接收單元52接著可以產生RTSP建立請求，並且向伺服器設備60發送RTSP建立請求。如上文所提到的，RTSP建立請求可以包含所請求的媒體資料（例如媒體內容64）的網路位置辨識符和傳輸說明符，諸如用於在客戶端設備40上接收RTP資料和控制資料（例如RTCP資料）的本端埠。作為回應，RTP接收單元52可以從伺服器設備60（包括伺服器設備60的該伺服器設備60將用於發送媒體資料和控制資料的埠）接收確認。The RTP receiving unit 52 may then generate an RTSP setup request and send the RTSP setup request to the server device 60. As mentioned above, the RTSP setup request may include a network location identifier of the requested media data (e.g., media content 64) and a transport descriptor, such as a local port for receiving RTP data and control data (e.g., RTCP data) on the client device 40. In response, the RTP receiving unit 52 may receive an acknowledgment from the server device 60 (including the port of the server device 60 that the server device 60 will use to send the media data and control data).

在伺服器設備60與客戶端設備40之間建立媒體流傳輸通信期之後，伺服器設備60的RTP發送單元70可以根據媒體流傳輸通信期來向使用者客戶端設備40發送媒體資料（例如媒體資料封包）。伺服器設備60和客戶端設備40可以交換指示例如客戶端設備40的接收統計的控制資料（例如RTCP資料），使得伺服器設備60可以進行壅塞控制或以其他方式診斷和解決發送故障。After the media stream communication period is established between the server device 60 and the client device 40, the RTP sending unit 70 of the server device 60 can send media data (e.g., media data packets) to the user client device 40 according to the media stream communication period. The server device 60 and the client device 40 can exchange control data (e.g., RTCP data) indicating, for example, reception statistics of the client device 40, so that the server device 60 can perform congestion control or otherwise diagnose and resolve transmission failures.

網路介面54可以接收所選擇的媒體呈現的媒體並且向RTP接收單元52提供所選的媒體呈現的媒體，該RTP接收單元又可以向解封裝單元50提供媒體資料。解封裝單元50可以將視訊檔的元素解封裝成組成PES串流，對PES流進行拆包以檢索編碼資料，並且根據編碼資料是音訊串流的一部分還是視訊串流的一部分（例如，如由串流的PES封包標頭所指示），向音訊解碼器46或視訊解碼器48發送編碼資料。音訊解碼器46對經編碼音訊資料進行解碼並且向音訊輸出端42發送經解碼音訊資料，而視訊解碼器48對經編碼視訊資料進行解碼並且向視訊輸出端44發送經解碼視訊資料，該經解碼視訊資料可以包括串流的複數個視圖。Network interface 54 may receive media of the selected media presentation and provide the media of the selected media presentation to RTP receiving unit 52, which in turn may provide the media data to decapsulation unit 50. Decapsulation unit 50 may decapsulate elements of the video file into constituent PES streams, depacketize the PES streams to retrieve the encoded data, and send the encoded data to audio decoder 46 or video decoder 48 depending on whether the encoded data is part of an audio stream or part of a video stream (e.g., as indicated by the PES packet headers of the streams). The audio decoder 46 decodes the encoded audio data and sends the decoded audio data to the audio output 42, while the video decoder 48 decodes the encoded video data and sends the decoded video data, which may include a plurality of views of the stream, to the video output 44.

視訊轉碼器28、視訊解碼器48、音訊編碼器26、音訊解碼器46、封裝單元30、RTP接收單元52和解封裝單元50各自可以視情況被實現為各種合適的處理電路系統中的任一者，諸如一或多個微處理器、數位訊號處理器（DSP）、特殊應用積體電路（ASIC）、現場可程式設計閘陣列（FPGA）、個別邏輯電路系統、軟體、硬體、韌體或它們的任何組合。視訊轉碼器28和視訊解碼器48中的每一者可以被包括在一或多個編碼器或解碼器中，該編碼器或該解碼器中的任一者可以被整合為組合式視訊轉碼器/解碼器（轉碼器）的一部分。同樣，音訊編碼器26和音訊解碼器46中的每一者可以被包括在一或多個編碼器或解碼器中，該編碼器或該解碼器中的任一者可以被整合為組合式轉碼器的一部分。包括視訊轉碼器28、視訊解碼器48、音訊編碼器26、音訊解碼器46、封裝單元30、RTP接收單元52及/或解封裝單元50的裝置可以包括積體電路、微處理器及/或無線通訊設備，諸如蜂巢式電話。The video transcoder 28, the video decoder 48, the audio encoder 26, the audio decoder 46, the packaging unit 30, the RTP receiving unit 52, and the decapsulation unit 50 can each be implemented as any of a variety of suitable processing circuit systems, such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), individual logic circuit systems, software, hardware, firmware, or any combination thereof, as appropriate. Each of the video transcoder 28 and the video decoder 48 can be included in one or more encoders or decoders, and any of the encoders or decoders can be integrated as part of a combined video transcoder/decoder (transcoder). Likewise, each of the audio encoder 26 and the audio decoder 46 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined transcoder. A device including the video transcoder 28, the video decoder 48, the audio encoder 26, the audio decoder 46, the packaging unit 30, the RTP receiving unit 52, and/or the decapsulation unit 50 may include an integrated circuit, a microprocessor, and/or a wireless communication device, such as a cellular phone.

客戶端設備40、伺服器設備60及/或內容準備設備20可以被配置為根據本案的技術來操作。出於實例的目的，本案描述了關於客戶端設備40和伺服器設備60的這些技術。然而，應當理解，代替伺服器設備60（或除了伺服器設備60之外），內容準備設備20可以被配置為進行這些技術。Client device 40, server device 60, and/or content preparation device 20 may be configured to operate according to the techniques of the present invention. For example, the present invention describes these techniques with respect to client device 40 and server device 60. However, it should be understood that content preparation device 20 may be configured to perform these techniques instead of server device 60 (or in addition to server device 60).

封裝單元30可以形成NAL單元，該NAL單元包括標識NAL單元所屬的節目的標頭以及有效載荷，例如音訊資料、視訊資料或描述NAL單元所對應的傳輸或節目流的資料。例如，在H.264/AVC中，NAL單元包括1位元組的標頭和可變大小的有效載荷。在其有效載荷中包括視訊資料的NAL單元可以包括各種細微性級別的視訊資料。例如，NAL單元可以包括視訊資料區塊、複數個塊、視訊資料切片或整個視訊資料圖片。封裝單元30可以從視訊轉碼器28接收呈基本串流的PES封包形式的經編碼視訊資料。封裝單元30可以將每個基本串流與對應的節目相關聯。The encapsulation unit 30 may form a NAL unit including a header identifying the program to which the NAL unit belongs and a payload, such as audio data, video data, or data describing the transport or program stream to which the NAL unit corresponds. For example, in H.264/AVC, the NAL unit includes a 1-byte header and a payload of variable size. A NAL unit including video data in its payload may include video data at various levels of detail. For example, a NAL unit may include a block of video data, a plurality of blocks, a slice of video data, or an entire picture of video data. The encapsulation unit 30 may receive encoded video data in the form of PES packets of elementary streams from the video transcoder 28. The encapsulation unit 30 may associate each elementary stream with a corresponding program.

封裝單元30亦可以組裝來自複數個NAL單元的存取單元。一般而言，存取單元可以包括用於表示視訊資料的訊框以及與該訊框對應的音訊資料（當這種音訊資料可用時）的一或多個NAL單元。存取單元通常包括一個輸出時間實例的所有NAL單元，例如一個時間實例的所有音訊和視訊資料。例如，若每個視圖具有20訊框每秒（fps）的畫面播放速率，則每個時間實例可以與0.05秒的時間間隔對應。在該時間間隔期間，可以同時呈現相同存取單元（相同時間實例）的所有視圖的特定訊框。在一個實例中，在一個時間實例中，存取單元可以包括編碼圖片，該編碼圖片可以作為主編碼圖片呈現。The encapsulation unit 30 may also assemble access units from a plurality of NAL units. In general, an access unit may include one or more NAL units for representing a frame of video data and audio data corresponding to the frame (when such audio data is available). An access unit typically includes all NAL units of an output time instance, such as all audio and video data of a time instance. For example, if each view has a picture playback rate of 20 frames per second (fps), each time instance may correspond to a time interval of 0.05 seconds. During the time interval, specific frames of all views of the same access unit (same time instance) may be presented simultaneously. In an example, in a time instance, an access unit may include a coded picture, which may be presented as a primary coded picture.

因此，存取單元可以包括公共時間實例的所有音訊和視訊訊框，例如與時間 X對應的所有視圖。本案亦將特定視圖的編碼圖片稱為「視圖分量」。亦即，視圖分量可以包括特定的時間的特定視圖的編碼圖片（或訊框）。因此，存取單元可以被定義為包括公共時間實例的所有視圖分量。存取單元的解碼順序並不一定需要與輸出或顯示順序相同。 Thus, an access unit may include all audio and video frames of a common time instance, e.g., all views corresponding to time X. The coded pictures of a particular view are also referred to herein as "view components." That is, a view component may include coded pictures (or frames) of a particular view at a particular time. Thus, an access unit may be defined as including all view components of a common time instance. The decoding order of access units does not necessarily need to be the same as the output or display order.

在封裝單元30已經基於接收到的資料而將NAL單元及/或存取單元彙編成視訊檔之後，封裝單元30將視訊檔傳遞給輸出介面32以用於輸出。在一些實例中，封裝單元30可以本機存放區視訊檔或經由輸出介面32向遠端伺服器發送視訊檔，而不是向客戶端設備40直接發送視訊檔。輸出介面32可以包括例如發送器、收發器、用於將資料寫入電腦可讀取媒體的設備，諸如（例如）光碟機、磁性媒體驅動器（例如軟碟機）、通用序列匯流排（USB）埠、網路介面或其他輸出介面。輸出介面32將視訊檔輸出到電腦可讀取媒體，諸如（例如）發送訊號、磁性媒體、光學媒體、記憶體、快閃記憶體驅動器或其他電腦可讀取媒體。After the packaging unit 30 has assembled the NAL units and/or access units into a video file based on the received data, the packaging unit 30 transmits the video file to the output interface 32 for output. In some examples, the packaging unit 30 may store the video file locally or send the video file to a remote server via the output interface 32, rather than directly sending the video file to the client device 40. The output interface 32 may include, for example, a transmitter, a transceiver, a device for writing data to a computer-readable medium, such as, for example, an optical disk drive, a magnetic media drive (e.g., a floppy disk drive), a universal serial bus (USB) port, a network interface, or other output interface. The output interface 32 outputs the video file to a computer readable medium, such as, for example, a transmission signal, magnetic media, optical media, memory, flash memory drive or other computer readable media.

網路介面54可以經由網路74接收NAL單元或存取單元，並且經由RTP接收單元52向解封裝單元50提供NAL單元或存取單元。解封裝單元50可以將視訊檔的元素解封裝成組成PES串流，對PES串流進行拆包以檢索編碼資料，並且根據編碼資料是音訊串流的一部分還是視訊串流的一部分（例如，如由串流的PES封包標頭所指示），向音訊解碼器46或視訊解碼器48發送編碼資料。音訊解碼器46對經編碼音訊資料進行解碼並且向音訊輸出端42發送經解碼音訊資料，而視訊解碼器48對經編碼視訊資料進行解碼並且向視訊輸出端44發送經解碼視訊資料，該經解碼視訊資料可以包括串流的複數個視圖。Network interface 54 may receive NAL units or access units via network 74 and provide the NAL units or access units to decapsulation unit 50 via RTP receiving unit 52. Decapsulation unit 50 may decapsulate elements of the video file into constituent PES streams, depacketize the PES streams to retrieve the encoded data, and send the encoded data to audio decoder 46 or video decoder 48 depending on whether the encoded data is part of an audio stream or part of a video stream (e.g., as indicated by a PES packet header of the stream). The audio decoder 46 decodes the encoded audio data and sends the decoded audio data to the audio output 42, while the video decoder 48 decodes the encoded video data and sends the decoded video data, which may include a plurality of views of the stream, to the video output 44.

圖2是圖示了用於擴展現實（XR）傳輸量遞送的實例架構的概念圖。圖2圖示了應用/服務層100、使用者平面功能（UPF）102、存取網路（AN）110和客戶端設備120。客戶端設備120可以與圖1的客戶端設備40對應並且通常包括與客戶端設備40的部件類似的部件。UPF 102和AN 110可以與圖1的網路74內的網路設備對應。FIG2 is a conceptual diagram illustrating an example architecture for extended reality (XR) transport delivery. FIG2 illustrates an application/service layer 100, a user plane function (UPF) 102, an access network (AN) 110, and a client device 120. The client device 120 may correspond to the client device 40 of FIG1 and generally include components similar to those of the client device 40. The UPF 102 and the AN 110 may correspond to network devices within the network 74 of FIG1.

在該實例中，UPF 102包括封包偵測單元104和封包偵測規則106，AN 110包括AN資源映射單元的服務品質（QoS）112和無線電介面114，並且客戶端設備120包括QoS規則122、AN資源映射單元的QoS124和無線電介面126。In this example, UPF 102 includes a packet detection unit 104 and packet detection rules 106, AN 110 includes a quality of service (QoS) 112 of an AN resource mapping unit and a radio interface plane 114, and client device 120 includes QoS rules 122, QoS 124 of an AN resource mapping unit, and a radio interface plane 126.

客戶端設備120可以發送和接收例如呈音訊、視訊及/或擴展現實（XR）資料形式的媒體資料。從客戶端設備120發送到另一設備（諸如圖1的另一客戶端設備或伺服器設備60）的資料經由上行鏈路串流發送，而由客戶端設備120接收到的資料經由下行鏈路串流接收。在下行鏈路（DL）擴展現實和媒體（XRM）服務串流中，UPF 102可以基於封包偵測規則106的封包篩檢程式集合來對傳入資料封包進行分類。UPF 102可以經由QoS串流辨識符（QFI）標記來傳達屬於QoS串流的使用者平面傳輸量的分類。AN 110的AN資源映射單元的QoS112將QoS串流拘束到AN資源（亦即，資料無線電承載）。針對上行鏈路（UL）XRM服務串流，客戶端設備120（其可以是使用者設備（UE））進行與UPF 102類似的PDU集合標識程序。客戶端設備120可以對通向AN的標識出的PDU集合進行標記。The client device 120 can send and receive media data, such as in the form of audio, video, and/or extended reality (XR) data. Data sent from the client device 120 to another device (such as another client device or the server device 60 of FIG. 1 ) is sent via an uplink stream, and data received by the client device 120 is received via a downlink stream. In a downlink (DL) extended reality and media (XRM) service stream, the UPF 102 can classify incoming data packets based on a set of packet filters of packet detection rules 106. The UPF 102 can communicate the classification of user plane traffic belonging to a QoS stream via a QoS flow identifier (QFI) marking. The QoS 112 of the AN resource mapping unit of the AN 110 constrains the QoS stream to the AN resources (i.e., the data radio bearer). For uplink (UL) XRM service streams, the client device 120 (which may be a user equipment (UE)) performs a PDU set identification procedure similar to that of the UPF 102. The client device 120 may mark the identified PDU set to the AN.

3GPP TR 23.700-60報告了用於經由匹配RTP/SRTP標頭、標頭擴展和有效載荷來標識PDU和PDU集合邊界的候選解決方案。提出了新參數（諸如PDU集合序號、PDU集合辨識符和PDU集合類型）來標識PDU和PDU集合。候選解決方案亦提出經由匹配RTP標頭擴展或有效載荷中的相關參數（諸如視訊網路抽象層（NAL）類型、時間ID（TID）和層ID（LID））來標記PDU集合優先順序、相關性或重要性。最重要的PDU或PDU集合可以指派給具有較高QoS要求的QoS串流，而不太重要的PDU或PDU集合可以在網路壅塞期間或在其相關PDU或PDU集合沒有被完全遞送時被丟棄。亦提出了在沒有接收到PDU集合中的若干PDU的情況下，AN可以丟棄PDU集合。3GPP TR 23.700-60 reports candidate solutions for identifying PDU and PDU set boundaries by matching RTP/SRTP headers, header extensions, and payloads. New parameters such as PDU set sequence number, PDU set identifier, and PDU set type are proposed to identify PDUs and PDU sets. Candidate solutions also propose to mark PDU set priority, relevance, or importance by matching relevant parameters in RTP header extensions or payloads such as video network abstraction layer (NAL) type, time ID (TID), and layer ID (LID). The most important PDU or PDU set can be assigned to QoS streams with higher QoS requirements, while less important PDUs or PDU sets can be discarded during network congestion or when their associated PDUs or PDU sets are not fully delivered. It is also proposed that the AN can discard a PDU set if some PDUs in the PDU set are not received.

可以將PDU映射到視訊資料封包。編碼器或應用功能（AF）可以被配置為針對視訊封包辨識諸如分群組類型和相關性的視訊資料封包特性以及諸如圖片順序計數器（POC）值的資料封包的辨識符。辨識符通常嵌入在媒體資料封包內並且不被暴露給RTP標頭或標頭擴展。在資料封裝、過濾和映射到QoS串流期間，特定視訊資料封包與對應特性之間的鏈路可能會丟失，特別是對於無序發送，這是由於封包順序可能被打亂。因為許多RTP封包可以共享相同特性，所以將每個封包的屬性或特性的欄位添加到RTP標頭或標頭擴展可以增加管理負擔成本。將視訊資料封包辨識符添加到RTP標頭或標頭擴展作為資料封包特性的列表的索引可以是有益的。PDUs may be mapped to video data packets. An encoder or application function (AF) may be configured to identify video data packet characteristics such as grouping type and correlation and identifiers of data packets such as picture order counter (POC) values for video packets. Identifiers are typically embedded within the media data packet and are not exposed to the RTP header or header extension. During data encapsulation, filtering, and mapping to a QoS stream, the link between a specific video data packet and the corresponding characteristic may be lost, particularly for out-of-order delivery since the packet order may be disrupted. Because many RTP packets may share the same characteristics, adding fields for attributes or characteristics for each packet to the RTP header or header extension may increase management overhead. It may be beneficial to add a video data packet identifier to the RTP header or header extension as an index into a list of data packet characteristics.

PDU集合亦可以具有標記的優先順序值。在一些情況下，圖片類型、時間辨識符（TID）或層辨識符（LID）可以用於標識PDU集合之間的優先順序，並且標記PDU集合優先順序或重要性。然而，多個優先順序含義可以增加中繼裝置的複雜性，以匯出最終整體優先順序。此外，編碼器可以向所有訊框指派相同TID，並且具有相同TID的訊框的重要性亦可以改變。通常有意義的是，將訊框內編碼圖片標記為最高優先順序，而將不用於參考的圖片標記為最低優先順序。然而，存在圖片被訊框內編碼但又不用於參考的情況（例如全訊框內模式）。可能需要單個指示來指示用於檢查的轉碼器級別的圖片優先順序。PDU sets may also have tagged priority values. In some cases, a picture type, a time identifier (TID), or a layer identifier (LID) may be used to identify priority between PDU sets, and to mark PDU set priority or importance. However, multiple priority meanings may increase the complexity for relay devices to derive the final overall priority. Furthermore, the encoder may assign the same TID to all frames, and the importance of frames with the same TID may also vary. It generally makes sense to mark pictures that are coded within a frame as the highest priority, and pictures that are not used for reference as the lowest priority. However, there are cases where pictures are coded within a frame but are not used for reference (e.g., all-intraframe mode). A single indication may be needed to indicate the priority of pictures at the codec level for checking.

PDU集合亦可以具有標記的相關性資訊。在大多數視訊編碼方案中，經由參考圖片管理來管理圖片相關性。參考圖片被儲存在解碼圖片緩衝器（DPB）中以用於訊框間預測，直到該參考圖片被標記為「未用於參考」並且從DPB移除為止。A PDU set may also have tagged dependency information. In most video coding schemes, picture dependencies are managed via reference picture management. Reference pictures are stored in the decoded picture buffer (DPB) for inter-frame prediction until the reference picture is marked as "unused for reference" and removed from the DPB.

ITU-T H.266/通用視訊編碼（VVC）指定了兩個參考圖片列表（RPL），其被稱為list0和list1。在序列參數集（SPS）中發訊號通知預定義候選RPL。在圖片標頭（PH）中（若圖片的所有切片皆具有相同RPL）或在切片標頭（SH）中發訊號通知參考預定義候選RPL的索引。亦可以在圖片標頭（PH）和切片標頭（SH）中直接發訊號通知新RPL。參考圖片可以是短期參考圖片、長期參考圖片或層間參考圖片，並且若參考圖片用於層間預測，則使用圖片順序計數（POC）和層ID來標記該參考圖片。用於當前圖片的訊框間預測的參考圖片是當前圖片的活動參考圖片。ITU-T H.266/Versatile Video Coding (VVC) specifies two reference picture lists (RPLs), which are called list0 and list1. The predefined candidate RPLs are signaled in the sequence parameter set (SPS). The index of the reference predefined candidate RPLs is signaled in the picture header (PH) if all slices of a picture have the same RPL or in the slice header (SH). New RPLs can also be signaled directly in the picture header (PH) and slice header (SH). A reference picture can be a short-term reference picture, a long-term reference picture or an inter-layer reference picture, and if the reference picture is used for inter-layer prediction, it is marked with a picture order count (POC) and a layer ID. The reference picture used for inter-frame prediction of the current picture is the active reference picture of the current picture.

ITU-T H.265/高效視訊編碼（HEVC）指定了參考圖片集（RPS）中的參考圖片。可以在序列參數集（SPS）或切片標頭（SH）中發訊號通知RPS。ITU-T H.265/High Efficiency Video Coding (HEVC) specifies reference pictures in a Reference Picture Set (RPS). The RPS can be signaled in the Sequence Parameter Set (SPS) or in the Slice Header (SH).

ITU-T H.264/高級視訊編碼（AVC）基於兩種標記機制來指定參考圖片：隱式滑動訊窗程序和顯式記憶體管理控制操作程序。通常，在給定解碼圖片緩衝器大小的情況下，最多允許16個唯一參考圖片。ITU-T H.264/Advanced Video Coding (AVC) specifies reference pictures based on two marking mechanisms: an implicit sliding window procedure and an explicit memory management control operation procedure. Typically, a maximum of 16 unique reference pictures are allowed for a given decoded picture buffer size.

AOMedia Video 1（AV1）最多允許7個參考訊框，並且在訊框標頭開放位元串流單元（OBU）中指定訊框標記功能。可以在每個圖塊的訊框標頭OBU或額外標頭中呈現參考圖片指示。AOMedia Video 1 (AV1) allows up to 7 reference frames and specifies the frame marking function in the frame header open bitstream unit (OBU). Reference picture indication can be present in the frame header OBU or in an additional header for each tile.

由於不同轉碼器中的不同參考圖片管理設計，因此對於UPF 102來說以轉碼器不可知的方式匯出每個PDU集合的相關性是複雜的。本案認識到，顯式地指示圖片相關性以支援特定NAL單元或SEI訊息處的現有轉碼器將是有益的。Due to different reference picture management designs in different codecs, it is complex for UPF 102 to export the relevance of each PDU set in a codec agnostic manner. The present invention recognizes that it would be beneficial to explicitly indicate picture relevance to support existing codecs at specific NAL units or SEI messages.

此外，參考圖片標記是基於POC值的，該POC值表示圖片輸出順序，而位元串流中的訊框或PDU集合是按編碼順序的，使得POC值對於位元串流中的相鄰訊框可以不是連續的。利用現有方案將POC值映射到PDU集合序號（SN）並不簡單。In addition, the reference picture tag is based on the POC value, which indicates the picture output order, while the frames or PDU sets in the bitstream are in coding order, so that the POC values may not be consecutive for adjacent frames in the bitstream. Mapping the POC value to the PDU set sequence number (SN) using existing schemes is not simple.

H.266支援子圖分割，其中每個子圖可以獨立於相同訊框內的其他子圖進行解碼。即使所有子圖可以共享相同參考圖片，訊框亦可以包含混合的訊框內編碼子圖和訊框間編碼子圖。子圖標識和標記可能需要額外屬性以促進適當的PDU處理。例如，當子圖被獨立編碼並且PDU集合中的一些PDU丟失時，可以繼續遞送其餘PDU。當子圖沒有被獨立編碼並且一些PDU丟失時，可以丟棄其餘PDU。H.266 supports sub-picture partitioning, where each sub-picture can be decoded independently of other sub-pictures in the same frame. Even though all sub-pictures can share the same reference picture, a frame can contain a mix of intra-frame coded sub-pictures and inter-frame coded sub-pictures. Sub-picture identification and marking may require additional attributes to facilitate proper PDU processing. For example, when a sub-picture is independently coded and some PDUs in a PDU set are lost, the remaining PDUs can continue to be delivered. When a sub-picture is not independently coded and some PDUs are lost, the remaining PDUs can be discarded.

AV1支援圖塊列表，該圖塊列表包含與訊框相關聯的圖塊資料，並且每個圖塊可以被獨立解碼。圖塊列表允許解碼器處理圖塊子集並且顯示訊框的對應部分，而不需要對訊框中的所有圖塊進行完全解碼。AV1 supports tile lists, which contain tile data associated with a frame, and each tile can be decoded independently. Tile lists allow the decoder to process a subset of tiles and display the corresponding part of the frame without fully decoding all tiles in the frame.

圖3是圖示了視訊資料的一系列漸進解碼器刷新（GDR）訊框的概念圖。一般而言，當進行隨機存取時（亦即，在視訊中除視訊開始之外的點開始串流傳輸），從串流存取點開始存取串流。串流存取點可以被完全訊框內預測編碼，使得整個訊框可以被解碼，並且重播可以從串流存取點開始。此類訊框可以被稱為暫態解碼器刷新（IDR）圖片。然而，完全訊框內預測編碼訊框具有相對較高的位元速率。FIG3 is a conceptual diagram illustrating a series of progressive decoder refresh (GDR) frames of video data. Generally, when random access is performed (i.e., streaming transmission is started at a point in the video other than the start of the video), the stream is accessed starting from a stream access point. The stream access point may be fully intraframe predictive coded so that the entire frame can be decoded and replay can start from the stream access point. Such frames may be referred to as transient decoder refresh (IDR) pictures. However, fully intraframe predictive coded frames have a relatively high bit rate.

因此，可以使用漸進解碼器刷新（GDR）串流存取點，而不是具有單個IDR串流存取點。一般而言，GDR串流存取點包括一系列訊框，這些訊框包括訊框內預測編碼的部分，而其他部分被訊框間預測編碼。當從GDR串流存取點進行隨機存取時，基於訊框間預測部分出現在訊框內預測部分的哪一側，訊框間預測編碼部分可以是可解碼的或不可解碼的。Therefore, instead of having a single IDR stream access point, a progressive decoder refresh (GDR) stream access point can be used. In general, a GDR stream access point includes a sequence of frames that include portions that are intra-frame prediction coded, while other portions are inter-frame prediction coded. When randomly accessed from a GDR stream access point, the inter-frame prediction coded portions may be decodable or non-decodable based on which side of the intra-frame prediction portion the inter-frame prediction portion appears on.

與對整個圖片進行訊框內編碼相對比，GDR使得編碼器能夠經由在多個圖片中分佈訊框內編碼切片或塊來使位元串流的位元速率平滑，從而顯著地減少端到端延遲，這對於超低延遲應用尤為重要。當以GDR圖片的解碼開始解碼程序時，圖片的一些區域不能被正確解碼，在對被稱為恢復週期的若干額外圖片進行解碼之後，復原點的整個圖片和按輸出順序的所有後續圖片將被正確解碼。圖3圖示這種GDR圖片恢復週期的實例，其中乾淨區域和訊框內預測區域是可以被正確解碼的區域，而髒區域是不能被正確解碼以進行隨機存取的區域。因此，當隨機存取發生在相關聯的GDR圖片處時，可以丟棄不能被正確解碼的髒區域的PDU。在候選解決方案中尚未解決GDR圖片的PDU和PDU集合的標識和標記。GDR enables the encoder to smooth the bit rate of the bit stream by distributing intra-frame coded slices or blocks across multiple pictures, as opposed to intra-frame coding of the entire picture, thereby significantly reducing end-to-end latency, which is especially important for ultra-low latency applications. When the decoding process starts with the decoding of a GDR picture, some areas of the picture cannot be decoded correctly. After decoding a number of additional pictures, called a recovery cycle, the entire picture at the recovery point and all subsequent pictures in output order will be decoded correctly. Figure 3 illustrates an example of such a GDR picture recovery cycle, where the clean areas and intra-frame predicted areas are the areas that can be decoded correctly, and the dirty areas are the areas that cannot be decoded correctly for random access. Therefore, PDUs of dirty areas that cannot be decoded correctly can be discarded when random access occurs at the associated GDR picture. The identification and marking of PDUs and PDU sets for GDR pictures has not been addressed in the candidate solutions.

更具體地，在圖3的實例中，圖示GDR訊框140A至140D。GDR訊框140A包括訊框內預測編碼區域142A和不可解碼的訊框間預測編碼區域144A。GDR訊框140B包括可解碼的訊框間預測區域146B、訊框內預測編碼區域142B和不可解碼的訊框間預測編碼區域144B。GDR訊框140C包括可解碼的訊框間預測區域146C、訊框內預測編碼區域142C和不可解碼的訊框間預測編碼區域144C。GDR訊框140D包括可解碼的訊框間預測區域146D和訊框內預測編碼區域142D。More specifically, in the example of FIG. 3 , GDR frames 140A to 140D are illustrated. GDR frame 140A includes an intra-frame prediction coding area 142A and an undecodable inter-frame prediction coding area 144A. GDR frame 140B includes a decodable inter-frame prediction coding area 146B, an intra-frame prediction coding area 142B, and an undecodable inter-frame prediction coding area 144B. GDR frame 140C includes a decodable inter-frame prediction coding area 146C, an intra-frame prediction coding area 142C, and an undecodable inter-frame prediction coding area 144C. GDR frame 140D includes a decodable inter-frame prediction coding area 146D and an intra-frame prediction coding area 142D.

因為不可解碼的訊框間預測編碼區域144A至144C可以參考按編碼順序在GDR訊框140A之前的參考訊框，所以當從GDR訊框140A開始進行隨機存取時，不可解碼的訊框間預測編碼區域144A至144C是不可解碼的。可解碼的訊框間預測區域146B至146D是可解碼的，這是因為它們只能從開始於GDR訊框140A的參考訊框預測，並且只能從可解碼的訊框間預測區域或訊框內預測區域預測。Because the undecodable inter-frame prediction coded areas 144A to 144C can refer to a reference frame preceding the GDR frame 140A in the coding order, the undecodable inter-frame prediction coded areas 144A to 144C are undecodable when random access is performed starting from the GDR frame 140A. The decodable inter-frame prediction areas 146B to 146D are decodable because they can only be predicted from a reference frame starting from the GDR frame 140A and can only be predicted from a decodable inter-frame prediction area or an intra-frame prediction area.

在H.266/VVC中，當PictureOutputFlag等於0時，不輸出圖片。例如，當圖片是隨機存取可跳過的前導（RASL）圖片並且相關聯的訊框內隨機存取點（IRAP）圖片的NoOutputBeforeRecoveryFlag等於1時，不輸出該圖片；當圖片是NoOutputBeforeRecoveryFlag等於1的漸進解碼刷新（GDR）圖片或是NoOutputBeforeRecoveryFlag等於1的GDR圖片的恢復圖片時，不輸出該圖片；當圖片的ph_pic_output_flag等於0時，不輸出該圖片。NoOutputBeforeRecoveryFlag的值可以由外部部件或由位元串流中的相關聯的圖片位置設置。若PDU集合沒有被輸出並且未被用作後續圖片預測的參考，則可以丟棄PDU集合。In H.266/VVC, when PictureOutputFlag is equal to 0, the picture is not output. For example, when the picture is a Random Access Skippable Leading (RASL) picture and the NoOutputBeforeRecoveryFlag of the associated Intra-frame Random Access Point (IRAP) picture is equal to 1, the picture is not output; when the picture is a Progressive Decoding Refresh (GDR) picture with NoOutputBeforeRecoveryFlag equal to 1 or a recovery picture of a GDR picture with NoOutputBeforeRecoveryFlag equal to 1, the picture is not output; when the ph_pic_output_flag of the picture is equal to 0, the picture is not output. The value of NoOutputBeforeRecoveryFlag can be set by an external component or by the associated picture position in the bitstream. If the PDU set is not output and is not used as a reference for subsequent picture prediction, the PDU set may be discarded.

圖4是圖示了協定資料單元（PDU）的標識資訊的實例集合的概念圖。本案描述了可以使用視訊轉碼器中的高級語法設計來促進PDU和PDU集合標識和標記的技術。視訊訊框的辨識符（例如AVC/HEVC/VVC中的POC值、AV1中的display_frame_id或current_frame_id）可以被添加到RTP標頭擴展中，並且在PDU集合或PDU中被進一步標記。應用功能（AF）可以使用視訊訊框辨識符經由策略控制功能（PCF）和通信期管理功能（SMF）向UPF 102（圖2）指示視訊訊框特性，諸如框架類型、優先順序和相關性。UPF 102可以經由檢查資料封包的訊框辨識符並且將其連結到用於QoS標記的對應特性來對用於QoS標記的視訊封包進行分類。圖4圖示了使用視訊訊框ID作為索引來將PDU集合連結到包含相關聯的視訊資料特性的表或列表的實例。FIG4 is a conceptual diagram illustrating an example set of identification information for a protocol data unit (PDU). This case describes techniques that can use high-level syntax design in a video transcoder to facilitate PDU and PDU set identification and marking. An identifier of a video frame (e.g., POC value in AVC/HEVC/VVC, display_frame_id or current_frame_id in AV1) can be added to an RTP header extension and further marked in a PDU set or PDU. The application function (AF) can use the video frame identifier to indicate video frame characteristics such as frame type, priority, and relevance to the UPF 102 ( FIG2 ) via the policy control function (PCF) and the communication management function (SMF). UPF 102 may classify video packets for QoS marking by examining the frame identifier of the data packet and linking it to the corresponding characteristics for QoS marking. FIG4 illustrates an example of using the video frame ID as an index to link a PDU set to a table or list containing the associated video data characteristics.

在一些實例中，指示訊框內的特定切片/圖塊的切片或圖塊辨識符（例如HEVC中的切片分段位址、VVC中的slice_address、AV1中的圖塊計數）可以與訊框辨識符一起在RTP標頭擴展中發訊號通知。可以將切片/圖塊辨識符映射到PDU屬性欄位。AF可以使用訊框辨識符和切片/圖塊辨識符向UPF 102指示視訊切片/圖塊特性。UPF 102可以經由檢查資料封包的訊框辨識符和切片/圖塊辨識符並且將這些值連結到用於QoS標記的對應特性來對用於QoS標記的視訊封包進行分類。In some instances, a slice or tile identifier (e.g., slice segment address in HEVC, slice_address in VVC, tile count in AV1) indicating a specific slice/tile within a frame may be signaled in an RTP header extension along with the frame identifier. The slice/tile identifier may be mapped to a PDU attribute field. The AF may indicate video slice/tile characteristics to the UPF 102 using the frame identifier and the slice/tile identifier. The UPF 102 may classify video packets for QoS marking by inspecting the frame identifier and slice/tile identifier of the data packet and linking these values to the corresponding characteristics for QoS marking.

在一些實例中，標記RTP標頭擴展的訊框可以包括優先順序標記資訊。例如，標記RTP標頭擴展的訊框可以包括指示了對應訊框是否被訊框內預測編碼（I訊框）、可丟棄訊框（D）、該訊框的時間ID（TID）及/或層ID（LID）的資料。任何或所有這種資料可以表示PDU集合優先順序標記。可以在用於PDU集合優先順序標記檢查的網路抽象層（NAL）單元標頭、特定NAL單元或補充增強資訊（SEI）訊息中發訊號通知這種優先順序屬性資料。In some examples, a frame marking an RTP header extension may include priority marking information. For example, a frame marking an RTP header extension may include data indicating whether the corresponding frame is intra-frame predictive coded (I frame), a discardable frame (D), a time ID (TID) and/or a layer ID (LID) of the frame. Any or all of this data may represent a PDU set priority marking. This priority attribute data may be signaled in a network abstraction layer (NAL) unit header, a specific NAL unit, or a supplemental enhancement information (SEI) message used for PDU set priority marking checking.

在一些實例中，可以在NAL單元標頭或標頭擴展中發訊號通知NAL單元優先順序指示nuh_priority。優先順序指示可以是指定相關聯的NAL單元的優先順序的3位元代碼，其取1到7之間的值，其中1表示最高優先順序，而7表示最低優先順序。下表1是表示NAL單元標頭擴展中的NAL單元優先順序的指示的實例。在表1中，「[添加：「」]」表示相對於（例如ITU-T H.266的）現有NAL單元句法的添加，而「[移除：「」]」表示相對於現有NAL單元句法的刪除。表1 nal_unit_header( ) { 描述符 forbidden_zero_bit f(1) [移除：「 nuh_reserved_zero_bit」] u(1) nuh_layer_id u(6) nal_unit_type u(5) nuh_temporal_id_plus1 u(3) [添加：「 nuh_extention_flag」] u(1) [添加：「if( nuh_extension_flag ) {「] [添加：「 nuh_priority」] u(3) [添加：「 nuh_reserved_zero_5bits」] u(5) } } In some examples, a NAL unit priority indication nuh_priority can be signaled in a NAL unit header or header extension. The priority indication can be a 3-bit code that specifies the priority of the associated NAL unit, which takes a value between 1 and 7, where 1 represents the highest priority and 7 represents the lowest priority. Table 1 below is an example of an indication of NAL unit priority in a NAL unit header extension. In Table 1, "[addition: ""]" indicates an addition relative to an existing NAL unit syntax (e.g., of ITU-T H.266), and "[removal: ""]" indicates a deletion relative to an existing NAL unit syntax. Table 1 nal_unit_header() { Descriptor forbidden_zero_bit f(1) [Removed: " nuh_reserved_zero_bit "] u(1) nuh_layer_id u(6) nal_unit_type u(5) nuh_temporal_id_plus1 u(3) [Added: " nuh_extention_flag "] u(1) [Add: “if( nuh_extension_flag ) {“] [Add: " nuh_priority "] u(3) [Add: " nuh_reserved_zero_5bits "] u(5) } }

表1的實例中的所添加的句法元素的語義可以如下：The semantics of the added syntactic elements in the example of Table 1 may be as follows:

nuh_extension_flag等於0指定在NAL單元標頭句法結構中不存在nuh_priority和muh_reserved_zero_5bits句法元素。nuh_extension_flag等於1指定在NAL單元標頭句法結構中可能存在nuh_priority和muh_reserved_zero_5bits句法元素。 nuh_extension_flag equal to 0 specifies that the nuh_priority and muh_reserved_zero_5bits syntax elements are not present in the NAL unit header syntax structure. nuh_extension_flag equal to 1 specifies that the nuh_priority and muh_reserved_zero_5bits syntax elements may be present in the NAL unit header syntax structure.

nuh_priority指定NAL單元優先順序。nuh_priority等於1表示最高優先順序，而等於7表示最低優先順序。 nuh_priority specifies the NAL unit priority. nuh_priority equal to 1 indicates the highest priority, while nuh_priority equal to 7 indicates the lowest priority.

作為另一實例，可以在AU定界符（AUD）中發訊號通知圖片優先順序指示aud_priority，如表2中所示：表2 access_unit_delimiter_rbsp( ) { 描述符 aud_irap_or_gdr_flag u(1) aud_pic_type u(3) [添加：「 aud_priority」] u(3) rbsp_trailing_bits( ) } As another example, the picture priority indication aud_priority may be signaled in the AU delimiter (AUD), as shown in Table 2: Table 2 access_unit_delimiter_rbsp( ) { Descriptor aud_irap_or_gdr_flag u(1) aud_pic_type u(3) [Added: " aud_priority "] u(3) rbsp_trailing_bits( ) }

表2的實例中的所添加的句法元素的語義可以如下：The semantics of the added syntactic elements in the example of Table 2 may be as follows:

aud_priority指定包含AU定界符的存取單元（AU）的優先順序，aud_priority等於1表示最高優先順序，而等於7表示最低優先順序。 aud_priority specifies the priority of the access unit (AU) containing the AU delimiter. aud_priority equal to 1 indicates the highest priority, while aud_priority equal to 7 indicates the lowest priority.

作為又一實例，可以在圖片標頭（PH）中發訊號通知圖片優先順序指示，如表3中所示：表3 picture_header_structure( ) { 描述符 ph_gdr_or_irap_pic_flag u(1) [移除：「 ph_non_ref_pic_flag」] u(1) [添加：「 ph_priority」] u(3) if( ph_gdr_or_irap_pic_flag ) ph_gdr_pic_flag u(1) ph_inter_slice_allowed_flag u(1) As yet another example, the picture priority indication may be signaled in the picture header (PH), as shown in Table 3: Table 3 picture_header_structure() { Descriptor ph_gdr_or_irap_pic_flag u(1) [Removed: " ph_non_ref_pic_flag "] u(1) [Add: " ph_priority "] u(3) if( ph_gdr_or_irap_pic_flag ) ph_gdr_pic_flag u(1) ph_inter_slice_allowed_flag u(1)

表3的實例中的所添加的句法元素的語義可以如下：The semantics of the added syntactic elements in the example of Table 3 may be as follows:

ph_priority指定當前圖片的優先順序。ph_priority等於1表示最高優先順序，而等於7表示最低優先順序。最低優先順序意味著當前圖片從不被用作參考圖片。 ph_priority specifies the priority of the current picture. A ph_priority of 1 means the highest priority, while a ph_priority of 7 means the lowest priority. The lowest priority means that the current picture is never used as a reference picture.

在又一實例中，圖片優先順序指示可以指定共用當前圖片的相同TID和LID的圖片之間的優先順序。在這種情況下，可以從ph_priority、TID和LID中匯出PDU集合優先順序標記。In yet another example, the picture priority indication may specify the priority between pictures that share the same TID and LID of the current picture. In this case, the PDU set priority flag may be derived from ph_priority, TID and LID.

在一些實例中，可以在切片標頭（SH）中發訊號通知切片優先順序指示，以指示相同圖片內的切片或共享相同TID和LID的圖片的切片之間的優先順序。In some examples, a slice priority indication may be signaled in a slice header (SH) to indicate the priority between slices within the same picture or between slices of a picture sharing the same TID and LID.

在一些實例中，可以在SEI訊息中發訊號通知圖片或切片優先順序，以使用切片位址指示圖片優先順序或切片優先順序。In some examples, picture or slice priority can be signaled in a SEI message to indicate picture priority or slice priority using a slice address.

在一些實例中，可以在AV1轉碼器的特定OBU（諸如訊框標頭OBU或中繼資料OBU）中發訊號通知訊框優先順序指示。可以在圖塊組OBU或圖塊列表OBU中發訊號通知圖塊優先順序指示。表4是具有訊框優先順序指示的訊框標頭OBU句法的實例：表4 uncompressed_header( ) { 類型若( frame_id_numbers_present_flag ) { idLen = ( additional_frame_id_length_minus_1 + delta_frame_id_length_minus_2 + 3 ) } allFrames = (1 ＜＜ NUM_REF_FRAMES) - 1 若( reduced_still_picture_header ) { … } 否則 { show_existing_frame f(1) … frame_type f(2) [添加：「frame_priority」] f(3) … In some instances, the frame priority indication may be signaled in a specific OBU of the AV1 codec, such as a frame header OBU or a metadata OBU. The tile priority indication may be signaled in a tile group OBU or a tile list OBU. Table 4 is an example of the syntax of a frame header OBU with a frame priority indication: Table 4 uncompressed_header() { Type if ( frame_id_numbers_present_flag ) { idLen = ( additional_frame_id_length_minus_1 + delta_frame_id_length_minus_2 + 3 ) } allFrames = (1 << NUM_REF_FRAMES) - 1 if ( reduced_still_picture_header ) { … } Otherwise{ show_existing_frame f(1) … frame_type f(2) [Add: “frame_priority”] f(3) …

可以發訊號通知frame_priority句法元素或將其映射到其他協定，諸如RTP標頭擴展或GPRS隧道協定使用者平面（GTP-U）擴展標頭。The frame_priority syntax element may be signaled or mapped to other protocols such as the RTP header extension or the GPRS Tunneling Protocol User Plane (GTP-U) extension header.

從參考圖片列表中匯出PDU集合相關性通常非常複雜。同樣，存在將POC值映射到PDU集合序號（SN）所需的額外成本。因此，根據本案的技術，可以在特定NAL單元或SEI訊息中標記位元串流中的當前圖片與活動參考圖片之間的距離（例如POC距離）。在位元串流中，所有活動參考圖片皆處於當前圖片之前。根據PDU集合邊界，距離可以以存取單元（AU）或圖片單元（PU）為單位。Deriving PDU set dependencies from a reference picture list is typically very complex. Likewise, there is an additional cost required to map POC values to PDU set sequence numbers (SNs). Therefore, according to the present invention, the distance between the current picture and the active reference picture in the bitstream (e.g., POC distance) may be marked in a specific NAL unit or SEI message. In the bitstream, all active reference pictures precede the current picture. Depending on the PDU set boundary, the distance may be in units of access units (AUs) or picture units (PUs).

下表5是指示位元串流中的與當前圖片相關的活動參考圖片列表的實例句法結構。該列表包括短期活動參考圖片、長期活動參考圖片和層間活動參考圖片。表5 act_ref_pic_list_struct( ) { 描述符 num_st_act_ref_pos u(8) for( i = 0; i ＜ num_st_act_ref_pos; i++) { st_act_ref_delta_pos[ i ] u(16) } num_lt_act_ref_pos u(8) for( i = 0; i ＜ num_lt_act_ref_pos; i++) { lt_act_ref_delta_pos[ i ] u(32) } num_il_act_ref_pos u(8) for( i = 0; i ＜ num_il_act_ref_pos; i++) { il_act_ref_delta_pos[ i ] u(8) } } Table 5 below is an example syntax structure indicating a list of active reference pictures associated with the current picture in a bitstream. The list includes short-term active reference pictures, long-term active reference pictures, and inter-layer active reference pictures. Table 5 act_ref_pic_list_struct( ) { Descriptor num_st_act_ref_pos u(8) for( i = 0; i <num_st_act_ref_pos; i++) { st_act_ref_delta_pos [i] u(16) } num_lt_act_ref_pos u(8) for( i = 0; i <num_lt_act_ref_pos; i++) { lt_act_ref_delta_pos [i] u(32) } num_il_act_ref_pos u(8) for( i = 0; i <num_il_act_ref_pos; i++) { il_act_ref_delta_pos [i] u(8) } }

表5的實例中的句法元素的語義可以如下：The semantics of the syntactic elements in the example of Table 5 may be as follows:

num_st_act_ref_pos指定句法結構中的短期活動參考圖片位置的數量。 num_st_act_ref_pos specifies the number of short-term activity reference picture positions in the syntax structure.

st_act_ref_delta_pos以存取單元（AU）為單位來指定當前圖片與活動短期參考圖片之間的距離。 st_act_ref_delta_pos specifies the distance between the current picture and the active short-term reference picture in access units (AU).

針對單層位元串流，AU包含圖片。假設當前PDU集合的序號（SN）為N，則包含第i個短期活動參考圖片的PDU集合的SN為（N - st_act_ref_delta_pos[ i ]）。For a single-layer bitstream, an AU contains pictures. Assuming the sequence number (SN) of the current PDU set is N, the SN of the PDU set containing the i-th short-term active reference picture is (N - st_act_ref_delta_pos[ i ]).

在另一實例中，st_act_ref_delta_pos可以圖片為單位來指定當前PU與活動短期參考圖片之間的距離。In another example, st_act_ref_delta_pos can specify the distance between the current PU and the active short-term reference picture in units of pictures.

num_lt_act_ref_pos指定句法結構中的長期活動參考圖片位置的數量。 num_lt_act_ref_pos specifies the number of long-term activity reference picture positions in the syntax structure.

lt_act_ref_delta_pos以AU為單位來指定當前圖片與活動短期參考圖片之間的距離。 lt_act_ref_delta_pos specifies the distance in AU between the current picture and the active short-term reference picture.

針對單層位元串流，AU包含圖片。假設當前圖片或PDU集合的SN為N，則包含第i個長期活動參考圖片的PDU集合的SN為（N - st_act_ref_delta_pos[ i ]）。For a single-layer bitstream, an AU contains pictures. Assuming the SN of the current picture or PDU set is N, the SN of the PDU set containing the i-th long-term active reference picture is (N - st_act_ref_delta_pos[ i ]).

在一些實例中，lt_act_ref_delta_pos可以圖片為單位來指定當前圖片與活動長期參考圖片之間的距離。In some implementations, lt_act_ref_delta_pos can specify the distance in pictures between the current picture and the active long-term reference picture.

num_il_act_ref_pos指定句法結構中的層間活動參考圖片位置的數量。 num_il_act_ref_pos specifies the number of inter-layer activity reference picture positions in the syntax structure.

lt_act_ref_delta_pos指定當前圖片與活動層間參考圖片之間的層差。 lt_act_ref_delta_pos specifies the delta between the current picture and the reference picture of the active layer.

假設當前圖片或PDU集合的SN為N，並且每個PDU集合包含圖片，則包含第i個層間活動參考圖片的PDU集合的SN為（N - il_act_ref_delta_pos[ i ]）。Assuming that the SN of the current picture or PDU set is N, and each PDU set contains pictures, the SN of the PDU set containing the i-th inter-layer active reference picture is (N - il_act_ref_delta_pos[ i ]).

在一些實例中，當每個PDU集合包含圖片而非AU時，可以擴展st_act_ref_delta_pos、lt_act_ref_delta_pos和il_act_ref_delta_pos的資料長度以適應層數。In some examples, when each PDU set contains pictures instead of AUs, the data lengths of st_act_ref_delta_pos, lt_act_ref_delta_pos, and il_act_ref_delta_pos may be extended to accommodate the number of layers.

下表6圖示了簡化的活動參考圖片列表結構的實例：表6 act_ref_pic_list_struct( ) { 描述符 num_act_ref_pos u(8) for( i = 0; i ＜ num_st_act_ref_pos; i++) { act_ref_delta_pos[ i ] u(32) } num_il_act_ref_pos u(8) for( i = 0; i ＜ num_il_act_ref_pos; i++) { il_act_ref_delta_pos[ i ] u(8) } } Table 6 below illustrates an example of a simplified activity reference picture list structure: Table 6 act_ref_pic_list_struct( ) { Descriptor num_act_ref_pos u(8) for( i = 0; i <num_st_act_ref_pos; i++) { act_ref_delta_pos [i] u(32) } num_il_act_ref_pos u(8) for( i = 0; i <num_il_act_ref_pos; i++) { il_act_ref_delta_pos [i] u(8) } }

表6的實例的句法元素的語義可以如下：The semantics of the syntactic elements of the example in Table 6 may be as follows:

num_act_ref_pos指定句法結構中的活動參考圖片位置的數量。 num_act_ref_pos specifies the number of act reference picture positions in the syntax structure.

act_ref_delta_pos以存取單元（AU）為單位來指定當前圖片與活動參考圖片之間的距離。假設PDU集合的SN為N，則包含第i個活動參考圖片的PDU集合的SN為（N - act_ref_delta_pos[ i ]）。 act_ref_delta_pos specifies the distance between the current picture and the active reference picture in access units (AU). Assuming the SN of a PDU set is N, the SN of the PDU set containing the i-th active reference picture is (N - act_ref_delta_pos[ i ]).

Il_act_ref_delta_pos以圖片單元（PU）為單位來指定當前圖片與活動參考圖片之間的距離。假設PDU集合的SN為N，則包含第i個層間活動參考圖片的PDU集合的SN為（N - il_act_ref_delta_pos[ i ]）。 il_act_ref_delta_pos specifies the distance between the current picture and the active reference picture in picture units (PUs). Assuming the SN of a PDU set is N, the SN of the PDU set containing the i-th inter-layer active reference picture is (N - il_act_ref_delta_pos[ i ]).

可以在AU定界符（AUD）或SEI訊息中攜帶所提出的活動參考圖片位置句法結構。The proposed active reference picture location syntax structure can be carried in the AU delimiter (AUD) or in the SEI message.

在一些實例中，以上句法結構中的參考圖片距離可以由指示位元串流中的AU或PU位置的序號替換。這種序號針對添加到位元串流的每個新AU或PU增加一。序號可以映射到PDU集合序號以便於匯出訊框相關性。In some examples, the reference picture distance in the above syntax structure can be replaced by a sequence number indicating the AU or PU position in the bitstream. This sequence number increases by one for each new AU or PU added to the bitstream. The sequence number can be mapped to the PDU set sequence number to facilitate export frame correlation.

在一些實例中，可以在AV1中的特定中繼資料類型OBU（例如metadata_itut_t35）或新中繼資料類型OBU中發訊號通知參考圖片在解碼順序中的位置，以指示位元串流中的當前圖片的相關性。In some examples, the position of a reference picture in the decoding order may be signaled in a specific metadata type OBU (e.g. metadata_itut_t35) or a new metadata type OBU in AV1 to indicate the relevance of the current picture in the bitstream.

在一些實例中，可以發訊號通知所建議的活動參考圖片列表句法元素或將其映射到其他協定，諸如RTP或GTP-U擴展標頭。In some examples, the proposed active reference picture list syntax element may be signaled or mapped to other protocols such as RTP or GTP-U extension headers.

在一些實例中，可以在活動參考圖片位置列表SEI訊息中攜帶活動參考圖片列表結構。在SEI訊息中提出了持久性標誌，以指示活動參考圖片列表是僅適用於當前圖片還是適用於當前層的當前圖片和所有後續圖片。當設置了所建議的取消標誌或當前層的新編碼層視訊序列（CLVS）開始時，可以取消持久性。In some instances, an active reference picture list structure may be carried in an active reference picture location list SEI message. A persistence flag is proposed in the SEI message to indicate whether the active reference picture list applies to the current picture only or to the current picture and all subsequent pictures of the current layer. Persistence may be canceled when the proposed cancel flag is set or a new coded layer video sequence (CLVS) for the current layer starts.

可以發訊號通知獨立PDU標記，以指示是否可以在沒有相同PDU集合的其他PDU的情況下對當前PDU進行解碼。即使相同PDU集合中的其他PDU丟失，亦可以傳輸獨立PDU。獨立PDU可以是包括獨立子圖的切片或由H265位元串流中的運動約束圖塊集合（MCTS）組成的切片。A standalone PDU flag can be signaled to indicate whether the current PDU can be decoded without other PDUs of the same PDU set. A standalone PDU can be transmitted even if other PDUs in the same PDU set are missing. A standalone PDU can be a slice consisting of an independent sub-picture or a slice consisting of motion constrained tile sets (MCTS) in an H265 bitstream.

在一些實例中，可以在特定視訊轉碼器NAL單元中發訊號通知句法元素，以指示CLVS中的所有切片邊界（當在SPS中發訊號通知時）或相關聯的圖片（當在PPS、PH或AUD中發訊號通知時）被視為圖片邊界，並且不存在跨切片邊界的迴路濾波。每個切片可以在不涉及相同圖片中的其他切片取樣的情況下被獨立解碼。In some examples, a syntax element may be signaled in a specific video codec NAL unit to indicate that all slice boundaries in a CLVS (when signaled in an SPS) or associated pictures (when signaled in a PPS, PH, or AUD) are considered picture boundaries and there is no loop filtering across slice boundaries. Each slice may be decoded independently without reference to samples of other slices in the same picture.

在一些實例中，可以在切片標頭（SH）中發訊號通知句法元素，以指示當前切片是否可以在沒有根據相同訊框內的其他取樣進行預測的情況下被獨立解碼。In some instances, a syntax element can be signaled in a slice header (SH) to indicate whether the current slice can be decoded independently without prediction based on other samples in the same frame.

由於AV1之每一者圖塊可以被獨立解碼，因此在PDU是AV1圖塊並且存在圖塊列表OBU的情況下，可以為每個PDU設置獨立PDU標記。Since each tile of AV1 can be decoded independently, if the PDU is an AV1 tile and there is a tile list OBU, the independent PDU flag can be set for each PDU.

針對HEVC位元串流，當相關聯的訊框內隨機存取點（IRAP）圖片的NoRaslOutputFlag等於1時，隨機存取可跳過前導（RASL）圖片的PDU集合可以在訊框標記或PDU集合標記中被標記為丟棄圖片。For HEVC bitstreams, a PDU set of a random access skippable leading (RASL) picture can be marked as a discarded picture in the frame flag or PDU set flag when the NoRaslOutputFlag of the associated intra-frame random access point (IRAP) picture is equal to 1.

針對VVC位元串流，當相關聯的IRAP圖片的NoOutputBeforeRecoveryFlag等於1時，pps_mixed_nalu_types_in_pic_flag等於0的RASL圖片的PDU集合可以在訊框標記或PDU集合標記中被標記為丟棄圖片。For VVC bitstreams, a PDU set of a RASL picture with pps_mixed_nalu_types_in_pic_flag equal to 0 can be marked as a discarded picture in the frame flag or PDU set flag when the NoOutputBeforeRecoveryFlag of the associated IRAP picture is equal to 1.

由於偵測可丟棄的圖片並不簡單，因此可以發訊號通知句法元素，以指示是否可以丟棄相關聯的圖片，以便以特定NAL單元（諸如AUD、PH或SH）或SEI訊息進行傳輸或解碼。Since detecting discardable pictures is not straightforward, syntax elements may be signaled to indicate whether the associated picture may be discarded for transmission or decoding in specific NAL units (such as AUD, PH, or SH) or SEI messages.

針對VVC位元串流，pps_mixed_nalu_types_in_pic_flag等於1的RASL圖片可以包含一或多個RASL子圖和一或多個隨機存取可解碼前導（RADL）子圖。RADL子圖可以用作活動參考子圖，不應該丟棄該活動參考子圖。儘管可以丟棄RASL圖片的RASL子圖或切片，但相關聯的PDU可以被標記為可丟棄的PDU。For VVC bitstreams, a RASL picture with pps_mixed_nalu_types_in_pic_flag equal to 1 may contain one or more RASL sub-pictures and one or more random access decodable leading (RADL) sub-pictures. RADL sub-pictures may be used as active reference sub-pictures, which shall not be discarded. Although RASL sub-pictures or slices of a RASL picture may be discarded, the associated PDUs may be marked as discardable PDUs.

針對VVC位元串流，當GDR圖片的NoOutputBeforeRecoveryFlag等於1或GDR圖片的恢復圖片的NoOutputBeforeRecoveryFlag等於1時，可以丟棄不能被正確解碼的GDR圖片的切片。可以在RTP標頭擴展中標記不能被正確解碼的對應切片，使得UPF 102可以在壅塞時間期間丟棄相關聯的PDU。For VVC bitstreams, slices of GDR pictures that cannot be decoded correctly may be discarded when the NoOutputBeforeRecoveryFlag of a GDR picture is equal to 1 or the NoOutputBeforeRecoveryFlag of a recovery picture of a GDR picture is equal to 1. The corresponding slices that cannot be decoded correctly may be marked in the RTP header extension so that the UPF 102 may discard the associated PDUs during congestion time.

針對VVC，圖片標頭句法元素ph_non_ref_pic_flag指示當前圖片從未被用作參考圖片。這種句法元素可以用於可丟棄的標記。針對AV1轉碼器，可以在訊框標頭OBU未壓縮標頭或圖塊組OBU中發訊號通知可丟棄或非參考指示，以指示相關聯的訊框是否不被用作後續圖片的參考圖片並且可以在不影響解碼程序的情況下被丟棄。下表7是具有可丟棄訊框指示的這種訊框標頭OBU未壓縮標頭句法的實例，其由標籤指示：「[添加：「」]」。表7 uncompressed_header( ) { 類型若( frame_id_numbers_present_flag ) { idLen = ( additional_frame_id_length_minus_1 + delta_frame_id_length_minus_2 + 3 ) } allFrames = (1 ＜＜ NUM_REF_FRAMES) - 1 若( reduced_still_picture_header ) { … } 否則 { show_existing_frame f(1) … frame_type f(2) FrameIsIntra = (frame_type == INTRA_ONLY_FRAME || frame_type == KEY_FRAME) … [添加：「若( !FrameIsIntra )」] [添加：「frame_discardable」] f(1) … For VVC, the picture header syntax element ph_non_ref_pic_flag indicates that the current picture has never been used as a reference picture. This syntax element can be used for discardable marking. For AV1 transcoders, the discardable or non-reference indication can be signaled in the frame header OBU uncompressed header or tile group OBU to indicate whether the associated frame is not used as a reference picture for subsequent pictures and can be discarded without affecting the decoding process. Table 7 below is an example of such a frame header OBU uncompressed header syntax with a discardable frame indication, which is indicated by the tag: "[add: ""]". Table 7 uncompressed_header() { Type if ( frame_id_numbers_present_flag ) { idLen = ( additional_frame_id_length_minus_1 + delta_frame_id_length_minus_2 + 3 ) } allFrames = (1 << NUM_REF_FRAMES) - 1 if ( reduced_still_picture_header ) { … } Otherwise{ show_existing_frame f(1) … frame_type f(2) FrameIsIntra = (frame_type == INTRA_ONLY_FRAME || frame_type == KEY_FRAME) … [Add: "if ( !FrameIsIntra )"] [Add: “frame_discardable”] f(1) …

由於可以即時決定切片或訊框的丟棄，因此可以在SEI訊息或特定NAL單元的可重寫欄位或中繼資料類型OBU中發訊號通知指示符，以指示是否可以丟棄相關聯的NAL單元。Since the discard of a slice or frame can be decided on the fly, an indicator can be signaled in a SEI message or in an overwritable field of a specific NAL unit or metadata type OBU to indicate whether the associated NAL unit can be discarded.

在一些實例中，當以下條件中的任一者為真時，切片可以被標記為可丟棄的切片：當前圖片是RASL圖片，並且當前切片NAL單元類型是RASL；當前圖片是GDR圖片，並且當前切片是P或B切片（單向訊框間預測或雙向訊框間預測）；或當前圖片是GDR圖片的恢復圖片，並且當前切片是相同圖片中在先前I切片之後的P或B切片。In some examples, a slice may be marked as a discardable slice when any of the following conditions is true: the current picture is a RASL picture and the current slice NAL unit type is RASL; the current picture is a GDR picture and the current slice is a P or B slice (unidirectional inter-frame prediction or bidirectional inter-frame prediction); or the current picture is a recovery picture of a GDR picture and the current slice is a P or B slice following a previous I slice in the same picture.

圖5是圖示了實例視訊檔150的元素的方塊圖。如前述，根據ISO基礎媒體檔案格式及其擴展的視訊檔將資料儲存在被稱為「框」的一系列物件中。在圖5的實例中，視訊檔150包括檔案類型（FTYP）框152、電影（MOOV）框154、分段索引（sidx）框162、電影片段（MOOF）框164和電影片段隨機存取（MFRA）框166。儘管圖5表示視訊檔的實例，但應當理解，根據ISO基礎媒體檔案格式及其擴展，其他媒體檔可以包括與視訊檔150的資料以類似方式結構化的其他類型的媒體資料（例如音訊資料、定時文字資料等）。FIG5 is a block diagram illustrating elements of an example video file 150. As previously described, video files according to the ISO base media file format and its extensions store data in a series of objects referred to as "boxes." In the example of FIG5 , the video file 150 includes a file type (FTYP) box 152, a movie (MOOV) box 154, a segment index (sidx) box 162, a movie fragment (MOOF) box 164, and a movie fragment random access (MFRA) box 166. Although FIG5 represents an example of a video file, it should be understood that other media files according to the ISO base media file format and its extensions may include other types of media data (e.g., audio data, timed text data, etc.) structured in a similar manner to the data of the video file 150.

檔案類型（FTYP）框152通常描述視訊檔150的檔案類型。檔案類型框152可以包括標識描述了視訊檔150的最佳用途的規範的資料。檔案類型框152可以替代地放置在MOOV框154、電影片段框164及/或MFRA框166之前。The file type (FTYP) box 152 generally describes the file type of the video file 150. The file type box 152 may include information identifying specifications describing the best use of the video file 150. The file type box 152 may alternatively be placed before the MOOV box 154, the movie clip box 164, and/or the MFRA box 166.

在圖5的實例中，MOOV框154包括電影標頭（MVHD）框156、軌跡（TRAK）框158和一或多個電影擴展（MVEX）框160。一般而言，MVHD框156可以描述視訊檔150的一般特點。例如，MVHD框156可以包括描述視訊檔150最初被建立的時間、視訊檔150最後被修改的時間、視訊檔150的時標、視訊檔150的重播持續時間的資料或大體上描述視訊檔150的其他資料。5 , the MOOV box 154 includes a movie header (MVHD) box 156, a track (TRAK) box 158, and one or more movie extension (MVEX) boxes 160. In general, the MVHD box 156 may describe general characteristics of the video file 150. For example, the MVHD box 156 may include data describing the time when the video file 150 was initially created, the time when the video file 150 was last modified, the time stamp of the video file 150, the playback duration of the video file 150, or other data generally describing the video file 150.

TRAK框158可以包括視訊檔150的軌跡的資料。TRAK框158可以包括描述與TRAK框158對應的軌跡的特點的軌跡標頭（TKHD）框。在一些實例中，TRAK框158可以包括經編碼視訊圖片，而在其他實例中，軌跡的經編碼視訊圖片可以被包括在電影片段164中，該電影片段可以由TRAK框158及/或sidx框162的資料引用。TRAK box 158 may include data for a track of video file 150. TRAK box 158 may include a track header (TKHD) box that describes characteristics of the track corresponding to TRAK box 158. In some examples, TRAK box 158 may include an encoded video picture, while in other examples, the encoded video picture of the track may be included in movie clip 164, which may be referenced by data in TRAK box 158 and/or sidx box 162.

在一些實例中，視訊檔150可以包括多於一個軌跡。因此，MOOV框154可以包括等於視訊檔150中的軌跡的數量的若干TRAK框。TRAK框158可以描述視訊檔150的對應軌跡的特點。例如，TRAK框158可以描述對應軌跡的時間及/或空間資訊。當封裝單元30（圖1）包括視訊檔（諸如視訊檔150）中的參數集軌跡時，與MOOV框154的TRAK框158類似的TRAK框可以描述參數集軌跡的特點。封裝單元30可以發訊號通知在描述參數集軌跡的TRAK框內的參數集軌跡中序列級SEI訊息的存在。In some examples, the video file 150 may include more than one track. Thus, the MOOV box 154 may include a number of TRAK boxes equal to the number of tracks in the video file 150. The TRAK box 158 may describe characteristics of a corresponding track of the video file 150. For example, the TRAK box 158 may describe temporal and/or spatial information of the corresponding track. When the encapsulation unit 30 ( FIG. 1 ) includes a parameter set track in a video file such as the video file 150, a TRAK box similar to the TRAK box 158 of the MOOV box 154 may describe characteristics of the parameter set track. The encapsulation unit 30 may signal the presence of a sequence-level SEI message in a parameter set track within a TRAK box describing the parameter set track.

MVEX框160可以描述對應電影片段164的特點，例如以發訊號通知視訊檔150除了MOOV框154內所包括的視訊資料（若存在）之外亦包括電影片段164。在流傳輸視訊資料的上下文中，經編碼視訊圖片可以被包括在電影片段164中，而非包括在MOOV框154中。因此，所有經編碼視訊取樣可以被包括在電影片段164中，而非包括在MOOV框154中。The MVEX box 160 may describe characteristics of the corresponding movie segment 164, for example, to signal that the video file 150 includes the movie segment 164 in addition to the video data (if any) included in the MOOV box 154. In the context of streaming video data, the encoded video pictures may be included in the movie segment 164 instead of the MOOV box 154. Thus, all encoded video samples may be included in the movie segment 164 instead of the MOOV box 154.

MOOV框154可以包括等於視訊檔150中的電影片段164的數量的若干MVEX框160。MVEX框160中的每一者可以描述電影片段164中的對應電影片段的特點。例如，每個MVEX框可以包括描述電影片段164中的對應電影片段的時間持續時間的電影擴展標頭（MEHD）框。The MOOV box 154 may include a number of MVEX boxes 160 equal to the number of movie clips 164 in the video file 150. Each of the MVEX boxes 160 may describe characteristics of a corresponding one of the movie clips 164. For example, each MVEX box may include a Movie Extended Header (MEHD) box that describes the time duration of the corresponding one of the movie clips 164.

如上文所提到的，封裝單元30可以將序列資料集儲存在不包括實際經編碼視訊資料的視訊取樣中。視訊取樣通常可以與存取單元對應，該存取單元是特定的時間實例的經編碼圖片的表示。在AVC的上下文中，經編碼圖片包括一或多個VCL NAL單元，該一或多個VCL NAL單元包含用於構建存取單元和其他相關聯的非VCL NAL單元的所有圖元的資訊，諸如SEI訊息。因此，封裝單元30可以在電影片段164中的一者中包括序列資料集，該序列資料集可以包括序列級SEI訊息。封裝單元30亦可以發訊號通知序列資料集及/或序列級SEI訊息的存在，如同其存在於與電影片段164中的一者對應的MVEX框160中的一者內的電影片段164中的一者中。As mentioned above, encapsulation unit 30 may store sequence data sets in video samples that do not include actual coded video data. Video samples may typically correspond to an access unit, which is a representation of a coded picture for a particular instance in time. In the context of AVC, a coded picture includes one or more VCL NAL units that contain information, such as SEI information, for all picture elements used to construct the access unit and other associated non-VCL NAL units. Thus, encapsulation unit 30 may include a sequence data set in one of the movie segments 164 that may include sequence-level SEI information. Packaging unit 30 may also signal the presence of a sequence data set and/or sequence-level SEI information as present in one of the movie segments 164 within one of the MVEX boxes 160 that corresponds to one of the movie segments 164 .

SIDX框162是視訊檔150的可選元素。亦即，與3GPP檔案格式或其他此類檔案格式相符的視訊檔並不一定包括SIDX框162。根據3GPP檔案格式的實例，SIDX框可以用於標識分段中的子分段（例如包含在視訊檔150內的分段）。3GPP檔案格式將子分段定義為「具有對應媒體資料框的一或多個連續電影片段框的自含式集合，並且包含由電影片段框引用的資料的媒體資料框必須處於該電影片段框之後並且處於包含關於相同軌跡的資訊的下一個電影片段框之前」。3GPP檔案格式亦指示SIDX框「包含對由框記錄的（子）分段中的子分段的一系列引用。所引用的子分段在呈現時間內是連續的。類似地，由分段索引框引用的位元組在分段內始終是連續的。所引用的大小提供了所引用的材料中的位元組數量的計數」。SIDX box 162 is an optional element of video file 150. That is, a video file conforming to the 3GPP file format or other such file formats does not necessarily include SIDX box 162. According to an example of the 3GPP file format, a SIDX box may be used to identify a sub-segment within a segment (e.g., a segment contained within video file 150). The 3GPP file format defines a sub-segment as "a self-contained collection of one or more consecutive movie segment boxes with corresponding media data frames, and a media data frame containing data referenced by a movie segment box must follow the movie segment box and precede the next movie segment box containing information about the same track." The 3GPP file format also indicates that the SIDX box "contains a series of references to subsegments within the (sub)segment recorded by the box. The referenced subsegments are contiguous in presentation time. Similarly, the bytes referenced by the Segment Index box are always contiguous within a segment. The referenced size provides a count of the number of bytes in the referenced material."

SIDX框162通常提供表示視訊檔150中所包括的分段中的一或多個子分段的資訊。例如，這種資訊可以包括子分段開始及/或結束的重播時間、子分段的位元組偏移、子分段是否包括（例如開始於）串流存取點（SAP）、SAP的類型（例如SAP是否是暫態解碼器刷新（IDR）圖片、乾淨隨機存取（CRA）圖片、斷開連結存取（BLA）圖片等）、SAP在子分段中的位置（就重播時間及/或位元組偏移而言）等。The SIDX box 162 generally provides information representing one or more sub-segments in the segment included in the video file 150. For example, such information may include the replay time at which the sub-segment begins and/or ends, the byte offset of the sub-segment, whether the sub-segment includes (e.g., begins at) a stream access point (SAP), the type of SAP (e.g., whether the SAP is a transient decoder refresh (IDR) picture, a clean random access (CRA) picture, a broken link access (BLA) picture, etc.), the location of the SAP in the sub-segment (in terms of replay time and/or byte offset), etc.

電影片段164可以包括一或多個經編碼視訊圖片。在一些實例中，電影片段164可以包括一或多個圖片組（GOP），該一或多個圖片組中的每一者可以包括若干經編碼視訊圖片，例如訊框或圖片。此外，如前述，在一些實例中，電影片段164可以包括序列資料集。電影片段164中的每一者可以包括電影片段標頭框（MFHD，圖5中未圖示）。MFHD框可以描述對應電影片段的特點，諸如電影片段的序號。可以按視訊檔150中的序號的順序包括電影片段164。Movie segment 164 may include one or more encoded video pictures. In some examples, movie segment 164 may include one or more groups of pictures (GOPs), each of which may include a number of encoded video pictures, such as frames or pictures. In addition, as previously described, in some examples, movie segment 164 may include a sequence data set. Each of movie segment 164 may include a movie segment header box (MFHD, not shown in FIG. 5 ). The MFHD box may describe the characteristics of the corresponding movie segment, such as the sequence number of the movie segment. Movie segment 164 may be included in the order of the sequence numbers in video file 150.

MFRA框166可以描述視訊檔150的電影片段164內的隨機存取點。這可以有助於進行特技模式，諸如對由視訊檔150封裝的分段內的特定的時間位置（亦即，重播時間）進行搜尋。在一些實例中，MFRA框166通常是可選的並且不需要被包括在視訊檔中。同樣，客戶端設備（諸如客戶端設備40）並不一定需要參考MFRA框166來對視訊檔150的視訊資料進行正確解碼並且顯示該視訊資料。MFRA框166可以包括若干軌跡片段隨機存取（TFRA）框（未圖示），其等於視訊檔150的軌跡的數量，或在一些實例中，等於視訊檔150的媒體軌跡（例如非提示軌跡）的數量。MFRA box 166 may describe a random access point within movie segment 164 of video file 150. This may facilitate trick modes, such as seeking to a specific time position (i.e., replay time) within a segment encapsulated by video file 150. In some examples, MFRA box 166 is generally optional and need not be included in a video file. Likewise, a client device (such as client device 40) does not necessarily need to reference MFRA box 166 to correctly decode and display the video data of video file 150. MFRA block 166 may include a number of track fragment random access (TFRA) blocks (not shown) equal to the number of tracks of video file 150, or in some examples, equal to the number of media tracks (e.g., non-cue tracks) of video file 150.

在一些實例中，電影片段164可以包括一或多個流存取點（SAP），諸如IDR圖片。同樣，MFRA框166可以提供SAP在視訊檔150內的位置的指示。因此，視訊檔150的時間子序列可以由視訊檔150的SAP形成。時間子序列亦可以包括其他圖片，諸如依賴於SAP的P訊框及/或B訊框。時間子序列的訊框及/或切片可以排列在分段內，使得時間子序列的取決於子序列的其他訊框/切片的訊框/切片可以被正確解碼。例如，在資料的分層排列中，用於其他資料的預測的資料亦可以被包括在時間子序列中。In some examples, the movie segment 164 may include one or more stream access points (SAPs), such as IDR pictures. Similarly, the MFRA box 166 may provide an indication of the location of the SAP within the video file 150. Therefore, a time sub-sequence of the video file 150 may be formed by the SAPs of the video file 150. The time sub-sequence may also include other pictures, such as P frames and/or B frames that depend on the SAPs. The frames and/or slices of the time sub-sequence may be arranged within the segment so that the frames/slices of the time sub-sequence that depend on other frames/slices of the sub-sequence can be decoded correctly. For example, in a layered arrangement of data, data used for prediction of other data may also be included in the time sub-sequence.

圖6是圖示了根據本案的技術的包括發送封包的實例方法的流程圖，該封包包括媒體資料。最初，伺服器設備（諸如圖1的伺服器設備60）可以例如從內容準備設備20（圖1）接收媒體資料（200）。媒體資料可以是封裝的媒體資料或編碼的媒體資料。媒體資料可以包括視訊資料的訊框/圖片的至少一部分，例如切片或圖塊。FIG6 is a flow chart illustrating an example method of sending a packet including media data according to the present invention. Initially, a server device (such as server device 60 of FIG1 ) may receive media data (200), for example, from content preparation device 20 ( FIG1 ). The media data may be packaged media data or encoded media data. The media data may include at least a portion of a frame/picture of video data, such as a slice or tile.

伺服器設備60亦可以接收指示媒體資料是否可丟棄的資料（202）和媒體資料的辨識符（204）。辨識符可以是例如訊框號、圖片順序計數（POC）值及/或其他這種辨識符。辨識符亦可以指示例如時間辨識符（TID）、層辨識符（LID）等。辨識符亦可以指示媒體資料表示的訊框或圖片的特定切片、圖塊或其他部分。The server device 60 may also receive data indicating whether the media data is discardable (202) and an identifier of the media data (204). The identifier may be, for example, a frame number, a picture order count (POC) value, and/or other such identifiers. The identifier may also indicate, for example, a time identifier (TID), a layer identifier (LID), etc. The identifier may also indicate a specific slice, tile, or other portion of a frame or picture represented by the media data.

伺服器設備60接著可以將媒體資料封裝到封包中（206）。替代地，在一些實例中，接收到的媒體資料可能已經被封裝在網路封包中。在任何情況下，根據圖6的方法，伺服器設備60可以將資料添加到表示辨識符的封包的RTP標頭擴展中（208）。辨識符本身可以指示封包的媒體資料是否是可丟棄的，或伺服器設備60亦可以將資料添加到RTP標頭擴展，該資料指示封包的媒體資料是否是可丟棄的。例如，若媒體資料是根據未被發送給客戶端設備的參考媒體資料進行預測的，則媒體資料可以是可丟棄的，這是例如因為參考媒體資料處於客戶端設備所存取的隨機存取點之前。伺服器設備60接著可以向客戶端設備發送封包。The server device 60 may then encapsulate the media data into a packet (206). Alternatively, in some examples, the received media data may already be encapsulated in a network packet. In any case, according to the method of FIG. 6 , the server device 60 may add data to an RTP header extension of the packet that indicates the identifier (208). The identifier itself may indicate whether the media data of the packet is discardable, or the server device 60 may also add data to the RTP header extension that indicates whether the media data of the packet is discardable. For example, the media data may be discardable if the media data is predicted based on reference media data that has not been sent to the client device, for example because the reference media data is before the random access point accessed by the client device. The server device 60 may then send the packet to the client device.

圖7是圖示了根據本案的技術的包括接收封包的實例方法的流程圖，該封包包括媒體資料。在該實例中，例如，圖1的客戶端設備40最初可以接收包括媒體資料的封包（250）。亦即，封包可以包括與封包分離的封包標頭和有效載荷。有效載荷可以與封包的應用層資料（例如根據ISO基礎媒體檔案格式而格式化的媒體資料）對應，如上文關於圖5所解釋的。例如，有效載荷可以包括PDU集合的XR資料、音訊資料及/或視訊資料。FIG. 7 is a flow chart illustrating an example method including receiving a packet including media data according to the present invention. In this example, for example, the client device 40 of FIG. 1 may initially receive a packet including media data (250). That is, the packet may include a packet header and a payload separated from the packet. The payload may correspond to application layer data of the packet (e.g., media data formatted according to the ISO base media file format), as explained above with respect to FIG. 5. For example, the payload may include XR data, audio data, and/or video data of a PDU set.

根據本案的技術，封包標頭可以包括RTP標頭擴展。因此，除了其他資料之外，RTP標頭擴展可以包括封包的媒體資料的辨識符。客戶端設備40可以從RTP標頭擴展中提取媒體資料的辨識符（252）。使用者客戶端設備40接著可以使用辨識符來決定媒體資料是否是可丟棄的（254）。例如，客戶端設備40可以決定辨識符包括媒體資料的圖片的POC值、圖片的切片或圖塊辨識符、圖片的TID及/或LID。客戶端設備40亦可以決定包括媒體資料的位元串流是否是從除位元串流的開頭之外的流存取點被隨機存取的。若位元串流被隨機存取，則客戶端設備40亦可以決定流存取點是否遵循封包的媒體資料的一或多個參考圖片，使得參考圖片不會被接收到。若尚未接收到參考圖片，則客戶端設備40可以決定媒體資料是可丟棄的。According to the technology of the present case, the packet header may include an RTP header extension. Therefore, in addition to other data, the RTP header extension may include an identifier of the media data of the packet. The client device 40 can extract the identifier of the media data from the RTP header extension (252). The user client device 40 can then use the identifier to determine whether the media data is discardable (254). For example, the client device 40 can determine that the identifier includes a POC value of a picture of the media data, a slice or tile identifier of the picture, a TID and/or LID of the picture. The client device 40 can also determine whether the bit stream including the media data is randomly accessed from a stream access point other than the beginning of the bit stream. If the bit stream is accessed randomly, the client device 40 can also determine whether the stream access point follows one or more reference pictures of the packetized media data, so that the reference pictures will not be received. If the reference pictures have not been received, the client device 40 can determine that the media data is discardable.

回應於決定媒體資料是可丟棄的（256的「是」分支），客戶端設備40可以在不向例如視訊解碼器48（圖1）發送媒體資料的情況下丟棄媒體資料（258）。回應於決定媒體資料不是可丟棄的（256的「否」分支），客戶端設備40可以將媒體資料轉發給視訊解碼器48（260）。In response to determining that the media data is discardable (the "yes" branch of 256), client device 40 may discard the media data without sending the media data to, for example, video decoder 48 (FIG. 1) (258). In response to determining that the media data is not discardable (the "no" branch of 256), client device 40 may forward the media data to video decoder 48 (260).

儘管關於圖1的客戶端設備40進行瞭解釋，但圖7的方法亦可以由例如圖2的客戶端設備120進行。同樣，類似方法可以由圖2的UPF 102進行。在UPF 102或另一設備進行圖7的方法的實例中，當媒體資料被發送給視訊解碼器時，可以假設視訊解碼器形成客戶端設備120的一部分，使得向視訊解碼器發送媒體資料包括向客戶端設備120發送封包。Although explained with respect to the client device 40 of Figure 1, the method of Figure 7 may also be performed by, for example, the client device 120 of Figure 2. Likewise, a similar method may be performed by the UPF 102 of Figure 2. In an example where the method of Figure 7 is performed by the UPF 102 or another device, when the media data is sent to a video decoder, it may be assumed that the video decoder forms part of the client device 120, such that sending the media data to the video decoder includes sending packets to the client device 120.

以這種方式，圖7的方法表示方法的實例，該方法包括：接收包括封包標頭和有效載荷的封包，該有效載荷包括視訊資料的訊框的至少一部分，封包標頭與有效載荷分離；從封包標頭中提取視訊資料的訊框的視訊訊框辨識符；及根據視訊訊框辨識符來處理有效載荷。In this manner, the method of FIG. 7 represents an example of a method comprising: receiving a packet comprising a packet header and a payload, the payload comprising at least a portion of a frame of video data, the packet header being separated from the payload; extracting a video frame identifier of the frame of video data from the packet header; and processing the payload based on the video frame identifier.

在以下條款中概述了本案的技術的各種實例。Various examples of the techniques of this case are outlined in the following clauses.

條款1：一種接收視訊資料的方法，該方法包括：接收包括封包標頭和有效載荷的封包，該有效載荷包括視訊資料的訊框的至少一部分，封包標頭與有效載荷分離；從封包標頭中提取視訊資料的訊框的視訊訊框辨識符；及根據視訊訊框辨識符來處理有效載荷。Clause 1: A method for receiving video data, the method comprising: receiving a packet including a packet header and a payload, the payload including at least a portion of a frame of video data, the packet header being separate from the payload; extracting a video frame identifier of the frame of video data from the packet header; and processing the payload based on the video frame identifier.

條款2：根據條款1的方法，其中視訊訊框辨識符包括圖片順序計數（POC）值。Clause 2: A method according to clause 1, wherein the video frame identifier comprises a picture order count (POC) value.

條款3：根據條款1的方法，其中視訊訊框辨識符包括顯示訊框辨識符或當前訊框辨識符。Clause 3: A method according to clause 1, wherein the video frame identifier comprises a display frame identifier or a current frame identifier.

條款4：根據條款1至3中任一項的方法，其中訊框的至少一部分包括訊框的切片，並且其中視訊訊框辨識符包括切片的切片辨識符。Clause 4: A method according to any of clauses 1 to 3, wherein at least a portion of the frame comprises a slice of the frame, and wherein the video frame identifier comprises a slice identifier of the slice.

條款5：根據條款1至4中任一項的方法，其中訊框的至少一部分包括訊框的圖塊，並且其中視訊訊框辨識符包括圖塊的圖塊辨識符。Clause 5: A method according to any of clauses 1 to 4, wherein at least a portion of the frame comprises a tile of the frame, and wherein the video frame identifier comprises a tile identifier of the tile.

條款6：根據條款1至5中任一項的方法，該方法亦包括：使用視訊訊框辨識符來決定訊框的框架類型、訊框的優先順序或訊框的相關性資訊中的一者或多者。Clause 6: A method according to any one of clauses 1 to 5, the method also comprising: using the video frame identifier to determine one or more of a frame type of the frame, a priority of the frame, or relevance information of the frame.

條款7：根據條款1至6中任一項的方法，其中視訊資料的訊框的至少一部分包括協定資料單元（PDU）集合中的PDU。Clause 7: A method according to any of clauses 1 to 6, wherein at least a portion of the frames of video data comprises a protocol data unit (PDU) in a set of PDUs.

條款8：根據條款1至7中任一項的方法，該方法亦包括：從封包標頭中提取以下各項中的一者或多者：訊框的至少一部分的網路抽象層（NAL）單元類型、訊框的至少一部分的時間辨識符（TID）、訊框的至少一部分的層辨識符（LID）、指示訊框的至少一部分是否被訊框內預測編碼的資料或指示至少一部分是否可丟棄的資料。Clause 8: A method according to any one of clauses 1 to 7, the method also comprising: extracting one or more of the following from a packet header: a network abstraction layer (NAL) unit type of at least a portion of a frame, a time identifier (TID) of at least a portion of a frame, a layer identifier (LID) of at least a portion of a frame, data indicating whether at least a portion of a frame is predictively coded within a frame, or data indicating whether at least a portion is discardable.

條款9：根據條款1至8中任一項的方法，該方法亦包括：處理訊框的網路抽象層（NAL）單元標頭，該NAL單元標頭包括指示訊框的優先順序值的資料。Clause 9: A method according to any one of clauses 1 to 8, the method also comprising: processing a network abstraction layer (NAL) unit header of a frame, the NAL unit header comprising data indicating a priority value of the frame.

條款10：根據條款1至9中任一項的方法，該方法亦包括：處理與訊框對應的存取單元的存取單元定界符（AUD），該AUD包括表示存取單元的優先順序值的資料。Clause 10: A method according to any one of clauses 1 to 9, the method also comprising: processing an access unit delimiter (AUD) of an access unit corresponding to a frame, the AUD comprising data representing a priority value of the access unit.

條款11：根據條款1至10中任一項的方法，該方法亦包括：處理訊框的圖片標頭，該圖片標頭包括表示訊框的優先順序值的資料。Clause 11: A method according to any one of clauses 1 to 10, the method also comprising: processing a picture header of a frame, the picture header comprising data representing a priority value of the frame.

條款12：根據條款1至8中任一項的方法，該方法亦包括：處理訊框的至少一部分的開放位元串流單元（OBU），該OBU包括表示訊框的至少一部分的優先順序值的資料。Clause 12: A method according to any one of clauses 1 to 8, the method also comprising: processing an open bitstream unit (OBU) of at least a portion of the frame, the OBU comprising data representing a priority value of at least a portion of the frame.

條款13：根據條款12的方法，其中OBU包括訊框標頭OBU或中繼資料OBU中的一者。Clause 13: A method according to clause 12, wherein the OBU comprises one of a frame header OBU or a metadata OBU.

條款14：根據條款1至13中任一項的方法，該方法亦包括：接收指示訊框與按訊框的編碼順序的參考訊框之間的圖片距離的資料。Clause 14: A method according to any one of clauses 1 to 13, the method also comprising: receiving data indicating a picture distance between a frame and a reference frame in a coding order of the frames.

條款15：根據條款14的方法，其中接收指示圖片距離的資料包括：接收包括資料的網路抽象層（NAL）單元或包括資料的補充增強資訊（SEI）訊息。Clause 15: A method according to clause 14, wherein receiving data indicating a picture distance comprises: receiving a network abstraction layer (NAL) unit including the data or a supplemental enhancement information (SEI) message including the data.

條款16：根據條款14和15中任一項的方法，其中接收指示訊框與訊框的參考訊框之間的圖片距離的資料包括：接收指示訊框與按編碼順序的每個活動參考訊框之間的圖片距離的資料。Clause 16: A method according to any one of clauses 14 and 15, wherein receiving data indicating a picture distance between a frame and a reference frame of the frame comprises: receiving data indicating a picture distance between the frame and each active reference frame in coding order.

條款17：根據條款1至16中任一項的方法，該方法亦包括：接收指示視訊資料的訊框的部分是否可以在沒有訊框的其他部分的情況下被解碼的資訊。Clause 17: A method according to any one of clauses 1 to 16, the method also comprising: receiving information indicating whether a portion of a frame of video data can be decoded without other portions of the frame.

條款18：根據條款17的方法，其中訊框的部分包括協定資料單元（PDU），其中訊框與包括PDU的PDU集合對應，並且其中資訊指示PDU是否可以在沒有PDU集合中的其他PDU的情況下被解碼。Clause 18: A method according to clause 17, wherein the portion of the frame comprises a protocol data unit (PDU), wherein the frame corresponds to a PDU set comprising the PDU, and wherein the information indicates whether the PDU can be decoded without other PDUs in the PDU set.

條款19：根據條款1至18中任一項的方法，該方法亦包括：接收指示迴路濾波是否將跨訊框的部分與訊框的一或多個其他部分之間的一或多個邊界進行的資訊。Clause 19: A method according to any one of clauses 1 to 18, the method also comprising: receiving information indicating whether loop filtering is to be performed across one or more boundaries between a portion of the frame and one or more other portions of the frame.

條款20：根據條款1至19中任一項的方法，該方法亦包括：接收指示訊框的部分是否在沒有根據訊框的其他部分進行預測的情況下被獨立編碼的資訊。Clause 20: A method according to any one of clauses 1 to 19, the method also comprising: receiving information indicating whether a portion of the frame is independently encoded without prediction based on other portions of the frame.

條款21：根據條款1至20中任一項的方法，該方法亦包括：接收指示訊框是可丟棄訊框的資料。Clause 21: A method according to any one of clauses 1 to 20, the method also comprising: receiving data indicating that the frame is a discardable frame.

條款22：根據條款21的方法，其中接收指示訊框是可丟棄訊框的資料包括：利用網路抽象層（NAL）單元、存取單元定界符、圖片標頭、切片標頭、補充增強資訊（SEI）訊息或訊框標頭開放位元串流單元（OBU）來接收訊框。Clause 22: A method according to clause 21, wherein receiving data indicating that a frame is a discardable frame comprises: receiving the frame using a network abstraction layer (NAL) unit, an access unit delimiter, a picture header, a slice header, a supplemental enhancement information (SEI) message, or a frame header open bitstream unit (OBU).

條款23：根據條款1至22中任一項的方法，其中處理有效載荷包括：決定訊框是漸進解碼器刷新（GDR）訊框，並且決定訊框的至少一部分被獨立編碼；及回應於決定訊框的至少一部分被獨立編碼，向視訊解碼器提供訊框的至少一部分。Clause 23: A method according to any of clauses 1 to 22, wherein processing the payload comprises: determining that the frame is a progressive decoder refresh (GDR) frame and determining that at least a portion of the frame is independently encoded; and providing at least a portion of the frame to a video decoder in response to determining that at least a portion of the frame is independently encoded.

條款24：根據條款1至22中任一項的方法，其中處理有效載荷包括：決定訊框是漸進解碼器刷新（GDR）訊框，並且決定訊框的至少一部分相對於參考訊框被編碼；決定GDR訊框是為包括視訊資料的位元串流檢索的順序第一訊框，使得參考訊框尚未被檢索；及回應於參考訊框尚未被檢索，丟棄視訊訊框的至少一部分。Clause 24: A method according to any of clauses 1 to 22, wherein processing the payload comprises: determining that the frame is a progressive decoder refresh (GDR) frame and determining that at least a portion of the frame is encoded relative to a reference frame; determining that the GDR frame is the first frame in sequence to be retrieved for a bit stream comprising video data such that a reference frame has not yet been retrieved; and in response to the reference frame not having been retrieved, discarding at least a portion of the video frame.

條款25：一種用於檢索媒體資料的設備，該設備包括用於進行根據條款1至24中任一項的方法的一或多個部件。Clause 25: An apparatus for retrieving media data, the apparatus comprising one or more components for performing a method according to any of clauses 1 to 24.

條款26：根據條款25的設備，其中一或多個部件包括在電路系統中實現的一或多個處理器。Clause 26: Apparatus according to clause 25, wherein one or more components comprise one or more processors implemented in a circuit system.

條款27：根據條款25的設備，該設備亦包括被配置為儲存視訊資料的記憶體。Clause 27: A device according to clause 25, which also comprises a memory configured to store video data.

條款28：根據條款25的設備，其中該設備括以下各項中的至少一者：積體電路；微處理器；及無線通訊設備。Clause 28: Apparatus according to clause 25, wherein the apparatus comprises at least one of: an integrated circuit; a microprocessor; and a wireless communication device.

條款29：一種用於接收媒體資料的設備，該設備包括：用於接收包括封包標頭和有效載荷的封包的部件，該有效載荷包括視訊資料的訊框的至少一部分，封包標頭與有效載荷分離；用於從封包標頭中提取視訊資料的訊框的視訊訊框辨識符的部件；及用於根據視訊訊框辨識符來處理有效載荷的部件。Clause 29: An apparatus for receiving media data, the apparatus comprising: means for receiving a packet comprising a packet header and a payload, the payload comprising at least a portion of a frame of video data, the packet header being separate from the payload; means for extracting a video frame identifier of the frame of video data from the packet header; and means for processing the payload based on the video frame identifier.

條款30：一種接收視訊資料的方法，該方法包括：接收包括封包標頭和有效載荷的封包，該有效載荷包括視訊資料的訊框的至少一部分，封包標頭與有效載荷分離；從封包標頭中提取視訊資料的訊框的視訊訊框辨識符；及根據視訊訊框辨識符來處理有效載荷。Clause 30: A method for receiving video data, the method comprising: receiving a packet including a packet header and a payload, the payload including at least a portion of a frame of video data, the packet header being separated from the payload; extracting a video frame identifier of the frame of video data from the packet header; and processing the payload based on the video frame identifier.

條款31：根據條款30的方法，其中視訊訊框辨識符包括圖片順序計數（POC）值。Clause 31: A method according to clause 30, wherein the video frame identifier comprises a picture order count (POC) value.

條款32：根據條款30的方法，其中視訊訊框辨識符包括顯示訊框辨識符或當前訊框辨識符。Clause 32: A method according to clause 30, wherein the video frame identifier comprises a display frame identifier or a current frame identifier.

條款33：根據條款30的方法，其中訊框的至少一部分包括訊框的切片，並且其中視訊訊框辨識符包括切片的切片辨識符。Clause 33: A method according to clause 30, wherein at least a portion of the frame comprises a slice of the frame, and wherein the video frame identifier comprises a slice identifier of the slice.

條款34：根據條款30的方法，其中訊框的至少一部分包括訊框的圖塊，並且其中視訊訊框辨識符包括圖塊的圖塊辨識符。Clause 34: A method according to clause 30, wherein at least a portion of the frame comprises a tile of the frame, and wherein the video frame identifier comprises a tile identifier of the tile.

條款35：根據條款30的方法，該方法亦包括：使用視訊訊框辨識符來決定訊框的框架類型、訊框的優先順序或訊框的相關性資訊中的一者或多者。Clause 35: A method according to clause 30, the method also comprising: using the video frame identifier to determine one or more of a frame type of the frame, a priority of the frame, or relevance information of the frame.

條款36：根據條款30的方法，其中視訊資料的訊框的至少一部分包括協定資料單元（PDU）集合中的PDU。Clause 36: A method according to clause 30, wherein at least a portion of the frames of video data comprises protocol data units (PDUs) in a set of PDUs.

條款37：根據條款30的方法，該方法亦包括：從封包標頭中提取以下各項中的一者或多者：訊框的至少一部分的網路抽象層（NAL）單元類型、訊框的至少一部分的時間辨識符（TID）、訊框的至少一部分的層辨識符（LID）、指示訊框的至少一部分是否被訊框內預測編碼的資料或指示至少一部分是否可丟棄的資料。Clause 37: A method according to clause 30, the method also comprising: extracting one or more of the following from a packet header: a network abstraction layer (NAL) unit type of at least a portion of a frame, a time identifier (TID) of at least a portion of a frame, a layer identifier (LID) of at least a portion of a frame, data indicating whether at least a portion of a frame is predictively coded within a frame, or data indicating whether at least a portion is discardable.

條款38：根據條款30的方法，該方法亦包括：處理訊框的網路抽象層（NAL）單元標頭，該NAL單元標頭包括指示訊框的優先順序值的資料。Clause 38: A method according to clause 30, the method also comprising: processing a network abstraction layer (NAL) unit header of a frame, the NAL unit header comprising data indicating a priority value of the frame.

條款39：根據條款30的方法，該方法亦包括：處理與訊框對應的存取單元的存取單元定界符（AUD），該AUD包括表示存取單元的優先順序值的資料。Clause 39: A method according to clause 30, the method also comprising: processing an access unit delimiter (AUD) of an access unit corresponding to a frame, the AUD comprising data representing a priority value of the access unit.

條款40：根據條款30的方法，該方法亦包括：處理訊框的圖片標頭，該圖片標頭包括表示訊框的優先順序值的資料。Clause 40: A method according to clause 30, the method also comprising: processing a picture header of a frame, the picture header comprising data representing a priority value of the frame.

條款41：根據條款30的方法，該方法亦包括：處理訊框的至少一部分的開放位元串流單元（OBU），該OBU包括表示訊框的至少一部分的優先順序值的資料。Clause 41: The method of clause 30, further comprising: processing an open bitstream unit (OBU) of at least a portion of the frame, the OBU comprising data representing a priority value of at least a portion of the frame.

條款42：根據條款41的方法，其中OBU包括訊框標頭OBU或中繼資料OBU中的一者。Clause 42: A method according to clause 41, wherein the OBU comprises one of a frame header OBU or a metadata OBU.

條款43：根據條款30的方法，該方法亦包括：接收指示訊框與訊框的參考訊框之間的圖片距離的資料。Clause 43: A method according to clause 30, the method also comprising: receiving data indicating a picture distance between a frame and a reference frame of the frame.

條款44：根據條款43的方法，其中接收指示圖片距離的資料包括：接收包括資料的網路抽象層（NAL）單元或包括資料的補充增強資訊（SEI）訊息。Clause 44: A method according to clause 43, wherein receiving data indicating a picture distance comprises: receiving a network abstraction layer (NAL) unit including the data or a supplemental enhancement information (SEI) message including the data.

條款45：根據條款43的方法，其中接收指示訊框與訊框的參考訊框之間的圖片距離的資料包括：接收指示訊框與每個活動參考訊框之間的圖片距離的資料。Clause 45: A method according to clause 43, wherein receiving data indicating a picture distance between a frame and a reference frame of the frame comprises: receiving data indicating a picture distance between a frame and each active reference frame.

條款46：根據條款30的方法，該方法亦包括：接收指示視訊資料的訊框的部分是否可以在沒有訊框的其他部分的情況下被解碼的資訊。Clause 46: A method according to clause 30, the method also comprising: receiving information indicating whether a portion of a frame of video data can be decoded without other portions of the frame.

條款47：根據條款46的方法，其中訊框的部分包括協定資料單元（PDU），其中訊框與包括PDU的PDU集合對應，並且其中資訊指示PDU是否可以在沒有PDU集合中的其他PDU的情況下被解碼。Clause 47: A method according to clause 46, wherein the portion of the frame comprises a protocol data unit (PDU), wherein the frame corresponds to a PDU set comprising the PDU, and wherein the information indicates whether the PDU can be decoded without other PDUs in the PDU set.

條款48：根據條款30的方法，該方法亦包括：接收指示迴路濾波是否將跨訊框的部分與訊框的一或多個其他部分之間的一或多個邊界進行的資訊。Clause 48: A method according to clause 30, the method also comprising: receiving information indicating whether loop filtering is to be performed across one or more boundaries between a portion of the frame and one or more other portions of the frame.

條款49：根據條款30的方法，該方法亦包括：接收指示訊框的部分是否在沒有根據訊框的其他部分進行預測的情況下被獨立編碼的資訊。Clause 49: A method according to clause 30, the method also comprising: receiving information indicating whether a portion of a frame is independently encoded without prediction based on other portions of the frame.

條款50：根據條款30的方法，該方法亦包括：接收指示訊框是可丟棄訊框的資料。Clause 50: A method according to clause 30, the method also comprising: receiving data indicating that the frame is a discardable frame.

條款51：根據條款50的方法，其中接收指示訊框是可丟棄訊框的資料包括：以網路抽象層（NAL）單元、存取單元定界符、圖片標頭、切片標頭、補充增強資訊（SEI）訊息或訊框標頭開放位元串流單元（OBU）來接收訊框。Clause 51: A method according to clause 50, wherein receiving data indicating that a frame is a discardable frame comprises: receiving the frame in a network abstraction layer (NAL) unit, an access unit delimiter, a picture header, a slice header, a supplemental enhancement information (SEI) message, or a frame header open bitstream unit (OBU).

條款52：根據條款30的方法，其中處理有效載荷包括：決定訊框是漸進解碼器刷新（GDR）訊框，並且決定訊框的至少一部分被獨立編碼；及回應於決定訊框的至少一部分被獨立編碼，向視訊解碼器提供訊框的至少一部分。Clause 52: A method according to clause 30, wherein processing the payload comprises: determining that the frame is a progressive decoder refresh (GDR) frame and determining that at least a portion of the frame is independently encoded; and providing at least a portion of the frame to a video decoder in response to determining that at least a portion of the frame is independently encoded.

條款53：根據條款30的方法，其中處理有效載荷包括：決定訊框是漸進解碼器刷新（GDR）訊框，並且決定訊框的至少一部分相對於參考訊框被編碼；決定GDR訊框是為包括視訊資料的位元串流檢索的順序第一訊框，使得參考訊框尚未被檢索；及回應於參考訊框尚未被檢索，丟棄視訊訊框的至少一部分。Clause 53: A method according to clause 30, wherein processing the payload comprises: determining that the frame is a progressive decoder refresh (GDR) frame and determining that at least a portion of the frame is encoded relative to a reference frame; determining that the GDR frame is a first frame in sequence to be retrieved for a bit stream comprising video data such that a reference frame has not been retrieved; and in response to the reference frame not being retrieved, discarding at least a portion of the video frame.

條款54：一種接收視訊資料的方法，該方法包括：接收包括封包標頭和有效載荷的封包，該有效載荷包括視訊資料的訊框的至少一部分，封包標頭與有效載荷分離；從封包標頭中提取視訊資料的訊框的視訊訊框辨識符；及根據視訊訊框辨識符來處理有效載荷。Clause 54: A method for receiving video data, the method comprising: receiving a packet including a packet header and a payload, the payload including at least a portion of a frame of video data, the packet header being separated from the payload; extracting a video frame identifier of the frame of video data from the packet header; and processing the payload based on the video frame identifier.

條款55：根據條款54的方法，其中視訊訊框辨識符包括圖片順序計數（POC）值。Clause 55: A method according to clause 54, wherein the video frame identifier comprises a picture order count (POC) value.

條款56：根據條款54的方法，其中視訊訊框辨識符包括顯示訊框辨識符或當前訊框辨識符。Clause 56: A method according to clause 54, wherein the video frame identifier comprises a display frame identifier or a current frame identifier.

條款57：根據條款54的方法，其中訊框的至少一部分包括訊框的切片，並且其中視訊訊框辨識符包括切片的切片辨識符。Clause 57: A method according to clause 54, wherein at least a portion of the frame comprises a slice of the frame, and wherein the video frame identifier comprises a slice identifier of the slice.

條款58：根據條款54的方法，其中訊框的至少一部分包括訊框的圖塊，並且其中視訊訊框辨識符包括圖塊的圖塊辨識符。Clause 58: A method according to clause 54, wherein at least a portion of the frame comprises a tile of the frame, and wherein the video frame identifier comprises a tile identifier of the tile.

條款59：根據條款54的方法，該方法亦包括：使用視訊訊框辨識符來決定訊框的框架類型、訊框的優先順序或訊框的相關性資訊中的一者或多者。Clause 59: A method according to clause 54, the method also comprising: using the video frame identifier to determine one or more of a frame type of the frame, a priority of the frame, or relevance information of the frame.

條款60：根據條款54的方法，其中視訊資料的訊框的至少一部分包括協定資料單元（PDU）集合中的PDU。Clause 60: A method according to clause 54, wherein at least a portion of the frames of video data comprises a protocol data unit (PDU) in a set of PDUs.

條款61：根據條款54的方法，該方法亦包括：從封包標頭中提取以下各項中的一者或多者：訊框的至少一部分的網路抽象層（NAL）單元類型、訊框的至少一部分的時間辨識符（TID）、訊框的至少一部分的層辨識符（LID）、指示訊框的至少一部分是否被訊框內預測編碼的資料或指示至少一部分是否可丟棄的資料。Clause 61: A method according to clause 54, the method also comprising: extracting one or more of the following from a packet header: a network abstraction layer (NAL) unit type of at least a portion of a frame, a time identifier (TID) of at least a portion of a frame, a layer identifier (LID) of at least a portion of a frame, data indicating whether at least a portion of a frame is predictively coded within a frame, or data indicating whether at least a portion is discardable.

條款62：根據條款54的方法，該方法亦包括：處理訊框的網路抽象層（NAL）單元標頭，該NAL單元標頭包括指示訊框的優先順序值的資料。Clause 62: A method according to clause 54, the method also comprising: processing a network abstraction layer (NAL) unit header of a frame, the NAL unit header comprising data indicating a priority value of the frame.

條款63：根據條款54的方法，該方法亦包括：處理與訊框對應的存取單元的存取單元定界符（AUD），該AUD包括表示存取單元的優先順序值的資料。Clause 63: The method according to clause 54, the method also comprising: processing an access unit delimiter (AUD) of an access unit corresponding to a frame, the AUD comprising data representing a priority value of the access unit.

條款64：根據條款54的方法，該方法亦包括：處理訊框的圖片標頭，該圖片標頭包括表示訊框的優先順序值的資料。Clause 64: A method according to clause 54, the method also comprising: processing a picture header of a frame, the picture header comprising data representing a priority value of the frame.

條款65：根據條款54的方法，該方法亦包括：處理訊框的至少一部分的開放位元串流單元（OBU），該OBU包括表示訊框的至少一部分的優先順序值的資料。Clause 65: The method of clause 54, further comprising: processing an open bitstream unit (OBU) of at least a portion of the frame, the OBU comprising data representing a priority value of at least a portion of the frame.

條款66：根據條款65的方法，其中OBU包括訊框標頭OBU或中繼資料OBU中的一者。Clause 66: A method according to clause 65, wherein the OBU comprises one of a frame header OBU or a metadata OBU.

條款67：根據條款54的方法，該方法亦包括：接收指示訊框與訊框的參考訊框之間的圖片距離的資料。Clause 67: A method according to clause 54, the method also comprising: receiving data indicating a picture distance between a frame and a reference frame of the frame.

條款68：根據條款67的方法，其中接收指示圖片距離的資料包括：接收包括資料的網路抽象層（NAL）單元或包括資料的補充增強資訊（SEI）訊息。Clause 68: A method according to clause 67, wherein receiving data indicating a picture distance comprises: receiving a network abstraction layer (NAL) unit including the data or a supplemental enhancement information (SEI) message including the data.

條款69：根據條款67的方法，其中接收指示訊框與訊框的參考訊框之間的圖片距離的資料包括：接收指示訊框與每個活動參考訊框之間的圖片距離的資料。Clause 69: A method according to clause 67, wherein receiving data indicating a picture distance between a frame and a reference frame of the frame comprises: receiving data indicating a picture distance between a frame and each active reference frame.

條款70：根據條款54的方法，該方法亦包括：接收指示視訊資料的訊框的部分是否可以在沒有訊框的其他部分的情況下被解碼的資訊。Clause 70: A method according to clause 54, the method also comprising: receiving information indicating whether a portion of a frame of video data can be decoded without other portions of the frame.

條款71：根據條款70的方法，其中訊框的部分包括協定資料單元（PDU），其中訊框與包括PDU的PDU集合對應，並且其中資訊指示PDU是否可以在沒有PDU集合中的其他PDU的情況下被解碼。Clause 71: A method according to clause 70, wherein the portion of the frame comprises a protocol data unit (PDU), wherein the frame corresponds to a PDU set comprising the PDU, and wherein the information indicates whether the PDU can be decoded without other PDUs in the PDU set.

條款72：根據條款54的方法，該方法亦包括：接收指示迴路濾波是否將跨訊框的部分與訊框的一或多個其他部分之間的一或多個邊界進行的資訊。Clause 72: The method of clause 54, further comprising: receiving information indicating whether loop filtering is to be performed across one or more boundaries between a portion of the frame and one or more other portions of the frame.

條款73：根據條款54的方法，該方法亦包括：接收指示訊框的部分是否在沒有根據訊框的其他部分進行預測的情況下被獨立編碼的資訊。Clause 73: A method according to clause 54, the method also comprising: receiving information indicating whether a portion of a frame is independently encoded without prediction based on other portions of the frame.

條款74：根據條款54的方法，該方法亦包括：接收指示訊框是可丟棄訊框的資料。Clause 74: The method according to clause 54, the method also comprising: receiving data indicating that the frame is a discardable frame.

條款75：根據條款74的方法，其中接收指示訊框是可丟棄訊框的資料包括：以網路抽象層（NAL）單元、存取單元定界符、圖片標頭、切片標頭、補充增強資訊（SEI）訊息或訊框標頭開放位元串流單元（OBU）來接收訊框。Clause 75: A method according to clause 74, wherein receiving data indicating that a frame is a discardable frame comprises: receiving the frame in a network abstraction layer (NAL) unit, an access unit delimiter, a picture header, a slice header, a supplemental enhancement information (SEI) message, or a frame header open bitstream unit (OBU).

條款76：根據條款54的方法，其中處理有效載荷包括：決定訊框是漸進解碼器刷新（GDR）訊框，並且決定訊框的至少一部分被獨立編碼；及回應於決定訊框的至少一部分被獨立編碼，向視訊解碼器提供訊框的至少一部分。Clause 76: A method according to clause 54, wherein processing the payload comprises: determining that the frame is a progressive decoder refresh (GDR) frame and determining that at least a portion of the frame is independently encoded; and in response to determining that at least a portion of the frame is independently encoded, providing at least a portion of the frame to a video decoder.

條款77：根據條款54的方法，其中處理有效載荷包括：決定訊框是漸進解碼器刷新（GDR）訊框，並且決定訊框的至少一部分相對於參考訊框被編碼；決定GDR訊框是為包括視訊資料的位元串流檢索的順序第一訊框，使得參考訊框尚未被檢索；及回應於參考訊框尚未被檢索，丟棄視訊訊框的至少一部分。Clause 77: A method according to clause 54, wherein processing the payload comprises: determining that the frame is a progressive decoder refresh (GDR) frame and determining that at least a portion of the frame is encoded relative to a reference frame; determining that the GDR frame is a first frame in a sequence to be retrieved for a bit stream comprising video data such that a reference frame has not been retrieved; and in response to the reference frame not being retrieved, discarding at least a portion of the video frame.

條款78：一種用於檢索媒體資料的設備，該設備包括：記憶體；及處理系統，該處理系統包括在電路系統中實現的一或多個處理器，處理系統被配置為：接收包括封包標頭和有效載荷的封包，該有效載荷包括視訊資料的訊框的至少一部分，封包標頭與有效載荷分離；從封包標頭中提取視訊資料的訊框的視訊訊框辨識符；及根據視訊訊框辨識符來處理有效載荷。Clause 78: A device for retrieving media data, the device comprising: a memory; and a processing system, the processing system comprising one or more processors implemented in a circuit system, the processing system being configured to: receive a packet comprising a packet header and a payload, the payload comprising at least a portion of a frame of video data, the packet header being separated from the payload; extract a video frame identifier of the frame of video data from the packet header; and process the payload based on the video frame identifier.

條款79：根據條款78的設備，其中視訊訊框辨識符包括圖片順序計數（POC）值、顯示訊框辨識符或當前訊框辨識符中的至少一者。Clause 79: Apparatus according to clause 78, wherein the video frame identifier comprises at least one of a picture order count (POC) value, a display frame identifier, or a current frame identifier.

條款80：根據條款78的設備，其中訊框的至少一部分包括訊框的切片或訊框的圖塊，並且其中視訊訊框辨識符包括訊框的切片或訊框的圖塊的辨識符。Clause 80: Apparatus according to clause 78, wherein at least a portion of the frame comprises a slice of the frame or a tile of the frame, and wherein the video frame identifier comprises an identifier of the slice of the frame or the tile of the frame.

條款81：根據條款78的設備，其中處理系統被配置為使用視訊訊框辨識符來決定訊框的框架類型、訊框的優先順序或訊框的相關性資訊中的一者或多者。Clause 81: An apparatus according to clause 78, wherein the processing system is configured to use the video frame identifier to determine one or more of a frame type of the frame, a priority of the frame, or relevance information of the frame.

條款82：根據條款78的設備，其中視訊資料的訊框的至少一部分包括協定資料單元（PDU）集合中的PDU。Clause 82: Apparatus according to clause 78, wherein at least a portion of the frames of the video data comprises a protocol data unit (PDU) in a set of PDUs.

條款83：根據條款78的設備，其中處理系統亦被配置為從封包標頭中提取以下各項中的一者或多者：訊框的至少一部分的網路抽象層（NAL）單元類型、訊框的至少一部分的時間辨識符（TID）、訊框的至少一部分的層辨識符（LID）、指示訊框的至少一部分是否被訊框內預測編碼的資料或指示至少一部分是否可丟棄的資料。Clause 83: An apparatus according to clause 78, wherein the processing system is also configured to extract one or more of the following from the packet header: a network abstraction layer (NAL) unit type of at least a portion of the frame, a time identifier (TID) of at least a portion of the frame, a layer identifier (LID) of at least a portion of the frame, data indicating whether at least a portion of the frame is predictively coded within the frame, or data indicating whether at least a portion is discardable.

條款84：根據條款78的設備，其中處理系統亦被配置為處理訊框的網路抽象層（NAL）單元標頭，該NAL單元標頭包括指示訊框的優先順序值的資料。Clause 84: Apparatus according to clause 78, wherein the processing system is also configured to process a network abstraction layer (NAL) unit header of a frame, the NAL unit header comprising data indicating a priority value of the frame.

條款85：根據條款78的設備，其中處理系統亦被配置為處理與訊框對應的存取單元的存取單元定界符（AUD），該AUD包括表示存取單元的優先順序值的資料。Clause 85: Apparatus according to clause 78, wherein the processing system is also configured to process an access unit delimiter (AUD) of an access unit corresponding to a frame, the AUD comprising data representing a priority value of the access unit.

條款86：根據條款78的設備，其中處理系統亦被配置為處理訊框的圖片標頭，該圖片標頭包括表示訊框的優先順序值的資料。Clause 86: Apparatus according to clause 78, wherein the processing system is also configured to process a picture header of a frame, the picture header comprising data representing a priority value of the frame.

條款87：根據條款78的設備，其中處理系統亦被配置為處理訊框的至少一部分的開放位元串流單元（OBU），該OBU包括表示訊框的至少一部分的優先順序值的資料。Clause 87: Apparatus according to clause 78, wherein the processing system is also configured to process an open bitstream unit (OBU) of at least a portion of the frame, the OBU comprising data representing a priority value of at least a portion of the frame.

條款88：根據條款78的設備，其中處理系統亦被配置為接收指示訊框與訊框的參考訊框之間的圖片距離的資料，該資料被包括在網路抽象層（NAL）單元或補充增強資訊（SEI）訊息中。Clause 88: Apparatus according to clause 78, wherein the processing system is also configured to receive data indicating a picture distance between a frame and a reference frame of the frame, the data being included in a network abstraction layer (NAL) unit or a supplemental enhancement information (SEI) message.

條款89：根據條款78的設備，其中處理系統亦被配置為接收指示視訊資料的訊框的部分是否可以在沒有訊框的其他部分的情況下被解碼的資訊。Clause 89: Apparatus according to clause 78, wherein the processing system is also configured to receive information indicating whether portions of a frame of video data can be decoded without other portions of the frame.

條款90：根據條款78的設備，其中處理系統亦被配置為接收指示迴路濾波是否將跨訊框的部分與訊框的一或多個其他部分之間的一或多個邊界進行的資訊。Clause 90: Apparatus according to clause 78, wherein the processing system is also configured to receive information indicating whether loop filtering is to be performed across one or more boundaries between a portion of the frame and one or more other portions of the frame.

條款91：根據條款78的設備，其中為了處理有效載荷，處理系統被配置為：決定訊框是漸進解碼器刷新（GDR）訊框，並且決定訊框的至少一部分被獨立編碼；及回應於決定訊框的至少一部分被獨立編碼，向視訊解碼器提供訊框的至少一部分。Clause 91: Apparatus according to clause 78, wherein, to process the payload, the processing system is configured to: determine that the frame is a progressive decoder refresh (GDR) frame and determine that at least a portion of the frame is independently encoded; and in response to determining that at least a portion of the frame is independently encoded, provide at least a portion of the frame to a video decoder.

條款92：根據條款78的設備，其中為了處理有效載荷，處理系統被配置為：決定訊框是漸進解碼器刷新（GDR）訊框，並且決定訊框的至少一部分相對於參考訊框被編碼；決定GDR訊框是為包括視訊資料的位元串流檢索的順序第一訊框，使得參考訊框尚未被檢索；及回應於參考訊框尚未被檢索，丟棄視訊訊框的至少一部分。Clause 92: An apparatus according to clause 78, wherein, to process the payload, the processing system is configured to: determine that a frame is a progressive decoder refresh (GDR) frame and determine that at least a portion of the frame is encoded relative to a reference frame; determine that the GDR frame is a first frame in a sequence to be retrieved for a bit stream comprising video data such that a reference frame has not yet been retrieved; and in response to the reference frame not having been retrieved, discard at least a portion of the video frame.

條款93：一種用於接收視訊資料的設備，該設備包括：用於接收包括封包標頭和有效載荷的封包的部件，該有效載荷包括視訊資料的訊框的至少一部分，封包標頭與有效載荷分離；用於從封包標頭中提取視訊資料的訊框的視訊訊框辨識符的部件；及用於根據視訊訊框辨識符來處理有效載荷的部件。Clause 93: An apparatus for receiving video data, the apparatus comprising: means for receiving a packet comprising a packet header and a payload, the payload comprising at least a portion of a frame of video data, the packet header being separate from the payload; means for extracting a video frame identifier of the frame of video data from the packet header; and means for processing the payload based on the video frame identifier.

條款94：一種在其上儲存有指令的電腦可讀取儲存媒體，該指令在被執行時使處理器進行以下操作：接收包括封包標頭和有效載荷的封包，該有效載荷包括視訊資料的訊框的至少一部分，封包標頭與有效載荷分離；從封包標頭中提取視訊資料的訊框的視訊訊框辨識符；及根據視訊訊框辨識符來處理有效載荷。Clause 94: A computer-readable storage medium having stored thereon instructions that, when executed, cause a processor to: receive a packet comprising a packet header and a payload, the payload comprising at least a portion of a frame of video data, the packet header being separate from the payload; extracting a video frame identifier of the frame of video data from the packet header; and processing the payload based on the video frame identifier.

在一或多個實例中，所描述的功能可以利用硬體、軟體、韌體或它們的任何組合來實現。若利用軟體來實現，則這些功能可以作為一或多個指令或代碼儲存在電腦可讀取媒體上或經由電腦可讀取媒體發送，並且由基於硬體的處理單元執行。電腦可讀取媒體可以包括電腦可讀取儲存媒體，該電腦可讀取儲存媒體與有形媒體（諸如資料儲存媒體）或通訊媒體對應，該通訊媒體包括有助於例如根據通訊協定來將電腦程式從一個地方傳送到另一地方的任何媒體。以這種方式，電腦可讀取媒體通常可以與（1）非暫時性的有形電腦可讀取儲存媒體或（2）通訊媒體（諸如訊號或載波）對應。資料儲存媒體可以是可被一或多個電腦或一或多個處理器存取以檢索用於實現本案中所描述的技術的指令、代碼及/或資料結構的任何可用媒體。電腦程式產品可以包括電腦可讀取媒體。In one or more examples, the functions described may be implemented using hardware, software, firmware, or any combination thereof. If implemented using software, the functions may be stored on or sent via a computer-readable medium as one or more instructions or codes and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media that corresponds to tangible media (such as data storage media) or communication media that includes any media that facilitates, for example, transferring a computer program from one place to another according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) non-transitory, tangible computer-readable storage media or (2) communications media (such as signals or carrier waves). Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described herein. A computer program product may include computer-readable media.

作為實例而非限制，此類電腦可讀取儲存媒體可以包括RAM、ROM、EEPROM、CD-ROM或其他光碟儲存裝置、磁性儲存設備或其他磁性存放裝置、快閃記憶體或可用於儲存呈指令或資料結構形式的期望程式碼並且可被電腦存取的任何其他媒體。而且，任何連接皆被適當地稱作電腦可讀取媒體。例如，若使用同軸電纜、光纖電纜、雙絞線、數位用戶線路（DSL）或無線技術（諸如紅外線、無線電和微波）從網站、伺服器或其他遠端源發送指令，則將同軸電纜、光纖電纜、雙絞線、DSL或無線技術（諸如紅外線、無線電和微波）包括在媒體的定義中。然而，應當理解，電腦可讀取儲存媒體和資料儲存媒體不包括連接、載波、訊號或其他暫時性媒體，而是涉及非暫時性有形儲存媒體。如本文中所使用，磁碟和光碟包括光碟（CD）、雷射光碟、光碟、數位多功能光碟（DVD）、軟碟和藍光光碟，其中磁碟通常以磁性方式複製資料，而光碟利用鐳射以光學方式複製資料。以上各項的組合亦應該被包括在電腦可讀取媒體的範疇內。By way of example and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic storage devices or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Moreover, any connection is properly referred to as a computer-readable medium. For example, if coaxial cable, fiber optic cable, twisted pair cable, digital subscriber line (DSL), or wireless technologies (such as infrared, radio, and microwave) are used to send instructions from a website, server, or other remote source, the coaxial cable, fiber optic cable, twisted pair cable, DSL, or wireless technologies (such as infrared, radio, and microwave) are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carriers, signals or other transient media, but rather refer to non-transient tangible storage media. As used herein, magnetic disks and optical discs include compact disks (CDs), laser discs, optical discs, digital versatile discs (DVDs), floppy disks, and Blu-ray discs, where magnetic disks typically copy data magnetically and optical discs use lasers to copy data optically. Combinations of the above should also be included in the scope of computer-readable media.

指令可以由一或多個處理器（諸如一或多個數位訊號處理器（DSP）、通用微處理器、特殊應用積體電路（ASIC）、現場可程式設計邏輯陣列（FPGA）或其他等效的整合或個別邏輯電路系統）執行。因此，如本文中所使用的術語「處理器」可以指任何前述結構或適合於實現本文中所描述的技術的任何其他結構。此外，在一些態樣，本文中所描述的功能性可以被設置在被配置成編碼和解碼的專用硬體及/或軟體模組內或被包含在組合的轉碼器中。同樣，這些技術可以完全在一或多個電路或邏輯部件中實現。Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable logic arrays (FPGAs), or other equivalent integrated or individual logic circuit systems. Therefore, the term "processor" as used herein may refer to any of the foregoing structures or any other structure suitable for implementing the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding or included in a combined transcoder. Likewise, these techniques may be implemented entirely in one or more circuits or logic components.

本案的技術可以在多種設備或裝置（包括無線手機、積體電路（IC）或IC集合（例如晶片集））中實現。在本案中描述了各種部件、模組或單元來強調被配置為進行所揭示的技術的設備的功能態樣，但並不一定需要由不同的硬體單元實現。相反，如前述，各種單元可以結合適合的軟體及/或韌體被組合在轉碼器硬體單元中或由交交交互動操作硬體單元的類集（包括如前述的一或多個處理器）提供。The technology of the present case can be implemented in a variety of devices or apparatuses, including wireless phones, integrated circuits (ICs) or IC collections (such as chipsets). Various components, modules or units are described in the present case to emphasize the functional aspects of the device configured to perform the disclosed technology, but they do not necessarily need to be implemented by different hardware units. On the contrary, as mentioned above, various units can be combined in a transcoder hardware unit in combination with suitable software and/or firmware or provided by a collection of interactively operating hardware units (including one or more processors as mentioned above).

已經描述了各種實例。這些和其他實例在以下申請專利範圍的範疇內。Various examples have been described. These and other examples are within the scope of the following claims.

10:系統 20:內容準備設備 22:音訊源 24:視訊源 26:音訊編碼器 28:視訊轉碼器 30:封裝單元 32:輸出介面 40:客戶端設備 42:音訊輸出端 44:視訊輸出端 46:音訊解碼器 48:視訊解碼器 50:解封裝單元 52:RTP接收單元 54:網路介面 60:伺服器設備 64:媒體內容 70:RTP發送單元 72:網路介面 74:網路 100:應用/服務層 102:使用者平面功能（UPF） 104:偵測單元 106:封包偵測規則 110:存取網路（AN） 112:服務品質（QoS） 114:無線電介面 120:客戶端設備 122:QoS規則 124:服務品質（QoS） 126:無線電介面 140A:GDR訊框 140B:GDR訊框 140C:GDR訊框 140D:GDR訊框 142A:訊框內預測編碼區域 142B:訊框內預測編碼區域 142C:訊框內預測編碼區域 142D:訊框內預測編碼區域 144A:訊框間預測編碼區域 144B:訊框間預測編碼區域 144C:訊框間預測編碼區域 146B:訊框間預測區域 146C:訊框間預測區域 146D:訊框間預測區域 150:視訊檔 152:檔案類型（FTYP）框 154:電影（MOOV）框 156:電影標頭（MVHD）框 158:軌跡（TRAK）框 160:電影擴展（MVEX）框 162:分段索引（sidx）框 164:電影片段（MOOF）框 166:電影片段隨機存取（MFRA）框 200:方塊 202:方塊 204:方塊 206:方塊 208:方塊 210:方塊 250:方塊 252:方塊 254:方塊 256:方塊 258:方塊 260:方塊 10: System 20: Content preparation equipment 22: Audio source 24: Video source 26: Audio encoder 28: Video transcoder 30: Encapsulation unit 32: Output interface 40: Client equipment 42: Audio output port 44: Video output port 46: Audio decoder 48: Video decoder 50: Decapsulation unit 52: RTP receiving unit 54: Network interface 60: Server equipment 64: Media content 70: RTP sending unit 72: Network interface 74: Network 100: Application/service layer 102: User plane function (UPF) 104: Detection unit 106: Packet detection rules 110: Access Network (AN) 112: Quality of Service (QoS) 114: Radio Interface 120: Client Equipment 122: QoS Rules 124: Quality of Service (QoS) 126: Radio Interface 140A: GDR Frame 140B: GDR Frame 140C: GDR Frame 140D: GDR Frame 142A: Intra-frame Predicted Coding Area 142B: Intra-frame Predicted Coding Area 142C: Intra-frame Predicted Coding Area 142D: Intra-frame Predicted Coding Area 144A: Inter-frame Predicted Coding Area 144B: Inter-frame Predicted Coding Area 144C: Inter-frame Predicted Coding Area 146B: Inter-frame Predicted Area 146C: Inter-frame prediction area 146D: Inter-frame prediction area 150: Video file 152: File type (FTYP) box 154: Movie (MOOV) box 156: Movie header (MVHD) box 158: Track (TRAK) box 160: Movie extension (MVEX) box 162: Segment index (sidx) box 164: Movie fragment (MOOF) box 166: Movie fragment random access (MFRA) box 200: Block 202: Block 204: Block 206: Block 208: Block 210: Block 250: Block 252: Block 254: Block 256: Block 258: Block 260: Block

圖1是圖示了實現用於經由網路資料串流媒體資料的技術的實例系統的方塊圖。FIG. 1 is a block diagram illustrating an example system implementing techniques for streaming media data via a network data stream.

圖2是圖示了用於擴展現實（XR）傳輸量遞送的實例架構的概念圖。FIG. 2 is a conceptual diagram illustrating an example architecture for extended reality (XR) transport delivery.

圖3是圖示了視訊資料的一系列漸進解碼器刷新（GDR）訊框的概念圖。FIG3 is a conceptual diagram illustrating a series of progressive decoder refresh (GDR) frames of video data.

圖4是圖示了協定資料單元（PDU）的標識資訊的實例集合的概念圖。FIG. 4 is a conceptual diagram illustrating an example set of identification information of a protocol data unit (PDU).

圖5是圖示了實例視訊檔的元素的方塊圖。FIG5 is a block diagram illustrating elements of an example video file.

圖6是圖示了根據本案的技術的包括發送封包的實例方法的流程圖，該封包包括媒體資料。Figure 6 is a flow chart illustrating an example method of sending a packet including media data according to the technology of the present case.

圖7是圖示了根據本案的技術的包括接收封包的實例方法的流程圖，該封包包括媒體資料。Figure 7 is a flow chart illustrating an example method of receiving a packet including media data according to the technology of the present case.

國內寄存資訊(請依寄存機構、日期、號碼順序註記) 無國外寄存資訊(請依寄存國家、機構、日期、號碼順序註記) 無 Domestic storage information (please note in the order of storage institution, date, and number) None Foreign storage information (please note in the order of storage country, institution, date, and number) None

250:方塊 250:Block

252:方塊 252: Block

254:方塊 254: Block

256:方塊 256: Block

258:方塊 258: Block

260:方塊 260: Block

Claims

A method for receiving video data, the method comprising the following steps: receiving a packet including a packet header and a payload, the payload including at least a portion of a frame of video data, the packet header being separate from the payload; extracting a video frame identifier of the frame of video data from the packet header; and processing the payload according to the video frame identifier.

The method of claim 1, wherein the video frame identifier comprises a picture order count (POC) value.

The method of claim 1, wherein the video frame identifier comprises a display frame identifier or a current frame identifier.

The method of claim 1, wherein the at least a portion of the frame includes all slices of the frame, and wherein the video frame identifier includes all slice identifiers of the slice.

The method of claim 1, wherein the at least a portion of the frame comprises a tile of the frame, and wherein the video frame identifier comprises a tile identifier of the tile.

According to the method of claim 1, the method also includes the following step: using the video frame identifier to determine one or more of a frame type of the frame, a priority of the frame, or relevance information of the frame.

The method of claim 1, wherein the at least a portion of the frame of video data comprises a protocol data unit (PDU) from a set of PDUs.

According to the method of claim 1, the method also includes the following steps: extracting one or more of the following items from the packet header: a network abstraction layer (NAL) unit type of at least a portion of the frame, a time identifier (TID) of at least a portion of the frame, a layer identifier (LID) of at least a portion of the frame, data indicating whether the at least a portion of the frame is predictively encoded within the frame, or data indicating whether the at least a portion is discardable.

According to the method of claim 1, the method also includes the following step: processing a network abstraction layer (NAL) unit header of the frame, the NAL unit header including data indicating a priority value of the frame.

According to the method of claim 1, the method also includes the following step: processing an access unit delimiter (AUD) of an access unit corresponding to the frame, the AUD including data representing a priority value of the access unit.

According to the method of claim 1, the method also includes the following steps: processing a picture header of the frame, the picture header including data representing a priority value of the frame.

According to the method of claim 1, the method also includes the following steps: processing an open bitstream unit (OBU) of at least a portion of the frame, the OBU including data representing a priority value of the at least a portion of the frame.

The method of claim 12, wherein the OBU comprises one of a frame header OBU or a metadata OBU.

According to the method of claim 1, the method also includes the following step: receiving data indicating a picture distance between the frame and a reference frame of the frame.

The method of claim 14, wherein receiving the data indicating the distance of the picture comprises the following steps: receiving a network abstraction layer (NAL) unit including the data or a supplemental enhancement information (SEI) message including the data.

The method of claim 14, wherein receiving the data indicating the picture distance between the frame and the reference frame of the frame comprises the following steps: receiving data indicating the picture distance between the frame and each active reference frame.

According to the method of claim 1, the method also includes the following step: receiving information indicating whether the portion of the frame of video data can be decoded without other portions of the frame.

The method of claim 17, wherein the portion of the frame includes a protocol data unit (PDU), wherein the frame corresponds to a PDU set that includes the PDU, and wherein the information indicates whether the PDU can be decoded without other PDUs in the PDU set.

According to the method of claim 1, the method also includes the following step: receiving information indicating whether loop filtering is to be performed across one or more boundaries between the portion of the frame and one or more other portions of the frame.

According to the method of claim 1, the method also includes the following step: receiving information indicating whether the part of the frame is independently encoded without prediction based on other parts of the frame.

The method of claim 1, further comprising the step of receiving data indicating that the frame is a discardable frame.

The method of claim 21, wherein receiving the data indicating that the frame is the discardable frame comprises the following steps: receiving the frame with a network abstraction layer (NAL) unit, an access unit delimiter, a picture header, a slice header, a supplemental enhancement information (SEI) message, or a frame header open bitstream unit (OBU).

The method of claim 1, wherein processing the payload comprises the steps of: determining that the frame is a progressive decoder refresh (GDR) frame, and determining that the at least a portion of the frame is independently encoded; and in response to determining that the at least a portion of the frame is independently encoded, providing the at least a portion of the frame to a video decoder.

The method of claim 1, wherein processing the payload comprises the steps of: determining that the frame is a progressive decoder refresh (GDR) frame, and determining that the at least a portion of the frame is encoded relative to a reference frame; determining that the GDR frame is a sequential first frame to be retrieved for a bit stream comprising the video data such that the reference frame has not been retrieved; and in response to the reference frame not being retrieved, discarding the at least a portion of the frame of video data.

A device for retrieving media data, the device comprising: a memory; and a processing system, the processing system comprising one or more processors implemented in a circuit system, the processing system being configured to: receive a packet including a packet header and a payload, the payload including at least a portion of a frame of video data, the packet header being separate from the payload; extract a video frame identifier of the frame of video data from the packet header; and process the payload according to the video frame identifier.

The apparatus of claim 25, wherein the video frame identifier comprises at least one of a picture order count (POC) value, a display frame identifier, or a current frame identifier.

The apparatus of claim 25, wherein the at least a portion of the frame comprises a slice of the frame or a tile of the frame, and wherein the video frame identifier comprises an identifier of the slice of the frame or the tile of the frame.

The apparatus of claim 25, wherein the processing system is configured to use the video frame identifier to determine one or more of a frame type of the frame, a priority of the frame, or relevance information of the frame.

The apparatus of claim 25, wherein the at least a portion of the frame of video data comprises a protocol data unit (PDU) from a set of PDUs.

A device according to claim 25, wherein the processing system is also configured to extract one or more of the following items from the packet header: a network abstraction layer (NAL) unit type of at least a portion of the frame, a time identifier (TID) of at least a portion of the frame, a layer identifier (LID) of at least a portion of the frame, data indicating whether the at least a portion of the frame is predictively encoded within the frame, or data indicating whether the at least a portion is discardable.

The apparatus of claim 25, wherein the processing system is also configured to process a network abstraction layer (NAL) unit header of the frame, the NAL unit header comprising data indicating a priority value of the frame.

The apparatus of claim 25, wherein the processing system is also configured to process an access unit delimiter (AUD) of an access unit corresponding to the frame, the AUD comprising data representing a priority value of the access unit.

The apparatus of claim 25, wherein the processing system is also configured to process a picture header of the frame, the picture header comprising data representing a priority value of the frame.

The apparatus of claim 25, wherein the processing system is also configured to process an open bitstream unit (OBU) of the at least a portion of the frame, the OBU comprising data representing a priority value of the at least a portion of the frame.

The apparatus of claim 25, wherein the processing system is also configured to receive data indicating a picture distance between the frame and a reference frame of the frame, the data being included in a network abstraction layer (NAL) unit or a supplemental enhancement information (SEI) message.

The apparatus of claim 25, wherein the processing system is also configured to receive information indicating whether the portion of the frame of video data can be decoded without other portions of the frame.

The apparatus of claim 25, wherein the processing system is also configured to receive information indicating whether loop filtering is to be performed across one or more boundaries between the portion of the frame and one or more other portions of the frame.

The apparatus of claim 25, wherein to process the payload, the processing system is configured to: determine that the frame is a progressive decoder refresh (GDR) frame and determine that the at least a portion of the frame is independently encoded; and in response to determining that the at least a portion of the frame is independently encoded, provide the at least a portion of the frame to a video decoder.

The apparatus of claim 25, wherein to process the payload, the processing system is configured to: determine that the frame is a progressive decoder refresh (GDR) frame and determine that the at least a portion of the frame is encoded relative to a reference frame; determine that the GDR frame is a sequential first frame retrieved for a bit stream comprising the video data such that the reference frame has not been retrieved; and in response to the reference frame not being retrieved, discard the at least a portion of the frame of video data.

A device for receiving video data, the device comprising: means for receiving a packet including a packet header and a payload, the payload including at least a portion of a frame of video data, the packet header being separate from the payload; means for extracting a video frame identifier of the frame of video data from the packet header; and means for processing the payload based on the video frame identifier.

A computer-readable storage medium having stored thereon instructions that, when executed, cause a processor to: receive a packet including a packet header and a payload, the payload including at least a portion of a frame of video data, the packet header being separate from the payload; extracting a video frame identifier for the frame of video data from the packet header; and processing the payload based on the video frame identifier.