MXPA06009109A

MXPA06009109A - Resizing of buffer in encoder and decoder

Info

Publication number: MXPA06009109A
Application number: MXPA/A/2006/009109A
Authority: MX
Inventors: Hannuksela Miska; Aksu Emre
Original assignee: Aksu Emre; Hannuksela Miska; Nokia Corporation
Priority date: 2004-02-13
Filing date: 2006-08-10
Publication date: 2007-04-10

Abstract

The invention relates to method for buffering encoded pictures. The method includes an encoding step for forming encoded pictures in an encoder. The method also includes a transmission step for transmitting said encoded pictures to a decoder as transmission units, a buffering step for buffering r transmission units transmitted to the decoder in a buffer, and a decoding step for decoding the encoded pictures for forming decoded pictures. The buffer size is defined so that the total size of at least two transmission units is defined and the maximum buffer size is defined on the basis of the total size.

Description

REDIMENSIONING OF INTERMEDIATE MEMORY IN ENCODER AND DECODER Field of the Invention The present invention relates to a method for the temporary storage of encoded images, the method includes a coding step for the formation of encoded images in an encoder, a transmission step for sending the encoded images to a decoder, a decoding step for deciphering the encoded images for the formation of coded images, and a rearrangement step for the placement of the encoded images in order of decoding. The invention also relates to a system, transmission device, receiving device, an encoder, a decoder, an electronic device, a software program, and a storage medium.

BACKGROUND OF THE INVENTION Published video coding standards include the standards I Ü-T H.261, ITU-T H.263, ISO / IEC MPEG-1, ISO / IEC MPEG-2 and Part 2 of the ISO / ISO standard. IEC MPEG-4. These standards are referred to herein as conventional video coding standards REF. 174878 Video Communication Systems Video communication systems can be divided into conversational and non-conversational systems. Conversational systems include video conferencing and video telephony. Examples of these systems include Recommendations ITU-T H.320, H.323 and H.324 that specify a video / telephony conference system that operates on the ISDN, IP and PSTN networks, respectively. Conversational systems are characterized by the attempt to minimize end-to-end delay (from audio-video capture to far-end audio-video presentation) in order to improve the user experience. Non-conversational systems include the reproduction of stored content, such as Digital Versatile Discs (DVD) or video files stored in a large capacity memory of a playback device, digital TV, and data transfer. A short review of the most important standards in these areas of technology is given below. A dominant standard in consumer electronic digital video devices today is the MPEG-2 standard, which includes specifications for video compression, audio compression, storage and transport. The storage and transport of encoded video is based on the concept of an elementary stream. An elementary stream consists of coded data from a single source (for example, video) plus the auxiliary data necessary for the synchronization, identification and characterization of the source information. An elementary stream is 'packetized' (that is, it is formed into packets), either in packets of constant length or of variable length to configure a Packetized Elemental Flow (PES). Each PES packet consists of a header followed by flow data called the payload. PES packets of several elementary streams are combined to form either a Program Flow (PS) or a Transport Flow (TS). The PS is targeted at applications that have negligible transmission errors, such as the type of store-and-play applications. The TS is addressed in applications that are susceptible to transmission errors. However, TS assumes that the performance of the network is guaranteed, so that it is constant. There is a standardization effort in progress in the ITU-T and ISO / IEC Video Union Team (JVT). The work of JVT is based on the previous standardization project in ITU-T called H.26L. The goal of JVT standardization is to release the same standard text as Recommendation ITU-T H.264 and International Standard ISO / IEC 14496-10 (Part 10 of MPEG-4). The standard is referred to as the JVT coding standard in this letter, and the code (encoder-decoder) according to the standard is referred to as the JVT code. The codec specification by itself is dispensed in a conceptual way between a video encoding layer (VCL), and a network abstraction layer (NAL). The VCL contains the signal processing functionality of the codec, things such as transform, quantization, search / motion compensation and the circuit filter. It also follows the general concept of most current video codecs, a macroblock-based encoder that uses inter-image prediction with motion compensation, and transforms the encoding of the residual signal. The VCL outputs are slices: a sequence of bits containing the macroblock data of a whole number of macroblocks, and the slice header information (which contains the spatial address of the first macroblock in the slice, the initial quantization parameter and similar). Sliced macroblocks are sorted in a scanning layout unless a different macroblock distribution is specified, using the so-called Macroblock Flexible Sort syntax. In-picture prediction is only used within a slice. The NAL encapsulates the output of the VCL slice in Network Abstraction Layer Units (NALU), which are suitable for transmission over packet networks or use in packet-oriented multiplex environments. Annex B of JVT defines an encapsulation process that transmits these NALUs through networks oriented by the byte flow. The optional H.263 reference image selection mode and the NEWPRED encoding tool of Part 2 MPEG-4 allow the selection of the reference frame for motion compensation for each image segment, for example, for each " In H.263, the optional H.263 Enhanced Reference Image Selection mode and the JVT coding standard allow selection of the reference frame for each macroblock separately.The selection of the reference image allows many Types of temporal scalability schemes Figure 1 shows an example of a temporal scalability scheme, which in this document is referred to as recursive temporal scalability.The example scheme can be decoded with three constant frame rates. a scheme referred to as the Video Redundancy Coding, where a sequence of images is divided into two or more coded threads in f Orma independent in an interleaved mode. The arrows in these and all subsequent figures indicate the direction of the motion compensation and the values according to the tables correspond to the capture and the relative display times of the frames. Parameter Set Concept A very fundamental design concept of the JVT codee is the generation of self-contained packages, making mechanisms such as the duplication of the header unnecessary. The way this was achieved is by separating the information that is relevant to more than one slice that comes from the media stream. This meta-higher layer information must be sent reliably, asynchronously and in advance from the RTP packet flow containing the slice packets. This information can also be sent in-band in these applications that do not have an adequate out-of-band transport channel for this purpose. The combination of the highest level parameters is called a Parameter Set. The Parameter Set contains information such as the size of the image, the presentation window, the optional coding modes that are used, the macroblock distribution map and others. In order to have the ability to change the image parameters (such as the size of the image), without having to transmit the updates of the Parameter Set in synchronized manner with the slice packet flow, the encoder and decoder they can keep a list of more than one Parameter Set. Each slice header contains a code word that indicates the Parameter Set that will be used. This mechanism allows decoupling the transmission of Parameter Sets from the packet flow, and transmit them through external means, for example, as a side effect of the capacity exchange, or through the control protocol (reliable or not reliable). It could even be possible that these are never transmitted even if they are fixed by an application design specification. Transmission Order In conventional video coding standards, the order of the decoding of the images is the same as the display order, except for the B images. A block in a conventional image B may be temporally provided in a way bi-directional from two reference images, where one reference image is precedent in a temporal form and the other reference image is successive in a temporal manner in the order of presentation or visualization. Only the last reference image in the decoding order can happen the image B in the display order (exception: the interlaced coding in H.263 where both of the field images of a subsequent reference frame in a temporary way may precede an image B in order of decoding). A conventional image B can not be used as a reference image for the temporal prediction, and, therefore, a conventional image B can be discarded without affecting the decoding of any of the other images. The JVT coding standard includes the following new technical features compared to the previous standards: - The decoding order of the images is decoupled from the display order. The image number indicates the decoding order and the image order count indicates the display order. The reference images for a blogue in an image B can be either before or after the image B in the display order. Consequently, an image B is maintained for a bi-predictive image instead of a bi-directional image. Images that are not used as reference images are explicitly marked. An image of any type (intra, inter, B, etc.) can be either a reference image or an image without reference. (Therefore, an image B can be used as a reference image for the temporal prediction of other images.) An image can contain slices that are encoded with a different type of coding. In other words, an encoded image could consist, for example, of an intra-encoded slice and a coded slice B. The decoupling of the display order from the decoding order can be beneficial from the point of view of the efficiency of the decoding. compression and error flexibility. An example of a prediction structure that potentially improves the efficiency of compression is presented in Figure 3. The boxes indicate the images, the uppercase letters inside the boxes indicate the types of coding, the numbers inside the boxes are the numbers of images according to the JVT coding standard, and the arrows indicate the dependencies of the prediction. It is noted that the image B17 is a reference image for the B18 images. The efficiency of the compression is possibly improved compared to conventional coding, because the reference images for the B18 images are temporarily closer if compared in conventional coding with the PBBP or PBBBP coded image patterns. The compression efficiency is potentially improved compared to the conventional PBP-coded image pattern, because part of the reference images are bi-directionally predicted.

Figure 4 presents an example of the intra-image postponement method that can be used to improve the error flexibility. In conventional manner, an intra-image is encoded immediately after a scene cut or as a response for example, to a renewal period of an expired intra-image. In the intra-image postponement method, an intra-image is not encoded immediately after the need to encode an intra-image is generated, but rather a temporally subsequent image is selected as an intra-image. Each image between the coded intra-image and the conventional location of an intra-image is foreseen from the next temporally subsequent image. As shown in Figure 4, the intra-image postponement method generates two independent chains of inter-image prediction, while conventional coding algorithms produce a single inter-image chain. It is clear intuitively that the two-chain procedure is more robust against deletion or deletion errors than the conventional one-chain procedure. If a chain will experience the loss of a packet, the other chain could still be received correctly. During conventional coding, the loss of a packet always causes the propagation of error towards the rest of the inter-image prediction chain.

Two types of order and synchronization information have been conventionally associated with digital video: the order of decoding and presentation. A closer look at the related technology is taken later. A decoding time stamp (DTS) indicates the time relative to a reference clock in which the coded data unit is supposed to be decoded. If DTS were encoded and transmitted, this would serve two purposes: first, if -the decoding order of the images were different from their output order, the DTS would indicate the decoding order explicitly. Secondly, DTS guarantees a certain temporary storage behavior of the pre-decoder with the condition that the reception speed is close to the transmission speed at any time. In networks where the end-to-end period or waiting time varies, the second use of DTS does not play or only plays a small role. Instead, the received data is decoded as quickly as possible on the condition that there is space in the post-decoder buffer for uncompressed images. DTS transport is a function of the communication system and the video coding standard in use. In MPEG-2 Systems, DTS can be transmitted optionally as an item in the header of a PES packet. In the JVT coding standard, DTS can optionally be taken as a part of the Supplementary Enhancement Information (SEI), and is used in the operation of the optional Reference Hypothetical Decoder. In the ISO Base Media File Format, DTS is dedicated to its own type of box, the Decoding Time for Sample Box. In "many systems, such as RTP-based data transfer systems, DTS is not carried at all, because the decoding order is assumed to be the same as the transmission order and the exact decoding time is not plays an important role, optional Annex H.263 U and Annex W.6.12 specify an image number that is incremented by 1 relative to the previous reference image in the decoding order.In the JVT coding standard, the The frame number encoding element is specified similarly to the image number of H.263 The JVT encoding standard specifies a particular type of an intra-image, called a decoder rollover snapshot (IDR) image. Subsequent may refer to the images that are before the IDR image in the decoding order.An IDR image is often coded as a response to a scene change.

In the JVT coding standard, the frame number is reset to 0 in an IDR image in order to improve the error flexibility in the case of a loss of the IDR image as presented in Figures 5a and 5b. However, it should be noted that the SEI scene information message of the JVT coding standard can also be used for the detection of scene changes. The H.263 image number can be used to recover the decoding order of the reference images. Similarly, the JVT frame number can be used to retrieve the order of decoding the frames between an IDR (inclusive) image and the next (exclusive) IDR image in the decoding order. However, because the complementary pairs of the reference field (consecutive images coded as fields that are of different parity) share the same frame number, their decoding order can not be reconstructed from the number of frames. The H.263 image number or JVT frame number of an image without reference is specified to be equal to the image or frame number of the previous reference image in the decoder order plus 1. If different images without reference were consecutive in the order of decoding, they would share the same image or frame number. The image or frame number of an image without reference would also be the same as the image or frame number of the next reference image in the decoding order. The decoding order of consecutive images without reference can be retrieved using the Temporal Reference (TR) coding element in H.263 or the Image Order Count (POC) concept of the JVT coding standard. A presentation time stamp (PTS) indicates the time relative to a reference clock when an image is supposed to be displayed. A presentation time stamp is also called a display time stamp, an output time stamp, and a composition time stamp. PTS transport is a function of the communication system and the video coding standard in use.

In MPEG-2 systems, PTS can be transmitted optionally as an item in the header of a PES packet. In the JVT coding standard, PTS can be optionally carried as a part of the Complementary Improvement Information (SEI), and is used in the operation of the Hypothetical Reference Decoder. At ISO Base Media File Format, PTS is dedicated to its own type of box, the Composition Time with the Sample Box where the presentation time stamp is coded in relation to the corresponding decoding time stamp. In RTP, the RTP time stamp in the RTP packet header corresponds to PTS. Conventional video coding standards characterize the coding element of the Temporary Reference (TR) which is similar to PTS in many aspects. In some of the conventional coding standards, such as video MPEG-2", - TR is reset to zero at the beginning of a Group of Images (GOP) .In the JVT coding standard, there is no concept of time in the layer of video encoding.The Order Count of Image (POC) is specified for each frame and field and is used similarly to TR in the temporal prediction - direct, for example, of slices B. POC is reset to 0 a in an IDR image. Transmission of Multimedia Flows A multimedia data transfer system consists of a data transfer server and a number of players, which access the server through a network. The network is normally package-oriented and provides little or no means for the guaranteed quality of the service. The players extract the data either from a pre-stored content or a live multimedia content from the server and reproduce them in real time while the content is being downloaded. The type of communication can be, either point-to-point or multicast. During the point-to-point data transfer, the server provides a separate connection-it will separate each player. In the multicast data transfer, the server transmits a single data stream to a number of players, and network elements would only duplicate the stream if necessary. When a player has established a connection - with a server and required a multimedia stream, the server starts transmitting the desired stream. The player does not start playback of the stream immediately, but rather it temporarily stores the input data for a few seconds. Here, this temporary storage is referred to as an initial temporary storage. The initial temporary storage helps to maintain the reproduction without pause, because, in the case of the increase of the occasional delays of the transmission or falls through the network, the player can decode and reproduce the data. stored temporarily. In order to avoid the unlimited delay of the transmission, it is not common to favor reliable transport protocols in data transfer systems. Instead, the systems prefer non-reliable transport protocols, such as UDP, which on the one hand, inherit a more stable transmission delay, although on the other hand, they also experience corruption or data loss. The RTP and RTCP protocols can be used on top of UDP to control real-time communications. RTP provides the means to detect "transmission packet losses, to reassemble the correct order of packets at the receiving end, and to associate a sample time stamp with each packet." RTCP conveys information about how a large Some of the packets were correctly received, and, therefore, can be used for flow control purposes Transmission Errors There are two main types of transmission errors, namely, bit errors and packet errors. These bit errors are normally associated with a circuit-switched channel, such as a radio access network connection in mobile communications, and are caused by imperfections in physical channels, such as radio interference. bit inversions, bit insertions and bit deletions in the transmitted data.Packet errors are normally provoked two for the elements in packet switched networks. For example, a packet router might get congested; that is, I could get too many packets as input and I could not output them at the same speed. In this situation, their intermediate memories have an excess of flow, and some of the packets are lost. Duplication of packet and packet supply in a different order than those transmitted are also possible although these are normally considered to be less common conditions than packet losses. Package errors could also be caused by the implementation of the transport protocol stack used. For example, some protocols use checksums that are computed in the transmitter and are encapsulated with the source encoded data. If there was a bit inversion error in the data, the receiver could not end up in the same checksum, and would have to discard the received packet. The second (2G) and third generation (3G) mobile networks, which include GPRS, UMTS, and CDMA-2000, provide two basic types of radio link connections, recognized and unrecognized. A recognized connection is so that the integrity of a radio link frame is verified by the recipient (either the Mobile Station, MS, or the Base Station Subsystem, BSS), and, in the transmission, that a request for retransmission be given to the other end of the radio link. Due to the retransmission of the link layer, the originator has to temporarily store a radio link frame until a positive acknowledgment is received for the frame. In difficult radio conditions, this buffer could have excess flow and cause data loss. However, it has been shown that the use of the recognized radio link protocol mode for data transfer services is beneficial. An unrecognized connection is, so that erroneous radio link frames are normally discarded. Package losses can be either corrected or masked. The loss correction refers to the ability to restore the lost data perfectly as if the losses had not been introduced. Loss masking refers to the ability to mask or disguise the effects of transmission losses so that they are not visible in the reconstructed video sequence. When a player detects a packet loss, it may request a packet retransmission. Due to the initial temporary storage, the retransmitted packet could be received before its »programmed playing time. Some commercial data transfer systems on the Internet implement retransmission requests using proprietary protocols. The work is continuous in the IETF to standardize a selective mechanism for retransmission request as part of RTCP.

A common characteristic for all these retransmission request protocols is that they are not suitable for multicasting for a large number of players, since network traffic could increase drastically. Consequently, the applications of the multicast data transfer have to depend on the non-interactive control of the packet loss. Point-to-point data transfer systems could also benefit from non-interactive error control techniques. First of all, some systems could not contain any interactive error control mechanism or they would prefer not to have any feedback from Ios-players in order to simplify the system. Second, retransmission of lost packets and other forms of interactive error control usually take a larger portion of the transmitted data rate than non-interactive error control methods. Data transfer servers have to ensure that interactive error control methods do not reserve a greater portion of the network's available performance. In practice, servers may have to limit the number of interactive error control operations. Third, the transmission delay could limit the number of interactions between the server and the player, since all interactive error control operations for a specific sample of data are preferred to have to be performed before the data sample be reproduced. Non-interactive control mechanisms for packet losses can be classified to send error control and loss masking through further processing. The error sending control refers to the techniques in which the transmitter adds this redundancy to the transmitted data that the receivers can recover at least in part from the transmitted data even if there are transmission losses. The masking of error through the subsequent processing is oriented by receiver in full. These methods try to estimate the correct representation of the erroneously received data. Most video compression algorithms generate the temporally predicted images ÍNTER or P. As a result, the loss of data in an image causes visible degradation in the consequent images that are temporarily predicted from the corrupted image. Video communication systems, either, can hide or mask the loss in the displayed images or freeze the last correct image on the screen until a frame is received that is independent of the corrupt frame. In conventional video coding standards, the order of decoding is coupled with the output order. In other words, the decoding order of the I and P images is the same as their output order, and the decoding order of an image B immediately follows the decoding order of the last reference image of the B image. in order of departure. As a result, it is possible to recover the decoding order based on the knowledge of the output order. The output order is normally transported in the elementary video bit stream in the Temporal Reference (TR) field and also in the multiplexing layer of the system, such as in the RTP header. Therefore, in the conventional video coding standards there is no problem presented. One solution that is evident to an expert in the field is the use of a frame counter similar to the H.263 image number without resetting to 0 in an IDR image (as is done in the JVT coding standard). However, some problems may arise when these types of solutions are used. Figure 5a presented a situation in which the continuous numbering scheme is used. If it were lost, for example, the IDR 137 image (can not be received / decoded), the decoder would continue deciphering the successive images, even if it uses a wrong reference image. This causes the propagation of error to the successive frames until the next frame, which is independent of the corrupt frame, is received and decoded correctly. In the example of Figure 5b the frame number is reset to 0 in an IDR image. Now, in a situation in which the IDR image 10 is lost, the decoder notifies that there is a large separation or space in the image numbering after the last correctly decoded image P36. The decoder can then assume that an error has happened and can freeze the screen in image P36 until the next frame, which is independent of! corrupt picture, be received and decoded. Sub-sequences The JVT coding standard also includes the concept of sub-sequence, which can improve temporal scalability when compared to the use of the image without reference so that the inter-planned chains of images can be placed as a whole without affecting the degree of decoding of the rest of the encoded stream. A sub-sequence is a set of encoded images within a sub-sequence layer. An image must reside in a sub-sequence layer and only in a sub-sequence. A sub-sequence should not depend on any other sub-sequence on the same layer or on a higher layer of sub-sequence. A sub-sequence in layer 0 can be decoded independently of any other of the sub-sequences and the previous reference images in the long term. Figure 6a describes an example of an image stream containing sub-sequences in layer 1. A sub-sequence layer contains a subset of the images encoded in a sequence. Sub-sequence layers are numbered with non-negative integers. A layer that has a larger number of layer is a layer higher than a layer that has a smaller number., Layer. The layers are "ordered in a hierarchical manner based on their dependence" on each other, so that a layer does not depend on any higher layer and may depend on the lower layers. In other words, layer 0 can be decoded independently, images in layer 1 can be provided from layer 0, images in layer 2 can be provided from layers 0 and 1, etc. The subjective quality is expected to increase along with the number of decoded layers. The concept of sub-sequence is included in the JVT coding standard as follows: The parameter required_frame_num_update_behaviour_flag equal to 1 in the sequence parameter set indicates that the encoded sequence could not contain all sub-sequences. The use of the required_frame_num_update_behaviour_flag parameter frees the requirement for the table number increment of 1 for each reference frame. Instead, the separations in the frame numbers are specifically marked in the decoded image buffer. If a "missing or missing" frame number was referenced in an inter-prediction, the loss of an image would be inferred. Otherwise, the frames corresponding to the "lost" frame numbers would be handled as if they were normal frames inserted in the decoded image buffer with the sliding window temporary storage mode. All the images in a rejected sub-sequence are assigned consistently to a "lost" frame number in the decoded image buffer, although these are never used in inter-prediction for other sub-sequences. The JVT coding standard also includes optional sub-sequencing related to SEI messages. The SEI message of sub-sequence information is associated with the next slice in the decoding order. This signals a sub-sequence layer and a sub-sequence identifier (sub_seg_id) of the sub-sequence to which the slice belongs. Each IDR image contains an identifier (dr_pic_id). If two IDR images were consecutive in the order of decoding, without any interleaved image, the value of idr_pic_id should change from the first IDR image to the other image. If the current image resided in a sub-sequence whose first image in decoding order was an IDR image, the value of sub_seq_id should be the same as the value of idr_pic_id of the IDR image. The solution in JVT-D093 works correctly only if the data did not reside in sub-sequence 1 layers or above. If the order of transmission was different from the decoding order and the encoded images were resided in the sub-sequence layer 1, their decoding order in relation to the images in the sub-sequence layer 0 could not be concluded based on the sub-sequence identifiers and the frame numbers. For example, considering the following coding scheme presented in Figure 6b where the output order runs from the left to the right, the boxes indicate the images, the uppercase letters within the boxes indicate the types of coding, the numbers within of the boxes are the frame numbers according to the JVT coding standard, the underlined characters indicate the images without reference, and the arrows indicate the prediction dependencies. If the images were transmitted in the order of 10, Pl, P3, 10, Pl, B2, B4, P5, it could not be concluded which independent GOP image belongs to B2. It could be argued that in the previous example the correct independent GOP image for the B2 image could not be concluded based on its exit time stamp. However, the order of decoding of the images can not be recovered based on the mark of times of exit and the numbers "of image, because the order of decoding and the order of exit are decoupled. next example (Figure 6c) where the output order runs from the left to the right, the boxes indicate the images, the uppercase letters inside the boxes indicate the types of coding, the numbers inside the boxes are the box numbers according to the JVT coding standard, and the arrows indicate the prediction dependencies.If the images were transmitted outside the decoding order, it could not be detected reliably if the P4 image has to be decoded after P3 of the first or second independent GOP in the order of departure Temporary Storage Data transfer clients usually have a receiver buffer that is capable of store a relatively large amount of data. Initially, when a data transfer session is established, the client does not begin to reproduce the flow immediately, but rather normally temporarily stores the input data for a few seconds. This temporary storage helps to maintain the continuous reproduction, because, in the case of the increase of the occasional transmission delays or the falls through the whole network, the client can decode and reproduce the stored data temporarily. Otherwise, without an initial temporary storage, the client has to freeze the screen, stop the decoding and wait for the input data. Temporary storage is also necessary, whether for automatic or selective retransmission at any protocol level. If any part of the image was lost, a retransmission mechanism could be used to resend the lost data. If the retransmitted data were received before its programmed decoding or playing time, the loss would be perfectly recovered. The encoded images can be classified according to their importance in the subjective quality of the decoded sequence. For example, images without reference, such as conventional B images, are less important subjectively, because its absence does not affect the decoding of any other of the images. The subjective classification can also be carried out based on the partition of data or based on the group of slices. The coded slices and data partitions that are the most important, subjectively, can be sent ahead of what their decoding order indicates, while coded slices and data partitions that are less important, subjectively , they can be sent later than their natural coding order indicates. As a result, any of the retransmitted portions of the most important slice and data partitions are more likely to be received before their scheduled decoding or playback time compared to the less important slices and data partitions. Temporary Storage of the Pre-decoder The temporary storage of the Pre-decoder refers to the temporary storage of the encoded data before they are decoded. The initial temporary storage refers to the temporary storage of the Pre-decoder at the beginning of a data transfer session. The initial temporary storage is performed in a conventional manner due to two reasons that are explained later. In packet-switched multimedia conversational systems, for example, in IP-based video conferencing systems, different types of media are commonly carried in separate packets. In addition, packets are usually carried on top of a better effort network that can not guarantee a constant transmission delay, but rather, the delay could vary from packet to packet. Consequently, packets that have the same display (playback) time stamp could not be received at the same time, and the reception interval of two packets could not be the same as their display interval (in terms of time). Therefore, in order to maintain the synchronization of reproduction between different types of media and to maintain the correct playback speed,. a multimedia terminal usually temporarily stores the data received for a short period (for example, less than half a second) in order to smooth the variation of the delay. Here, this type of buffer component is referred to as a delay jitter buffer. The temporary storage can take place before and / or after the decoding of the media data. The temporary storage of delay fluctuation is also applied in data transfer systems. Due to the fact that the data transfer is a non-conversational application, the required delay jitter buffer could be considerably larger than in conversational applications. When a data transfer player has established a connection to a server and has required a multimedia stream to be downloaded, the server starts transmitting the desired stream. The player can not start the reproduction of the flow immediately, but rather normally temporarily stores the input data for a certain period, commonly, a few seconds, here, this temporary storage is referred to as the initial temporary storage. The initial temporary storage provides the ability to smooth the variations of transmission delay in a manner similar to that provided by the temporary storage of delay jitter in conversational applications. In addition, this could allow the use of link layer retransmissions, transport and / or application of lost protocol data units (PDUs). The player can decode and reproduce the stored data temporarily while the retransmitted PDUs could be received in time to be decoded and played back at the scheduled time. The initial temporary storage in the data transfer clients still provides another advantage that can not be achieved in conversational systems: it allows the data rate of the media transmitted from the server to vary. In other words, media packets can be transmitted temporarily faster or slower than their playback speed provided that the receiver's buffer is not exceeded or has a lack of flow. The fluctuation in the data rate could originate from two sources. First, the compression efficiency that can be achieved in some types of media, such as video, is a function of the contents of the source data. Therefore, if a stable quality is desired, the bit rate of the compressed bitstream that originates would vary.An audiovisual stable quality is usually more pleasing, subjective, than a variable quality. The initial temporality allows a more pleasing audiovisual quality to be achieved when compared to a system without initial temporary storage, such as a video conferencing system.Secondly, it is commonly known that packet losses in fixed IP networks are presented In order to avoid errors of bursts and high bit-rate and packet speeds, the well-designed data transfer servers program the transmission of the packets carefully.The packets can not be sent accurately at the speed where they are reproduced at the receiving end, but rather, the servers could try to get a continuous interval between the transmitted packets. The server could also adjust the speed of the packet transmission according to the prevailing network conditions, reducing the packet transmission rate when the network is congested and increasing the speed, for example, if the conditions of the network allowed it. . Hypothetical Reference Decoder (HRD) / Temporary Video Storage Verifier (VBV) Many video coding standards include an HRD / VBV specification as an integral part of the standard. The HRD / VBV specification is a hypothetical decoder model that contains an input buffer (pre-decoder). The encoded data flows into the input buffer normally at a constant bit rate. The encoded images are removed from the input buffer at their decoding timestamps, which could be the same as their output timestamps. The input buffer is of a certain size depending on the profile and level in use. The HRD / VBV model is used to specify the interoperability points from the point of view of processing and memory requirements. The encoders must ensure that a generated bit stream conforms to the HRD / VBV specification according to the HRD / VBV parameter values of a certain profile and level. The decoders that claim the support for a certain profile and level must be able to decode the bitstream that is conformed in the HRD / VBV model. The HRD comprises a coded image buffer for storing the coded data stream and a decoded image buffer for storing the decoded reference images and for reordering the decoded images in the order of display or display. The HRD moves the data between the buffers in a similar way as the decoder of a decoding device does. However, the HRD does not need to decrypt the encoded images in their entirety or output the decoded images, although the HRD only verifies that the decoding of the image stream can be performed in accordance with the restrictions given in the coding standard. When the HRD is operating, it receives a stream of encoded data and stores it in the encoded image buffer. In addition, the HRD removes encoded images from the encoded image buffer and stores at least some of the corresponding images decoded hypothetically in the decoded image buffer. The HRD realizes the input speed according to which the encoded data flows into the coded image buffer, of the speed of removal of the images of the encoded image buffer, and of the output speed of the images of the decoded image buffer. The HRD verifies that the encoded or decoded image buffer flows in excess, and indicates whether decoding is not possible with the current settings. Then, the HRD informs the encoder about the violation of temporary storage where the encoder can change the coding parameters, for example, by reducing the number of reference frames to avoid the violation of temporary storage / Alternatively or additionally, the The encoder starts coding the images with the new parameters and sends the encoded images to the HRD which once again performs the decoding of the images and the necessary verifications. Still as another alternative, the encoder could discard the last encoded frame and could encode subsequent frames, so that no temporary storage violation would occur. Two types of decoder conformance have been specified in the iTVT coding standard: conformance of the output order (VCL conformance) and conformance of the output time (VCL-NAL conformance). These types of conformance have been specified using the HRD specification. The conformity of the output order refers to the capacity of the decoder to recover the output order of the images correctly. The HRD specification includes a "shock decoder" model that outputs the uncompressed image faster in the output order when a new storage space is needed for an image. The conformance of the output time refers to the decoder's ability to output the images in the same space as the HRD model does. The timestamp of an image output must always be equal to or smaller than the time when it could be removed from the "shock decoder". Interleaving Interleaving is a technique commonly used in the transfer of audio data. In the frame interleaving technique, an RTP packet contains audio frames that are not consecutive in the order of decoding or output. If a packet in the audio packet stream is lost, the correctly received packets contain surrounding audio frames that can be used to hide or mask the lost audio packet (by some type of interpolation). Many RTP payload audio coding and type specifications MIME contains the possibility of signaling the maximum amount of interleaving in a packet in terms of audio frames. In some prior art encoding / decoding methods the size of the necessary buffer is reported as an account of the transmission units.

SUMMARY OF THE INVENTION The maximum size of the pre-decoding buffer of a decoder can be reported as the bytes b to the decoder. If the byte-based scheme were used and the reordering process were not defined by the decoder, the temporary storage model would have to be defined explicitly, because the encoder and the decoder could use different temporary storage schemes. If a certain byte size was defined for the buffer and the decoder will use a temporary storage scheme in which the transmission units are stored in the buffer until it is complete and only after the previous data is removed from the buffer. the buffer and be decoded This type of temporary storage could last longer than necessary before the decoding is started Another possibility to inform the maximum size of the pre-decoding buffer is the use of transmission units, here , the size of the buffer is infor as the maximum number of transmission units that will be stored temporarily. However, the maximum size of the transmission unit is not defined and the size of the transmission unit may vary. If the maximum size were defined and if the size were too small for a certain data unit, the data unit would have to be divided into - more than one transmission unit, which increases the coding and transmission overhead, ie decreases the compression efficiency and / or increases the complexity of the system. The maximum size of the transmission unit has to be large enough, where the total size of the buffer can be unnecessarily large. In the present invention the size of the buffer size is defined such that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size. In addition to the total size it might be necessary to take into account the variation or fluctuation of the transmission of the network. According to another aspect of the present invention, the number of transmission units used in the calculation of the total size is a fractional number of the required size of the buffer in terms of the number of transmission units. According to yet another aspect of the present invention the number of transmission units used in the calculation of the total size is a fractional number of the required size of the buffer in terms of the number of transmission units, where the fractional number is the form 1 / N, where N is an integer. According to still another aspect of the present invention, the number of transmission units used in the calculation of the total size is the same as the necessary size of the buffer in terms of the number of transmission units. In one embodiment of the present invention the number of transmission units used in the calculation of the total size is expressed as in the order of temporary storage of the transmission units. The order of the temporary storage refers to the order of the transmission units that are temporarily stored in the decoder for deciphering, that is, the order of storage temporarily in the buffer of the pre-decoder. The invention allows defining the size of the reception buffer to the decoder. Next, a separate GOP consists of images from an IDR image (inclusive) to the next IDR image (exclusive) in the order of decoding. In the present invention, a parameter is proposed that signals the maximum amount of temporary storage required. Several units for this parameter were considered: duration, bytes, encoded images, frames, VCL NAL units, all types of NAL units, and RTP packets or payloads. The specification of the amount of the disorder in the duration would cause a dependency between the bit rate and the specified duration to conclude the required amount of temporary storage in bytes. Since the bit rate is not generally known, the duration-based procedure is not used. Specifying the number of clutter by number of bytes would require the transmitter to verify the flow transmitted with care, so that the bounded limit would not have exceeded. This procedure requires a large amount of processing power for all servers. It would also require the specification of a temporary storage verifier for the servers. The specification of the amount of clutter in the coded images or frames would be too ordinary for a unit, because the simple slice interleaving method for decoders that do not support the arbitrary slice order would require a sub-image resolution to achieve the minimum wait time of the temporary storage for the recovery of the decoding order. The specification of the amount of disorder in the number of RTP packets was not considered adequate, because different types of aggregated packets could exist depending on the prevailing network conditions. In this way, an RTP packet could contain a variable amount of data. Different SEI messages could be transmitted depending on the prevailing network conditions. For example, under relatively poor conditions, it would be beneficial to transmit SEI messages that are objective due to error flexibility, such as the SEI message of scene information. Therefore, the amount of disorder in the number of all types of NAL units is a function of the prevailing network conditions, that is, the amount of SEI and the NAL units of the set of parameters that are being transmitted outside. of order. Therefore, "all types of NAL units" were not observed as a good unit. Consequently, the specification of the amount of clutter in the number of VCL NAL units was considered as the best alternative. The VCL NAL units are defined in the JVT coding standard which will be encoded slices, coded data partitions or sequence end markers. The proposed parameter is as follows: num-reorder-VCL-NAL-units. This specifies the maximum number of VCL units NAL that precedes any VCL unit NAL in the packet flow in the order of supply of the NAL unit and follows the VCL unit NAL in the order of sequence number RTP or in the order of composition of the package added containing the VCL NAL unit. The proposed parameter can be transported as an optional parameter in the MIME type announcement or in the optional SDP fields. The proposed parameter may indicate the capacity of the decoder or the characteristics of the flow, or both, depending on the protocol and the phase of the procedure for establishing the session. The buffer size of a buffer constructed in accordance with the parameter num-reorder-VCL-NAL-units can not be specified with precision in bytes. In order to allow the designation of receivers where the memory storage requirements are exactly known, the specification of the conformance of the decoding time is proposed. The conformance of the decoding time is specified using a hypothetical temporary storage model that does not assume a constant input bit rate, but rather requires that the data transfer servers include the model to ensure that the transmitted data stream is compliant to the model. The hypothetical specific model of the buffer softens the possibly fast packet rate and rearranges the NAL units of the transmission order to the decoding order so that the resulting bitstream can be entered into the hypothetical decoder at a constant bit rate. Next, the description of the invention is explained using the encoder-decoder based system, although it is obvious that the invention can also be implemented in systems in which video signals are stored. The stored video signals may be uncoded signals that are stored before encoding, such as encoded signals that are stored after encoding or as decoded signals that are stored after the encoding and decoding process. For example, an encoder produces bit streams in the decoding order. A file system receives the audio and / or video bit streams that are encapsulated, for example, in the decoding order and stored as a file. In addition, the encoder and file system can produce metadata that informs the subjective importance of the images and the NAL units, in addition, they contain the information about the sub-sequences, inter alia. The file can be stored in a database from which the data transfer server can read the NAL units and encapsulate them in RTP packets. According to the optional connection of metadata and data in use, the data transfer server can modify the transmission order of the packets differently from the decoding order, can remove the sub-sequences, decide which SE messages will be transmitted , if they existed, etc. At the receiving end, the RTP packets are admitted and stored temporarily. Normally, the NAL units are first reordered in the correct order and after that - the NAL units are supplied to the decoder. Furthermore, in the following description, the invention is explained by the use of the encoder-decoder based system, although it is obvious that the invention can also be implemented in systems where the encoder outputs and transmits the encoded data to another component such as a data transfer server, in order of decoding, wherein the other component reorders the encoded data of the decoding order in another order and sends the encoded data in its reordered form to the decoder. The method according to the present invention is mainly characterized in that the size of the buffer is defined such that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size . The system according to the present invention is mainly characterized in that the system further comprises a definer which specifies the size of the buffer so that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size. The encoder according to the present invention is mainly characterized in that the encoder further comprises a definer which specifies the size of the buffer so that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size. The decoder according to the present invention is mainly characterized in that the decoder further comprises a processor that distributes the memory for the pre-decoding buffer according to a received parameter indicative of the size of the buffer, and the size of the memory intermediate is defined, so that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size. The transmission device according to the present invention is mainly characterized in that the transmission device further comprises a defining which specifies the size of the buffer, so that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size. The receiving device according to the present invention is mainly characterized in that the decoder further comprises a processor that distributes the memory for the pre-decoding buffer according to a received parameter indicative of the size of the buffer, and the size of the buffer is defined, so that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size. The software program according to the present invention is mainly characterized in that the size of the buffer is defined such that the total size of at least two transmission units is defined and the maximum size of the buffer is defined on the basis of total size The storage medium according to the present invention is mainly characterized in that the size of the buffer is defined such that the total size of at least two transmission units is defined and the maximum size of the buffer is defined on the basis of the total size The electronic device according to the present invention is mainly characterized in that the electronic device further comprises a definer which specifies the size of the buffer so that the total size of at least two transmission units is defined and the maximum size of the memory is defined. intermediate is defined based on the total size. "Substitute signaling for any information of the order of decoding in the video bitstream is presented below in accordance with an advantageous embodiment of the present invention.

Decoding (DON) indicates the decoding order of the NAL units, in another, the order of supply of the units NAL to the decoder. From now on, DON is supposed to be an unsigned 16-bit integer without the loss of generality. Let DON of a NAL unit be Di and DON of another NAL unit is D2. If Di < D2 and D2 -Di < 32768, or if DI > D2 and DI -D2 > = 32768, then, the NAL unit that has the DON equal to Di precedes the NAL unit that has the DON equal to D2 in the order of supply of the NAL unit. If Di < D2 and D2 -Di > = 32768, or if Di > D2 and Di -D2 < 32768, then, the NAL unit that has the DON equal to D2 precedes the NAL unit that has the DON equal to DI in the order of supply of the NAL unit. The NAL units associated with different primary coded images do not have the same DON value. The NAL units associated with the same primary coded image could have the same DON value. If all the NAL units of a primary coded image had the same DON value, the NAL units of a redundant encoded image associated with the primary coded image must have the same DON value as the NAL units of the primary coded image. The order of supply of the NAL unit of the NAL units having the same value of DON is preferably as follows: 1. NAL unit of image delimitation, if there were 2. NAL units of set of sequence parameters, if there were 3. NAL units of image parameter set, if any 4. SEI NAL units, if there were 5. NAL units of sliced and sliced slice data partition of the primary encoded image, if there were 6. NAL slicing data partition units and encoded slice of redundant encoded images, if any 7. NAL filler data units, if any 8. End of NAL stream unit, if any 9. End to the NAL flow unit, if any. The present invention improves the efficiency of the temporary storage of coding systems. By using the present invention it is possible to inform the decoding device how much pre-decoding time storage is required. Therefore, there is no need to distribute more memory for the pre-decoding buffer than is necessary in the decoding device likewise, the overflow of the pre-decoding buffer can be avoided.

Brief Description of the Figures Figure 1 shows an example of a recursive scheme of temporal scalability, Figure 2 represents a scheme referred to as Video Redundancy Coding, where a sequence of images is divided into two or more coded cores independently in an interleaved mode, Figure 3 presents an example of a prediction structure that potentially improves the efficiency of compression, Figure 4 presents an example of the intra-image postponement method that can be used to improve error flexibility, Figures 5a and 5b describe different prior art numbering schemes for video coded flow images, Figure 6a describes an example of an image stream containing sub-sequences in layer 1, Figure 6b describes an example of an image stream containing two groups of images having subsequences in layer 1, Figure 6c describes an example of an image stream of different group of images, Figure 7 describes another example of an image stream containing sub-sequences in layer 1, Figure 8 depicts an advantageous embodiment of the system according to the present invention, Figure 9 Represents an advantageous embodiment of the encoder according to the present invention, Figure 10 represents an advantageous embodiment of the decoder according to the present invention, Figure IA describes an example of the NAL packet format that can be used with the present invention, Figure 11b describes another example of the NAL packet format that can be used with the present invention, and Figure 12 shows an example of the temporary storage of the transmission units in a pre-decoder buffer.

Detailed Description of the Invention The general concept behind the rules of de-ordering is the rearrangement of the transmission units, such as the NAL units of the order of transmission to the decoding order of the NAL unit. The receiver includes a receiver buffer (or a predecoder buffer), which is used to reorder the packets from the order of transmission to the decode order of the NAL unit. In an example embodiment of the present invention, the size of the receiver buffer is set, in terms of the number of bytes, equal to or greater than the value of the parameter, deint-buf-size, for example, in a value 1.2 * the value of the int-buf-size MIME parameter. The receiver could also take into account the temporary storage for the delay fluctuation of the transmission and either reserve a separate buffer for the storage of the fluctuation of the delay of the transmission or to combine the buffer for the fluctuation of the delay of the transmission. transmission with the receiver's buffer (and therefore, some additional space is reserved for the temporary storage of the delay jitter in the receiver's buffer). The receiver stores the NAL input units in the order of reception in the receiver buffer as follows. The NAL units of the aggregated packets are stored in the receiver's buffer, individually. The DON value is calculated and stored for all NAL units. From here on, let N be the value of the optional parameter num-reorder-VCL-NAL-units (the interleaving-depth parameter) that specifies the maximum number of NAL VCL units that precede any VCL unit NAL in the flow of packet in the order of transmission of the NAL unit and follow the VCL unit NAL in the order of decoding. If the parameter was not present, a number of value 0 could be implied. When the video stream transfer session is initialized, the receiver 8 distributes the memory for the reception buffer 9.1 which stores at least N pieces of the VCL NAL units. The receiver then initiates the admission of the video stream and stores the received VCL NAL units in the reception buffer. The initial temporary storage lasts - Until at least N pieces of the VCL NAL units are stored in the reception buffer 9.1, or - If the parameter max-don-diff MIME is present, until the value of a function don_diff ( m, n) is larger than the max-don-diff value, where n corresponds to the NAL unit that has the largest AbsDON value among the received NAL units and m corresponds to the NAL unit that has the most value small AbsDON between the received units NAL, or - Until the initial temporary storage has ended for the duration equal to or greater than the value of the optional init-buf-time MIME parameter. The function don_diff (m, n) is specified as follows: If DON (m) - == DON (n), don_diff (m, n) = 0 If (DON (m) <DON (n) and DON (n) ) -DON (m) <32768), don_diff (m, n) = DON (n) -DON (m) "- If (DON (m)> DON (n) and DON (m) -DON (n) ) = 32768), don_diff (m, n) = 65536 -DON (m) + DON (n) Yes (DON (m) <DON (n) and DON (n) -DON (m) > = 32768), don_diff (m, n) = - (DON (m) + 65536 -DON (n)) Yes (DON (m)> DON (n) and DON (m) -DON (n) <32768) , don_diff (m, n) = - (DON (m) -DON (n)) where DON (i) is the decoding order number of the NAL unit that has the index i in the transmission order. The positive of don_diff (m, n) indicates that the NAL unit that has the index of the order of transmission n, in order of decoding, follows the NAL unit that has the index of the order of transmission m.

AbsDON denotes the order of the decoding number of the NAL unit that is not wrapped around 0 after 65535. In other words, AbsDON is calculated as follows: Let m and n be consecutive units NAL in order of transmission. For the first NAL unit in transmission order (whose index is 0), AbsDON (0) = DON (0). For other NAL units, AbsDON is calculated as follows: If DON (m) == DON (n), AbsDON (n) = AbsOd () Yes (DON (m) <; DON (n) and DON (n) -DON (m) < 32768), AbsDO (n) = AbsDO (m) + .DON (n) -DON (m) Si (DON (m)> DON (n) and DON (m) -DON (n) >= 32768) ", AbsDON (n) = AbsDON (m) + 65536 -DON (m) + DON (n) Yes (DON (m) <DON (n) and DON (n) -DON (m)> = 32768) , AbsDON (n) = AbsDON (m) - (DON (m) + 65536 -DON (n)) Yes (DON (m) &gd; DON (n) &DON (m) -DON (n) < 32768) , AbsDON (n) = AbsDON () - (DON (m) -DON (n)) where DON (i) is the order of the decode number of the NAL unit that has the index i in the order of transmission. Normally two states of temporary storage in the receiver: the initial temporary storage and the temporary storage while it is playing The initial temporary storage is presented when the RTP session is initialized After the initial temporary storage, the decoding and reproduction are initiated and it is used the temporary storage mode while playing back When the receiver buffer 9.1 contains at least N VCL units NAL, l The NAL units are removed from the receiver 9.1 buffer one by one and passed to the decoder 2. The NAL units are not necessarily removed from the receiver 9.1 buffer in the same order in which they were stored, although in accordance with the DON of the NAL units, as described below. The supply of the packets to the decoder 2 is continued until the buffer contains less than N VCL units NAL, that is, N-1 units VCL NAL. The NAL units that will be removed from the receiver buffer are determined as follows: - If the receiver buffer contains at least N NAL VCL units, the NAL units would be removed from the receiver buffer and passed to the decoder at the order specified below until the buffer memory had Nl VCL NAL units. If max-don-diff were present, all NAL units m for which don_diff (m, n) is larger than max-don-diff would be removed from the receiver buffer and passed to the decoder in the specified order. ahead. Here, n corresponds to the NAL unit that has the largest AbsDON value among the received NAL units. - A variable ts is set to the value of a system synchronizer that was initialized to 0 when the first packet of the NAL unit flow was received. If the receiver buffer had a NAL unit whose reception time tr met the condition that ts-tr > init-buf-time., the NAL units would be passed to the decoder (and would be removed from the receiver buffer) in the order specified below until the receiver buffer did not have the NAL unit whose reception time tr complies with the specified condition. The order in which the NAL units are passed to the decoder is specified as follows. Let PDON be a variable that is initialized to 0 at the beginning of an RTP session. For each NAL unit associated with a DON value, the DON distance is calculated as follows. If the DON value of the NAL unit were larger than the PDON value, the DON distance would be equal to DON-PDON. Otherwise, the DON distance would be equal to 65535 -PDON + DON + 1. The NAL units are supplied to the decoder in ascending order of the DON distance. If several units NAL share the same value of the distance DON, these could be passed to the decoder in any order.

When the desired number of NAL units has been passed to the decoder, the PDON value would be set at the DON value for the last NAL unit passed to the decoder. Additional Depacketing Principles The following additional rules of de-packetization (ie, packet separation) could be used to implement an H.264 operational de-dispatcher: Smart RTP receivers (for example, on gateways) could identify lost partitions of data slice A (DPA). If a lost APD were found, a gateway could decide not to send the corresponding encoded slice data partitions B and C, since its information has no meaning for H.264 decoders. In this way, a network element can reduce the network load by discarding useless packets without analyzing a complex bit stream. Intelligent RTP receivers (for example, in gateways) could identify the Fractional Units lost (FU). If a lost FU were found, a gateway might decide not to send the next FUs of the same NAL unit, since its information has no meaning for the H.264 decoders. In this way, a network element can reduce the network load by disposing of useless packets, without analyzing a complex bit stream.

Intelligent receivers that have to discard packets or NALU could first discard all / NALU packets in which the NRI field value of the NAL unit type octet is equal to 0. This could minimize the impact on the user experience . Next, a parameter that will be used to indicate the maximum size of the buffer in the decoder will be described. The deint-buf-size parameter is not normally present when the packetization mode parameter indicative of the packetization mode is not present or the parameter value of the packetization mode. is equal to 0 or 1. This parameter must be present when the value of the packetization mode parameter is equal to 2. The value of the deint-buf-size parameter is specified in association with the following hypothetical non-interleaved buffer model. In the beginning, the hypothetical non-interleaved memory is empty and the maximum occupation of the buffer m is set to 0. The following process is used in the model: i) The next VCL unit NAL in the order of transmission is entered into memory intermediate hypothetical interleaving. ii) Let s be the total size of the VCL NAL units in the buffer in terms of bytes. iii) If the value of s were larger than m, the value of m would be set equal to s. iv) If the number of NCL VCL units in the hypothetical interleaving buffer would be less than or equal to the value of the interleaving depth, the process is continued from step vii. v) The previous unit VCL NAL in the order of decoding between the VCL units NAL in the buffer, hypothetical interleaving is determined from the DON values for the NCL VCL units according to section 5.5 of RFC XXXX. vi) The most previous unit VCL NAL is removed from the hypothetical intermediate memory. vii) If there were no more VCL NAL units in the order of transmission, the process would be terminated. viii) The process is continued from stage i. This parameter indicates the properties of a NAL unit flow or the capabilities of a receiver implementation. When the parameter is used to indicate the properties of a NAL unit flow, the value of the parameter, referred to as v, is such that: a) The value of m resulting when the NAL unit flow is processed in its entirety by the hypothetical non-interleaved buffer model is less than or equal to av, or b) The order of the NCL VCL units determined by removing the most previous VCL NAL unit from an interleaved buffer with the condition that there is excess flow of the buffer that is the same as the order of removal of the VCL NAL units from the hypothetical interleaving buffer. Accordingly, it is ensured that the receivers can reconstruct the decode order of the NCL VCL unit, when the size of the buffer for the recovery of the decode order of the VCL NAL unit is at least the value of deint-buf-size in terms of bytes. When the parameter is used to signal the capabilities of a receiver implementation, the receiver is able to correctly reconstruct the decode order of the NAL VCL unit of any NAL unit flow that is characterized by the same value of deint-buf -size. When the receiver temporarily stores this number of bytes that is equal to or larger than the value of deint-buf-size, it is able to reconstruct the decoding order of the NCL VCL unit from the order of transmission. Units without VCL NAL should also be taken into account when determining the size of the interleaving buffer. When this parameter is present, a sufficient size of the interleaving buffer for all NAL units would be less than or equal to 20% larger than the value of the parameter. If the parameter was not present, then a value of 0 would be used for the deint-buf-size parameter. The value of the deint-buf-size parameter is an integer in the range, for example, from 0 to 4 294967295, inclusive. Next, the invention will be described in greater detail with reference to the system of Figure 8, the encoder 1 and the hypothetical reference decoder (HRD) 5 of Figure 9 and the decoder 2 of Figure 10. The images that will be encoded they can be, for example, images of a video stream that comes from a video source 3, for example, a camera, a video recorder, etc. The images (frames) of the video stream can be divided into smaller portions such as slices. The slices can also be divided into blocks. In the encoder 1 the video stream is encoded to reduce the information that will be sent by means of a transmission channel 4, or storage media (not shown). The images of the video stream are inputs to the encoder 1. The encoder has a coding buffer 1.1 (Figure 9) for the temporary storage of some of the images that will be encoded. The encoder 1 also includes a memory 1.3 and a processor 1.2 in which the coding tasks according to the invention can be applied. The memory 1.3 and the processor 1.2 may be common with the transmission device 6 or the transmission device 6 may have another processor and / or memory (not shown) for other functions of the transmission device 6. The encoder 1 performs the estimation of movement and / or some other tasks to compress the video flow. In the estimation of movement, similarities are sought between the image that will be encoded (the current image) and a subsequent and / or previous image. If similarities were found, the compared image or part of it can be used as a reference image for the image to be encoded. In JVT, the display order and the decoding order of the images are not necessarily the same, wherein the reference image has to be stored in a buffer (for example, in the coding buffer 1.1) with the condition that is used as a reference image. The encoder 1 also enters the information in the order of display of the images in the transmission stream. From the coding process the encoded images are moved to a coded image buffer 5.2, if necessary. The encoded images are sent from the encoder 1 to the decoder 2 via the transmission channel 4. In the decoder 2 the encoded images are decoded to form uncompressed images corresponding as much as possible with the encoded images. Each decoded image is stored temporarily in the DPB 2.1 of the decoder 2 unless it is substantially visualized immediately after decoding and is not used as a reference image. In the system according to the present invention both the temporary storage of the reference image and the temporary storage of the display image are combined and can use the same decoded image buffer 2.1. This eliminates the need to store the same images in two different places, therefore, the memory requirements of the decoder 2 are reduced. The decoder 1 also includes a memory 2.3 and a processor 2.2 in which the decoding tasks of the decoder can be applied. according to the invention. The memory 2.3 and the processor 2.2 may be common with the receiving device 8 or the receiving device 8 may have another processor and / or memory (not shown) for other functions of the receiving device 8.

Coding Now, let's consider the coding-decoding process in greater detail. The images that come from the video source 3 are inputs to the encoder 1 and advantageously are stored in the coding buffer 1.1. The coding process is not necessarily initiated immediately once the first image is input to the encoder, but after a certain number of images are available in the encoding buffer 1.1. Then, the encoder 1 tries to find suitable candidates from the images that will be used as reference frames. The encoder 1 then encodes to form coded images. The encoded images can be, for example, predicted images (P), bi-predictive images (B), and / or intra-coded images (I). Intra-encoded images can be decoded without using any other images, although other types of images need at least one reference image before they can be decoded. The images of any of the mentioned image types can be used as a reference image. The encoder advantageously links two timestamps with the images: the decoding time stamp (DTS) and the output time stamp (OTS). The decoder can use the timestamps to determine the correct decoding time and the time to output the images (display). However, these timestamps are not necessarily transmitted to the decoder or it does not use them. The encoder also forms sub-sequences in one or more layers above the lowest layer 0. The images in layer 0 can be decoded independently, although the images in the higher layers could depend on the images in some or some lower layers. In the example, from Figure 6a there are two layers: layer 0 and layer 1. Images 10, P6 and P12 belong to layer 0 while the other images P1-P5, P7-P11 shown in Figure 6a belong to the layer 1. Advantageously, the encoder forms groups of images (GOP) so that each image of a GOP can be reconstructed only by using the images in the same GOP. In other words, a GOP contains at least one image that can be decoded independently and all the other images for which the image that can be decoded independently is a reference image. In the example of Figure 7, there are two groups of images. The first group of images includes images 10 (0), Pl (0), P3 (0) in layer 0, and images B2 (0), 2xB3 (0), B4 (0), 2xB5 (0), B6 (0), P5 (0), P6 (0) in layer 1. The second group of images includes images 10 (1), and Pl (1) in layer 0, and images 2xB3 (1) and B2 (1) in layer 1. The images in layer 1 of each group of images are additionally located as sub-sequences. The first sub-sequence of the first group of images contains the images B3 (0), B2 (0), B3 (0), the second sub-sequence contains the images B5 (0), B4 (0), B5 (0), and the third sub-sequence contains the images B6 (0), P5 (0), P6 (0). The sub-sequence of the second group of images contains the images B3 (1), B2 _ (1), B3 (1). The numbers in parentheses indicate the video sequence ID defined for the group of images to which the image belongs. The video sequence ID is transferred for each image. This can be transported within the video bit stream, such as in the information data.

Complementary Improvement. The video sequence ID can also be transmitted in the transport protocol header fields, such as within the RTP payload header of the JVT coding standard. The video sequence ID according to the partition presented in the Independent GOPs can be stored in the metadata of the video file format, such as in the MPEG-4 AVC file format. Figures Ia and llb describe examples of the NAL packet formats that can be used with the present invention. The packet contains a header 11 and a payload part 12. The header 11 advantageously contains an error indicating field 11.1 (F, Forbidden), a priority field 11.2, and a field of type 11.3. The error indicator field 11.1 indicates a NAL unit free of bit error. Advantageously, when the error indicator field is set, in decoder it is warned that bit errors could be present in the payload or in the octet of type NALU. Decoders that are unable to handle bit errors can then discard these packets. The priority field 11.2 is used to indicate the importance of the image encapsulated in part 12 of the payload of the packet. In an example implementation, the priority field may have four different values as follows. A value of 00 indicates that the content of the NALU is not used to reconstruct the stored images (which can be used as a future reference). These NALU can be discarded without risk of the integrity of the reference images. Values above 00 indicate that the decoding of the NALU is required to maintain the integrity of the reference images. In addition, values above 00 indicate the relative priority of transport, which is determined by the encoder. Intelligent network elements can use this information to better protect NALUs that are more important than less important NALUs. The number 11 is the highest transport priority, followed by 10, then 01, and finally, 00 is the lowest priority. The payload portion 12 of the NALU contains at least one video sequence ID field 12.1, a field indicator 12.2, a field of size 12.3, the synchronization information 12.4 and the encoded image information 12.5. The video sequence ID field 12.1 is used to store the number of the video sequence to which the image belongs. The field indicator 12.2 is used to indicate whether the image is a first or a second frame when an image format of two frames is used. Both tables could be encoded as separate images. The first field indicator equal to 1 advantageously indicates that the NALU belongs to a coded frame or a coded field that precedes the second coded field of the same frame in the decoding order. The first field indicator equal to 0 indicates that the NALU belongs to a coded field that occurs the first coded field of the same frame in the decoding order. The synchronization information field 11.3 is used for the transformation of the information related to time, if necessary. NAL units can be supplied in different types of packages. In this advantageous embodiment, the different package formats include "simple packages and aggregate packages." The aggregate packages can also be divided into single-time aggregate packages and multiple-time aggregate packages. consists of a NALU A NAL unit flow composed by the de-encapsulation of the Simple Packs in the order of the RTP sequence number must conform to the supply order of the NAL unit. This specification of the payload The scheme is introduced to reflect the different MTU sizes in dramatic form of two different types of networks, the wireline IP networks (with an MTU size that is often limited by the MTU Eternet size, approximately 1500 bytes) and IP-based or non-IP wireless networks (eg H.324 / M) with preferred transmission unit sizes of 254 bytes or less. In order to avoid media transcoding between the two worlds, and to avoid the undesirable excess of packetization, a packet aggregation scheme is introduced. The Single Time Attachment Pack (STAP) adds the NALUs with identical NALU time. In a respective way, The Packages Multiple Time Attachment (MTAP) adds the -NALU with potentially different NALU time. Two different MTAPs are defined so that they differ in the length of the NALU timestamp displacement. The term NALU time is defined as the value that the RTP timestamp should have if this NALU could be transported in its own RTP packet. The MTAP and STAP share the following non-limiting packetization rules according to an advantageous embodiment of the present invention. The RTP timestamp must be set at the minimum NALU times of all the NALUs that will be added. The type field of the NALU octet type must be set to the appropriate value as indicated in Table 1. The error indicator field 11.1 has to be cleared if all the error indicator fields of the NALUs added were zero, otherwise it must be adjusted. Table 1 The NALU payload of an aggregate package contains one or more aggregate units. An aggregate package can carry as many aggregate units as necessary, however, the total amount of data in the aggregate package obviously has to be placed inside an IP packet, and the size has to be chosen so that the resulting packet IP is smaller than the MTU size. The Single Time Attachment Package (STAP) has to be used each time the aggregate NALUs share the same NALU time. The NALU payload of an STAP consists of the field of the video sequence ID 12.1 (for example, 7 bits) and the field indicator 12.2 is followed by the Single Image Aggregation Units (SPAU). In another alternative mode, the NALU payload of a Single Time Aggregate Packet (STAP) consists of a 16-bit unsigned decoding order number (DON) followed by the Single Image Aggregation Units (SPAU) ). A video sequence according to this specification can be any part of the NALU stream that can be decoded independently from the other parts of the NALU stream. A table consists of two fields that could be encoded as separate images. The first field indicator equal to 1 indicates that the NALU belongs to a coded frame or a coded field that precedes the second coded field of the same frame in the decoding order. The first field indicator equal to 0 indicates that the NALU belongs to a coded field that happens to the first coded field of the same frame in the decoding order. A Single Image Aggregation Unit consists, for example, of the non-signed 16-bit size information indicating the size of the next NALU in bytes (excluding these two octets, although including the NALU octet of the NALU) , followed by the NALU by itself that includes its NALU byte. A Multiple Time Attachment Package (MTAP) has a similar architecture as an STAP. This consists of a NALU header byte and one or more Multiple Image Aggregation Units. The choice between the different MTAP fields is a function of the application, with the displacement of the larger time stamp a greater flexibility of the MTAP is provided, although also the overload is greater. Two different Multiple Time Attach Units are defined in this specification. Both of them consist, for example, of the information of the unsigned 16-bit size of the next NALU (same as the size information in the STAP). In addition to these 16 bits there are also, the video sequence ID field 12.1 for example, 7 bits), the field indicator 12.2 and n bits of the synchronization information for this NALU, by means of which n may be, for example, 16 or 24. The synchronization information field has to be adjusted, so that the RTP timestamp of an RTP packet of each NALU in the MTAP (the NALU time) can be generated by adding the synchronization information from of the RTP time stamp of the MTAP. In another alternate modality, the Aggregate Package of Multiple Time (MTAP) consists of the NALU header byte, a base field of the decode order number (DONB) 12. 1 (for example, 16 bits), and one or more Multiple Image Aggregation Units. The two different Units of Added Multiple Images are defined in this case as follows. Both of them consist, for example, of the unsigned 16-bit size information of the following NALU (the same as the size information of the STAP). In addition to these 16 bits there is also the delta field of the decoding order number (DOND) 12.5 (for example, 7 bits), and n bits of the synchronization information for this NALU, whereby n can be, for example, 16 or 24.

The DON of the next NALU is equal to DONB + DOND. The synchronization training field has to be adjusted so that the RTP timestamp of an RTP packet of each NALU in the MTAP (the NALU time) can be generated by adding the synchronization information from the RTP time stamp of the MTAP. The DONB must contain the smallest DON value among the NAL units of the MTAP.

The behavior of the temporary storage model according to the present invention is controlled, advantageously, with the following parameters: the initial entry period (for example, in clock strokes of a 90-kHz clock) and the size " "of the hypothetical packet entry buffer Xfor example, in bytes). Preferably, the default initial entry period and the default size of the hypothetical packet entry buffer are 0. PSS clients could signal their ability to provide a larger buffer during the capacity exchange process. The maximum video bit rate can be signaled, for example, in the medium level bandwidth attribute of SDP, or in the dedicated SDP parameter. If the video level bandwidth attribute was not present in the presentation description, the maximum video bit rate would be defined according to the video coding profile and the usage level. The initial parameter values for each flow can be signaled within the SDP description of the flow, for example, using MIME type parameters or similar non-standard SDP parameters. The parameter values indicated override the corresponding default parameter values. The values indicated in the SDP description guarantee the reproduction without pause from the beginning of the flow until the end of the same (assuming a reliable transmission channel of constant delay). The PSS servers could update the parameter values in response to the RTSP PLAY request. If an updated parameter value were present, it would have to replace the value indicated in the SDP description or the default parameter value in the operation of the PSS temporary storage model. An updated parameter value is valid only in the indicated range of reproduction, and has no effect after this. Assuming a reliable constant delay transmission channel, the updated parameter values guarantee the non-stop reproduction of the current interval indicated in the response for the PLAY request. The indicated size of the hypothetical input packet buffer and the initial input period must be smaller or equal to the corresponding values in the SDP description or the corresponding default values, whichever is valid. The server's temporary storage verifier is specified according to the specific model of temporary storage. The model is based on a hypothetical packet entry buffer. The model of temporary storage is presented below. The buffer is initially empty. A PSS server adds each RTP transmitted packet having video payload to the hypothetical packet entry buffer 1.1, immediately when it is transmitted. All the protocol headers in "RTP or in any lower layer are removed." The data is not removed from the hypothetical packet entry buffer during a period called the initial entry period.The initial entry period begins when the first RTP packet is added to the hypothetical packet entry buffer.When the initial entry period has expired, the removal of the data from the hypothetical packet entry buffer is initiated.The removal of the data happens, advantageously , at the maximum video bit rate, unless the hypothetical packet entry buffer 1.1 is empty The data removed from the hypothetical packet input buffer 1.1 is input to the Hypothetical Reference Decoder 5. The decoder hypothetical reference 5 performs the hypothetical decoding process to ensure that the encoded video stream can be to be deciphered according to the established parameters, or if the hypothetical reference decoder 5 observes that, for example, the image buffer 5.2 of the hypothetical reference decoder 5 has an excess of flow, the parameters of the buffer memory could be modified. In this case, the new parameters are also transmitted to the receiving device 8, in which, consequently, the buffers are reinitialized. The coding and transmission device 1, such as a PSS server, has to verify that the transmitted RTP packet flow meets the following requirements: - The temporary memory model has to be used with the default temporary storage parameter values or designated. The parameter values indicated override the corresponding default parameter values. - The occupation of the hypothetical packet entry buffer must not exceed the default or designated size of the buffer. The output bitstream of the hypothetical packet input buffer must conform to the definitions of the Hypothetical Reference Decoder. When the temporary storage model is in use, the PSS client must be able to receive an RTP packet flow that complies with the PSS server's temporary storage verifier, when the RTP packet flow is carried through a dedicated channel. Reliable transmission of constant delay. In addition, the PSS client decoder must output the frames at the correct speed defined by the RTP timestamps of the packet received stream. Transmission The transmission and / or storage of the encoded images (and the optional virtual decoding) can be immediately started once the first encoded image is ready. This image is not necessarily the first one in the order of the decoder output, because the decoding order and the output order could not be the same. When the first image of the video stream is encoded, transmission can be initiated. The encoded images are stored, optionally, in the coded image buffer 1.2. The transmission may also start at a later stage, for example, after a certain part of the video stream is encoded. The decoder 2 must also output the decoded images in the correct order, for example, using the order of the image order counts, and therefore, the reordering process needs to be clearly and normatively defined.

De-packaging The process of de-packaging is dependent on the implementation. Therefore, the following description is a non-restrictive example of an appropriate implementation. Other schemes could also be used. Optimizations related to the algorithms described are probably possible. The general concept behind these de-ordering rules is to reorder the NAL units of the transmission order to the order of supply of the NAL unit. Decoding The operation of receiver 8 will be described below. Receiver 8 collects all packets belonging to an image by bringing them in a reasonable order. The restricted- of the order is based on the profile used. The received packets are stored in the reception buffer 9.1 (the pre-decoding buffer). Receiver 8 discards anything that is not usable and passes the rest to decoder 2. Aggregate packets are handled by downloading their payload into the individual RTP packets that the NALUs carry. These NALUs are processed as if they were received in separate RTP packets, in the order in which they were placed in the Aggregate Package. From here on, let N be the value of the optional MIME-type parameter num-reorder-VCL-NAL-units which specifies the maximum number of NAL VCL units that precede any NAL VCL unit in the packet flow in order NAL unit supply and followed by the VCL NAL unit in the order of the RTP sequence number or in the order of composition of the aggregate package containing the VCL NAL unit. If the parameter was not present, a number of value 0 could be involved. When the video stream transfer session is initialized, the receiver 8 distributes the memory for the reception buffer 9.1 which stores at least nine pieces of the data. VCL NAL units. The receiver then starts with the admission of the video stream and stores the received VCL NAL units in the reception buffer, until at least N pieces of the NCL VCL units are stored in the reception buffer 9.1. When the receiver buffer memory .1 contains at least N VCL units NAL, the NAL units are removed from the receiver buffer 9.1 one by one and passed to the decoder 2. The NAL units are not necessarily removed from the buffer of receiver 9.1 in the same order in which they were stored, but they are in accordance with the ID of the video sequence of the NAL units, as described below. The supply of the packets to the decoder 2 is continuous until the buffer contains less than N VCL units NAL, that is, N-l VCL units NAL. In Figure 12 an example of temporary storage of the transmission units in the buffer of the decoder pre-decoder is shown. The numbers refer to the order of decoding while the order of the transmission units refers to the order of transmission (and also to the order of reception). From here on, let PVSID be the ID of the video sequence (VSID) of the last NAL unit passed to the decoder. All NAL units in an STAP share the same VSID. The order in which the NAL units are passed to the decoder is specified as follows: if the most previous sequence number RTP in the buffer corresponded to a Simple Package, the NALU in the Simple Package would be the next NALU in the order of supply of the NAL unit. If the oldest RTP sequence number in the buffer corresponds to an Aggregate Package, the order of supply of the NAL unit would be retrieved between the NALUs transported in the Aggregate Packages in the order of the RTP sequence number to the next Bundle Simple (exclusive). This set of NALU is referred to hereinafter as the NALU candidates. If no NALUs transported in Simple Packages reside in the buffer, all NALUs would belong to the NALU candidates. For each NAL unit among the candidate NALUs, the VSID distance would be calculated as follows. If the VSID of the NAL unit were larger than PVSID, the VSID distance would be equal to VSID - PVSID. Otherwise, the VSID distance would be equal to 2A (the number of bits used to signal VSID) - PVSID + VSID. The NAL units are supplied to the decoder in ascending order of the VSID distance. If several NAL units share the same VSID distance, the order to pass them to the decoder should conform to the order of delivery of the NAL unit defined in this specification. The order of supply of the NAL unit can be retrieved as described below. First, the slices and data partitions are associated with the images according to their frame numbers, RTP timestamps and the first flags or field warnings: all NALUs that share the same values of the frame number, the RTP time stamp and the first field warning belong to the same image. The NALU SEIs, the NALUs of the sequence parameter set, the NALUs of the image parameter set, the NALUs of the image delimiter, the end of the sequence NALUs, the end of the flow NALUs, and the NALUs of data Fillings belong to the image of the next VCL NAL unit in the order of transmission.

Secondly, the order of supply of the images is concluded based on nal_ref_idc, the frame number, the first field warning and the RTP time stamp of each image. The order of supply of the images is in ascending order of the frame numbers (in module arithmetic). If several images shared the same frame number value, the image (s) that had nal_ref_idc equal to 0 would be supplied first. If several images shared the same value of the frame number and all had nal_ref_idc equal to 0, the images would be supplied in the ascending order of RTP timestamp.

If two images shared the same RTP timestamp, the image that has the first field notice equal to 1 would be supplied first. It is noted that a primary coded image and the corresponding redundant encoded images are considered herein as an encoded image. Third, if the video decoder in use does not support the Arbitrary Slice Sorting, the order of supply of slices and data partitions A is in the ascending order of the syntax element first_mb_in_slice in the slice header. In addition, the data partitions B and C immediately follow the corresponding data partition A in the order of supply.

In the above, the terms PVSID and VSID were used. The terms PDON (the decoder order number of the previous NAL unit of an aggregate packet in the NAL unit supply order) and DON (the decoder order number) can be used instead as follows: allow PDON of the first unit NAL passed to the decoder is of a value 0. The order in which the NAL units are passed to the decoder is specified as follows: if the previous sequence number RTP in the buffer corresponds to a packet Simple, the NALU in the Simple Package would be the next NALU in the supply order of the NAL unit. If the previous sequence number RTP in the buffer corresponds to an Aggregate Package, the order of supply of the NAL unit would be retrieved between the NALUs transported in the Aggregate Packages in the order of the RTP sequence number to the next Simple Package (exclusive) This set of NALU is referred to hereinafter as the NALU candidates. If no NALUs transported in the Simple Packs reside in the buffer, all NALUs would belong to the candidate NALUs. For each NAL unit among the candidate NALUs, the DON distance is calculated as follows. If the DON of the NAL unit were larger than PDON, the DON distance would be equal to DON-PDON. Otherwise, the DON distance would be equal to 2A (the number of bits represented by a DON and PDON as an unsigned integer) - PDON + DON. The NAL units are supplied to the decoder in ascending order of the DON distance. If several NAL units share the same DON distance, the order to pass them to the decoder would be: 1. NAL unit of image delimitation, if any 2. NAL units of sequence parameter set, if any 3. NAL units of parameter set of image, if there were 4. SEI NAL Units, if there were 5. NAL units of sliced and sliced slice data partition of the primary encoded image, if there were 6. NAL units of sliced and sliced slice data partition of the images redundant encoded, if any 7. NAL filler data units, if any 8. End of NAL stream unit, if any 9. End of NAL stream unit, if any. If the video decoder in use will not support the Arbitrary Slice Sorting, the order of supply of slices and data partitions A would be sorted in ascending order of the syntax element first_mb_in_slice in the slice header. further, the data partitions B and C immediately follow the corresponding data partition A in the order of supply. The following additional rules of de-commissioning could be used to implement an operative JVT descrambler: the NALUs are presented to the JVT decoder in the order of the RTP sequence number. The NALUs carried in the Aggregate Package are presented in their order in the - Aggregate Package. All NALUs in the Aggregate Package are processed before the next RTP packet is processed. Intelligent RTP receivers (for example, on the gateways) could identify the loss DPAs. If a lost APD were found, the gateway COULD decide not to send the DPB and DPC partitions, since its information has no meaning for the JVT Decoder. In this way, a network element can reduce the network load by discarding useless packets without analyzing a complex bit stream. Intelligent receivers could discard all packets that have a NAL Reference Idc of 0. However, they should process those packets, if possible, because the user experience could suffer if the packets were discarded. The DPB 2.1 contains memory locations for storing a number of images. These places are also called as box stores in the description. The decoder 2 decrypts the received images in the correct order. To do so in this way, the decoder examines the ID information of the video sequence of the received images. If the encoder had selected the ID of the video sequence for each group of images freely, the decoder would decipher the images of the group of images in the order in which they are received. If the encoder had defined the ID of the video sequence for each group of images by using the increment (or decrement) numbering scheme, the decoder would decipher the group of images in the order of the IDs of the video sequence. In other words, the group of images that has the smallest (or largest) ID of the video sequence is decoded first. The present invention can be applied in many types of systems and devices. The transmission device 6 including the encoder 1 and optionally, the HRD 5 advantageously comprises a transmitter 7 which sends the encoded images to the transmission channel 4. The receiving device 8 includes the receiver 9 which supports the encoded images, the decoder 2, and a screen 10 on which the decoded images can be presented. The transmission channel can be, for example, a terrestrial communication channel and / or a wireless communication channel. The transmission device and the receiving device also include one or more processors 1.2, 2.2- which can carry out the necessary steps to control the coding / decoding process of the video stream according to the invention. Therefore, the method according to the present invention can mainly be implemented as the steps that can be executed by a machine of the processors. The intermediate storage of the images can be implemented in memory 1.3, 2.3 of the devices. Program code 1.4 of the encoder can be stored in memory 1.3. In a respective manner, the program code 2.4 of the decoder can be stored in the memory 2.3. It is obvious that the hypothetical reference decoder 5 can be located after the encoder 1, so that the hypothetical reference decoder 5 rearranges the encoded images, if necessary, and can ensure that the pre-decoding buffer of the receiver 8 does not have an excess of flow. The present invention can be implemented in the temporary storage verifier which can be part of the hypothetical reference decoder 5 or can be separated from it. It is noted that in relation to this date the best method known to the applicant to carry out said invention, is the one that is clear from the present description of. the invention. 1. 0 . 0

Claims

CLAIMS Having described the invention as above, the content of the following claims is claimed as property: 1. A method for the temporary storage of encoded images, the method includes a coding step that forms the encoded images in an encoder, a step of transmission that sends the encoded images to a decoder as transmission units, a temporary storage stage that temporarily stores the transmission units sent to the decoder in a buffer and a decoding step that deciphers the encoded images for the formation of the decoded images, characterized in that the size of the buffer is defined, so that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size. The method according to claim 1, characterized in that the number of transmission units used in the calculation of the total size is the fractional number of the required size of the buffer in terms of the number of the transmission units. The method according to claim 1, characterized in that the number of transmission units used in the calculation of the total size is the fractional number of the required size of the buffer in terms of the number of transmission units, wherein the number fractional is of the form of 1 / N, where N is an integer. The method according to claim 1, characterized in that the number of transmission units used in the calculation of the total size is the same as the necessary size of the buffer in terms of the number of transmission units. The method according to claim 1, characterized in that the number of transmission units used in the calculation of the total size is expressed as in the order of temporary storage of the transmission units. 6. A system includes an encoder that decrypts images, a transmitter that sends the encoded images to a decoder as VCL NAL units, a decoder that decrypts the encoded images for the formation of the decoded images, the decoder includes a buffer for storage of the transmission units sent to the decoder, further characterized in that it comprises a definer which specifies the size of the buffer, such that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size. 7. An encoder that decodes the images comprises a transmitter that sends the encoded images to a decoder as transmission units for temporary storage in a buffer and the decoding, further characterized in that it comprises a definer that specifies the size of the buffer, so that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size. 8. The encoder according to claim 7, characterized in that it comprises a buffer that temporarily stores the encoded images, and a hypothetical reference decoder that determines the temporary storage requirements for the decoding of the encoded images. 9. A decoder that decodes the images encoded for the formation of the decoded images, includes a pre-decoding buffer for the temporary storage of the encoded images that are received for decoding, also characterized in that it comprises a processor that distributes the memory for the pre-decoding buffer according to a received parameter indicative of the size of the buffer, and the size of the buffer is defined, so that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size. 10. A software program that includes the steps that can be executed by a machine to perform a method for storing the encoded images, the method includes a coding stage that forms the encoded images in an encoder, a transmission stage that sends the encoded images to a decoder as transmission units, a temporary storage stage that temporarily stores the transmission units sent to the decoder in a buffer and a decoding stage that decrypts the encoded images for the formation of the decoded images , characterized in that the size of the buffer is defined such that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size. 11. A storage medium that stores a software program that includes the steps that can be executed by a machine to perform a method for storing the encoded images, the method includes a coding step that forms the encoded images in an encoder , a transmission stage that sends the encoded images to a decoder as transmission units, a temporary storage stage that temporarily stores the transmission units sent to the decoder in a buffer and a decoding stage that decrypts the encoded images for the formation of the decoded images, characterized in that the size of the buffer is defined so that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size. 12. An electronic device includes an encoder that decrypts images and a transmitter that sends the encoded images to a decoder as transmission units for temporary storage in a buffer and its decoding, further characterized by comprising a definer that specifies the size of the buffer, so that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size. 13. A signal includes images encoded as transmission units for which the temporary storage requirements for the decoding of the encoded images are determined, characterized in that it has a parameter indicative of the size of the buffer memory, so that the total size at least of two transmission units is defined and the maximum size of the buffer is defined based on the total size, and the parameter is linked with the signal. 14. A transmission device, which includes an encoder that encrypts the images, is constituted by a transmitter that sends the encoded images to a decoder as transmission units for temporary storage in a buffer memory and its decoding, further characterized in that it comprises a definer that specifies the size of the buffer, such that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size. 15. A reception device, which includes a decoder that decodes the encoded images for the formation of decoded images, is constituted by a pre-decoding buffer for the temporary storage of the encoded and received images for decoding, further characterized because it comprises a processor that distributes the memory for the pre-decoding buffer according to a received parameter indicative of the size of the buffer, and the size of the buffer is defined, so that the total size of at least two transmission units is defined and the maximum size of the buffer is defined based on the total size.