[go: up one dir, main page]

CN114095733A - Metadata processing method in video transcoding, video transcoding equipment and electronic equipment - Google Patents

Metadata processing method in video transcoding, video transcoding equipment and electronic equipment Download PDF

Info

Publication number
CN114095733A
CN114095733A CN202110968111.1A CN202110968111A CN114095733A CN 114095733 A CN114095733 A CN 114095733A CN 202110968111 A CN202110968111 A CN 202110968111A CN 114095733 A CN114095733 A CN 114095733A
Authority
CN
China
Prior art keywords
metadata
video stream
transcoding
dolby
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110968111.1A
Other languages
Chinese (zh)
Other versions
CN114095733B (en
Inventor
尼尔·古恩
王丛中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rongming Microelectronics Jinan Co ltd
Original Assignee
Rongming Microelectronics Jinan Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rongming Microelectronics Jinan Co ltd filed Critical Rongming Microelectronics Jinan Co ltd
Priority to CN202110968111.1A priority Critical patent/CN114095733B/en
Publication of CN114095733A publication Critical patent/CN114095733A/en
Application granted granted Critical
Publication of CN114095733B publication Critical patent/CN114095733B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/40Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2628Alteration of picture size, shape, position or orientation, e.g. zooming, rotation, rolling, perspective, translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本发明提出了一种视频转码中元数据的处理方法、视频转码设备及电子设备,处理方法,包括:S110,对源视频流进行解码,得到解码视频流和对应的元数据;S120,基于对解码视频流的处理操作,调整对应的所述元数据;S130,对解码视频流进行编码,将元数据嵌入编码的视频流中。本发明可以自动化实现HDR视频的转码,并保留了HDR媒体必须包含的元数据信息,同时元数据自动调整以适应转码过程中对图像的修改。允许流服务和其他媒体用户以一种简单的方式对HDR媒体进行转码,而不需要多次访问主文件,简化了视频转码流程,提高了效率。

Figure 202110968111

The present invention provides a method for processing metadata in video transcoding, video transcoding equipment and electronic equipment, and the processing method includes: S110, decoding a source video stream to obtain the decoded video stream and corresponding metadata; S120, Based on the processing operation on the decoded video stream, adjust the corresponding metadata; S130, encode the decoded video stream, and embed the metadata into the encoded video stream. The invention can automatically realize the transcoding of the HDR video, and retain the metadata information that the HDR media must contain, and at the same time, the metadata is automatically adjusted to adapt to the modification of the image during the transcoding process. Simplifies the video transcoding process and increases efficiency by allowing streaming services and other media users to transcode HDR media in an easy way without requiring multiple accesses to the main file.

Figure 202110968111

Description

Metadata processing method in video transcoding, video transcoding equipment and electronic equipment
Technical Field
The invention relates to the technical field of video transcoding, in particular to a metadata processing method in video transcoding, video transcoding equipment and electronic equipment.
Background
HDR (high dynamic range) video uses metadata in the encoding process to more accurately define the information needed for display, using standards including HDR10, HLG, HDR10+, and Dolby Vision (DoVi). The information contained in these metadata may be static information such as the color system used by the entire video, maximum and maximum average light levels, master display functions, etc., or dynamic information that may optimize the video display frame by frame.
When transcoding video using commonly used transcoding tools (e.g., FFmpeg) (e.g., h.265 encoding using X265), HDR metadata is typically lost during the transcoding process and the user must manually add the metadata back into the transcoded video, otherwise the resulting transcoded bitstream cannot be played back correctly. Currently, h.265 is the most common HDR codec, while X265 is the most popular h.265 encoder so far. Other codecs, such as VP9, AV1, and h.264, may also be used for HDR encoding, and the same problems may exist.
Disclosure of Invention
The invention provides a method for processing metadata in video transcoding, video transcoding equipment and electronic equipment, and aims to solve the technical problem of how to automatically process metadata in video transcoding.
The method for processing the metadata in the video transcoding, provided by the embodiment of the invention, comprises the following steps:
decoding the source video stream to obtain a decoded video stream and corresponding metadata;
adjusting the corresponding metadata based on processing operations on the decoded video stream;
encoding the decoded video stream, and embedding the metadata into the encoded video stream.
According to some embodiments of the invention, the processing operation on the decoded video stream comprises: changing the size or position of pixels in the picture, carrying out zooming, cutting, rotating and mirroring operations on the image, and carrying out corresponding adjustment on corresponding metadata based on the operation of the video stream.
In some embodiments of the present invention, the metadata is directly embedded in the encoded video stream if no processing operation is performed on the decoded video stream.
According to some embodiments of the invention, the metadata comprises static metadata and dynamic metadata.
In some embodiments of the present invention, said embedding said metadata in the encoded video stream comprises:
for Dolby Vision, setting and modifying a Dolby Vision descriptor according to the picture size and the bit rate of an encoded video stream, writing the encoded video stream into a container, inserting a Dolby configuration record into the container, and selecting a corresponding Dolby codec type for the Dolby configuration in the container;
for static metadata, firstly caching all static metadata in an encoder, and inserting the static metadata into a video frame of an encoded video stream, wherein the video frame is encoded into an I frame;
for dynamic metadata, the dynamic metadata is added to the corresponding encoded frame.
According to some embodiments of the invention, the static metadata comprises: color description metadata, CLL metadata, MDCV metadata, and Dolby Video configuration;
the dynamic metadata includes: HDR10+ dynamic metadata and dolby visual dynamic metadata.
The video transcoding device according to the embodiment of the invention comprises:
the decoding module is used for decoding the source video stream to obtain a decoded video stream and corresponding metadata;
a metadata processing module for adjusting the corresponding metadata based on processing operations on the decoded video stream;
and the encoding module is used for encoding the decoded video stream and embedding the metadata into the encoded video stream.
According to some embodiments of the invention, the processing operation on the decoded video stream comprises: and the metadata processing module correspondingly adjusts corresponding metadata based on the operation of the video stream.
According to an embodiment of the present invention, an electronic apparatus includes: memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method of:
receiving a decoded video stream and corresponding metadata acquired by a decoder decoding a source video stream;
adjusting the corresponding metadata based on processing operations on the decoded video stream;
sending the metadata to an encoder to encode the decoded video stream, the metadata being embedded in the encoded video stream.
According to some embodiments of the invention, the processing operation on the decoded video stream comprises: changing the size or position of pixels in the picture, carrying out zooming, cutting, rotating and mirroring operations on the image, and carrying out corresponding adjustment on corresponding metadata based on the operation of the video stream.
The metadata processing method, the video transcoding equipment and the electronic equipment in video transcoding provided by the invention have the following beneficial effects that:
the present invention automatically transfers any HDR related metadata from decoding input to transcoding of encoded output. If the video stream is converted during transcoding, the metadata is identically converted so that any regions acted upon by the metadata remain the same in the transcoded image. Therefore, transcoding of HDR video is automatically realized, metadata information which must be contained in HDR media is reserved, and meanwhile, metadata is automatically adjusted to adapt to modification of images in the transcoding process. The method allows streaming services and other media users to transcode HDR media in a simple mode without accessing a main file for multiple times, simplifies the process and improves the efficiency.
Drawings
Fig. 1 is a flowchart of a method for processing metadata in video transcoding according to an embodiment of the present invention;
fig. 2 is a diagram of an example of decoding HDR10 color information in h.265vui according to an embodiment of the present invention;
fig. 3 is a diagram of an example of decoding of CLL and MDCV SEI of h.265 according to an embodiment of the invention;
FIG. 4 is an exemplary diagram of an mp4 file containing Dolby profile 5, according to an embodiment of the invention;
FIG. 5 is an exemplary diagram of HDR10+ dynamic metadata according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating ST2094-10 process window parameters, in accordance with an embodiment of the present invention;
FIG. 7 is a schematic illustration of the active area described in the ST2094-10 standard, in accordance with an embodiment of the present invention;
FIG. 8 is a schematic diagram of an example FFmpeg command in the prior art;
fig. 9 is a schematic diagram of metadata processing when scaling is required in an HDR transcoding process according to an embodiment of the present invention;
FIG. 10 is a diagram illustrating scaling, mirroring, rotating and cropping of a read image according to an embodiment of the invention.
Detailed Description
To further explain the technical means and effects of the present invention adopted to achieve the intended purpose, the present invention will be described in detail with reference to the accompanying drawings and preferred embodiments.
The description of the method flow in the present specification and the steps of the flow chart in the drawings of the present specification are not necessarily strictly performed by the step numbers, and the execution order of the method steps may be changed. Moreover, certain steps may be omitted, multiple steps may be combined into one step execution, and/or a step may be broken down into multiple step executions.
The invention discloses a method for reserving metadata in an HDR video transcoding process. The core problem it solves is to automate the transfer of HDR metadata from an input bitstream to a transcoded bitstream during transcoding and to automatically modify the metadata as required by any video transformation (e.g. scaling, cropping, mirroring or rotation) in the process.
For example, HDR10+ dynamic metadata specified in the SMPTE ST2094-40 standard and Dolby Vision dynamic metadata specified in the SMPTE ST2094-10 standard may specify a processing window or active region. If these metadata properties are used in the input bitstream, these HDR10+ process windows or Dolby Vision active regions also need to be transformed similarly when transcoding in order to map to the correct region in the output window.
HDR metadata contains tone mapping information for mapping a set of colors in a higher dynamic range to a lower dynamic range and ensuring that their visual perception is similar. This tone mapping may be static, meaning that the mapping is optimized for the brightest scene for the entire video. The mapping may also be dynamic and may be updated on a frame-by-frame basis. Both mappings may thus provide better visual effects for certain special scenes (e.g. snow, low light).
For the convenience of understanding the present invention, the HDR metadata related to the present invention is explained as follows:
the HDR metadata includes: static metadata and dynamic metadata.
Wherein the static metadata is typically kept constant for the entire video data stream. In some special cases, it may also be updated when a scene changes. To ensure that random access to the video file is possible, or to allow the terminal to dynamically join the video stream being played, static metadata is typically sent on each I-frame, regardless of whether the metadata has changed.
The static metadata contains the following contents: HDR color description metadata, luma level information (CLL) of the content, and a Master Display Color Volume (MDCV). Dolby Vision has a complete set of static metadata defined by itself, called dvcC or dvvC, i.e., Dolby Vision configuration records.
Several types of static metadata are explained below:
1. color description metadata.
For h.264 and h.265, which comply with the ITU codec standard, HDR color metadata is stored in the VUI. For AV1 and VP9, its HDR color metadata is stored in color _ config of the sequence header obu (metadata open bitstream unit) in accordance with the specifications of these codecs. Therefore, the VUI and color _ config carry the same information regardless of the codec. As shown in fig. 2, is an example of decoding HDR10 color information in h.265vui. It is worth noting that different HDR standards will have different color settings.
2. CLL and MDCV metadata.
CLL (brightness level information of contents) contains information of maximum and maximum average brightness; the MDCV (master display color volume) contains information on the calibration of the master display. By interpreting both information, the best viewing experience can be reproduced as much as possible upon playback. See CTA 861.G for CLL definition; the definition of MDCV refers to SMPTE 2086.
For h.264 and h.265, CLL and MDCV metadata are stored in SEI (supplemental enhancement information). For AV1, they are stored in metadata _ obu. While VP9 does not support storing such metadata in the bitstream, it relies on an appropriate container format, such as MKV or WebM. As shown in fig. 3, an example of decoding of CLL and MDCV SEI for h.265. The formats of the other codecs are similar.
3. Dolby Vision configuration records.
Dolby defines a configuration record, named dvcC or dvvC, in which a number of information is stored, including Dolby Vision configuration and level, dynamic metadata (Dolby refers to RPU: reference processing unit), and up to 2 coding layers: the presence of a base layer (bl) and an enhancement layer (el). The dvcC is stored in a container, typically in an ISO-based format, such as mp4 or QuickTime or transport stream. These records must be passed from the input container to the output container and updated according to the encoder settings to the Dolby Level (Dolby Level), while all other settings should remain unchanged.
The duty of the dolby level is similar to the h.265 or h.264 level and increases as the picture size and bit rate increase. Depending on the configuration, dolby also uses its own codec identifier, e.g. dvh1 or dvhe is used in configuration 5. Configuration 8 contains many backward compatible modes, such as HDR10, HLG, and SDR, so the hev1 or hcv1 names of the standard ISO format h.265 codec are used.
Fig. 4 shows an mp4 file containing Dolby profile 5, dvcC shows configuration 5(profile 5), level 9, the bitstream contains dynamic metadata (rpu), and contains only the base layer. Some dolby visual configurations, such as profile 4 and profile 7, support enhancement coding, may further increase dynamic range.
Complete information about Dolby Vision configuration and level, information recorded using Dolby Vision based on ISO format and Dolby configuration, and related transport streams may be referred to prior art documents and will not be described herein again.
Dynamic HDR metadata is typically updated on a frame-by-frame basis. As previously described, the 2 HDR standards using dynamic metadata are HDR10+ and Dolby Vision, but SMPTE standardizes 4 schemes:
SMPTE ST 2094-10–Dolby Vision;
SMPTE ST 2094-20–Philips SL-HDR1;
SMPTE ST 2094-30–Technicolor SL-HDR1;
SMPTE ST-2094-40–HDR10+;
two types of standard dynamic metadata are introduced below:
1. HDR10+ dynamic metadata.
FIG. 5 is an example of HDR10+ dynamic metadata. For h.264 and h.265, they are stored in t.35sei. AV1 stores it in a T.35meta _ obu, and VP9 also relies on the container to store this metadata.
ST2094-10 contains process window parameters specifying pixel coordinates that need to be updated if any image transformation that changes pixel position is performed (e.g., scaling, rotation, mirroring and cropping), as shown in fig. 6, the current version of the standard limits the value of num windows to 1, and therefore these parameters have not been enabled for HDR10 +.
2. Dolby Vision dynamic metadata (RPU).
Dolby Vision dynamic metadata is defined in SMPTE 2094-10. For h.265, this metadata is stored in reservation type 62NAL units (network abstraction layer). Currently Dolby only supports h.265.
The ST2094-10 level 5(level 5) metadata specifies the pixel coordinates of the picture-active area. When the active area needs any picture transformation (e.g. scaling, cropping, rotating, mirroring pictures, etc.), the corresponding metadata needs to be modified and corrected as well. Since the definition of the active area is limited to a rectangle, only transformations that preserve the rectangular active area can be supported. For example, only 90 degrees and multiples thereof are supported. Fig. 7 is an active area described in the standard.
The prior art typically uses FFmpeg or similar tools for transcoding. Taking X265 as an example of an encoder, X265 requires all HDR metadata as a parameter input to the FFmpeg command line. The user must extract this information using his own software or common tools (e.g., media info, ffprobe, bitstream analyzers, etc.).
Fig. 8 shows an example FFmpeg command that transcodes an HDR10 bitstream (HDR inbit. mp4) to HDR-output. mp4 using an X265 encoder. HDR metadata is highlighted in light gray. The first 3 parameters are HDR color information. The fourth and fifth parameters are CLL and MDCV.
The above example only specifies static HDR metadata. X265 also has two other parameters for dynamic HDR metadata, as shown below. The first specifies the filename containing HDR10+ dynamic metadata, and the second specifies the filename containing Dolby Vision dynamic metadata. Each file must contain an entry with metadata for each frame in the input bitstream.
HDR10+ dynamic metadata: dhdr10-info ═ filename >;
dolby Vision dynamic metadata: dolby-vision-rpu ═ filename >.
In the prior art, the HDR transcoding is implemented by using FFmpeg + X265. This implementation requires the user to manually extract the HDR metadata and specify it on the command line at transcoding time. This requires each user to develop their own methods to extract HDR metadata, and if the video needs to be scaled/rotated during transcoding, the metadata needs to be manually transformed as necessary. In the case of Dolby Vision, it is also cumbersome and inconvenient to manually adjust the Dolby Level (Dolby Level) based on the new picture size and bit rate.
As shown in fig. 1, a method for processing metadata in video transcoding according to an embodiment of the present invention includes:
s110, decoding the source video stream to obtain a decoded video stream and corresponding metadata;
it should be noted that the metadata includes static metadata and dynamic metadata. Wherein the static metadata includes: color description metadata, CLL metadata, MDCV metadata, and Dolby Video configuration; the dynamic metadata includes: HDR10+ dynamic metadata and dolby visual dynamic metadata.
S120, adjusting corresponding metadata based on the processing operation of the decoded video stream;
for example, the processing operations on the decoded video stream may include: changing the size or position of pixels in the picture, carrying out zooming, cutting, rotating and mirroring operations on the image, and carrying out corresponding adjustment on corresponding metadata based on the operation of the video stream. It will be appreciated that the metadata may be embedded directly into the encoded video stream if no processing operations are performed on the decoded video stream.
S130, the decoded video stream is encoded, and the metadata is embedded in the encoded video stream.
In some embodiments of the present invention, embedding metadata in an encoded video stream includes:
for Dolby Vision, setting and modifying a Dolby Vision descriptor according to the picture size and the bit rate of an encoded video stream, writing the encoded video stream into a container, inserting a Dolby configuration record into the container, and selecting a corresponding Dolby codec type for the Dolby configuration in the container;
for static metadata, firstly caching all static metadata in an encoder, and inserting the static metadata into a video frame of an encoded video stream, which is encoded into an I frame;
for dynamic metadata, the dynamic metadata is added to the corresponding encoded frame.
According to the method for processing the metadata in the video transcoding, any metadata related to HDR is automatically transmitted from decoding input to encoding output transcoding. If the video stream is converted during transcoding, the metadata is identically converted so that any regions acted upon by the metadata remain the same in the transcoded image. Therefore, transcoding of HDR video is automatically realized, metadata information which must be contained in HDR media is reserved, and meanwhile, metadata is automatically adjusted to adapt to modification of images in the transcoding process. The method allows streaming services and other media users to transcode HDR media in a simple mode without accessing a main file for multiple times, simplifies the process and improves the efficiency.
The video transcoding device according to the embodiment of the invention comprises: the device comprises a decoding module, a metadata processing module and an encoding module.
The decoding module is used for decoding the source video stream to obtain a decoded video stream and corresponding metadata;
the metadata processing module is used for adjusting corresponding metadata based on the processing operation of the decoded video stream;
for example, processing operations on a decoded video stream include: the method comprises the steps of changing the size or the position of a pixel in a picture, carrying out zooming, cutting, rotating and mirroring operations on the image, and carrying out corresponding adjustment on corresponding metadata by a metadata processing module based on the operation of a video stream.
The encoding module is used for encoding the decoded video stream and embedding the metadata into the encoded video stream.
According to the metadata processing device in video transcoding, the functions of decoding, metadata processing and encoding are integrated, the transcoding of HDR video is automatically realized, metadata information which must be contained in HDR media is reserved, and meanwhile, metadata is automatically adjusted to adapt to modification of images in the transcoding process. Thus, the stream service and other media users are allowed to transcode HDR media in a simple mode without accessing the main file for multiple times, the flow is simplified, and the efficiency is improved.
According to an embodiment of the present invention, an electronic apparatus includes: the electronic device may be a computer, for example, and the above method steps are implemented by connecting the computer to a net transcoder.
The computer program realizes the following method steps when executed by a processor:
s110, receiving a decoded video stream and corresponding metadata acquired by a decoder decoding a source video stream;
s120, adjusting corresponding metadata based on the processing operation of the decoded video stream;
for example, processing operations on a decoded video stream include: changing the size or position of pixels in the picture, carrying out zooming, cutting, rotating and mirroring operations on the image, and carrying out corresponding adjustment on corresponding metadata based on the operation of the video stream.
S130, the metadata is sent to the encoder to encode the decoded video stream, and the metadata is embedded in the encoded video stream.
The following describes a metadata processing method in video transcoding according to the present invention with reference to the accompanying drawings. It is to be understood that the following description is only exemplary in nature and should not be taken as a specific limitation on the invention.
When the present invention is used with a net transcoder, there is no need to manually extract/modify the HDR metadata. All HDR parameters are processed automatically, only the encoding parameters need to be specified. The same command applies to all HDR formats. For example, if the input is a profile 5Dolby Vision, the output is a profile 5Dolby Vision. The same is true of HDR10, HDR10+, HLG, etc.
As shown in fig. 9, to implement an example of scaling required in HDR transcoding, the input bitstream is input from the left and decoded. The decoded image is output, then scaled, and finally normally transmitted to a video encoder. HDR metadata is removed from the decoded bitstream and modified following operations consistent with image scaling, and the final modified metadata is input to a video encoder where it is inserted into the newly encoded bitstream.
The process is not limited to picture scaling, but may be used for other operations, including changing the size or position of pixels in a picture, scaling, cropping, rotating (90, 180, 270), mirroring, and the like. The metadata can be embedded directly in the new bitstream without any modification if no operations are performed to change the image.
FIG. 10 shows how active regions in Dolby Vision HDR metadata can be modified to support scaling, mirroring, n by 90 degree rotation, or picture cropping. Any picture operation that changes the X, Y coordinates of the active area requires modification of the active area in the metadata. Note that since the active region in Dolby Vision must be a rectangular frame with sides parallel to the image, only picture operations on a rectangular frame with sides parallel to the image are supported, and 45 degree rotation is not supported without loss (in practical applications, there are also few such operations, such as 45 degree rotation).
The HDR transcoding method comprises the following steps:
a100, decode the image and find all static (color description, CLL and MDCV, Dolby Video configuration, etc.) and dynamic (HDR10+ and Dolby vision) HDR metadata that can be appended to the decoded image. (a corresponding I-frame search may be required).
A200, performing an equivalent transformation on the dynamic metadata while performing any supported transformation on the entire image or selected region. If the dynamic metadata contains the selected area, the dynamic metadata is deleted when the unsupported image conversion needs to be executed, so that the distortion of the image caused by applying the dynamic metadata to the wrong area is reduced, and meanwhile, the warning information is output. And the user knows the change in the image display quality in the HDR transcoding process according to the warning information.
And A300, encoding.
A310, for Dolby Vision, the Dolby Vision descriptor is modified according to the encoder picture size and bit rate settings. If the encoded bitstream is written to a container, a Dolby configuration record is inserted in the container and the appropriate Dolby codec type is used for the Dolby configuration in the container.
For static metadata, all static metadata is first buffered in the encoder and inserted into any video frame encoded as an I-frame a 320. If the static metadata is found to have an update on any incoming frame, the updated metadata will be used on the next I-frame.
A330, for dynamic metadata, adding any dynamic metadata to the encoded frame carrying it. It is noted that the order of the frames may be scrambled during the encoding process, but the dynamic metadata must be consistent with the frame to which it belongs.
In summary, the present invention automatically transfers any HDR related metadata from decoding input to transcoding of encoded output. If the video stream is converted during transcoding, the metadata is identically converted so that any regions acted upon by the metadata remain the same in the transcoded image.
In the prior art, the original video file is accessed and metadata is manually extracted and modified using multiple tools, and a new video stream is inserted. This method requires access rights, is very complex and cannot be applied on a large scale. Moreover, in many cases, the transcoding process has no access rights to the original video file (only the video stream).
The invention can automatically realize the transcoding of the HDR video, reserves the metadata information which must be contained in the HDR media, and automatically adjusts the metadata to adapt to the modification of the image in the transcoding process. This allows streaming services and other media users to transcode HDR media in a simple way without having to access the main file multiple times, simplifying the process and increasing efficiency.
While the invention has been described in connection with specific embodiments thereof, it is to be understood that it is intended by the appended drawings and description that the invention may be embodied in other specific forms without departing from the spirit or scope of the invention.

Claims (10)

1. A method for processing metadata in video transcoding is characterized by comprising the following steps:
decoding the source video stream to obtain a decoded video stream and corresponding metadata;
adjusting the corresponding metadata based on processing operations on the decoded video stream;
encoding the decoded video stream, and embedding the metadata into the encoded video stream.
2. The method for processing metadata in video transcoding of claim 1, wherein the processing operation on the decoded video stream comprises: changing the size or position of pixels in the picture, carrying out zooming, cutting, rotating and mirroring operations on the image, and carrying out corresponding adjustment on corresponding metadata based on the operation of the video stream.
3. The method of claim 1, wherein the metadata is directly embedded in the encoded video stream if the decoded video stream is not processed.
4. The method of claim 1, wherein the metadata comprises static metadata and dynamic metadata.
5. The method for processing metadata in video transcoding of claim 4, wherein the embedding the metadata into the encoded video stream comprises:
for Dolby Vision, setting and modifying a Dolby Vision descriptor according to the picture size and the bit rate of an encoded video stream, writing the encoded video stream into a container, inserting a Dolby configuration record into the container, and selecting a corresponding Dolby codec type for the Dolby configuration in the container;
for static metadata, firstly caching all static metadata in an encoder, and inserting the static metadata into a video frame of an encoded video stream, wherein the video frame is encoded into an I frame;
for dynamic metadata, the dynamic metadata is added to the corresponding encoded frame.
6. The method for processing metadata in video transcoding of claim 4, wherein the static metadata comprises: color description metadata, CLL metadata, MDCV metadata, and Dolby Video configuration;
the dynamic metadata includes: HDR10+ dynamic metadata and dolby visual dynamic metadata.
7. A video transcoding device, comprising:
the decoding module is used for decoding the source video stream to obtain a decoded video stream and corresponding metadata;
a metadata processing module for adjusting the corresponding metadata based on processing operations on the decoded video stream;
and the encoding module is used for encoding the decoded video stream and embedding the metadata into the encoded video stream.
8. The video transcoding device of claim 7, wherein the processing operation on the decoded video stream comprises: and the metadata processing module correspondingly adjusts corresponding metadata based on the operation of the video stream.
9. An electronic device, characterized in that the electronic device comprises: memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method of:
receiving a decoded video stream and corresponding metadata acquired by a decoder decoding a source video stream;
adjusting the corresponding metadata based on processing operations on the decoded video stream;
sending the metadata to an encoder to encode the decoded video stream, the metadata being embedded in the encoded video stream.
10. The electronic device of claim 9, wherein the processing operation on the decoded video stream comprises: changing the size or position of pixels in the picture, carrying out zooming, cutting, rotating and mirroring operations on the image, and carrying out corresponding adjustment on corresponding metadata based on the operation of the video stream.
CN202110968111.1A 2021-08-23 2021-08-23 Method for processing metadata in video transcoding, video transcoding device and electronic device Active CN114095733B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110968111.1A CN114095733B (en) 2021-08-23 2021-08-23 Method for processing metadata in video transcoding, video transcoding device and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110968111.1A CN114095733B (en) 2021-08-23 2021-08-23 Method for processing metadata in video transcoding, video transcoding device and electronic device

Publications (2)

Publication Number Publication Date
CN114095733A true CN114095733A (en) 2022-02-25
CN114095733B CN114095733B (en) 2024-11-05

Family

ID=80296102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110968111.1A Active CN114095733B (en) 2021-08-23 2021-08-23 Method for processing metadata in video transcoding, video transcoding device and electronic device

Country Status (1)

Country Link
CN (1) CN114095733B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023009484A1 (en) * 2021-07-30 2023-02-02 Cisco Technology, Inc. Transport mechanisms for video stream merging with overlapping video

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108028944A (en) * 2015-09-22 2018-05-11 高通股份有限公司 Use the Video Decoder accordance of the high dynamic range HDR video codings of video core standard
CN108370455A (en) * 2015-11-09 2018-08-03 汤姆逊许可公司 Method and apparatus for adapting video content decoded from an elementary stream to display characteristics
CN111669532A (en) * 2020-06-02 2020-09-15 国家广播电视总局广播电视科学研究院 High dynamic range video end-to-end realization method
CN111771375A (en) * 2018-02-13 2020-10-13 皇家飞利浦有限公司 System for processing multiple HDR video formats
CN112261442A (en) * 2020-10-19 2021-01-22 上海网达软件股份有限公司 Method and system for real-time transcoding of HDR (high-definition link) and SDR (short-definition link) of video
CN112511838A (en) * 2021-02-05 2021-03-16 北京欣博电子科技有限公司 Method, device, equipment and readable medium for reducing video transcoding delay

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108028944A (en) * 2015-09-22 2018-05-11 高通股份有限公司 Use the Video Decoder accordance of the high dynamic range HDR video codings of video core standard
CN108370455A (en) * 2015-11-09 2018-08-03 汤姆逊许可公司 Method and apparatus for adapting video content decoded from an elementary stream to display characteristics
CN111771375A (en) * 2018-02-13 2020-10-13 皇家飞利浦有限公司 System for processing multiple HDR video formats
CN111669532A (en) * 2020-06-02 2020-09-15 国家广播电视总局广播电视科学研究院 High dynamic range video end-to-end realization method
CN112261442A (en) * 2020-10-19 2021-01-22 上海网达软件股份有限公司 Method and system for real-time transcoding of HDR (high-definition link) and SDR (short-definition link) of video
CN112511838A (en) * 2021-02-05 2021-03-16 北京欣博电子科技有限公司 Method, device, equipment and readable medium for reducing video transcoding delay

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023009484A1 (en) * 2021-07-30 2023-02-02 Cisco Technology, Inc. Transport mechanisms for video stream merging with overlapping video

Also Published As

Publication number Publication date
CN114095733B (en) 2024-11-05

Similar Documents

Publication Publication Date Title
CN110460792B (en) Reproducing method and reproducing apparatus
CN108476324B (en) Method, computer and medium for enhancing regions of interest in video frames of a video stream
KR102577659B1 (en) Method and device for adapting video content decoded from elementary streams to the characteristics of a display
US9967599B2 (en) Transmitting display management metadata over HDMI
US9894314B2 (en) Encoding, distributing and displaying video data containing customized video content versions
CN108495141A (en) A kind of synthetic method and system of audio and video
KR101977654B1 (en) Conversion of dynamic metadata to support alternative tone rendering
US10225624B2 (en) Method and apparatus for the generation of metadata for video optimization
US8121421B2 (en) Media content management
CA2626385C (en) Systems and methods for determining and communicating correction information for video images
KR20220029688A (en) Video content type metadata for high dynamic range
CN111885393A (en) Live broadcast method, system, storage medium and equipment
CN114095733B (en) Method for processing metadata in video transcoding, video transcoding device and electronic device
US20160336040A1 (en) Method and apparatus for video optimization using metadata
JP7074461B2 (en) How and Devices to Reconstruct Display-Compatible HDR Images
WO2024230427A1 (en) Method and device for transcoding input video, computing device, and storage medium
WO2023193524A1 (en) Live streaming video processing method and apparatus, electronic device, computer-readable storage medium, and computer program product
US6369859B1 (en) Patching degraded video data
US20040234138A1 (en) Method for generating a user's favorite logo of an image display system
US11991380B2 (en) Distribution of high dynamic range images in a mixed capability playback system
WO2025036423A1 (en) Image processing method and apparatus
US20060093319A1 (en) Method of reproducing contents data and apparatus for reproducing the same
WO2016100102A1 (en) Method, apparatus and system for video enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant