[go: up one dir, main page]

CN115297327B - Semantic priori coding and decoding method and system based on semantic structured coding - Google Patents

Semantic priori coding and decoding method and system based on semantic structured coding Download PDF

Info

Publication number
CN115297327B
CN115297327B CN202210925551.3A CN202210925551A CN115297327B CN 115297327 B CN115297327 B CN 115297327B CN 202210925551 A CN202210925551 A CN 202210925551A CN 115297327 B CN115297327 B CN 115297327B
Authority
CN
China
Prior art keywords
semantic
structured
coding
decoder
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210925551.3A
Other languages
Chinese (zh)
Other versions
CN115297327A (en
Inventor
陈志波
孙思萌
金鑫
冯若愚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202210925551.3A priority Critical patent/CN115297327B/en
Publication of CN115297327A publication Critical patent/CN115297327A/en
Application granted granted Critical
Publication of CN115297327B publication Critical patent/CN115297327B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a semantic priori coding and decoding method and system based on semantic structured coding, which combines the existing semantic structured coding method, and provides an improved method for improving coding and decoding performance, namely, fully utilizing semantic priori information in a structured code stream, and realizing better coding and decoding performance while maintaining the functionality of the semantic structured coding, namely, better reconstruction quality under the same code rate or lower code rate under the same reconstruction quality.

Description

基于语义结构化编码的语义先验编解码方法与系统Semantic a priori encoding and decoding method and system based on semantic structured coding

技术领域Technical Field

本发明涉及图像压缩编码技术领域,尤其涉及一种基于语义结构化编码的语义先验编解码方法与系统。The present invention relates to the technical field of image compression coding, and in particular to a semantic a priori coding and decoding method and system based on semantic structured coding.

背景技术Background Art

为了应对5G时代视觉数据大爆发的挑战,并且更加高效地支持机器智能应用场景以及人机混合智能应用场景,甚至支持对图像内容在码流层面的灵活编辑,已有方法提出对图像进行语义结构化编码的工作。例如,公告号为CN110225341 B的中国发明专利《基于深度学习的语义结构化的图像编解码方法》,以及基于视觉对象层(Visual Object Plane)的图像视频编码MPEG-4Visual。In order to cope with the challenges of the explosion of visual data in the 5G era, and to more efficiently support machine intelligence application scenarios and human-machine hybrid intelligent application scenarios, and even support flexible editing of image content at the code stream level, existing methods have proposed semantically structured encoding of images. For example, the Chinese invention patent with announcement number CN110225341 B, "Semantic structured image encoding and decoding method based on deep learning", and the image video encoding MPEG-4Visual based on the visual object plane.

在以上专利《基于深度学习的语义结构化的图像编解码方法》中引入目标检测的区域决策网络和对齐模块提取压缩特征中语义对象可能存在区域的边界框,并对特征进行空间层面的分割,从而获得语义结构化的码流,即每段码流单独代表一种语义对象。该方法可以支持部分语义对象的解码重建,并且使用部分语义信息直接进行机器智能分析任务。图像视频编码MPEG-4Visual将输入的图像/视频根据语义对象划分为各个视觉对象平面(Visual Object Plane),并对各个视觉对象平面单独进行块划分、变换、预测、量化和熵编码,从而形成结构化的码流。该方法可以支持部分视觉对象的单独解码重建,并且支持对视觉对象的编辑,例如缩放、移位或者旋转,以及来自不同图像的视觉对象的重新组合。In the above patent "Semantic structured image encoding and decoding method based on deep learning", the regional decision network and alignment module of target detection are introduced to extract the bounding box of the area where the semantic object may exist in the compressed feature, and the feature is segmented at the spatial level to obtain a semantically structured code stream, that is, each segment of the code stream represents a semantic object separately. This method can support the decoding and reconstruction of some semantic objects, and use part of the semantic information to directly perform machine intelligence analysis tasks. Image and video encoding MPEG-4Visual divides the input image/video into various visual object planes (Visual Object Plane) according to semantic objects, and blocks, transforms, predicts, quantizes and entropy encodes each visual object plane separately to form a structured code stream. This method can support the separate decoding and reconstruction of some visual objects, and supports the editing of visual objects, such as scaling, shifting or rotating, and the recombination of visual objects from different images.

虽然已有结构化编码方法具备很强的功能性,但是在编解码性能上相比一般图像编码方法(如H.264、H.265、H.266等)还有很大的提升空间。Although existing structured coding methods have strong functionality, there is still much room for improvement in encoding and decoding performance compared to general image coding methods (such as H.264, H.265, H.266, etc.).

发明内容Summary of the invention

本发明的目的是提供一种基于语义结构化编码的语义先验编解码方法与系统,充分利用语义先验信息,改进现有的语义结构化编码方法,提升语义结构化编码方法的编解码性能。The purpose of the present invention is to provide a semantic a priori encoding and decoding method and system based on semantic structured coding, which makes full use of semantic a priori information, improves the existing semantic structured coding method, and enhances the encoding and decoding performance of the semantic structured coding method.

本发明的目的是通过以下技术方案实现的:The objective of the present invention is achieved through the following technical solutions:

一种基于语义结构化编码的语义先验编解码方法,包括:A semantic a priori encoding and decoding method based on semantic structured coding, comprising:

在编码端,输入图像经过语义分析模块,获得对应于压缩特征中语义对象的位置信息与语义类别标签;基于语义对象的位置信息,将输入图像或者输入图像对应的压缩特征在空间层面上分割仅包含语义对象的若干部分;分割后的图像或者压缩特征分别输入至后续编码模块,获得结构化码流;编码端维护一个语义类别标签池,根据语义分析模块获得的语义类别标签,确定语义类别标签的索引值,并将所述语义类别标签的索引值填充至结构化码流中的指定位置;At the encoding end, the input image passes through the semantic analysis module to obtain the position information and semantic category label corresponding to the semantic object in the compressed feature; based on the position information of the semantic object, the input image or the compressed feature corresponding to the input image is segmented into several parts containing only the semantic object at the spatial level; the segmented image or compressed feature is respectively input into the subsequent encoding module to obtain a structured bitstream; the encoding end maintains a semantic category label pool, determines the index value of the semantic category label according to the semantic category label obtained by the semantic analysis module, and fills the index value of the semantic category label into the specified position in the structured bitstream;

在解码端,维护一个与语义标签池中的标签一一对应的解码器组成的解码器池,根据结构化码流中的语义类别标签的索引值选择相应的解码器,对结构化码流中的相关码流进行解码。At the decoding end, a decoder pool consisting of decoders corresponding one-to-one to the labels in the semantic label pool is maintained, and the corresponding decoder is selected according to the index value of the semantic category label in the structured bitstream to decode the relevant bitstream in the structured bitstream.

一种基于语义结构化编码的语义先验编解码系统,包括:A semantic a priori coding and decoding system based on semantic structured coding, comprising:

编码网络,应用于编码端;在编码端,输入图像经过语义分析模块,获得对应于压缩特征中语义对象的位置信息与语义类别标签;基于语义对象的位置信息,将输入图像或者输入图像对应的压缩特征在空间层面上分割仅包含语义对象的若干部分;分割后的图像或者压缩特征分别输入至后续编码模块,获得结构化码流;编码端维护一个语义类别标签池,根据语义分析模块获得的语义类别标签,确定语义类别标签的索引值,并将所述语义类别标签的索引值填充至结构化码流中的指定位置;The coding network is applied to the coding end; at the coding end, the input image is passed through a semantic analysis module to obtain the position information and semantic category label corresponding to the semantic object in the compressed feature; based on the position information of the semantic object, the input image or the compressed feature corresponding to the input image is segmented into several parts containing only the semantic object at the spatial level; the segmented image or compressed feature is respectively input to the subsequent coding module to obtain a structured bitstream; the coding end maintains a semantic category label pool, determines the index value of the semantic category label according to the semantic category label obtained by the semantic analysis module, and fills the index value of the semantic category label into a specified position in the structured bitstream;

解码网络,应用于解码端;在解码端,维护一个与语义标签池中的标签一一对应的解码器组成的解码器池,根据结构化码流中的语义类别标签的索引值选择相应的解码器,对结构化码流中的相关码流进行解码。The decoding network is applied to the decoding end; at the decoding end, a decoder pool consisting of decoders corresponding one-to-one to the labels in the semantic label pool is maintained, and the corresponding decoder is selected according to the index value of the semantic category label in the structured bitstream to decode the relevant bitstream in the structured bitstream.

一种处理设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序;A processing device, comprising: one or more processors; a memory for storing one or more programs;

其中,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现前述的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the aforementioned method.

一种可读存储介质,存储有计算机程序,当计算机程序被处理器执行时实现前述的方法。A readable storage medium stores a computer program, which implements the above method when the computer program is executed by a processor.

由上述本发明提供的技术方案可以看出,结合现有的语义结构化编码方法,提出了提升编解码性能的改进方法,即充分利用结构化码流中语义先验信息,在维持语义结构化编码的功能性同时实现更好的编解码性能,即相同码率情况下更好的重建质量或者相同重建质量情况下更低的码率。It can be seen from the technical solution provided by the present invention that, in combination with the existing semantic structured coding method, an improved method for improving the coding and decoding performance is proposed, that is, making full use of the semantic prior information in the structured bit stream, while maintaining the functionality of the semantic structured coding, to achieve better coding and decoding performance, that is, better reconstruction quality under the same bit rate or lower bit rate under the same reconstruction quality.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other accompanying drawings can be obtained based on these accompanying drawings without paying creative work.

图1为本发明实施例提供的一种基于语义结构化编码的语义先验编解码方法的流程图;FIG1 is a flow chart of a semantic a priori encoding and decoding method based on semantic structured coding provided by an embodiment of the present invention;

图2为本发明实施例提供的在压缩特征层面进行语义分割的基于语义结构化编码的语义先验编码方法的示意图;FIG2 is a schematic diagram of a semantic a priori coding method based on semantic structured coding for performing semantic segmentation at a compressed feature level provided by an embodiment of the present invention;

图3为本发明实施例提供的在图像层面进行语义分割的基于语义结构化编码的语义先验编码方法的示意图;3 is a schematic diagram of a semantic a priori coding method based on semantic structured coding for performing semantic segmentation at an image level provided by an embodiment of the present invention;

图4为本发明实施例提供的一种基于语义结构化编码的语义先验编解码系统的示意图;FIG4 is a schematic diagram of a semantic a priori encoding and decoding system based on semantic structured coding provided by an embodiment of the present invention;

图5为本发明实施例提供的一种处理设备的示意图。FIG. 5 is a schematic diagram of a processing device provided in an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明的保护范围。The following is a clear and complete description of the technical solutions in the embodiments of the present invention in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the protection scope of the present invention.

首先对本文中可能使用的术语进行如下说明:First, the terms that may be used in this article are explained as follows:

术语“包括”、“包含”、“含有”、“具有”或其它类似语义的描述,应被解释为非排它性的包括。例如:包括某技术特征要素(如原料、组分、成分、载体、剂型、材料、尺寸、零件、部件、机构、装置、步骤、工序、方法、反应条件、加工条件、参数、算法、信号、数据、产品或制品等),应被解释为不仅包括明确列出的某技术特征要素,还可以包括未明确列出的本领域公知的其它技术特征要素。The terms "include", "comprises", "contains", "has" or other descriptions with similar semantics should be interpreted as non-exclusive inclusion. For example, including certain technical feature elements (such as raw materials, components, ingredients, carriers, dosage forms, materials, dimensions, parts, components, mechanisms, devices, steps, procedures, methods, reaction conditions, processing conditions, parameters, algorithms, signals, data, products or products, etc.) should be interpreted as including not only certain technical feature elements explicitly listed, but also other technical feature elements known in the art that are not explicitly listed.

下面对本发明所提供的一种基于语义结构化编码的语义先验编解码方法与系统进行详细描述。本发明实施例中未作详细描述的内容属于本领域专业技术人员公知的现有技术。本发明实施例中未注明具体条件者,按照本领域常规条件或制造商建议的条件进行。The following is a detailed description of a semantic a priori encoding and decoding method and system based on semantic structured coding provided by the present invention. The contents not described in detail in the embodiments of the present invention belong to the prior art known to professional and technical personnel in the field. If no specific conditions are specified in the embodiments of the present invention, they shall be carried out according to the conventional conditions in the field or the conditions recommended by the manufacturer.

实施例一Embodiment 1

本发明实施例提供一种基于语义结构化编码的语义先验编解码方法,如图1所示,其主要包括:The embodiment of the present invention provides a semantic a priori encoding and decoding method based on semantic structured coding, as shown in FIG1 , which mainly includes:

在编码端,输入图像经过语义分析模块,获得对应于压缩特征中语义对象的位置信息与语义类别标签;基于语义对象的位置信息,将输入图像或者输入图像对应的压缩特征在空间层面上分割仅包含语义对象的若干部分;分割后的图像或者压缩特征分别输入至后续编码模块,获得结构化码流;编码端维护一个语义类别标签池,根据语义分析模块获得的语义类别标签,确定语义类别标签的索引值,并将所述语义类别标签的索引值填充至结构化码流中的指定位置;At the encoding end, the input image passes through the semantic analysis module to obtain the position information and semantic category label corresponding to the semantic object in the compressed feature; based on the position information of the semantic object, the input image or the compressed feature corresponding to the input image is segmented into several parts containing only the semantic object at the spatial level; the segmented image or compressed feature is respectively input into the subsequent encoding module to obtain a structured bitstream; the encoding end maintains a semantic category label pool, determines the index value of the semantic category label according to the semantic category label obtained by the semantic analysis module, and fills the index value of the semantic category label into the specified position in the structured bitstream;

在解码端,维护一个与语义标签池中的标签一一对应的解码器组成的解码器池,根据结构化码流中的语义类别标签的索引值选择相应的解码器,对结构化码流中的相关码流进行解码。At the decoding end, a decoder pool consisting of decoders corresponding one-to-one to the labels in the semantic label pool is maintained, and the corresponding decoder is selected according to the index value of the semantic category label in the structured bitstream to decode the relevant bitstream in the structured bitstream.

本发明实施例中,输入图像经过语义分析模块,可以获得输入图像中语义对象的位置信息与语义类别标签。由于通用编码器获得的压缩特征可以看作是输入图像经过下采样变换的结果,因此输入图像中语义对象的位置信息与语义类别标签可以通过同样的下采样操作映射到压缩特征上,因而能够获得压缩特征中语义对象的位置信息与语义类别标签。In the embodiment of the present invention, the input image passes through the semantic analysis module to obtain the position information and semantic category label of the semantic object in the input image. Since the compressed features obtained by the universal encoder can be regarded as the result of the downsampling transformation of the input image, the position information and semantic category label of the semantic object in the input image can be mapped to the compressed features through the same downsampling operation, so that the position information and semantic category label of the semantic object in the compressed features can be obtained.

本发明实施例中,结构化码流中将会设置专门用于存放索引值的字节,索引值首先被转化为二进制的表示方式,随后填充对应的存放索引值的字节。In the embodiment of the present invention, bytes specifically used to store index values are set in the structured bitstream. The index values are first converted into binary representations, and then the corresponding bytes storing the index values are filled.

本发明实施例提供的上述方案,结合了现有的语义结构化编码方法,充分利用语义先验信息,改进现有的语义结构化编码方法,提升语义结构化编码方法的编解码性能,即相同码率情况下更好的重建质量或者相同重建质量情况下更低的码率。The above-mentioned scheme provided by the embodiment of the present invention combines the existing semantic structured coding method, makes full use of semantic prior information, improves the existing semantic structured coding method, and improves the encoding and decoding performance of the semantic structured coding method, that is, better reconstruction quality under the same bit rate or lower bit rate under the same reconstruction quality.

为了更加清晰地展现出本发明所提供的技术方案及所产生的技术效果,下面以具体实施例对本发明实施例所提供的一种基于语义结构化编码的语义先验编解码方法进行详细描述。In order to more clearly demonstrate the technical solution and technical effects provided by the present invention, a semantic a priori encoding and decoding method based on semantic structured coding provided by an embodiment of the present invention is described in detail with reference to a specific embodiment below.

如图2~图3所示,分别为在压缩特征层面进行语义分割、在图像层面进行语义分割实施上述方法的示意图。As shown in FIG. 2 and FIG. 3 , they are schematic diagrams of implementing the above method by performing semantic segmentation at the compression feature level and performing semantic segmentation at the image level, respectively.

图2与图3中,均通过语义分析模块对输入图像进行语义分析,获得压缩特征中语义对象的位置信息与语义类别标签。所述的语义对象的位置信息包括:语义对象的边界框(bounding box)或者语义分割图(segmentation mask)。In both Figures 2 and 3, the semantic analysis module performs semantic analysis on the input image to obtain the location information and semantic category labels of the semantic objects in the compressed features. The location information of the semantic objects includes: a bounding box of the semantic object or a semantic segmentation mask.

本发明实施例中,语义分析模块可以选择常用的基于神经网络的方法实现,例如Mask RCNN、CenterNet等目标检测或者实例分割方法。In an embodiment of the present invention, the semantic analysis module can be implemented by using commonly used neural network-based methods, such as Mask RCNN, CenterNet and other target detection or instance segmentation methods.

同时,为了构成语义结构化码流,基于语义对象的位置信息,输入图像或者经过通用编码器获得的压缩特征将被在空间层面上分割为仅仅包含对应语义信息的若干部分。所有分割后的图像/压缩特征将会被分别输入后续编码模块,从而形成结构化的码流。如图2与图3所示,对于分割后的压缩特征,后续编码模块包括:量化器与熵编码器,所述分割后的压缩特征依次经量化器与熵编码器,获得结构化码流;对于分割后的图像,后续编码模块包括:编码器、量化器与熵编码器,所述分割后的图像依次经编码器、量化器与熵编码器,获得结构化码流。At the same time, in order to form a semantically structured code stream, based on the location information of the semantic object, the input image or the compressed features obtained by the general encoder will be segmented into several parts containing only the corresponding semantic information at the spatial level. All segmented images/compressed features will be input into the subsequent encoding module respectively to form a structured code stream. As shown in Figures 2 and 3, for the segmented compressed features, the subsequent encoding module includes: a quantizer and an entropy encoder, and the segmented compressed features are sequentially passed through the quantizer and the entropy encoder to obtain a structured code stream; for the segmented image, the subsequent encoding module includes: an encoder, a quantizer and an entropy encoder, and the segmented image is sequentially passed through the encoder, a quantizer and an entropy encoder to obtain a structured code stream.

语义结构化编码过程可以参考现有技术,例如,前述背景技术引用的专利中的技术方案。不同的是,本发明实施例中,语义结构化编码中的语义划分过程可以是在图像层面实现,或者在压缩特征层面实现。再者,语义划分方法可以选择采用粗粒度的边界框为依据进行划分,或者选择更为精确的语义分割图为依据进行划分。The semantic structured coding process can refer to the prior art, for example, the technical solutions in the patents cited in the aforementioned background technology. The difference is that in the embodiment of the present invention, the semantic segmentation process in the semantic structured coding can be implemented at the image level or at the compression feature level. Furthermore, the semantic segmentation method can choose to use a coarse-grained bounding box as the basis for segmentation, or choose a more precise semantic segmentation map as the basis for segmentation.

为了充分利用已经获取的语义对象的位置信息和语义类别标签信息等先验信息,以帮助提升编码性能。本发明实施例中,编码端维护的语义类别标签池中存储了所有语义类别标签以及各个标签对应的索引值,如图2与图3所示,提供了少部分语义类别标签及对应索引值的示例:鱼——索引值1,车——索引值2,人——索引值3。本发明实施例中,语义类别标签池种类别的设定根据特定的应用场景而定。结构化码流中,可以预设多bit(例如,8bit)索引值的存储空间(即最多支持1024种语义类别)供使用,若出现同一张图像中出现多个相同语义类别的语义对象,可以采取编码数量和对应的索引值的方法减少重复的索引值的传输。In order to make full use of the acquired prior information such as the location information and semantic category label information of the semantic objects, so as to help improve the encoding performance. In an embodiment of the present invention, the semantic category label pool maintained by the encoding end stores all semantic category labels and the index values corresponding to each label, as shown in Figures 2 and 3, providing examples of a small number of semantic category labels and corresponding index values: fish - index value 1, car - index value 2, person - index value 3. In an embodiment of the present invention, the setting of the categories in the semantic category label pool is determined according to the specific application scenario. In the structured bitstream, a storage space of multi-bit (for example, 8-bit) index values (that is, supporting up to 1024 semantic categories) can be preset for use. If multiple semantic objects of the same semantic category appear in the same image, the method of encoding the number of codes and the corresponding index values can be used to reduce the transmission of repeated index values.

对应地,解码端维护一个与语义标签池中的标签一一对应的解码器组成的解码器池,如图2与图3所示,解码器池中存储了与语义类别标签池中索引值一一对应的语义类别的最佳解码器和一个通用解码器,每个语义类别的最佳解码器在对应语义类别的压缩特征作为输入时其解码性能是最佳的,但对于其他语义语义类别不具有泛化性,因此不能够获得较好的性能。而通用解码器则为较对应语义类别的解码器更具泛化性的解码器,可以在任意图像上获得的普遍较好的性能,但对特定语义类别的图像的解码性能不如该语义类别的最佳解码器。通常来说,图3中针对分割后的每个语义对象对应的图像部分,使用的编码器一般为通用编码器,但在编码端计算能力和存储空间允许的情况下,编码器也可以针对特定语义类别进行设计,实现语义类别最佳编码器。特定语义类别的编码器必须与解码器池中对应的解码器匹配,才能够正确解码。本发明实施例中,最佳编码器/解码器的获得可以采取深度神经网络CNN端到端(end-to-end)训练的方式获得,其优化目标是码率和重建失真之间的权衡损失函数。在特定语义类别数据集上训练的编码器/解码器能够理论上实现对特定语义类别数据集种数据的最佳适配,从而在当前语义类别数据上表现出最佳的编码性能。Correspondingly, the decoding end maintains a decoder pool composed of decoders corresponding to the labels in the semantic label pool one by one, as shown in Figures 2 and 3. The decoder pool stores the best decoder of the semantic category corresponding to the index value in the semantic category label pool and a general decoder. The best decoder of each semantic category has the best decoding performance when the compressed features of the corresponding semantic category are used as input, but it is not generalizable for other semantic categories, so it cannot obtain good performance. The general decoder is a decoder that is more generalizable than the decoder of the corresponding semantic category, and can obtain generally good performance on any image, but the decoding performance of the image of a specific semantic category is not as good as the best decoder of the semantic category. Generally speaking, the encoder used for the image part corresponding to each semantic object after segmentation in Figure 3 is generally a general encoder, but if the computing power and storage space of the encoding end allow, the encoder can also be designed for a specific semantic category to achieve the best encoder for the semantic category. The encoder of a specific semantic category must match the corresponding decoder in the decoder pool to be correctly decoded. In the embodiment of the present invention, the best encoder/decoder can be obtained by end-to-end training of a deep neural network CNN, and its optimization target is a trade-off loss function between bit rate and reconstruction distortion. The encoder/decoder trained on a specific semantic category data set can theoretically achieve the best adaptation to the data of the specific semantic category data set, thereby showing the best encoding performance on the current semantic category data.

在编码端,若前述背景技术引用的专利中的技术方案,结构化码流中已经包含了分割部分的语义位置信息和语义类别信息,在本发明中将语义类别信息一对一映射为索引值,随后进行传输。若结构化编码方法本身不传输语义类别信息,本发明的方法将增加对索引值的传输。语义类别标签的索引值的语法结构中包括:是否支持结构化编码的标识、输入图像中所包含的语义类别总数、输入图像包含某一语义类别的种类数量、各语义类别对应的索引值。解码端在获取得到语义结构化码流中的索引值后,能够为需要解码的图像/压缩特征选择最佳的解码器,完成解码操作。表1与表2分别展示了语义类别索引值相关语义结构中的关键语法项描述、语义类别索引值相关语法结构(syntax)定义,考虑到本发明支持结构化编码,因此,是否支持结构化编码的标识设置为1。At the encoding end, if the technical solution in the patent cited in the above-mentioned background technology already contains the semantic position information and semantic category information of the segmented part in the structured code stream, the semantic category information is mapped one-to-one to an index value in the present invention and then transmitted. If the structured coding method itself does not transmit the semantic category information, the method of the present invention will increase the transmission of the index value. The grammatical structure of the index value of the semantic category label includes: an identification of whether structured coding is supported, the total number of semantic categories contained in the input image, the number of types of a certain semantic category contained in the input image, and the index value corresponding to each semantic category. After obtaining the index value in the semantic structured code stream, the decoding end can select the best decoder for the image/compression feature to be decoded to complete the decoding operation. Tables 1 and 2 respectively show the description of key grammatical items in the semantic structure related to the semantic category index value and the definition of the grammatical structure (syntax) related to the semantic category index value. Considering that the present invention supports structured coding, the identification of whether structured coding is supported is set to 1.

表1:语义类别索引值相关语义结构中的关键语法项描述Table 1: Description of key grammatical items in the semantic structure associated with semantic category index values

表2:语义类别索引值相关语法结构(syntax)定义Table 2: Syntax definitions of semantic category index values

表2中码流结构的定义,同时决定了解码端读取码流,获得码流中信息的过程。首先,通过detection_enabled_flag标注位判断该码流是否支持语义结构化。如支持,则继续读取8bit长度的码流object_class_max_num,并根据码流获得该图像中包含的语义类别数目。根据语义类别的数目,依次读取object_class_index各个语义类别的类别标签,用于指定解码器。The definition of the bitstream structure in Table 2 also determines the process of the decoder reading the bitstream and obtaining the information in the bitstream. First, the detection_enabled_flag flag is used to determine whether the bitstream supports semantic structuring. If it does, continue to read the 8-bit bitstream object_class_max_num and obtain the number of semantic categories contained in the image based on the bitstream. According to the number of semantic categories, read the category labels of each semantic category of object_class_index in turn to specify the decoder.

需要说明的是,表1与表2中所示的bit数目、语义类别的种类数目均为举例,并非构成限制,在实际应用中,用户可以根据实际情况或者经验设置bit数目与语义类别的种类数目的具体数值。It should be noted that the number of bits and the number of semantic categories shown in Table 1 and Table 2 are examples and do not constitute a limitation. In actual applications, users can set specific values of the number of bits and the number of semantic categories based on actual conditions or experience.

本发明实施例上述方案,还可以根据情况更新语义类别标签池和解码器池。当应用场景中出现语义类别标签池未包含的新语义类别时,优先考虑获取新语义类别的相关数据,并针对新类别的数据集设计最佳解码器,并将新类别对应的索引值和解码器传输给解码端更新解码器池,以及同步更新语义类别标签池的语义类别与对应索引值。当编码端很难获取新类别的相关数据时,新语义类别相关图像/压缩特征的解码全部采用通用解码器完成。The above scheme of the embodiment of the present invention can also update the semantic category label pool and the decoder pool according to the situation. When a new semantic category that is not included in the semantic category label pool appears in the application scenario, priority is given to obtaining the relevant data of the new semantic category, and the best decoder is designed for the data set of the new category, and the index value and decoder corresponding to the new category are transmitted to the decoding end to update the decoder pool, and the semantic category and corresponding index value of the semantic category label pool are synchronously updated. When it is difficult for the encoding end to obtain the relevant data of the new category, the decoding of the image/compression features related to the new semantic category is all completed using a general decoder.

本发明实施例中,解码器池中包含若干对于特定类别最佳的解码器和通用解码器,在深度学习相关的方法中,具体表现为模型的参数(核函数的权重和偏置)。为了减少存储解码器所需要的存储空间,存储所述通用解码器的模型参数,以及所述通用解码器与每个类别的最佳解码器模型参数相减获得的残差值。In the embodiment of the present invention, the decoder pool includes several decoders and general decoders that are optimal for a specific category, which are specifically manifested as model parameters (weights and biases of kernel functions) in the method related to deep learning. In order to reduce the storage space required for storing decoders, the model parameters of the general decoder and the residual value obtained by subtracting the general decoder from the model parameters of the best decoder of each category are stored.

本发明实施例上述方案主要具有如下优点:The above solution of the embodiment of the present invention mainly has the following advantages:

(1)基于语义结构化编码方法,提出了提升编解码性能的改进方法,即充分利用结构化码流中语义先验信息,在维持语义结构化编码的功能性同时实现更好的编解码性能。(1) Based on the semantic structured coding method, an improved method for improving the encoding and decoding performance is proposed. That is, the semantic prior information in the structured bitstream is fully utilized to achieve better encoding and decoding performance while maintaining the functionality of the semantic structured coding.

(2)提出对语义结构化码流中语法结构(syntax)的改进,实现索引值的传输,从而实现在解码端选择最佳解码器适配各个对象码流的解码。(2) An improvement on the syntax in the semantically structured code stream is proposed to realize the transmission of index values, thereby realizing the selection of the best decoder to adapt to the decoding of each object code stream at the decoding end.

(3)提出对解码器池中各个解码器的高效存储方式,有效的减少了解码端存储各个类别最佳解码器以及通用解码器所需要的存储空间。(3) An efficient storage method for each decoder in the decoder pool is proposed, which effectively reduces the storage space required for storing the best decoder of each category and the general decoder at the decoding end.

综上所述,本发明提供的方案能够有效提高语义结构化编码相关方法的编解码性能。In summary, the solution provided by the present invention can effectively improve the encoding and decoding performance of semantic structured coding related methods.

实施例二Embodiment 2

本发明还提供一种基于语义结构化编码的语义先验编解码系统,其主要基于前述实施例提供的方法实现,如图4所示,该系统主要包括:The present invention also provides a semantic a priori coding and decoding system based on semantic structured coding, which is mainly implemented based on the method provided in the above embodiment. As shown in FIG4 , the system mainly includes:

编码网络,应用于编码端;在编码端,输入图像经过语义分析模块,获得对应于压缩特征中语义对象的位置信息与语义类别标签;基于语义对象的位置信息,将输入图像或者输入图像对应的压缩特征在空间层面上分割仅包含语义对象的若干部分;分割后的图像或者压缩特征分别输入至后续编码模块,获得结构化码流;编码端维护一个语义类别标签池,根据语义分析模块获得的语义类别标签,确定语义类别标签的索引值,并将所述语义类别标签的索引值填充至结构化码流中的指定位置;The coding network is applied to the coding end; at the coding end, the input image is passed through a semantic analysis module to obtain the position information and semantic category label corresponding to the semantic object in the compressed feature; based on the position information of the semantic object, the input image or the compressed feature corresponding to the input image is segmented into several parts containing only the semantic object at the spatial level; the segmented image or compressed feature is respectively input to the subsequent coding module to obtain a structured bitstream; the coding end maintains a semantic category label pool, determines the index value of the semantic category label according to the semantic category label obtained by the semantic analysis module, and fills the index value of the semantic category label into a specified position in the structured bitstream;

解码网络,应用于解码端;在解码端,维护一个与语义标签池中的标签一一对应的解码器组成的解码器池,根据结构化码流中的语义类别标签的索引值选择相应的解码器,对结构化码流中的相关码流进行解码。The decoding network is applied to the decoding end; at the decoding end, a decoder pool consisting of decoders corresponding one-to-one to the labels in the semantic label pool is maintained, and the corresponding decoder is selected according to the index value of the semantic category label in the structured bitstream to decode the relevant bitstream in the structured bitstream.

所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将系统的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。Technical personnel in the relevant field can clearly understand that for the convenience and simplicity of description, only the division of the above-mentioned functional modules is used as an example. In actual applications, the above-mentioned functions can be assigned to different functional modules as needed, that is, the internal structure of the system can be divided into different functional modules to complete all or part of the functions described above.

实施例三Embodiment 3

本发明还提供一种处理设备,如图5所示,其主要包括:一个或多个处理器;存储器,用于存储一个或多个程序;其中,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现前述实施例提供的方法。The present invention also provides a processing device, as shown in Figure 5, which mainly includes: one or more processors; a memory for storing one or more programs; wherein, when the one or more programs are executed by the one or more processors, the one or more processors implement the method provided in the aforementioned embodiment.

进一步的,所述处理设备还包括至少一个输入设备与至少一个输出设备;在所述处理设备中,处理器、存储器、输入设备、输出设备之间通过总线连接。Furthermore, the processing device also includes at least one input device and at least one output device; in the processing device, the processor, memory, input device, and output device are connected via a bus.

本发明实施例中,所述存储器、输入设备与输出设备的具体类型不做限定;例如:In the embodiment of the present invention, the specific types of the memory, input device and output device are not limited; for example:

输入设备可以为触摸屏、图像采集设备、物理按键或者鼠标等;The input device may be a touch screen, an image acquisition device, a physical button or a mouse, etc.;

输出设备可以为显示终端;The output device may be a display terminal;

存储器可以为随机存取存储器(Random Access Memory,RAM),也可为非不稳定的存储器(non-volatile memory),例如磁盘存储器。The memory may be a random access memory (RAM) or a non-volatile memory, such as a disk memory.

实施例四Embodiment 4

本发明还提供一种可读存储介质,存储有计算机程序,当计算机程序被处理器执行时实现前述实施例提供的方法。The present invention also provides a readable storage medium storing a computer program, which implements the method provided in the above embodiment when the computer program is executed by a processor.

本发明实施例中可读存储介质作为计算机可读存储介质,可以设置于前述处理设备中,例如,作为处理设备中的存储器。此外,所述可读存储介质也可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。In the embodiment of the present invention, the readable storage medium as a computer-readable storage medium can be set in the aforementioned processing device, for example, as a memory in the processing device. In addition, the readable storage medium can also be a U disk, a mobile hard disk, a read-only memory (ROM), a disk or an optical disk, etc., which can store program codes.

以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明披露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求书的保护范围为准。The above is only a preferred specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any changes or substitutions that can be easily thought of by a person skilled in the art within the technical scope disclosed in the present invention should be included in the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims (8)

1. A semantic priori coding and decoding method based on semantic structured coding is characterized by comprising the following steps:
At the encoding end, the input image passes through a semantic analysis module to obtain position information and semantic category labels corresponding to semantic objects in the compression characteristics; dividing an input image or compression characteristics corresponding to the input image into a plurality of parts only containing the semantic object on a space level based on the position information of the semantic object; the segmented images or compression features are respectively input into a subsequent coding module to obtain a structured code stream; the encoding end maintains a semantic class label pool, determines the index value of the semantic class label according to the semantic class label obtained by the semantic analysis module, and fills the index value of the semantic class label into a designated position in the structured code stream;
At the decoding end, a decoder pool formed by decoders corresponding to tags in the semantic tag pool one by one is maintained, and the corresponding decoder is selected according to the index value of the semantic category tag in the structured code stream, so that the related code stream in the structured code stream is decoded;
The method for maintaining a decoder pool formed by decoders corresponding to labels in the semantic label pool one by the encoding end comprises the following steps: the semantic category label pool maintained by the encoding end stores all semantic category labels and index values corresponding to the labels; the decoder pool maintained by the decoding end stores an optimal decoder and a universal decoder of semantic categories, wherein the optimal decoder and the universal decoder of the semantic categories are in one-to-one correspondence with index values in the semantic category label pool, the universal decoder can be suitable for each semantic category, and the optimal decoder of each semantic category has optimal decoding performance when compression characteristics of the corresponding semantic category are used as input;
The decoding end stores model parameters of the general decoder and residual values obtained by subtracting the model parameters of the optimal decoder of each semantic category from the general decoder.
2. The semantic prior encoding and decoding method based on semantic structured coding according to claim 1, wherein the location information of the semantic object comprises: a bounding box of a semantic object or a semantic segmentation map.
3. The semantic prior encoding and decoding method based on semantic structured coding according to claim 1, wherein the inputting of the segmented image or the compressed feature to the subsequent encoding module, respectively, comprises:
For the segmented image, the subsequent encoding module includes: an encoder, a quantizer, and an entropy encoder; the segmented image sequentially passes through an encoder, a quantizer and an entropy encoder to obtain a structured code stream;
for the segmented compression feature, the subsequent encoding module includes: a quantizer and entropy encoder; and the segmented compression characteristics sequentially pass through a quantizer and an entropy coder to obtain a structured code stream.
4. The semantic prior encoding and decoding method based on semantic structured coding according to claim 1, further comprising:
When a new semantic category which is not contained in the semantic category label pool appears in the application scene, designing an optimal decoder aiming at the new semantic category, updating the decoder pool by utilizing the index value of the new semantic category and the corresponding optimal decoder, and synchronously updating the semantic category of the semantic category label pool and the corresponding index value.
5. The semantic prior encoding and decoding method based on semantic structured coding according to claim 1, wherein the syntax structure of the index value of the semantic category label comprises:
Whether the identification of the structured coding is supported, the total number of semantic categories contained in the input image, the number of categories of a certain semantic category contained in the input image, and the index value corresponding to each semantic category.
6. A semantic prior codec system based on semantic structured coding, characterized in that it is implemented based on the method of any one of claims 1 to 5, the system comprising:
The coding network is applied to the coding end; at the encoding end, the input image passes through a semantic analysis module to obtain position information and semantic category labels corresponding to semantic objects in the compression characteristics; dividing an input image or compression characteristics corresponding to the input image into a plurality of parts only containing the semantic object on a space level based on the position information of the semantic object; the segmented images or compression features are respectively input into a subsequent coding module to obtain a structured code stream; the encoding end maintains a semantic class label pool, determines the index value of the semantic class label according to the semantic class label obtained by the semantic analysis module, and fills the index value of the semantic class label into a designated position in the structured code stream;
The decoding network is applied to the decoding end; at the decoding end, a decoder pool formed by decoders corresponding to the tags in the semantic tag pool one by one is maintained, and the corresponding decoders are selected according to the index values of the semantic category tags in the structured code stream, so that the related code stream in the structured code stream is decoded.
7. A processing apparatus, comprising: one or more processors; a memory for storing one or more programs;
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.
8. A readable storage medium storing a computer program, characterized in that the method according to any one of claims 1-5 is implemented when the computer program is executed by a processor.
CN202210925551.3A 2022-08-03 2022-08-03 Semantic priori coding and decoding method and system based on semantic structured coding Active CN115297327B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210925551.3A CN115297327B (en) 2022-08-03 2022-08-03 Semantic priori coding and decoding method and system based on semantic structured coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210925551.3A CN115297327B (en) 2022-08-03 2022-08-03 Semantic priori coding and decoding method and system based on semantic structured coding

Publications (2)

Publication Number Publication Date
CN115297327A CN115297327A (en) 2022-11-04
CN115297327B true CN115297327B (en) 2024-10-29

Family

ID=83826578

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210925551.3A Active CN115297327B (en) 2022-08-03 2022-08-03 Semantic priori coding and decoding method and system based on semantic structured coding

Country Status (1)

Country Link
CN (1) CN115297327B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116708843B (en) * 2023-08-03 2023-10-31 清华大学 User experience quality feedback regulation system in semantic communication process
CN120410859B (en) * 2025-07-02 2025-09-16 深圳北理莫斯科大学 Image reconstruction method, system, terminal and readable storage medium for super-resolution of magnetic resonance image

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102724485A (en) * 2012-06-26 2012-10-10 公安部第三研究所 Device and method for performing structuralized description for input audios by aid of dual-core processor
CN110225341A (en) * 2019-06-03 2019-09-10 中国科学技术大学 A kind of code flow structure image encoding method of task-driven

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10977530B2 (en) * 2019-01-03 2021-04-13 Beijing Jingdong Shangke Information Technology Co., Ltd. ThunderNet: a turbo unified network for real-time semantic segmentation
FR3103342B1 (en) * 2019-11-19 2022-07-08 Thales Sa METHOD AND DEVICE FOR COMPRESSION OF DIGITAL IMAGES AND METHOD AND DEVICE FOR DECOMPRESSING ASSOCIATED
CN112866715B (en) * 2021-01-06 2022-05-13 中国科学技术大学 Universal video compression coding system supporting man-machine hybrid intelligence

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102724485A (en) * 2012-06-26 2012-10-10 公安部第三研究所 Device and method for performing structuralized description for input audios by aid of dual-core processor
CN110225341A (en) * 2019-06-03 2019-09-10 中国科学技术大学 A kind of code flow structure image encoding method of task-driven

Also Published As

Publication number Publication date
CN115297327A (en) 2022-11-04

Similar Documents

Publication Publication Date Title
CN117094419B (en) Multi-modal content output-oriented large language model training method, device and medium
US10380996B2 (en) Method and apparatus for correcting speech recognition result, device and computer-readable storage medium
CN108882020B (en) Video information processing method, device and system
US11062210B2 (en) Method and apparatus for training a neural network used for denoising
JP2022525897A (en) Methods and equipment for compression / decompression of neural network models
CN115297327B (en) Semantic priori coding and decoding method and system based on semantic structured coding
CN109522403A (en) A kind of summary texts generation method based on fusion coding
JP5671320B2 (en) Information processing apparatus, control method therefor, and program
CN112149642B (en) A text image recognition method and device
CN114399646B (en) Image description method and device based on transform structure
CN115454554A (en) Text description generation method, device, terminal and storage medium
CN117765132A (en) Image generation method, device, equipment and storage medium
CN114567331B (en) LZ 77-based compression method, device and medium thereof
CN116168108A (en) Method and device for generating image from text, storage medium and electronic device
CN118590711A (en) Video editing method, computer device, storage medium and computer program product
CN112819848A (en) Matting method, matting device and electronic equipment
CN116645455A (en) Method and device for generating avatar video, electronic equipment and storage medium
CN116701602A (en) A dialogue reply generation method, device, electronic equipment and storage medium
CN116489391A (en) Image vector quantization coding, text-image model training and use method and device
CN115604476A (en) Variable-code-rate semantic structured image encoding and decoding method and system
US20240236342A1 (en) Systems and methods for scalable video coding for machines
CN111654706A (en) Video compression method, device, equipment and medium
CN115914631A (en) Encoding and decoding method and system with controllable entropy decoding complexity
CN114330239B (en) Text processing method and device, storage medium and electronic equipment
CN117579889A (en) Image generation method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant