[go: up one dir, main page]

CN115695810A - Low bit rate image compression coding method based on semantic communication - Google Patents

Low bit rate image compression coding method based on semantic communication Download PDF

Info

Publication number
CN115695810A
CN115695810A CN202211292779.XA CN202211292779A CN115695810A CN 115695810 A CN115695810 A CN 115695810A CN 202211292779 A CN202211292779 A CN 202211292779A CN 115695810 A CN115695810 A CN 115695810A
Authority
CN
China
Prior art keywords
data
network
generator
image
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211292779.XA
Other languages
Chinese (zh)
Inventor
何晨光
黄声显
陈舒怡
陈晧
马宇川
李晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology Shenzhen
Original Assignee
Harbin Institute of Technology Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology Shenzhen filed Critical Harbin Institute of Technology Shenzhen
Priority to CN202211292779.XA priority Critical patent/CN115695810A/en
Publication of CN115695810A publication Critical patent/CN115695810A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本发明提出一种基于语义通信的低比特率图像压缩编码方法。本发明结合语义编码技术与LDPC信道编码技术,并考虑了硬件设备的弱计算能力,一方面实现了高于传统图像编码技术的压缩比率和清晰度,另一方面保证了数据在实际无线信道中的可靠传输,使性能受限的传感器网络得以满足大数据量业务可靠传输的需求。且得益于神经网络模型的泛化能力,本发明使图像解码过程摆脱编码格式的限制,在无法重传或是比特差错超出编码技术纠错范围等情况下仍能保证图片被正常恢复且满足一定清晰度。

Figure 202211292779

The invention proposes a low-bit-rate image compression coding method based on semantic communication. The present invention combines semantic coding technology and LDPC channel coding technology, and considers the weak computing power of hardware equipment. On the one hand, it realizes a compression ratio and definition higher than that of traditional image coding technology, and on the other hand, it ensures that the data in the actual wireless channel Reliable transmission enables the performance-limited sensor network to meet the needs of reliable transmission of large data volume services. And thanks to the generalization ability of the neural network model, the invention frees the image decoding process from the limitation of the encoding format, and can still ensure that the picture is restored normally and meets the requirements of Certain clarity.

Figure 202211292779

Description

一种基于语义通信的低比特率图像压缩编码方法A Low Bit Rate Image Compression Coding Method Based on Semantic Communication

技术领域technical field

本发明属于深度学习、无线通信技术领域,特别是涉及一种基于语义通信的低比特率图像压缩编码方法。The invention belongs to the technical fields of deep learning and wireless communication, and in particular relates to a low-bit-rate image compression coding method based on semantic communication.

背景技术Background technique

近年的山火给自然界带来的沉痛的灾难,传统基于卫星和无人机的探测技术效率低且成本高昂,随着通信技术与物联网(InternetofThings,IoT)技术的发展,人们开始利用大规模无线传感器网络(WirelessSensorNetwork,WSN)进行实时监测,然而偏远地区中的传感器网络性能受限,难以承载大数据量业务的传输,尤其是图片等可以提供现场环境变化的有效信息。此外,经过数十年的发展,基于熵的图像压缩编码方法已经逐渐逼近香农极限,而借助深度学习的语义通信技术将基于图像语义进行压缩编码,实现对香农极限的超越,进而使边缘网络传输大量图片成为可能。In recent years, wildfires have brought severe disasters to the natural world. Traditional satellite and UAV-based detection technologies are inefficient and costly. With the development of communication technology and Internet of Things (IoT) technology, people have begun to use large-scale Wireless sensor network (WirelessSensorNetwork, WSN) for real-time monitoring, however, the performance of sensor networks in remote areas is limited, it is difficult to carry the transmission of large data volume services, especially pictures and other effective information that can provide on-site environmental changes. In addition, after decades of development, the entropy-based image compression coding method has gradually approached the Shannon limit, and the semantic communication technology with the help of deep learning will compress and code based on image semantics to achieve beyond the Shannon limit, and then make the edge network transmission Lots of pictures possible.

目前关于语义通信相关研究大多集中在文本等轻量数据集上,且均在仿真软件针对单一信道实现,对于图片语义相关共享知识库的建立与更新,端到端传输对物理无线信道时变、频选特征的自适应等相关研究较少,即尚未有图像语义编码并在实际无线环境中传输的成功案例。At present, most of the research on semantic communication is focused on lightweight data sets such as text, and they are all implemented in simulation software for a single channel. For the establishment and update of shared knowledge bases related to image semantics, end-to-end transmission affects the time-varying physical wireless channels, There are few related studies on the adaptation of frequency-selective features, that is, there is no successful case of image semantic encoding and transmission in an actual wireless environment.

发明内容Contents of the invention

本发明目的是为了解决目前传统图像压缩编码技术压缩比率低,清晰度低,以及语义通信技术在实际无线环境中难以直接应用的问题,提出了一种基于语义通信的低比特率图像压缩编码方法。The purpose of the present invention is to solve the problems of low compression rate and low definition of traditional image compression coding technology, and the difficulty of direct application of semantic communication technology in actual wireless environment, and propose a low bit rate image compression coding method based on semantic communication .

本发明是通过以下技术方案实现的,本发明提出一种边缘设备基于语义通信的低比特率图像压缩编码方法,所述方法包括:The present invention is achieved through the following technical solutions. The present invention proposes a semantic communication-based low-bit-rate image compression coding method for edge devices, the method comprising:

步骤一、进行网络训练前准备,设计编解码过程中所需的编码器、量化器、生成器、鉴别器,以及网络训练过程的优化函数及优化器,然后对各自的网络参数进行初始化;对所提供的数据集预处理,将其打乱、分批和归一化;Step 1. Prepare before network training, design the encoder, quantizer, generator, discriminator required in the encoding and decoding process, and the optimization function and optimizer of the network training process, and then initialize the respective network parameters; preprocessing of the provided dataset, shuffling, batching and normalizing it;

步骤二、将分批后的数据输入到步骤一中搭建的训练网络,依次经过编码器,量化器和生成器,然后将原图与生成器输出的重建图像一同输入到鉴别器中,计算相应损失函数值,并利用随机梯度下降法与反向传播更新网络参数;Step 2. Input the batched data into the training network built in step 1, pass through the encoder, quantizer and generator in turn, and then input the original image and the reconstructed image output by the generator into the discriminator to calculate the corresponding Loss function value, and use stochastic gradient descent method and backpropagation to update network parameters;

步骤三、判断损失函数值是否收敛至预设值,是则提前终止训练并保存相应模型参数,供边缘设备离线工作,否则重复步骤二;Step 3. Determine whether the loss function value converges to the preset value. If so, terminate the training in advance and save the corresponding model parameters for the edge device to work offline. Otherwise, repeat step 2;

步骤四、边缘设备启动,自动加载步骤三中保存的预训练编码器和量化器,将捕获到的图片依次经过编码器,量化器和低密度奇偶校验码LDPC编码器,并选取合适的调制模式进行调制,发送数据至无线环境中;Step 4: The edge device starts, automatically loads the pre-trained encoder and quantizer saved in step 3, passes the captured pictures through the encoder, quantizer and LDPC encoder in sequence, and selects the appropriate modulation mode to modulate and send data to the wireless environment;

步骤五、接收端加载步骤三中保存的预训练生成器,对接收到的信号依次进行解调制、LDPC解码、生成重建图像。Step 5: The receiving end loads the pre-training generator saved in step 3, and sequentially demodulates the received signal, performs LDPC decoding, and generates a reconstructed image.

进一步地,在步骤一所述的编码器和生成器网络结构中:Further, in the encoder and generator network structure described in step 1:

input表示输入层;reflectionpadding()指反射填充层,括号中为填充尺寸;h×w×cconv,strides指卷积核尺寸为h×w,通道数为c,步长为k的卷积层,后接实例归一化层和ReLU激活层;编码器结构中最后一层卷积层中的通道数C指瓶颈层,用于控制压缩比率;Residual bloock指残差网络块,其中BatchNorm指批归一化层。input indicates the input layer; reflectionpadding() refers to the reflection padding layer, and the padding size is in parentheses; h×w×cconv,strides refers to the convolutional layer with the convolution kernel size h×w, the number of channels is c, and the step size is k. It is followed by an instance normalization layer and a ReLU activation layer; the number of channels in the last convolutional layer in the encoder structure C refers to the bottleneck layer, which is used to control the compression ratio; Residual bloock refers to the residual network block, where BatchNorm refers to batch return One chemical layer.

进一步地,所述优化函数为:Further, the optimization function is:

Figure BDA0003901784200000021
Figure BDA0003901784200000021

其中D(·)和G(·)分别表示鉴别器网络与生成器网络所对应的函数,

Figure BDA0003901784200000022
表示寻找能使损失函数取极小值时对应的编码器E和生成器G以及能使损失函数取极大值的鉴别器D过程中所使用的极小化极大算法;f(·)和g(·)表示衡量样本真实程度的辅助函数,d(·)表示原图与生成图片的失真函数,H(·)表示熵编码算法,即量化后数据表示所需要的比特开销,
Figure BDA0003901784200000023
表示取期望值,λ和β表示失真函数项和熵编码项的权重,x和
Figure BDA0003901784200000024
表示原图与生成图,z表示接收到的信号样本,y表示量化后数据。where D( ) and G( ) represent the functions corresponding to the discriminator network and the generator network, respectively,
Figure BDA0003901784200000022
Indicates the minimax algorithm used in the process of finding the corresponding encoder E and generator G that can make the loss function take the minimum value and the discriminator D that can make the loss function take the maximum value; f( ) and g(·) represents the auxiliary function to measure the realness of the sample, d(·) represents the distortion function of the original image and the generated image, H(·) represents the entropy coding algorithm, that is, the bit overhead required for quantized data representation,
Figure BDA0003901784200000023
Represents the expected value, λ and β represent the weight of the distortion function item and the entropy coding item, x and
Figure BDA0003901784200000024
Represents the original image and the generated image, z represents the received signal sample, and y represents the quantized data.

进一步地,所述步骤二中量化器实现过程,包括以下步骤:Further, the implementation process of the quantizer in the step 2 includes the following steps:

第一部分为前向推导过程:The first part is the forward derivation process:

Figure BDA0003901784200000025
Figure BDA0003901784200000025

其中,zi表示第i个数据流中的第i个数据样本,cj表示量化集中的元素,满足

Figure BDA0003901784200000026
L表示量化集的长度;Among them, z i represents the i-th data sample in the i-th data stream, and c j represents the elements in the quantized set, satisfying
Figure BDA0003901784200000026
L represents the length of the quantization set;

第二部分为反向传播过程:The second part is the backpropagation process:

Figure BDA0003901784200000027
Figure BDA0003901784200000027

其中,exp(·)表示指数函数,σ表示softmax函数中的温度超参数。Among them, exp( ) represents the exponential function, and σ represents the temperature hyperparameter in the softmax function.

进一步地,对应鉴别器的损失函数为:Further, the loss function corresponding to the discriminator is:

Figure BDA0003901784200000031
Figure BDA0003901784200000031

其中,k表示鉴别器的个数,每个鉴别器结构相同且相互独立,对于第k个鉴别器,其输入为对原图像进行因子为2k-1的下采样操作的图像对,每一次下采样都能提供图像对的全局特征的高度抽象,来保证原图与生成图片之间从局部特征到全局特征的高保真度。Among them, k represents the number of discriminators. Each discriminator has the same structure and is independent of each other. For the kth discriminator, its input is an image pair that performs a downsampling operation on the original image with a factor of 2 k-1 . Each time Downsampling can provide a high degree of abstraction of the global features of the image pair to ensure high fidelity from local features to global features between the original image and the generated image.

进一步地,对应生成器的损失函数为:Further, the loss function corresponding to the generator is:

Figure BDA0003901784200000032
Figure BDA0003901784200000032

进一步地,采用Adam优化算法对求得的损失函数值的梯度值反向传播,分别更新鉴别器网络和生成器网络中的可训练参数,其中Adam算法具体过程如下:Further, the Adam optimization algorithm is used to backpropagate the gradient value of the obtained loss function value, and the trainable parameters in the discriminator network and the generator network are updated respectively. The specific process of the Adam algorithm is as follows:

vk=β1vk-1+(1-β1)gk v k =β 1 v k-1 +(1-β 1 )g k

Figure BDA0003901784200000033
Figure BDA0003901784200000033

Figure BDA0003901784200000034
Figure BDA0003901784200000034

Figure BDA0003901784200000035
Figure BDA0003901784200000035

Figure BDA0003901784200000036
Figure BDA0003901784200000036

θk=θk-1-Δgk θ k =θ k-1 -Δg k

其中,gk表示第k批数据的随机梯度,vk表示第k批数据的梯度所对应的动量变量,sk表示第k批数据的梯度平方的累加变量,

Figure BDA0003901784200000037
Figure BDA0003901784200000038
表示偏差修正后的动量变量与累加变量,常数β1和β2分别为梯度指数加权移动平均的超参数和梯度平方指数加权移动平均的超参数,η为优化器的学习率,常数ε表示防止分母为0所添加的一个极小值,取值为10-8。Among them, g k represents the stochastic gradient of the k-th batch of data, v k represents the momentum variable corresponding to the gradient of the k-th batch of data, s k represents the cumulative variable of the gradient square of the k-th batch of data,
Figure BDA0003901784200000037
and
Figure BDA0003901784200000038
Indicates the momentum variable and accumulation variable after bias correction, the constants β 1 and β 2 are the hyperparameters of the gradient exponentially weighted moving average and the hyperparameter of the gradient squared exponentially weighted moving average, respectively, η is the learning rate of the optimizer, and the constant ε represents the prevention of A minimum value added when the denominator is 0, the value is 10 -8 .

进一步地,所述选取合适的调制模式进行调制,发送数据至无线环境中,具体公式为:Further, the selection of a suitable modulation mode for modulation, sending data to the wireless environment, the specific formula is:

sdata=modulate(LDPC(y;r))s data = modulate(LDPC(y;r))

其中,sdata表示发送的数据,modulate(·)表示调制过程,LDPC(·)表示LDPC码编码过程,r表示码率,y表示编码量化后的数据。Among them, s data represents the transmitted data, modulate(·) represents the modulation process, LDPC(·) represents the LDPC code encoding process, r represents the code rate, and y represents the encoded and quantized data.

进一步地,所述步骤五具体过程公式为:Further, the specific process formula of said step five is:

Figure BDA0003901784200000039
Figure BDA0003901784200000039

其中,demodulate(·)表示解调过程,LDPC-1(·)表示LDPC解码过程,rdata表示经过无线信道后的数据,

Figure BDA00039017842000000310
表示生成器的输入;Among them, demodulate(·) represents the demodulation process, LDPC -1 (·) represents the LDPC decoding process, r data represents the data after passing through the wireless channel,
Figure BDA00039017842000000310
represents the input to the generator;

其中发送数据与接收数据满足rdata=hsdata+n;The sending data and receiving data satisfy r data = hs data + n;

式中,h表示信道状态信息,n表示信道的加性噪声。In the formula, h represents the channel state information, and n represents the additive noise of the channel.

本发明的有益效果为:The beneficial effects of the present invention are:

本发明结合语义编码技术与LDPC信道编码技术,并考虑了硬件设备的弱计算能力,一方面实现了高于传统图像编码技术的压缩比率和清晰度,另一方面保证了数据在实际无线信道中的可靠传输,使性能受限的传感器网络得以满足大数据量业务可靠传输的需求。且得益于神经网络模型的泛化能力,本发明使图像解码过程摆脱编码格式的限制,在无法重传或是比特差错超出编码技术纠错范围等情况下仍能保证图片被正常恢复且满足一定清晰度。The present invention combines semantic coding technology and LDPC channel coding technology, and considers the weak computing power of hardware equipment. On the one hand, it realizes a compression ratio and definition higher than that of traditional image coding technology, and on the other hand, it ensures that the data in the actual wireless channel Reliable transmission enables the performance-limited sensor network to meet the needs of reliable transmission of large data volume services. And thanks to the generalization ability of the neural network model, the invention frees the image decoding process from the limitation of the encoding format, and can still ensure that the picture is restored normally and meets the requirements of Certain clarity.

附图说明Description of drawings

图1为图像语义压缩编码方法的系统框架图。Fig. 1 is a system frame diagram of image semantic compression coding method.

图2为编码器与生成器的网络结构图。Figure 2 is a network structure diagram of the encoder and generator.

图3为神经网络模型训练的算法流程图。Fig. 3 is the algorithm flow chart of neural network model training.

图4为语义压缩编码方法与传统图像编码方法BPG的效果对比图。Figure 4 is a comparison diagram of the effect of the semantic compression coding method and the traditional image coding method BPG.

图5为语义压缩编码方法与传统图像编码方法BPG的不同信道传输效果对比图。Fig. 5 is a comparison diagram of different channel transmission effects between the semantic compression coding method and the traditional image coding method BPG.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

结合图1-5,本发明提出一种边缘设备基于语义通信的低比特率图像压缩编码方法,包括以下步骤:In combination with Figures 1-5, the present invention proposes a semantic communication-based low-bit-rate image compression coding method for edge devices, including the following steps:

步骤一、进行网络训练前准备。设计编解码过程中所需的编码器、量化器、生成器、鉴别器,以及网络训练过程的优化函数及优化器,然后对各自的网络参数进行初始化;对所提供的数据集预处理,将其打乱、分批、归一化。Step 1: Prepare for network training. Design the encoder, quantizer, generator, discriminator required in the encoding and decoding process, as well as the optimization function and optimizer of the network training process, and then initialize the respective network parameters; preprocess the provided data set, the It scrambles, batches, and normalizes.

所述编码器与生成器的网络结构如图2所示,其中input表示输入层;reflectionpadding()指反射填充层,括号中为填充尺寸;h×w×cconv,strides指卷积核尺寸为h×w,通道数为c,步长为k的卷积层,后接实例归一化层和ReLU激活层;编码器结构中最后一层卷积层中的通道数C指瓶颈层,用于控制压缩比率;Residualbloock指残差网络块,具体结构如图2中第3幅图所示,其中BatchNorm指批归一化层。The network structure of the encoder and generator is shown in Figure 2, where input represents the input layer; reflectionpadding() refers to the reflection padding layer, and the padding size is in parentheses; h×w×cconv, strides refers to the convolution kernel size is h × w, the number of channels is c, the convolutional layer with a step size of k, followed by the instance normalization layer and the ReLU activation layer; the number of channels C in the last convolutional layer in the encoder structure refers to the bottleneck layer, which is used for Control the compression ratio; Residualbloock refers to the residual network block, and the specific structure is shown in the third picture in Figure 2, where BatchNorm refers to the batch normalization layer.

所述优化函数如下式:The optimization function is as follows:

Figure BDA0003901784200000051
Figure BDA0003901784200000051

其中D(·)和G(·)分别表示鉴别器网络与生成器网络所对应的函数,

Figure BDA0003901784200000052
表示寻找能使损失函数取极小值时对应的编码器E和生成器G以及能使损失函数取极大值的鉴别器D过程中所使用的极小化极大算法。f(·)和g(·)表示衡量样本真实程度的辅助函数。d(·)表示原图与生成图片的失真函数,H(·)表示熵编码算法,即表示量化后数据表示所需要的比特开销,
Figure BDA0003901784200000053
表示取期望值,λ和β表示失真函数项和熵编码项的权重,x和
Figure BDA0003901784200000054
表示原图与生成图,z表示接收到的信号样本,y表示量化后数据。where D( ) and G( ) represent the functions corresponding to the discriminator network and the generator network, respectively,
Figure BDA0003901784200000052
Indicates the minimax algorithm used in the process of finding the corresponding encoder E and generator G that can make the loss function take the minimum value, and the discriminator D that can make the loss function take the maximum value. f( ) and g( ) represent auxiliary functions to measure the realness of the sample. d(·) represents the distortion function of the original image and the generated image, H(·) represents the entropy coding algorithm, that is, the bit overhead required for quantized data representation,
Figure BDA0003901784200000053
Represents the expected value, λ and β represent the weight of the distortion function item and the entropy coding item, x and
Figure BDA0003901784200000054
Represents the original image and the generated image, z represents the received signal sample, and y represents the quantized data.

所述优化器采用Adam算法优化器。优化器选取并不唯一,可根据需求进行选取。The optimizer adopts Adam algorithm optimizer. The optimizer selection is not unique and can be selected according to requirements.

本步骤中,主要对对抗生成网络训练过程所需要的相关物件进行初始化,具体的网络结构可根据不同硬件设备的性能进行搭建,对于优化函数中的失真函数项与熵编码项亦可根据需求进行选取,优化器同理,选取的关键依据为确保模型能够稳定训练并收敛,最后达到理想的效果。此外,数据集的选取可根据不同的应用场景进行选取以获得更好的训练效果,分批大小与归一化过程可根据训练所用配置进行取舍。In this step, the relevant objects required for the training process of the confrontation generative network are mainly initialized. The specific network structure can be built according to the performance of different hardware devices, and the distortion function item and entropy encoding item in the optimization function can also be customized according to requirements. The selection is the same for the optimizer. The key basis for selection is to ensure that the model can be trained stably and converges, and finally achieve the desired effect. In addition, the selection of data sets can be selected according to different application scenarios to obtain better training results, and the batch size and normalization process can be selected according to the configuration used for training.

步骤二、将分批后的数据输入到步骤一中搭建的训练网络,依次经过编码器,量化器,生成器,然后将原图与生成器输出的重建图像一同输入到鉴别器中,计算相应损失函数值,并利用随机梯度下降法与反向传播更新网络参数,包括以下步骤:Step 2. Input the batched data into the training network built in step 1, pass through the encoder, quantizer, and generator in turn, and then input the original image and the reconstructed image output by the generator into the discriminator to calculate the corresponding Loss function value, and use stochastic gradient descent method and backpropagation to update network parameters, including the following steps:

步骤二一、将当前分批数据输入到编码器、量化器中,如下式表示:Step 21. Input the current batch data into the encoder and the quantizer, as shown in the following formula:

y=q(E(x;θ))(2)y=q(E(x;θ))(2)

式中,E表示编码器网络,θ表示编码器网络参数,q表示量化过程。where E represents the encoder network, θ represents the parameters of the encoder network, and q represents the quantization process.

其中,为了保证梯度链路在反向传播过程中,尤其是经过量化器后不发生梯度中断的现象,将量化器的实现逻辑分为两部分:Among them, in order to ensure that the gradient link does not experience gradient interruption during the backpropagation process, especially after passing through the quantizer, the implementation logic of the quantizer is divided into two parts:

第一部分为前向推导过程:The first part is the forward derivation process:

Figure BDA0003901784200000055
Figure BDA0003901784200000055

其中,zi表示第i个数据流中的第i个数据样本,cj表示量化集中的元素,满足

Figure BDA0003901784200000056
L表示量化集的长度。Among them, z i represents the i-th data sample in the i-th data stream, and c j represents the elements in the quantized set, satisfying
Figure BDA0003901784200000056
L represents the length of the quantization set.

第二部分为反向传播过程:The second part is the backpropagation process:

Figure BDA0003901784200000061
Figure BDA0003901784200000061

其中,exp(·)表示指数函数,σ表示softmax函数中的温度超参数。Among them, exp( ) represents the exponential function, and σ represents the temperature hyperparameter in the softmax function.

步骤二二、将步骤二一中量化后的数据作为生成器的输入,进行从语义到图像的重建,具体为:Step 22: Use the quantified data in step 21 as the input of the generator to perform reconstruction from semantics to images, specifically:

Figure BDA0003901784200000062
Figure BDA0003901784200000062

其中,G(·)表示生成器网络,

Figure BDA0003901784200000063
表示生成器网络中的可训练参数,
Figure BDA0003901784200000064
表示步骤二一中量化后的数据,
Figure BDA0003901784200000065
表示生成图像。Among them, G( ) represents the generator network,
Figure BDA0003901784200000063
Denotes the trainable parameters in the generator network,
Figure BDA0003901784200000064
Represents the quantified data in step 21,
Figure BDA0003901784200000065
Indicates the generated image.

步骤二三、将步骤一中数据集的原图与步骤二二中得到的生成图片进行两两组合,将图像对一同作为鉴别器的输入,求对应的鉴别器损失函数值:Step two and three, combine the original image of the data set in step one with the generated image obtained in step two and two, and use the image pair together as the input of the discriminator, and find the corresponding discriminator loss function value:

Figure BDA0003901784200000066
Figure BDA0003901784200000066

其中,k表示多尺度鉴别器的个数,每个鉴别器结构相同且相互独立,对于第k个鉴别器,其输入为对原图像进行因子为2k-1的下采样操作的图像对,每一次下采样都能提供图像对的全局特征的高度抽象,来保证原图与生成图片之间从局部特征到全局特征的高保真度。Among them, k represents the number of multi-scale discriminators, and each discriminator has the same structure and is independent of each other. For the kth discriminator, its input is an image pair that performs a downsampling operation on the original image with a factor of 2 k-1 , Each downsampling can provide a high degree of abstraction of the global features of the image pair to ensure high fidelity from local features to global features between the original image and the generated image.

而对应的生成器损失函数值为:The corresponding generator loss function value is:

Figure BDA0003901784200000067
Figure BDA0003901784200000067

步骤二四、采用Adam优化算法对步骤二三中求得的损失函数值的梯度值反向传播,分别更新鉴别器网络和生成器网络中的可训练参数,其中Adam算法具体过程如下:Step two and four, using the Adam optimization algorithm to backpropagate the gradient value of the loss function value obtained in step two and three, respectively updating the trainable parameters in the discriminator network and the generator network, wherein the specific process of the Adam algorithm is as follows:

Figure BDA0003901784200000068
Figure BDA0003901784200000068

其中,gk表示第k批数据的随机梯度,vk表示第k批数据的梯度所对应的动量变量,sk表示第k批数据的梯度平方的累加变量,

Figure BDA0003901784200000069
Figure BDA00039017842000000610
表示偏差修正后的动量变量与累加变量,常数β1和β2分别为梯度指数加权移动平均的超参数和梯度平方指数加权移动平均的超参数,η为优化器的学习率,常数ε表示防止分母为0所添加的一个极小值,通常为10-8。Among them, g k represents the stochastic gradient of the k-th batch of data, v k represents the momentum variable corresponding to the gradient of the k-th batch of data, s k represents the cumulative variable of the gradient square of the k-th batch of data,
Figure BDA0003901784200000069
and
Figure BDA00039017842000000610
Indicates the momentum variable and accumulation variable after bias correction, the constants β 1 and β 2 are the hyperparameters of the gradient exponentially weighted moving average and the hyperparameter of the gradient squared exponentially weighted moving average, respectively, η is the learning rate of the optimizer, and the constant ε represents the prevention of A minimum value added by a denominator of 0, usually 10 -8 .

本步骤中,本步骤主要进行系统中对抗生成网络模型的具体训练,对于在步骤一中所搭建的网络结构,优化器及学习率、预处理后的数据集等输入到对抗生成网络框架中,并在优化函数的指导下对编码器、生成器和鉴别器进行交替训练。In this step, this step mainly carries out the specific training of the confrontation generation network model in the system. For the network structure built in step 1, the optimizer and learning rate, the preprocessed data set, etc. are input into the confrontation generation network framework, And the encoder, generator and discriminator are alternately trained under the guidance of the optimization function.

步骤三、判断损失函数值是否收敛至预设值,是则提前终止训练并保存相应模型参数,否则重复步骤二。Step 3. Determine whether the loss function value converges to the preset value. If yes, terminate the training in advance and save the corresponding model parameters. Otherwise, repeat step 2.

本步骤中,根据步骤二所得到的损失值进行判断,确定其是否收敛,对于不同的网络参数和数据集输入,即使是同一个网络结构也可能收敛至不同值,因此优化函数收敛只是神经网络学习完成的一个特征,得到的模型及其效果需要根据一些额外的测试来判断,如模型在测试数据集和验证数据集中实际生成图片的质量是否也达到预期,进而是否需要提前终止训练并保存网络参数。In this step, judge according to the loss value obtained in step 2 to determine whether it is converged. For different network parameters and data set inputs, even the same network structure may converge to different values, so the convergence of the optimization function is only the neural network After learning a feature, the obtained model and its effects need to be judged based on some additional tests, such as whether the quality of the pictures actually generated by the model in the test data set and verification data set is also up to expectations, and whether it is necessary to terminate the training early and save the network parameter.

步骤四、边缘设备启动,自动加载步骤三中保存的预训练编码器和量化器,将捕获到的图片依次经过编码器,量化器,低密度奇偶校验码(LowDensityParityCheckCode,LDPC)编码器,并选取合适的调制模式进行调制,发送数据至无线环境中,具体过程为:Step 4, the edge device starts, automatically loads the pre-trained encoder and quantizer saved in step 3, passes the captured pictures through the encoder, quantizer, and Low Density Parity Check Code (LowDensityParityCheckCode, LDPC) encoder in sequence, and Select an appropriate modulation mode for modulation and send data to the wireless environment. The specific process is:

sdata=modulate(LDPC(y;r))(9)s data = modulate(LDPC(y;r))(9)

其中,sdata表示发送的数据,modulate(·)表示调制过程,LDPC(·)表示LDPC码编码过程,r表示码率,y表示编码量化后的数据,由步骤二中已训练的编码器和量化器输出得到,即公式(2)。Among them, s data represents the sent data, modulate(·) represents the modulation process, LDPC(·) represents the LDPC code encoding process, r represents the code rate, and y represents the encoded and quantized data. The trained encoder and The quantizer output is obtained, that is, formula (2).

本步骤中,本步骤边缘设备直接加载步骤二中已经训练完成的编码器和量化器,设备捕捉到的图片进行编码和量化,同时为了保证数据能够在实际无线环境中可靠传输,对其进行信道编码和调制,其中,调制方式可选取BPSK、16QAM等,LDPC编码码率的设置也可根据实际需求来进行,通常码率越高,传输效率越高,但可靠性越低。In this step, the edge device in this step directly loads the encoder and quantizer that have been trained in step 2. The pictures captured by the device are encoded and quantized. At the same time, in order to ensure that the data can be reliably transmitted in the actual wireless environment, channel Coding and modulation, where the modulation method can be BPSK, 16QAM, etc. The LDPC coding rate can also be set according to actual needs. Usually, the higher the code rate, the higher the transmission efficiency, but the lower the reliability.

步骤五、接收端加载步骤三中保存的预训练生成器,对接收到的信号依次进行解调制、LDPC解码、生成重建图像,具体过程为:Step 5. The receiving end loads the pre-training generator saved in step 3, demodulates the received signal, LDPC decodes, and generates a reconstructed image in sequence. The specific process is as follows:

Figure BDA0003901784200000071
Figure BDA0003901784200000071

其中,demodulate(·)表示解调过程,LDPC-1(·)表示LDPC解码过程,rdata表示经过无线信道后的数据,

Figure BDA0003901784200000072
表示生成器的输入。Among them, demodulate(·) represents the demodulation process, LDPC -1 (·) represents the LDPC decoding process, r data represents the data after passing through the wireless channel,
Figure BDA0003901784200000072
Represents the input to the generator.

其中发送数据与接收数据满足rdata=hsdata+n;The sending data and receiving data satisfy r data = hs data + n;

式中,h表示信道状态信息,n表示信道的加性噪声。In the formula, h represents the channel state information, and n represents the additive noise of the channel.

本步骤中,接收端设备先对接收信号进行解调和解码,来恢复图像的语义,供后续生成器使用,此处的信道噪声应为抽象的语义噪声,而非传统的物理噪声,前者导致信息发生语义误差,误差大小由接发双方的共享语义库所决定,实际指编码器网络与生成器网络的可训练参数对新样本的适应性,但仍能保证接收到的语义被正常恢复为对应的图片,而后者导致比特差错现象,当比特差错出现在格式头部时,会使整个数据包无法被识别、解码。In this step, the receiver device first demodulates and decodes the received signal to restore the semantics of the image for use by the subsequent generator. The channel noise here should be abstract semantic noise rather than traditional physical noise. The former causes Semantic errors occur in the information, and the size of the error is determined by the shared semantic library of the sender and receiver. It actually refers to the adaptability of the trainable parameters of the encoder network and the generator network to new samples, but it can still ensure that the received semantics are normally restored to The corresponding picture, and the latter causes a bit error phenomenon. When a bit error occurs in the format header, the entire data packet cannot be recognized and decoded.

实施例1:Example 1:

按照具体实施方式的方法进行图像语义压缩,如下:Carry out image semantic compression according to the method for specific embodiment, as follows:

(1)第一步,确定模型训练所需超参数,训练集选取FLAME数据集,瓶颈层C分别取4和8,量化集

Figure BDA0003901784200000081
长度L=5,优化器选取Adam算法,学习率为0.0002,数据集选取FLAME数据集,分批数量为1;优化函数中,对抗生成网络的损失函数采用最小二乘损失,即令f(x)=(x-1)2和g(x)=x2,失真函数选取均方误差函数,权重λ=10,熵编码部分暂不考虑,即令β=0;多尺度鉴别器个数为3,对原图对、下采样因子为2的图像对、下采样因子为4的图像对进行鉴别。(1) The first step is to determine the hyperparameters required for model training. The training set is selected from the FLAME data set, the bottleneck layer C is 4 and 8 respectively, and the quantization set
Figure BDA0003901784200000081
The length L=5, the optimizer selects the Adam algorithm, the learning rate is 0.0002, the data set selects the FLAME data set, and the number of batches is 1; in the optimization function, the loss function of the confrontation generation network adopts the least square loss, that is, f(x) =(x-1) 2 and g(x)=x 2 , the distortion function selects the mean square error function, the weight λ=10, the entropy coding part is not considered temporarily, that is, β=0; the number of multi-scale discriminators is 3, Identify the original image pair, the image pair with a downsampling factor of 2, and the image pair with a downsampling factor of 4.

(2)第二步,开始训练。待优化函数收敛后,保存模型及相关可训练参数,存储在边缘设备中。(2) The second step is to start training. After the optimization function converges, the model and related trainable parameters are saved and stored in the edge device.

(3)第三步,边缘设备加载模型,捕获图片,编码,量化,LDPC编码,码率为1/2,BPSK调制,信道模型分别选取AWGN信道和Rayleigh信道。(3) In the third step, the edge device loads the model, captures pictures, codes, quantizes, LDPC codes, the code rate is 1/2, BPSK modulation, and the channel model selects AWGN channel and Rayleigh channel respectively.

(4)第四步,接收端接收数据,进行BPSK解调,LDPC解码,生成图片。(4) In the fourth step, the receiving end receives the data, performs BPSK demodulation, LDPC decoding, and generates a picture.

从上述过程可以看到,本发明提出的语义编码方法并没有具体格式要求,因此比传统图像编码方法更具有普适性。It can be seen from the above process that the semantic coding method proposed by the present invention has no specific format requirements, so it is more universal than the traditional image coding method.

实施例2:Example 2:

按照具体实施中的方法进行仿真验证:Perform simulation verification according to the method in the specific implementation:

仿真的条件为:图片格式为RGB格式,分辨率为256×256px,通道数为3,每个通道上的像素最大取值范围为255,即图片的存储开销为24比特每像素。对比所用参考图像编码格式为BPG格式,图像质量衡量指标采用峰值信噪比(PeakSignaltoNoiseRatio,PSNR)和多尺度结构相似性(Multiscalestructuralsimilarity,MSSSIM)。The simulation conditions are: the image format is RGB format, the resolution is 256×256px, the number of channels is 3, and the maximum value range of pixels on each channel is 255, that is, the storage overhead of the image is 24 bits per pixel. The reference image coding format used for comparison is BPG format, and the image quality metrics use Peak Signal to Noise Ratio (PSNR) and Multiscale Structural Similarity (MSSSIM).

从图4可以看出:本发明在比特开销略小的情况,能够生成清晰度更高的图像。而生成相近清晰度的图片所需要的比特开销更小。It can be seen from FIG. 4 that the present invention can generate images with higher definition when the bit overhead is slightly smaller. And the bit overhead required to generate a picture of similar definition is smaller.

由图4可见,语义编码压缩方法在压缩比例与图像质量均优于BPG编码方法。It can be seen from Figure 4 that the semantic coding compression method is superior to the BPG coding method in terms of compression ratio and image quality.

从图5可以看出,在AWGN信道与Rayleigh信道下,语义编码方法在不同信噪比环境下均能提供高于BPG的图像质量。It can be seen from Figure 5 that under the AWGN channel and Rayleigh channel, the semantic coding method can provide higher image quality than BPG under different SNR environments.

由图5可见,语义编码方法对于不同无线信道环境的适应性更高。It can be seen from Figure 5 that the semantic coding method has higher adaptability to different wireless channel environments.

以上对本发明所提出的一种基于语义通信的低比特率图像压缩编码方法进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。A kind of low-bit-rate image compression coding method based on semantic communication proposed by the present invention has been introduced in detail above. In this paper, specific examples are used to illustrate the principle and implementation of the present invention. The description of the above embodiments is only for To help understand the method of the present invention and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in the specific implementation and scope of application. In summary, the content of this specification It should not be construed as a limitation of the invention.

Claims (9)

1. A low bit rate image compression coding method of edge equipment based on semantic communication is characterized in that: the method comprises the following steps:
firstly, preparing before network training, designing an encoder, a quantizer, a generator and a discriminator required in the encoding and decoding process, and an optimization function and an optimizer in the network training process, and then initializing respective network parameters; preprocessing the provided data set, scrambling, batching and normalizing the data set;
inputting the batched data into the training network established in the step one, sequentially passing through an encoder, a quantizer and a generator, then inputting the original image and the reconstructed image output by the generator into a discriminator together, calculating a corresponding loss function value, and updating network parameters by using a random gradient descent method and back propagation;
step three, judging whether the loss function value converges to a preset value, if so, terminating the training in advance and storing corresponding model parameters for the edge equipment to work offline, otherwise, repeating the step two;
starting edge equipment, automatically loading a pre-training encoder and a quantizer stored in the step three, enabling the captured picture to sequentially pass through the encoder, the quantizer and the low-density parity check code (LDPC) encoder, selecting a proper modulation mode for modulation, and sending data to a wireless environment;
and step five, loading the pre-training generator stored in the step three at the receiving end, and sequentially demodulating, decoding the LDPC and generating a reconstructed image for the received signal.
2. The method of claim 1, wherein in the encoder and generator network structure of step one:
input represents an input layer; reflection padding () refers to the reflective padding layer, with the padding size in parentheses; h multiplied by w multiplied by c conv, stride s refers to a convolution layer with convolution kernel size of h multiplied by w, channel number of c and step length of k, and is connected with an example normalization layer and a ReLU activation layer; the channel number C in the last convolutional layer in the encoder structure refers to the bottleneck layer and is used for controlling the compression ratio; residual block refers to a Residual network block, where Batch Norm refers to the Batch normalization layer.
3. The method of claim 1, wherein the optimization function is:
Figure FDA0003901784190000011
wherein D (-) and G (-) represent functions corresponding to the network of the discriminator and the network of the generator, respectively,
Figure FDA0003901784190000012
a minimization maximum algorithm used in the process of searching for the corresponding encoder E and generator G which can make the loss function take the minimum value and the discriminator D which can make the loss function take the maximum value is shown; f (-) and g (-) denote auxiliary functions for measuring the trueness of the sample, d (-)) Represents the distortion function of the original and the generated picture, H (-) represents the entropy coding algorithm, i.e. the bit overhead required for the quantized data representation,
Figure FDA0003901784190000013
representing the expectation, λ and β represent the weights of the distortion function term and the entropy coding term, x and
Figure FDA0003901784190000014
the original image is represented and the image is generated, z represents the received signal samples, and y represents the quantized data.
4. The method of claim 1, wherein the quantizer in step two implements a process comprising the steps of:
the first part is a forward derivation process:
Figure FDA0003901784190000021
wherein z is i Representing the ith data sample in the ith data stream, c j Representing elements in the quantization set, satisfy
Figure FDA0003901784190000022
L represents the length of the quantization set;
the second part is a backward propagation process:
Figure FDA0003901784190000023
where exp (·) represents an exponential function and σ represents a temperature hyperparameter in the softmax function.
5. The method of claim 1, wherein the penalty function for a corresponding discriminator is:
Figure FDA0003901784190000024
wherein k represents the number of discriminators, each discriminator has the same structure and is independent of each other, and for the kth discriminator, the input is to perform a factor of 2 on the original image k-1 Each downsampling operation can provide high abstraction of global features of the image pair, so that high fidelity from local features to global features between the original image and the generated image is guaranteed.
6. The method of claim 3, wherein the penalty function for a corresponding generator is:
Figure FDA0003901784190000025
7. the method of claim 1, wherein the trainable parameters in the discriminator network and the generator network are updated respectively by back-propagating the gradient values of the obtained loss function values using an Adam optimization algorithm, wherein the Adam optimization algorithm is specifically performed as follows:
v k =β 1 v k-1 +(1-β 1 )g k
Figure FDA0003901784190000026
Figure FDA0003901784190000027
Figure FDA0003901784190000028
Figure FDA0003901784190000029
θ k =θ k-1 -Δg k
wherein, g k Random gradient, v, representing the kth data k Representing the momentum variable, s, corresponding to the gradient of the kth data k An accumulated variable representing the square of the gradient of the kth batch of data,
Figure FDA0003901784190000031
and
Figure FDA0003901784190000032
representing the deviation-corrected momentum and accumulated variables, constant beta 1 And beta 2 The method comprises the steps of respectively weighting a hyperparameter of a gradient index weighted moving average and a hyperparameter of a gradient square index weighted moving average, wherein eta is a learning rate of an optimizer, a constant epsilon represents a minimum value added when a denominator is 0, and a value is 10 -8
8. The method of claim 1, wherein the selecting of the suitable modulation mode for modulation and sending of the data to the wireless environment is according to the following specific formula:
s data =modulate(LDPC(y;r))
wherein s is data Represents the transmitted data, modulated (-) represents the modulation process, LDPC (-) represents the LDPC code encoding process, r represents the code rate, and y represents the encoded and quantized data.
9. The method according to claim 1, wherein the step five specific process formula is:
Figure FDA0003901784190000033
wherein, demodulation (. Cndot.) denotes the demodulation process, LDPC -1 (. Cndot.) denotes an LDPC decoding process,r data representing the data after it has passed through the radio channel,
Figure FDA0003901784190000034
represents an input to a generator;
wherein the transmission data and the reception data satisfy r data =hs data +n;
In the formula, h represents channel state information, and n represents additive noise of a channel.
CN202211292779.XA 2022-10-21 2022-10-21 Low bit rate image compression coding method based on semantic communication Pending CN115695810A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211292779.XA CN115695810A (en) 2022-10-21 2022-10-21 Low bit rate image compression coding method based on semantic communication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211292779.XA CN115695810A (en) 2022-10-21 2022-10-21 Low bit rate image compression coding method based on semantic communication

Publications (1)

Publication Number Publication Date
CN115695810A true CN115695810A (en) 2023-02-03

Family

ID=85066772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211292779.XA Pending CN115695810A (en) 2022-10-21 2022-10-21 Low bit rate image compression coding method based on semantic communication

Country Status (1)

Country Link
CN (1) CN115695810A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116029340A (en) * 2023-01-13 2023-04-28 香港中文大学(深圳) Image and semantic information transmission method based on deep learning network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116029340A (en) * 2023-01-13 2023-04-28 香港中文大学(深圳) Image and semantic information transmission method based on deep learning network

Similar Documents

Publication Publication Date Title
Shao et al. Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems
Lee et al. Deep learning-constructed joint transmission-recognition for Internet of Things
Erdemir et al. Generative joint source-channel coding for semantic image transmission
Tung et al. DeepJSCC-Q: Constellation constrained deep joint source-channel coding
Huang et al. Joint task and data-oriented semantic communications: A deep separate source-channel coding scheme
CN116527943B (en) Extreme image compression method and system based on vector quantization index and generative model
Huang et al. D 2-JSCC: Digital Deep Joint Source-channel Coding for Semantic Communications
WO2021063218A1 (en) Image signal processing method and apparatus
Nemati et al. VQ-VAE empowered wireless communication for joint source-channel coding and beyond
CN116155297A (en) Data compression method, device, equipment and storage medium
CN115714627A (en) Self-adaptive semantic communication transmission method and electronic equipment
Itahara et al. Communication-oriented model fine-tuning for packet-loss resilient distributed inference under highly lossy IoT networks
Nemati et al. All-in-one: VQ-VAE for end-to-end joint source-channel coding
CN115695810A (en) Low bit rate image compression coding method based on semantic communication
Beitollahi et al. FLAC: Federated learning with autoencoder compression and convergence guarantee
Chen et al. Information compression in the AI era: Recent advances and future challenges
Li et al. Digital semantic device-edge co-inference with task-oriented arq
Liu et al. Semantic communications system with model division multiple access and controllable coding rate for point cloud
Zhang et al. Semantic edge computing and semantic communications in 6G networks: A unifying survey and research challenges
CN119324986A (en) Image semantic communication method, device, equipment and storage medium
Li et al. Content adaptive distributed joint source-channel coding for image transmission with hyperprior
Jiang et al. Lightweight vision model-based multi-user semantic communication systems
Huang et al. Deep separate source-channel coding for semantic-aware image transmission
Zhang et al. Progressive Learned Image Transmission for Semantic Communication Using Hierarchical VAE
Yao et al. Deep Joint Source-Channel Coding for Efficient and Reliable Cross-Technology Communication

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination