CN115695810A - Low bit rate image compression coding method based on semantic communication - Google Patents
Low bit rate image compression coding method based on semantic communication Download PDFInfo
- Publication number
- CN115695810A CN115695810A CN202211292779.XA CN202211292779A CN115695810A CN 115695810 A CN115695810 A CN 115695810A CN 202211292779 A CN202211292779 A CN 202211292779A CN 115695810 A CN115695810 A CN 115695810A
- Authority
- CN
- China
- Prior art keywords
- data
- network
- generator
- image
- encoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 78
- 230000006835 compression Effects 0.000 title claims abstract description 22
- 238000007906 compression Methods 0.000 title claims abstract description 22
- 238000004891 communication Methods 0.000 title claims abstract description 13
- 230000005540 biological transmission Effects 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 54
- 238000012549 training Methods 0.000 claims description 25
- 238000005457 optimization Methods 0.000 claims description 15
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 6
- 238000013139 quantization Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 3
- 239000000654 additive Substances 0.000 claims description 3
- 230000000996 additive effect Effects 0.000 claims description 3
- 238000009795 derivation Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 12
- 238000003062 neural network model Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000009825 accumulation Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000013442 quality metrics Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Landscapes
- Compression Or Coding Systems Of Tv Signals (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
本发明提出一种基于语义通信的低比特率图像压缩编码方法。本发明结合语义编码技术与LDPC信道编码技术,并考虑了硬件设备的弱计算能力,一方面实现了高于传统图像编码技术的压缩比率和清晰度,另一方面保证了数据在实际无线信道中的可靠传输,使性能受限的传感器网络得以满足大数据量业务可靠传输的需求。且得益于神经网络模型的泛化能力,本发明使图像解码过程摆脱编码格式的限制,在无法重传或是比特差错超出编码技术纠错范围等情况下仍能保证图片被正常恢复且满足一定清晰度。
The invention proposes a low-bit-rate image compression coding method based on semantic communication. The present invention combines semantic coding technology and LDPC channel coding technology, and considers the weak computing power of hardware equipment. On the one hand, it realizes a compression ratio and definition higher than that of traditional image coding technology, and on the other hand, it ensures that the data in the actual wireless channel Reliable transmission enables the performance-limited sensor network to meet the needs of reliable transmission of large data volume services. And thanks to the generalization ability of the neural network model, the invention frees the image decoding process from the limitation of the encoding format, and can still ensure that the picture is restored normally and meets the requirements of Certain clarity.
Description
技术领域technical field
本发明属于深度学习、无线通信技术领域,特别是涉及一种基于语义通信的低比特率图像压缩编码方法。The invention belongs to the technical fields of deep learning and wireless communication, and in particular relates to a low-bit-rate image compression coding method based on semantic communication.
背景技术Background technique
近年的山火给自然界带来的沉痛的灾难,传统基于卫星和无人机的探测技术效率低且成本高昂,随着通信技术与物联网(InternetofThings,IoT)技术的发展,人们开始利用大规模无线传感器网络(WirelessSensorNetwork,WSN)进行实时监测,然而偏远地区中的传感器网络性能受限,难以承载大数据量业务的传输,尤其是图片等可以提供现场环境变化的有效信息。此外,经过数十年的发展,基于熵的图像压缩编码方法已经逐渐逼近香农极限,而借助深度学习的语义通信技术将基于图像语义进行压缩编码,实现对香农极限的超越,进而使边缘网络传输大量图片成为可能。In recent years, wildfires have brought severe disasters to the natural world. Traditional satellite and UAV-based detection technologies are inefficient and costly. With the development of communication technology and Internet of Things (IoT) technology, people have begun to use large-scale Wireless sensor network (WirelessSensorNetwork, WSN) for real-time monitoring, however, the performance of sensor networks in remote areas is limited, it is difficult to carry the transmission of large data volume services, especially pictures and other effective information that can provide on-site environmental changes. In addition, after decades of development, the entropy-based image compression coding method has gradually approached the Shannon limit, and the semantic communication technology with the help of deep learning will compress and code based on image semantics to achieve beyond the Shannon limit, and then make the edge network transmission Lots of pictures possible.
目前关于语义通信相关研究大多集中在文本等轻量数据集上,且均在仿真软件针对单一信道实现,对于图片语义相关共享知识库的建立与更新,端到端传输对物理无线信道时变、频选特征的自适应等相关研究较少,即尚未有图像语义编码并在实际无线环境中传输的成功案例。At present, most of the research on semantic communication is focused on lightweight data sets such as text, and they are all implemented in simulation software for a single channel. For the establishment and update of shared knowledge bases related to image semantics, end-to-end transmission affects the time-varying physical wireless channels, There are few related studies on the adaptation of frequency-selective features, that is, there is no successful case of image semantic encoding and transmission in an actual wireless environment.
发明内容Contents of the invention
本发明目的是为了解决目前传统图像压缩编码技术压缩比率低,清晰度低,以及语义通信技术在实际无线环境中难以直接应用的问题,提出了一种基于语义通信的低比特率图像压缩编码方法。The purpose of the present invention is to solve the problems of low compression rate and low definition of traditional image compression coding technology, and the difficulty of direct application of semantic communication technology in actual wireless environment, and propose a low bit rate image compression coding method based on semantic communication .
本发明是通过以下技术方案实现的,本发明提出一种边缘设备基于语义通信的低比特率图像压缩编码方法,所述方法包括:The present invention is achieved through the following technical solutions. The present invention proposes a semantic communication-based low-bit-rate image compression coding method for edge devices, the method comprising:
步骤一、进行网络训练前准备,设计编解码过程中所需的编码器、量化器、生成器、鉴别器,以及网络训练过程的优化函数及优化器,然后对各自的网络参数进行初始化;对所提供的数据集预处理,将其打乱、分批和归一化;
步骤二、将分批后的数据输入到步骤一中搭建的训练网络,依次经过编码器,量化器和生成器,然后将原图与生成器输出的重建图像一同输入到鉴别器中,计算相应损失函数值,并利用随机梯度下降法与反向传播更新网络参数;
步骤三、判断损失函数值是否收敛至预设值,是则提前终止训练并保存相应模型参数,供边缘设备离线工作,否则重复步骤二;
步骤四、边缘设备启动,自动加载步骤三中保存的预训练编码器和量化器,将捕获到的图片依次经过编码器,量化器和低密度奇偶校验码LDPC编码器,并选取合适的调制模式进行调制,发送数据至无线环境中;Step 4: The edge device starts, automatically loads the pre-trained encoder and quantizer saved in
步骤五、接收端加载步骤三中保存的预训练生成器,对接收到的信号依次进行解调制、LDPC解码、生成重建图像。Step 5: The receiving end loads the pre-training generator saved in
进一步地,在步骤一所述的编码器和生成器网络结构中:Further, in the encoder and generator network structure described in step 1:
input表示输入层;reflectionpadding()指反射填充层,括号中为填充尺寸;h×w×cconv,strides指卷积核尺寸为h×w,通道数为c,步长为k的卷积层,后接实例归一化层和ReLU激活层;编码器结构中最后一层卷积层中的通道数C指瓶颈层,用于控制压缩比率;Residual bloock指残差网络块,其中BatchNorm指批归一化层。input indicates the input layer; reflectionpadding() refers to the reflection padding layer, and the padding size is in parentheses; h×w×cconv,strides refers to the convolutional layer with the convolution kernel size h×w, the number of channels is c, and the step size is k. It is followed by an instance normalization layer and a ReLU activation layer; the number of channels in the last convolutional layer in the encoder structure C refers to the bottleneck layer, which is used to control the compression ratio; Residual bloock refers to the residual network block, where BatchNorm refers to batch return One chemical layer.
进一步地,所述优化函数为:Further, the optimization function is:
其中D(·)和G(·)分别表示鉴别器网络与生成器网络所对应的函数,表示寻找能使损失函数取极小值时对应的编码器E和生成器G以及能使损失函数取极大值的鉴别器D过程中所使用的极小化极大算法;f(·)和g(·)表示衡量样本真实程度的辅助函数,d(·)表示原图与生成图片的失真函数,H(·)表示熵编码算法,即量化后数据表示所需要的比特开销,表示取期望值,λ和β表示失真函数项和熵编码项的权重,x和表示原图与生成图,z表示接收到的信号样本,y表示量化后数据。where D( ) and G( ) represent the functions corresponding to the discriminator network and the generator network, respectively, Indicates the minimax algorithm used in the process of finding the corresponding encoder E and generator G that can make the loss function take the minimum value and the discriminator D that can make the loss function take the maximum value; f( ) and g(·) represents the auxiliary function to measure the realness of the sample, d(·) represents the distortion function of the original image and the generated image, H(·) represents the entropy coding algorithm, that is, the bit overhead required for quantized data representation, Represents the expected value, λ and β represent the weight of the distortion function item and the entropy coding item, x and Represents the original image and the generated image, z represents the received signal sample, and y represents the quantized data.
进一步地,所述步骤二中量化器实现过程,包括以下步骤:Further, the implementation process of the quantizer in the
第一部分为前向推导过程:The first part is the forward derivation process:
其中,zi表示第i个数据流中的第i个数据样本,cj表示量化集中的元素,满足L表示量化集的长度;Among them, z i represents the i-th data sample in the i-th data stream, and c j represents the elements in the quantized set, satisfying L represents the length of the quantization set;
第二部分为反向传播过程:The second part is the backpropagation process:
其中,exp(·)表示指数函数,σ表示softmax函数中的温度超参数。Among them, exp( ) represents the exponential function, and σ represents the temperature hyperparameter in the softmax function.
进一步地,对应鉴别器的损失函数为:Further, the loss function corresponding to the discriminator is:
其中,k表示鉴别器的个数,每个鉴别器结构相同且相互独立,对于第k个鉴别器,其输入为对原图像进行因子为2k-1的下采样操作的图像对,每一次下采样都能提供图像对的全局特征的高度抽象,来保证原图与生成图片之间从局部特征到全局特征的高保真度。Among them, k represents the number of discriminators. Each discriminator has the same structure and is independent of each other. For the kth discriminator, its input is an image pair that performs a downsampling operation on the original image with a factor of 2 k-1 . Each time Downsampling can provide a high degree of abstraction of the global features of the image pair to ensure high fidelity from local features to global features between the original image and the generated image.
进一步地,对应生成器的损失函数为:Further, the loss function corresponding to the generator is:
进一步地,采用Adam优化算法对求得的损失函数值的梯度值反向传播,分别更新鉴别器网络和生成器网络中的可训练参数,其中Adam算法具体过程如下:Further, the Adam optimization algorithm is used to backpropagate the gradient value of the obtained loss function value, and the trainable parameters in the discriminator network and the generator network are updated respectively. The specific process of the Adam algorithm is as follows:
vk=β1vk-1+(1-β1)gk v k =β 1 v k-1 +(1-β 1 )g k
θk=θk-1-Δgk θ k =θ k-1 -Δg k
其中,gk表示第k批数据的随机梯度,vk表示第k批数据的梯度所对应的动量变量,sk表示第k批数据的梯度平方的累加变量,和表示偏差修正后的动量变量与累加变量,常数β1和β2分别为梯度指数加权移动平均的超参数和梯度平方指数加权移动平均的超参数,η为优化器的学习率,常数ε表示防止分母为0所添加的一个极小值,取值为10-8。Among them, g k represents the stochastic gradient of the k-th batch of data, v k represents the momentum variable corresponding to the gradient of the k-th batch of data, s k represents the cumulative variable of the gradient square of the k-th batch of data, and Indicates the momentum variable and accumulation variable after bias correction, the constants β 1 and β 2 are the hyperparameters of the gradient exponentially weighted moving average and the hyperparameter of the gradient squared exponentially weighted moving average, respectively, η is the learning rate of the optimizer, and the constant ε represents the prevention of A minimum value added when the denominator is 0, the value is 10 -8 .
进一步地,所述选取合适的调制模式进行调制,发送数据至无线环境中,具体公式为:Further, the selection of a suitable modulation mode for modulation, sending data to the wireless environment, the specific formula is:
sdata=modulate(LDPC(y;r))s data = modulate(LDPC(y;r))
其中,sdata表示发送的数据,modulate(·)表示调制过程,LDPC(·)表示LDPC码编码过程,r表示码率,y表示编码量化后的数据。Among them, s data represents the transmitted data, modulate(·) represents the modulation process, LDPC(·) represents the LDPC code encoding process, r represents the code rate, and y represents the encoded and quantized data.
进一步地,所述步骤五具体过程公式为:Further, the specific process formula of said step five is:
其中,demodulate(·)表示解调过程,LDPC-1(·)表示LDPC解码过程,rdata表示经过无线信道后的数据,表示生成器的输入;Among them, demodulate(·) represents the demodulation process, LDPC -1 (·) represents the LDPC decoding process, r data represents the data after passing through the wireless channel, represents the input to the generator;
其中发送数据与接收数据满足rdata=hsdata+n;The sending data and receiving data satisfy r data = hs data + n;
式中,h表示信道状态信息,n表示信道的加性噪声。In the formula, h represents the channel state information, and n represents the additive noise of the channel.
本发明的有益效果为:The beneficial effects of the present invention are:
本发明结合语义编码技术与LDPC信道编码技术,并考虑了硬件设备的弱计算能力,一方面实现了高于传统图像编码技术的压缩比率和清晰度,另一方面保证了数据在实际无线信道中的可靠传输,使性能受限的传感器网络得以满足大数据量业务可靠传输的需求。且得益于神经网络模型的泛化能力,本发明使图像解码过程摆脱编码格式的限制,在无法重传或是比特差错超出编码技术纠错范围等情况下仍能保证图片被正常恢复且满足一定清晰度。The present invention combines semantic coding technology and LDPC channel coding technology, and considers the weak computing power of hardware equipment. On the one hand, it realizes a compression ratio and definition higher than that of traditional image coding technology, and on the other hand, it ensures that the data in the actual wireless channel Reliable transmission enables the performance-limited sensor network to meet the needs of reliable transmission of large data volume services. And thanks to the generalization ability of the neural network model, the invention frees the image decoding process from the limitation of the encoding format, and can still ensure that the picture is restored normally and meets the requirements of Certain clarity.
附图说明Description of drawings
图1为图像语义压缩编码方法的系统框架图。Fig. 1 is a system frame diagram of image semantic compression coding method.
图2为编码器与生成器的网络结构图。Figure 2 is a network structure diagram of the encoder and generator.
图3为神经网络模型训练的算法流程图。Fig. 3 is the algorithm flow chart of neural network model training.
图4为语义压缩编码方法与传统图像编码方法BPG的效果对比图。Figure 4 is a comparison diagram of the effect of the semantic compression coding method and the traditional image coding method BPG.
图5为语义压缩编码方法与传统图像编码方法BPG的不同信道传输效果对比图。Fig. 5 is a comparison diagram of different channel transmission effects between the semantic compression coding method and the traditional image coding method BPG.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
结合图1-5,本发明提出一种边缘设备基于语义通信的低比特率图像压缩编码方法,包括以下步骤:In combination with Figures 1-5, the present invention proposes a semantic communication-based low-bit-rate image compression coding method for edge devices, including the following steps:
步骤一、进行网络训练前准备。设计编解码过程中所需的编码器、量化器、生成器、鉴别器,以及网络训练过程的优化函数及优化器,然后对各自的网络参数进行初始化;对所提供的数据集预处理,将其打乱、分批、归一化。Step 1: Prepare for network training. Design the encoder, quantizer, generator, discriminator required in the encoding and decoding process, as well as the optimization function and optimizer of the network training process, and then initialize the respective network parameters; preprocess the provided data set, the It scrambles, batches, and normalizes.
所述编码器与生成器的网络结构如图2所示,其中input表示输入层;reflectionpadding()指反射填充层,括号中为填充尺寸;h×w×cconv,strides指卷积核尺寸为h×w,通道数为c,步长为k的卷积层,后接实例归一化层和ReLU激活层;编码器结构中最后一层卷积层中的通道数C指瓶颈层,用于控制压缩比率;Residualbloock指残差网络块,具体结构如图2中第3幅图所示,其中BatchNorm指批归一化层。The network structure of the encoder and generator is shown in Figure 2, where input represents the input layer; reflectionpadding() refers to the reflection padding layer, and the padding size is in parentheses; h×w×cconv, strides refers to the convolution kernel size is h × w, the number of channels is c, the convolutional layer with a step size of k, followed by the instance normalization layer and the ReLU activation layer; the number of channels C in the last convolutional layer in the encoder structure refers to the bottleneck layer, which is used for Control the compression ratio; Residualbloock refers to the residual network block, and the specific structure is shown in the third picture in Figure 2, where BatchNorm refers to the batch normalization layer.
所述优化函数如下式:The optimization function is as follows:
其中D(·)和G(·)分别表示鉴别器网络与生成器网络所对应的函数,表示寻找能使损失函数取极小值时对应的编码器E和生成器G以及能使损失函数取极大值的鉴别器D过程中所使用的极小化极大算法。f(·)和g(·)表示衡量样本真实程度的辅助函数。d(·)表示原图与生成图片的失真函数,H(·)表示熵编码算法,即表示量化后数据表示所需要的比特开销,表示取期望值,λ和β表示失真函数项和熵编码项的权重,x和表示原图与生成图,z表示接收到的信号样本,y表示量化后数据。where D( ) and G( ) represent the functions corresponding to the discriminator network and the generator network, respectively, Indicates the minimax algorithm used in the process of finding the corresponding encoder E and generator G that can make the loss function take the minimum value, and the discriminator D that can make the loss function take the maximum value. f( ) and g( ) represent auxiliary functions to measure the realness of the sample. d(·) represents the distortion function of the original image and the generated image, H(·) represents the entropy coding algorithm, that is, the bit overhead required for quantized data representation, Represents the expected value, λ and β represent the weight of the distortion function item and the entropy coding item, x and Represents the original image and the generated image, z represents the received signal sample, and y represents the quantized data.
所述优化器采用Adam算法优化器。优化器选取并不唯一,可根据需求进行选取。The optimizer adopts Adam algorithm optimizer. The optimizer selection is not unique and can be selected according to requirements.
本步骤中,主要对对抗生成网络训练过程所需要的相关物件进行初始化,具体的网络结构可根据不同硬件设备的性能进行搭建,对于优化函数中的失真函数项与熵编码项亦可根据需求进行选取,优化器同理,选取的关键依据为确保模型能够稳定训练并收敛,最后达到理想的效果。此外,数据集的选取可根据不同的应用场景进行选取以获得更好的训练效果,分批大小与归一化过程可根据训练所用配置进行取舍。In this step, the relevant objects required for the training process of the confrontation generative network are mainly initialized. The specific network structure can be built according to the performance of different hardware devices, and the distortion function item and entropy encoding item in the optimization function can also be customized according to requirements. The selection is the same for the optimizer. The key basis for selection is to ensure that the model can be trained stably and converges, and finally achieve the desired effect. In addition, the selection of data sets can be selected according to different application scenarios to obtain better training results, and the batch size and normalization process can be selected according to the configuration used for training.
步骤二、将分批后的数据输入到步骤一中搭建的训练网络,依次经过编码器,量化器,生成器,然后将原图与生成器输出的重建图像一同输入到鉴别器中,计算相应损失函数值,并利用随机梯度下降法与反向传播更新网络参数,包括以下步骤:
步骤二一、将当前分批数据输入到编码器、量化器中,如下式表示:Step 21. Input the current batch data into the encoder and the quantizer, as shown in the following formula:
y=q(E(x;θ))(2)y=q(E(x;θ))(2)
式中,E表示编码器网络,θ表示编码器网络参数,q表示量化过程。where E represents the encoder network, θ represents the parameters of the encoder network, and q represents the quantization process.
其中,为了保证梯度链路在反向传播过程中,尤其是经过量化器后不发生梯度中断的现象,将量化器的实现逻辑分为两部分:Among them, in order to ensure that the gradient link does not experience gradient interruption during the backpropagation process, especially after passing through the quantizer, the implementation logic of the quantizer is divided into two parts:
第一部分为前向推导过程:The first part is the forward derivation process:
其中,zi表示第i个数据流中的第i个数据样本,cj表示量化集中的元素,满足L表示量化集的长度。Among them, z i represents the i-th data sample in the i-th data stream, and c j represents the elements in the quantized set, satisfying L represents the length of the quantization set.
第二部分为反向传播过程:The second part is the backpropagation process:
其中,exp(·)表示指数函数,σ表示softmax函数中的温度超参数。Among them, exp( ) represents the exponential function, and σ represents the temperature hyperparameter in the softmax function.
步骤二二、将步骤二一中量化后的数据作为生成器的输入,进行从语义到图像的重建,具体为:Step 22: Use the quantified data in step 21 as the input of the generator to perform reconstruction from semantics to images, specifically:
其中,G(·)表示生成器网络,表示生成器网络中的可训练参数,表示步骤二一中量化后的数据,表示生成图像。Among them, G( ) represents the generator network, Denotes the trainable parameters in the generator network, Represents the quantified data in step 21, Indicates the generated image.
步骤二三、将步骤一中数据集的原图与步骤二二中得到的生成图片进行两两组合,将图像对一同作为鉴别器的输入,求对应的鉴别器损失函数值:Step two and three, combine the original image of the data set in step one with the generated image obtained in step two and two, and use the image pair together as the input of the discriminator, and find the corresponding discriminator loss function value:
其中,k表示多尺度鉴别器的个数,每个鉴别器结构相同且相互独立,对于第k个鉴别器,其输入为对原图像进行因子为2k-1的下采样操作的图像对,每一次下采样都能提供图像对的全局特征的高度抽象,来保证原图与生成图片之间从局部特征到全局特征的高保真度。Among them, k represents the number of multi-scale discriminators, and each discriminator has the same structure and is independent of each other. For the kth discriminator, its input is an image pair that performs a downsampling operation on the original image with a factor of 2 k-1 , Each downsampling can provide a high degree of abstraction of the global features of the image pair to ensure high fidelity from local features to global features between the original image and the generated image.
而对应的生成器损失函数值为:The corresponding generator loss function value is:
步骤二四、采用Adam优化算法对步骤二三中求得的损失函数值的梯度值反向传播,分别更新鉴别器网络和生成器网络中的可训练参数,其中Adam算法具体过程如下:Step two and four, using the Adam optimization algorithm to backpropagate the gradient value of the loss function value obtained in step two and three, respectively updating the trainable parameters in the discriminator network and the generator network, wherein the specific process of the Adam algorithm is as follows:
其中,gk表示第k批数据的随机梯度,vk表示第k批数据的梯度所对应的动量变量,sk表示第k批数据的梯度平方的累加变量,和表示偏差修正后的动量变量与累加变量,常数β1和β2分别为梯度指数加权移动平均的超参数和梯度平方指数加权移动平均的超参数,η为优化器的学习率,常数ε表示防止分母为0所添加的一个极小值,通常为10-8。Among them, g k represents the stochastic gradient of the k-th batch of data, v k represents the momentum variable corresponding to the gradient of the k-th batch of data, s k represents the cumulative variable of the gradient square of the k-th batch of data, and Indicates the momentum variable and accumulation variable after bias correction, the constants β 1 and β 2 are the hyperparameters of the gradient exponentially weighted moving average and the hyperparameter of the gradient squared exponentially weighted moving average, respectively, η is the learning rate of the optimizer, and the constant ε represents the prevention of A minimum value added by a denominator of 0, usually 10 -8 .
本步骤中,本步骤主要进行系统中对抗生成网络模型的具体训练,对于在步骤一中所搭建的网络结构,优化器及学习率、预处理后的数据集等输入到对抗生成网络框架中,并在优化函数的指导下对编码器、生成器和鉴别器进行交替训练。In this step, this step mainly carries out the specific training of the confrontation generation network model in the system. For the network structure built in
步骤三、判断损失函数值是否收敛至预设值,是则提前终止训练并保存相应模型参数,否则重复步骤二。
本步骤中,根据步骤二所得到的损失值进行判断,确定其是否收敛,对于不同的网络参数和数据集输入,即使是同一个网络结构也可能收敛至不同值,因此优化函数收敛只是神经网络学习完成的一个特征,得到的模型及其效果需要根据一些额外的测试来判断,如模型在测试数据集和验证数据集中实际生成图片的质量是否也达到预期,进而是否需要提前终止训练并保存网络参数。In this step, judge according to the loss value obtained in
步骤四、边缘设备启动,自动加载步骤三中保存的预训练编码器和量化器,将捕获到的图片依次经过编码器,量化器,低密度奇偶校验码(LowDensityParityCheckCode,LDPC)编码器,并选取合适的调制模式进行调制,发送数据至无线环境中,具体过程为:
sdata=modulate(LDPC(y;r))(9)s data = modulate(LDPC(y;r))(9)
其中,sdata表示发送的数据,modulate(·)表示调制过程,LDPC(·)表示LDPC码编码过程,r表示码率,y表示编码量化后的数据,由步骤二中已训练的编码器和量化器输出得到,即公式(2)。Among them, s data represents the sent data, modulate(·) represents the modulation process, LDPC(·) represents the LDPC code encoding process, r represents the code rate, and y represents the encoded and quantized data. The trained encoder and The quantizer output is obtained, that is, formula (2).
本步骤中,本步骤边缘设备直接加载步骤二中已经训练完成的编码器和量化器,设备捕捉到的图片进行编码和量化,同时为了保证数据能够在实际无线环境中可靠传输,对其进行信道编码和调制,其中,调制方式可选取BPSK、16QAM等,LDPC编码码率的设置也可根据实际需求来进行,通常码率越高,传输效率越高,但可靠性越低。In this step, the edge device in this step directly loads the encoder and quantizer that have been trained in
步骤五、接收端加载步骤三中保存的预训练生成器,对接收到的信号依次进行解调制、LDPC解码、生成重建图像,具体过程为:
其中,demodulate(·)表示解调过程,LDPC-1(·)表示LDPC解码过程,rdata表示经过无线信道后的数据,表示生成器的输入。Among them, demodulate(·) represents the demodulation process, LDPC -1 (·) represents the LDPC decoding process, r data represents the data after passing through the wireless channel, Represents the input to the generator.
其中发送数据与接收数据满足rdata=hsdata+n;The sending data and receiving data satisfy r data = hs data + n;
式中,h表示信道状态信息,n表示信道的加性噪声。In the formula, h represents the channel state information, and n represents the additive noise of the channel.
本步骤中,接收端设备先对接收信号进行解调和解码,来恢复图像的语义,供后续生成器使用,此处的信道噪声应为抽象的语义噪声,而非传统的物理噪声,前者导致信息发生语义误差,误差大小由接发双方的共享语义库所决定,实际指编码器网络与生成器网络的可训练参数对新样本的适应性,但仍能保证接收到的语义被正常恢复为对应的图片,而后者导致比特差错现象,当比特差错出现在格式头部时,会使整个数据包无法被识别、解码。In this step, the receiver device first demodulates and decodes the received signal to restore the semantics of the image for use by the subsequent generator. The channel noise here should be abstract semantic noise rather than traditional physical noise. The former causes Semantic errors occur in the information, and the size of the error is determined by the shared semantic library of the sender and receiver. It actually refers to the adaptability of the trainable parameters of the encoder network and the generator network to new samples, but it can still ensure that the received semantics are normally restored to The corresponding picture, and the latter causes a bit error phenomenon. When a bit error occurs in the format header, the entire data packet cannot be recognized and decoded.
实施例1:Example 1:
按照具体实施方式的方法进行图像语义压缩,如下:Carry out image semantic compression according to the method for specific embodiment, as follows:
(1)第一步,确定模型训练所需超参数,训练集选取FLAME数据集,瓶颈层C分别取4和8,量化集长度L=5,优化器选取Adam算法,学习率为0.0002,数据集选取FLAME数据集,分批数量为1;优化函数中,对抗生成网络的损失函数采用最小二乘损失,即令f(x)=(x-1)2和g(x)=x2,失真函数选取均方误差函数,权重λ=10,熵编码部分暂不考虑,即令β=0;多尺度鉴别器个数为3,对原图对、下采样因子为2的图像对、下采样因子为4的图像对进行鉴别。(1) The first step is to determine the hyperparameters required for model training. The training set is selected from the FLAME data set, the bottleneck layer C is 4 and 8 respectively, and the quantization set The length L=5, the optimizer selects the Adam algorithm, the learning rate is 0.0002, the data set selects the FLAME data set, and the number of batches is 1; in the optimization function, the loss function of the confrontation generation network adopts the least square loss, that is, f(x) =(x-1) 2 and g(x)=x 2 , the distortion function selects the mean square error function, the weight λ=10, the entropy coding part is not considered temporarily, that is, β=0; the number of multi-scale discriminators is 3, Identify the original image pair, the image pair with a downsampling factor of 2, and the image pair with a downsampling factor of 4.
(2)第二步,开始训练。待优化函数收敛后,保存模型及相关可训练参数,存储在边缘设备中。(2) The second step is to start training. After the optimization function converges, the model and related trainable parameters are saved and stored in the edge device.
(3)第三步,边缘设备加载模型,捕获图片,编码,量化,LDPC编码,码率为1/2,BPSK调制,信道模型分别选取AWGN信道和Rayleigh信道。(3) In the third step, the edge device loads the model, captures pictures, codes, quantizes, LDPC codes, the code rate is 1/2, BPSK modulation, and the channel model selects AWGN channel and Rayleigh channel respectively.
(4)第四步,接收端接收数据,进行BPSK解调,LDPC解码,生成图片。(4) In the fourth step, the receiving end receives the data, performs BPSK demodulation, LDPC decoding, and generates a picture.
从上述过程可以看到,本发明提出的语义编码方法并没有具体格式要求,因此比传统图像编码方法更具有普适性。It can be seen from the above process that the semantic coding method proposed by the present invention has no specific format requirements, so it is more universal than the traditional image coding method.
实施例2:Example 2:
按照具体实施中的方法进行仿真验证:Perform simulation verification according to the method in the specific implementation:
仿真的条件为:图片格式为RGB格式,分辨率为256×256px,通道数为3,每个通道上的像素最大取值范围为255,即图片的存储开销为24比特每像素。对比所用参考图像编码格式为BPG格式,图像质量衡量指标采用峰值信噪比(PeakSignaltoNoiseRatio,PSNR)和多尺度结构相似性(Multiscalestructuralsimilarity,MSSSIM)。The simulation conditions are: the image format is RGB format, the resolution is 256×256px, the number of channels is 3, and the maximum value range of pixels on each channel is 255, that is, the storage overhead of the image is 24 bits per pixel. The reference image coding format used for comparison is BPG format, and the image quality metrics use Peak Signal to Noise Ratio (PSNR) and Multiscale Structural Similarity (MSSSIM).
从图4可以看出:本发明在比特开销略小的情况,能够生成清晰度更高的图像。而生成相近清晰度的图片所需要的比特开销更小。It can be seen from FIG. 4 that the present invention can generate images with higher definition when the bit overhead is slightly smaller. And the bit overhead required to generate a picture of similar definition is smaller.
由图4可见,语义编码压缩方法在压缩比例与图像质量均优于BPG编码方法。It can be seen from Figure 4 that the semantic coding compression method is superior to the BPG coding method in terms of compression ratio and image quality.
从图5可以看出,在AWGN信道与Rayleigh信道下,语义编码方法在不同信噪比环境下均能提供高于BPG的图像质量。It can be seen from Figure 5 that under the AWGN channel and Rayleigh channel, the semantic coding method can provide higher image quality than BPG under different SNR environments.
由图5可见,语义编码方法对于不同无线信道环境的适应性更高。It can be seen from Figure 5 that the semantic coding method has higher adaptability to different wireless channel environments.
以上对本发明所提出的一种基于语义通信的低比特率图像压缩编码方法进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。A kind of low-bit-rate image compression coding method based on semantic communication proposed by the present invention has been introduced in detail above. In this paper, specific examples are used to illustrate the principle and implementation of the present invention. The description of the above embodiments is only for To help understand the method of the present invention and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in the specific implementation and scope of application. In summary, the content of this specification It should not be construed as a limitation of the invention.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211292779.XA CN115695810A (en) | 2022-10-21 | 2022-10-21 | Low bit rate image compression coding method based on semantic communication |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211292779.XA CN115695810A (en) | 2022-10-21 | 2022-10-21 | Low bit rate image compression coding method based on semantic communication |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115695810A true CN115695810A (en) | 2023-02-03 |
Family
ID=85066772
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211292779.XA Pending CN115695810A (en) | 2022-10-21 | 2022-10-21 | Low bit rate image compression coding method based on semantic communication |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115695810A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116029340A (en) * | 2023-01-13 | 2023-04-28 | 香港中文大学(深圳) | Image and semantic information transmission method based on deep learning network |
-
2022
- 2022-10-21 CN CN202211292779.XA patent/CN115695810A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116029340A (en) * | 2023-01-13 | 2023-04-28 | 香港中文大学(深圳) | Image and semantic information transmission method based on deep learning network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shao et al. | Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems | |
Lee et al. | Deep learning-constructed joint transmission-recognition for Internet of Things | |
Erdemir et al. | Generative joint source-channel coding for semantic image transmission | |
Tung et al. | DeepJSCC-Q: Constellation constrained deep joint source-channel coding | |
Huang et al. | Joint task and data-oriented semantic communications: A deep separate source-channel coding scheme | |
CN116527943B (en) | Extreme image compression method and system based on vector quantization index and generative model | |
Huang et al. | D 2-JSCC: Digital Deep Joint Source-channel Coding for Semantic Communications | |
WO2021063218A1 (en) | Image signal processing method and apparatus | |
Nemati et al. | VQ-VAE empowered wireless communication for joint source-channel coding and beyond | |
CN116155297A (en) | Data compression method, device, equipment and storage medium | |
CN115714627A (en) | Self-adaptive semantic communication transmission method and electronic equipment | |
Itahara et al. | Communication-oriented model fine-tuning for packet-loss resilient distributed inference under highly lossy IoT networks | |
Nemati et al. | All-in-one: VQ-VAE for end-to-end joint source-channel coding | |
CN115695810A (en) | Low bit rate image compression coding method based on semantic communication | |
Beitollahi et al. | FLAC: Federated learning with autoencoder compression and convergence guarantee | |
Chen et al. | Information compression in the AI era: Recent advances and future challenges | |
Li et al. | Digital semantic device-edge co-inference with task-oriented arq | |
Liu et al. | Semantic communications system with model division multiple access and controllable coding rate for point cloud | |
Zhang et al. | Semantic edge computing and semantic communications in 6G networks: A unifying survey and research challenges | |
CN119324986A (en) | Image semantic communication method, device, equipment and storage medium | |
Li et al. | Content adaptive distributed joint source-channel coding for image transmission with hyperprior | |
Jiang et al. | Lightweight vision model-based multi-user semantic communication systems | |
Huang et al. | Deep separate source-channel coding for semantic-aware image transmission | |
Zhang et al. | Progressive Learned Image Transmission for Semantic Communication Using Hierarchical VAE | |
Yao et al. | Deep Joint Source-Channel Coding for Efficient and Reliable Cross-Technology Communication |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |