Background
The rise of the compressive sensing technology provides a novel signal acquisition and recovery mechanism, according to the compressive sensing theory, only the original signal needs to be projected onto a random basis to obtain a small number of measurement values, and signals with sparse or nearly sparse representation in a certain transform domain can be recovered through the measurement values. In compressed sensing video communication, a measuring end and a reconstruction end are extremely asymmetric, the measuring end is an information-physical fusion system and has the basic characteristics of limited physical and computing resources, cooperative signal acquisition and transmission and the like, and the reconstruction end with sufficient resources needs to recover an original signal without a feedback channel.
Video compression sensing generally adopts a communication architecture of 'independent measurement of each frame and multi-frame joint reconstruction', the computational complexity is transferred from a measurement end to a reconstruction end, and the extremely simple measurement end design is very suitable for a visual sensor with limited resources in a sensing network. The measuring terminal independently observes and encodes each frame image of the video by adopting the same observation matrix to generate continuous frame observation vectors, and sends out the continuous frame observation vectors as code streams. After receiving the code stream, the reconstruction end combines the code stream into a continuous image group observation vector, the multi-frame joint reconstruction utilizes different degrees of space-time redundant information, and the speed and the quality of video reconstruction are different.
The original image signal cannot be obtained in the reconstruction process of video compressed sensing, and the original image is difficult to refer to for the reconstruction performance evaluation. The blind evaluation of the video quality utilizes image samples in a typical image database to train, a blind evaluation model of video characteristic change is established through supervised pattern recognition and statistical regression, and the quality evaluation of a plurality of frames of images can be executed without original images. The Video-BLIINDS and VIIDEO proposed by Bovik et al are two typical Video quality blind evaluation criteria, wherein the Video-BLIINDS criteria is a frequency domain statistical model based on space-time natural scenes, and the VIIDEO criteria is a statistical model based on the distribution of frame differences between the front and the rear. Blind evaluation of the quality of the reconstructed image group can extract the self-characteristics of multi-frame images, and is helpful for recovering the structural information of a video signal.
The deep learning has shown promising performance in machine vision and image recovery tasks, and the compressed sensing deep learning can fully utilize the resources of a reconstruction terminal and better reconstruct a dynamically changing video signal. Long-short term memory (LSTM) networks perform attention model-based long-time sequence modeling, enabling the expression of more complex spatio-temporal information, and LSTM network-based deep learning mechanisms help to recover detailed information of video signals.
Disclosure of Invention
The invention aims to solve the technical problem of providing a video compression sensing reconstruction method based on the LSTM network and image group quality blind evaluation, which can combine sparse prior modeling with a data driving mechanism and is beneficial to improving the quality of a reconstructed video.
The technical scheme adopted by the invention for solving the technical problems is as follows: a video compression perception reconstruction method based on LSTM network and image group quality blind evaluation is provided, which comprises the following steps:
(1) a reconstruction end receives a frame observation vector code stream and combines the frame observation vector code stream to form a continuous image group observation vector;
(2) observation vector GMV Using 1 st image group1Training a parameter set of the LSTM network by the reconstructed image group;
(3) observation vector GMV for nth image groupnExecuting multi-frame joint iterative reconstruction based on the LSTM network, wherein n is more than or equal to 2, and the stopping condition is that when the iteration number reaches the maximum value K or the residual error l2Norm Rn,j||2Less than threshold resMin or group of images blind quality Qb nIs higher than the threshold value qMax, thereby completing the recovery of the nth image group and reconstructing a frame F in the nth image groupnAs the final nth reconstructed frame; after the recovery of continuous alpha image groups is finished, if the final recovery of each image group is that the iteration number reaches the maximumIf the value K is stopped, entering the step (4); otherwise, subsequent multi-frame joint iterative reconstruction still employs the current parameter set of the LSTM network §*And jumping to the step (5);
(4) the reconstruction end uses the nth image group to observe the vector GMVnThe reconstructed image group G ofnTraining an LSTM network;
(5) if the image group observation vector to be reconstructed still exists, returning to the step (3), and continuing to restore the image groups one by one; otherwise, outputting the rest reconstructed frame Fn+1、…、Fn+L-1And finishing video reconstruction as the final n +1, … and n + L-1 reconstructed frames.
Each image group observation vector in the step (1) comprises L frame observation vectors, wherein L is more than or equal to 2, and each frame observation vector contains M measured values.
The observation vector GMV for the 1 st image group in the step (2)1The 1 st reconstruction image group G is restored by the reconstruction end frame by adopting an image reconstruction algorithm1={F1,F2,...,FLThen (G) is added1,GMV1) Parameter set for training LSTM networks as reference data pairs §1To obtain the current set of parameters of the LSTM network §*=§1。
The multi-frame joint iterative reconstruction based on the LSTM network in the step (3) is realized by frame observation vector GMV
n(i) initializing the i frame residual vector R one by one
n,j(i) and the initialized residual vector R
n,j(i) as an input to the LSTM network; using a conversion matrix U to output the LSTM network of the ith frame image in the jth iteration
Conversion to base vectors
ncell is the number of LSTM network neurons; will base vector z
n,j(ii) i) further input to the softmax layer, thereby deriving non-zero probabilities for each element in the ith frame sparse vector, selecting the element with the highest probability and applying it to the iAdding to a support set of frame sparse vectors; finally, finding out sparse vectors { S ] of each frame in the jth iteration one by one through a least square estimation method
n,j(:,i)}
i=1,2,…,L。
And (3) weighting the residual error coefficient by the residual error vector after the repeated iteration calculated by the reconstruction end in the step (3) according to the probability that the residual error coefficient is zero to obtain a weighted residual error minimization problem, and solving the problem by adopting a Split Bregman iteration algorithm.
And (4) evaluating the blind quality of the image group in the step (3) by a Video-BLIINDS or VIIDEO criterion.
The reconstruction end recovers the nth reconstruction image group G frame by frame in the step (4) by adopting an image reconstruction algorithmn={Fn,Fn+1,...,Fn+L-1Will be (G)n,GMVn) Training a parameter set of an LSTM network as a reference data pair §nUpdating the Current parameter set of the LSTM network §*=§n。
When the LSTM network is trained, the LSTM network is adopted to reconstruct the image group G for trainingnCarrying out sparse coding, and carrying out sparse representation on given data by using the LSTM network to obtain a coefficient matrix; then fixing the coefficient matrix, and updating each atom of the LSTM network in turn to enable each atom to represent the reconstructed image group G for training more closelyn。
Advantageous effects
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the invention considers the space-time sparse characteristics of continuous multi-frame images, and provides a video compression sensing reconstruction method integrating sparse prior modeling and data memory driving. The method can simultaneously consider a large number of frames, does not need to make linear assumption on the motion of the object, can comprehensively reflect the motion information of the object, and is favorable for recovering the structure and detail information of a multi-frame image integrally, thereby improving the quality of the reconstructed video.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
In compressed sensing video communication, a measuring terminal performs each frame of independent measurement on an original video or a sub video divided by blocks frame by frame, and sends a frame observation vector code stream. The reconstruction end receives the code stream and combines the code stream to form continuous image group observation vectors, each image group observation vector contains L frame observation vectors, i represents the frame number (i is more than or equal to 1 and less than or equal to L) in the same image group, each frame observation vector contains M measured values,
represents the nth image group observation vector (n ≧ 1), and L frame observation vectors are arranged into GMV
nL column of (1), wherein GMV
n(: i) denotes the i-th frame observation vector. Based on the continuous frame observation vector code stream, fig. 1 shows a timing relationship diagram of the image group observation vectors, where the total number L of frame numbers included in each image group observation vector is 3. The reconstruction end carries out multi-frame joint iterative reconstruction based on long-short term memory (LSTM) on each image group observation vector, and determines whether to update the current parameter set of the LSTM network in a self-adaptive mode according to the recent condition of iterative reconstruction because the previous image group and the next image group of the video have strong correlation.
In the reconstructed video, each reconstructed image group contains L reconstructed frames, Gn={Fn,Fn+1,...,Fn+L-1Denotes an nth reconstructed image group, Qb nRepresents GnBlind quality of image groups, Rn,jRepresenting the nth image set residual vector in the jth iteration. FnRepresenting the nth reconstructed frame, containing N pixels. Sn,jRepresenting a reconstructed image group GnAnd (4) corresponding image group sparse vectors in the j iteration. Based on the LSTM network and the blind estimation of the image group quality, fig. 2 shows a general flow chart of a video compressed sensing reconstruction method, which mainly includes the following steps:
in the first step, in the initialization operation, the sequence number n is 1. Because the original frame can not be obtained, the reconstruction end adopts the 1 st image group observation vector GMV1The recovered reconstructed image set trains LSTM network parameters. For GMV1The 1 st reconstruction image group G is restored by the reconstruction end frame by adopting the image reconstruction algorithm with the total variation minimization1={F1,F2,...,FLWill then be (G)1,GMV1) Training the LSTM network as a reference data pair to obtain its parameter set §1As the current set of parameters for the LSTM network §*=§1。
Step two, observing vector GMV for nth image groupnAnd the reconstruction end executes multi-frame joint iterative reconstruction based on the LSTM network, calculates a residual error in each iteration, weights the residual error coefficient according to the probability that the residual error coefficient is zero to form a weighted residual error minimization problem, and adopts a Split Bregman iterative algorithm to solve the problem. The stop condition of multi-frame joint iterative reconstruction is that the iteration number reaches the maximum value K or the residual error l2Norm Rn,j||2Less than threshold resMin or group of images blind quality Qb nAbove the threshold qMax. The blind evaluation of the quality of the image sets uses the Video-BLIINDS or VIIDEO criterion, the quality being better for larger values. In cooperation with newly introduced qMax, K may be selected to be larger and resMin may be selected to be smaller. The reconstruction end completes the image group observation vector GMVnTo obtain the reconstructed frames { F one by onen+i-1=Ψ·Sn,j(:,i)}i=1,2,…,LFurther, a reconstructed image group G is obtainedn={Fn,Fn+1,...,Fn+L-1And outputs a reconstructed frame FnAs the final n-th reconstructed frame, the rest of the reconstructed frames Fn+1、…、Fn+L-1As the initial state of the same time-series frame of the subsequent image group. The adjacent image groups have approximate multi-frame joint sparse characteristics, and the parameter set of the LSTM network generally does not need to be updated in the recovery of the multiple image groups. When the recovery of the continuous alpha image groups is finished, if the situation that the iteration times reach the maximum value and stop continues for the alpha image groups, continuing the step three; otherwise, subsequent image group observation vectors still employ the current parameter set of the LSTM network §*And jumping to the step four.
In the multi-frame joint iterative reconstruction of the step, L frame sparse vectors are combined into an image group observation vector to carry out joint recovery, and L reconstructed frames are output according to the frame sequence and the frame rate to form a reconstructed image group. In the case of video reconstruction, it is known that,
is a matrix of a gaussian random number and,
is a dual-tree wavelet transform base, observation matrix
For any image group observation vector, a multi-frame joint iterative reconstruction flow chart based on the LSTM network is shown in FIG. 3. Image group residual vector R
n,jI frame residual vector R of
n,j(:,i)=GMV
n(:,i)-A
j·S
n,j(i), frame number i is 1,2, …, L, A
jIs to include only the corresponding S
n,jA matrix of A columns supporting the elements of the set, S
n,j(i) is the image group sparse vector S
n,jThe ith frame sparse vector of (1). L is the total number of frame numbers in a group of pictures observation vector, i.e. S
n,jThe number of columns. The joint sparse dependency of multiple frame observation vectors tends to be gradual, and needs to be dynamically obtained by calculating the conditional probability of the residual. The method is obtained by calculating the conditional probability of each vectorThis dependency uses a data-driven LSTM network to infer these probabilities, completing the GMV
n(: i) and A)
jIs estimated by least squares. Suppose S
n,jThe columns are sparse in common, i.e. the non-zero elements of each vector appear at the same positions as the other vectors, which means that the frame sparse vectors have the same support set. The method observes a vector GMV through an ith frame
n(i) initializing the i frame residual vector R one by one
n,j(ii) (i) these residual vectors are used as input to the LSTM network; the LSTM network for the ith frame in the jth iteration is then output using the transformation matrix U
Conversion into base vectors
Will z
n,j(ii) (i) input to the softmax layer, whose output is expressed as conditional probabilities, thereby finding the non-zero probabilities of the elements in the ith frame sparse vector, selecting the element with the highest probability and adding it to the support set of frame sparse vectors; and then finding the ith frame sparse vector by a least square estimation method, and simultaneously calculating a new ith frame residual vector as the input of the LSTM network in the next iteration.
Step three, the reconstruction end utilizes the nth image group observation vector GMVnThe reconstructed image set of (a) trains LSTM network parameters. The reconstruction end recovers the nth reconstruction image group G by adopting an image reconstruction algorithm with minimized total variation frame by framen={Fn,Fn+1,...,Fn+L-1Will be (G)n,GMVn) Retraining the parameter set of the LSTM network as a reference data pair §nAnd update the current set of parameters of the LSTM network §*=§n。
In the third step, if the iteration times reach the maximum value and the situation of stopping continues for alpha image groups, the training and updating of the LSTM network parameters are started. Reconstructed image set GnAnd its corresponding group of images observation vector GMVnForm a reference data pair (G)n,GMVn). To solve for the LSTM network parameters, it is necessary to minimize LThe STM network gives a cross entropy cost function between the conditional probability and the known probability of the reference data pair. The alternate training process first uses the LSTM network to reconstruct the image group G for trainingnPerforming sparse coding, i.e. fixing an LSTM network with which to sparsely represent given data, i.e. to represent the image set observation vectors GMV as closely as possible with as few coefficients as possiblenObtaining a coefficient matrix; then, the coefficient matrix is fixed, and each atom of the LSTM network (each column of the LSTM network) is updated in turn to enable the atom to represent the reconstructed image group G for training more closelyn. The training process of the LSTM network parameters is usually carried out again at longer time sequence numbers, the smaller the alpha value is, the more stable the quality of the reconstructed video is, but more computing resources are consumed, and the updated LSTM network parameter set is used for recovering the subsequent image group.
Step four, if the image group observation vector to be reconstructed still exists at the reconstruction end, if n is n +1, jumping back to the step two, repeatedly executing the process, and continuing to recover the subsequent image group; otherwise, the reconstruction end outputs GnThe remaining reconstructed frame Fn+1、…、Fn+L-1And the reconstructed frames are the final n +1, … and n + L-1 reconstructed frames, so that the video reconstruction is completed.
The invention provides a video compression sensing reconstruction method integrating sparse prior modeling and data memory driving, which considers the space-time sparse characteristics of continuous multi-frame images. The method can simultaneously consider a large number of frames, does not need to make linear assumption on the motion of the object, can comprehensively reflect the motion information of the object, and is favorable for recovering the structure and detail information of a multi-frame image integrally, thereby improving the quality of the reconstructed video.