CN118509607B

CN118509607B - Real-time video processing and intelligent analysis method based on edge calculation

Info

Publication number: CN118509607B
Application number: CN202410968764.3A
Authority: CN
Inventors: 秦四海
Original assignee: Dianji Network Technology Shanghai Co ltd
Current assignee: Dianji Network Technology Shanghai Co ltd
Priority date: 2024-07-19
Filing date: 2024-07-19
Publication date: 2024-09-27
Anticipated expiration: 2044-07-19
Also published as: CN118509607A

Abstract

The application relates to the technical field of video coding, and provides a real-time video processing and intelligent analysis method based on edge calculation, which comprises the following steps: transmitting the real-time video to an edge computing device; determining a motion mask image based on a detection result of a moving object, and determining a feature point matching result based on a matching degree of object movement features between adjacent video frames; respectively determining gesture motion vectors and single-point gesture vectors based on the matching results; a consistency feature value determined based on a degree of similarity between a single-point pose vector and a pose motion vector of feature points within the coded block sub-block; determining a PU partition mode based on the consistency characteristic value; determining a prediction selection queue based on a direction angle corresponding to the feature point in the PU block and the intra-frame prediction angle; and encapsulating the real-time video into NAL streams based on a prediction selection queue by using the HEVC technology, and completing analysis processing of the real-time video. The application reduces the number of intra-frame prediction modes by screening the intra-frame prediction angle, and reduces the encoding time.

Description

Real-time video processing and intelligent analysis method based on edge calculation

Technical Field

The application relates to the technical field of video coding, in particular to a real-time video processing and intelligent analysis method based on edge calculation.

Background

With the development of intelligent devices and communication networks, the requirements on real-time video image processing are higher and higher, the traditional centralized cloud computing mode cannot meet the requirements on real-time performance, and the edge computing technology reduces the transmission delay of data by moving computation and data storage to the edge of the network, namely the device or the terminal, improves the response speed and reduces the network bandwidth requirement, and achieves real-time video image processing, such as the preprocessing of enhancing denoising of video images, the recognition of video images, the encoding compression of video images and the like.

In the processing process of the real-time video, the edge calculation completes the processing analysis of the real-time video data through the related calculation on video processing equipment such as a camera, an encoder and the like, and realizes the real-time encoding, compression, transmission and the like of the video. At present, high efficiency Video Coding HEVC (HIGH EFFICIENCY Video Coding) is one of the mainstream technologies applied to real-time Video compression Coding, when the HEVC technology carries out intra-frame prediction on Video data, the prediction mode selection is expanded from 8 types of H.264 to 35 types, so that the Coding complexity is greatly increased while the prediction precision is improved, however, only a small number of Video frames in a Video sequence are used for carrying out relative motion, a large amount of redundant information or similar content exists in the time of the front frame and the rear frame, and the excessive prediction selection not only ignores the correlation between the motion characteristics and the characteristic information directions in the real-time Video, but also reduces the processing efficiency of edge computing equipment on Video compression Coding.

Disclosure of Invention

The application provides a real-time video processing and intelligent analysis method based on edge calculation, which improves the efficiency of edge calculation equipment on real-time video coding and reduces coding complexity by reducing the number of prediction directions and prediction modes when HEVC technology codes the real-time video in each edge node.

In a first aspect, an embodiment of the present application provides a method for real-time video processing and intelligent analysis based on edge computation, where the method includes:

transmitting the real-time video acquired by the edge node to edge computing equipment of the edge node;

Determining a motion mask image of each video frame based on a detection result of a moving object in each frame in the real-time video, and determining a feature point matching result between the motion mask images based on a matching degree of object movement features between adjacent video frames in the real-time video;

Respectively determining gesture motion vectors and single-point gesture vectors based on feature point matching results between motion mask images of adjacent video frames;

Dividing the real-time video into a large number of coding blocks by using an HEVC (high efficiency video coding) technology, and determining a consistency characteristic value between sub-blocks based on the similarity degree between single-point gesture vectors and gesture motion vectors of characteristic points in the sub-blocks in each coding block;

in each edge computing device, determining a PU partition mode when the HEVC technology carries out predictive coding on the real-time video based on the consistency characteristic value, and obtaining a partition result of the PU blocks;

determining a prediction selection queue based on the direction angle of the single-point attitude vector of each feature point in the PU block and 33 preset intra-frame prediction angles in the HEVC technology;

And encapsulating the real-time video in each edge computing device into NAL (NAL streams) based on the prediction selection queue by using HEVC (high efficiency video coding) technology, and completing analysis and processing of the real-time video.

In the scheme, firstly, feature points are obtained through detection of moving objects in adjacent video frames in real-time video, and a motion mask image of each video frame is constructed, so that a matching result between the feature points in the motion mask images of the adjacent video frames is determined, and the motion centroid of each video frame is conveniently determined subsequently through extraction and matching of the feature points; secondly, respectively acquiring an attitude motion vector of each video frame and a single-point attitude vector of each feature point through a feature point matching result in the adjacent video frames, and completing quantification of the overall motion state and single-point movement condition of each video frame; then, obtaining a consistency characteristic value among the sub-blocks by analyzing the similarity degree of object motion trend characteristics among the sub-blocks in the coding block, and determining the PU division mode of the CTU by combining a rate distortion criterion; then, for each inter-frame prediction frame, progressively determining PU dividing conditions in each CTU according to different calculation results, and realizing self-adaptive dividing in different CTUs, so that one coding unit is coded by using fewer bits as possible when real-time video is coded; then, calculating a possibility index of each intra-frame prediction angle based on the angle deviation between the direction angle of the single-point gesture vector of the feature point in the PU block and the preset 33 intra-frame prediction angles in the HEVC technology, and evaluating the possibility that each intra-frame prediction angle is selected; the 33 intra-frame prediction angles are screened based on the possibility indexes, a prediction selection queue is constructed, and prediction modes are obtained, so that the number of the whole intra-frame prediction modes is smaller than 35, the real-time video coding time is shortened, and the real-time performance of video coding is improved; finally, the VCL data sequence is encapsulated into NAL streams through a network extraction layer NAL, and the NAL streams are stored in each edge node to complete localized real-time video processing analysis and storage.

With reference to the first aspect, in one possible implementation manner, the method for determining the motion mask image of each video frame based on the detection result of the moving object in each frame in the real-time video includes:

Taking two adjacent video frames as input, and determining pixel points in a motion state in the video frames by using an optical flow method;

And in each video frame, acquiring the connected domain of all the pixel points, and setting the gray value of the pixel points outside the connected domain to be 0 to obtain a motion mask image of each video frame.

In the scheme, the pixel points in the motion state are determined through tracking the motion state in the adjacent video frames, the motion mask image of each video frame is obtained through masking processing of the pixel points not in the motion state, the interference of the pixel points not in the motion state on the detection of the subsequent feature points is reduced, and the feature point detection efficiency is improved.

With reference to the first aspect, in one possible implementation manner, the method for determining a feature point matching result between the motion mask images based on a matching degree of the object moving feature between adjacent video frames in the real-time video includes:

and taking the motion mask images of two adjacent video frames as input, respectively acquiring the characteristic points in each motion mask image by using a characteristic point detection method, and acquiring the matching results of the characteristic points in the two motion mask images by using a characteristic point matching mode.

In the scheme, firstly, the characteristic points in each motion mask image are determined by a characteristic point detection method, and secondly, the matching result between the characteristic points in adjacent video frames is determined by a characteristic point matching method, so that the subsequent determination of the overall motion centroid of each video frame is facilitated.

With reference to the first aspect, in one possible implementation manner, the method for determining the pose motion vector and the single-point pose vector based on the feature point matching result between the motion mask images of the adjacent video frames includes:

in each motion mask image, determining a centroid point in each motion mask image by utilizing the position information of the successfully matched feature points;

taking a vector determined by a connecting line between a centroid point in a motion mask image of each video frame and a centroid point in a motion mask image of an adjacent next video frame as a gesture motion vector of each video frame;

And taking each characteristic point in the motion mask image of each video frame as a starting point, taking the matched characteristic point of the characteristic point as an end point, and taking each vector pointing to the end point from the starting point as a single-point posture vector of the characteristic point.

In the scheme, the centroid point in each video frame is determined through the position information of all the characteristic points in the motion mask image of each video frame, so that the gesture motion vector of each video frame and the single-point gesture vector of each characteristic point are determined, the quantification of the overall motion state and the single-point movement condition of each video frame is realized, and the gesture operation direction is determined based on the characteristic point matching result, so that the gesture motion direction of a moving object in a continuous video frame can be accurately evaluated when partial pixel points are lost due to the influence of foreground and background overlapping phenomena, and the motion trend characteristics among the sub-blocks of the coding block can be analyzed conveniently.

With reference to the first aspect, in one possible implementation manner, the method for determining the consistency characteristic value between sub-blocks based on the similarity degree between the single-point gesture vector and the gesture motion vector of the characteristic point in the sub-block in each coding block is:

Taking a similarity measurement result between a single-point gesture vector of each feature point in each sub-block of the coding block and a gesture motion vector of a video frame where the feature point is located as a trend feature value of the feature point;

And taking a sequence formed by all the trend characteristic values in each sub-block as a trend characteristic sequence of each sub-block, and taking a similarity measurement result between the trend characteristic sequences of the two sub-blocks as a consistency characteristic value between the two sub-blocks.

In the scheme, the trend characteristic value of the characteristic point is calculated through the similarity defect between the single-point gesture vector of each characteristic point in the sub-block and the gesture motion vector of the video frame where the characteristic point is positioned, so that the measurement of the motion trend of a single characteristic point and the whole video frame is completed; and then, determining the consistency characteristic value among the sub-blocks by utilizing the similarity degree among trend characteristic sequences formed by the trend characteristic values of all the characteristic points in the sub-blocks, and evaluating the proximity degree of the motion directions of the two sub-blocks by the consistency of the motion trend of the characteristic points in the sub-blocks and the motion trend of the whole video frame so as to facilitate the subsequent determination of the PU partition modes of different coding blocks.

With reference to the first aspect, in one possible implementation manner, in each edge computing device, the method for determining, based on the consistency feature value, a PU partition mode when performing predictive coding on real-time video by using HEVC technology is:

calculating the accumulated sum of the absolute values of the differences of pixel values of pixel points in each CTU in the inter-frame prediction frame and the CTU in the same position in the previous video frame as an initial difference value of each CTU;

If the initial difference value of each CTU is smaller than or equal to a preset value, each CTU adopts a PU partition mode of the CTU at the same position in the previous video frame;

If the initial difference value is larger than a preset value, respectively calculating the consistency characteristic value between any two sub-blocks in each coding block of the CTU;

if the consistency characteristic values among the sub-blocks in the coding block are equal, carrying out a PU division mode of 2Nx2N and skip;

If the consistency characteristic values among the sub-blocks in the coding block are not equal, respectively calculating the sum of the consistency characteristic values among the sub-blocks in the horizontal direction and the vertical direction in the coding block as a horizontal consistency characteristic value and a vertical consistency characteristic value;

If the horizontal consistency characteristic value is larger than the vertical consistency characteristic value, carrying out PU division modes of 2Nx2N, 2NxN and skip;

If the horizontal consistency characteristic value is equal to the vertical consistency characteristic value, carrying out a PU division mode of 2N multiplied by 2N;

If the horizontal coherence feature value is smaller than the vertical coherence feature value, a PU partition mode of 2Nx N, N x2N and skip is performed.

In the scheme, firstly, determining an initial difference value of each CTU through a pixel value difference value of pixel points in the CTUs in adjacent video frames, and determining a first PU division mode by utilizing the initial difference value and a preset value; secondly, determining a second PU partition mode based on whether the consistency characteristic values between any two sub-blocks in each coding block of the CTU are equal; then, aiming at the condition that the consistency characteristic values among the sub-blocks are unequal, the horizontal consistency characteristic value and the vertical consistency characteristic value are respectively obtained through the consistency characteristic values of the sub-blocks in the horizontal direction and the vertical direction, a plurality of PU dividing modes are determined based on the comparison results of the horizontal consistency characteristic value and the vertical consistency characteristic value, and the PU dividing modes in CTU under different conditions are progressively determined through the comparison of a plurality of calculation results, so that the rate distortion performance in real-time video processing is better.

With reference to the first aspect, in one possible implementation manner, the method for determining the prediction selection queue based on the direction angle of the single-point pose vector of each feature point in the PU block and 33 intra-frame prediction angles preset in the HEVC technology includes:

Determining a possibility index of preset 33 intra-frame prediction angles based on the direction angles of the single-point attitude vectors of the feature points in the divided PU blocks;

Taking the possibility indexes of 33 intra-frame prediction angles as input, acquiring a segmentation threshold value by using an Ojin threshold value method, taking the intra-frame prediction angle with each possibility index larger than the segmentation threshold value as a prediction direction, and taking a sequence formed by all the prediction directions as a prediction selection queue.

In the above scheme, the possibility that each intra-frame prediction angle is taken as a prediction direction is estimated through the angle deviation between the direction angle of the unidirectional attitude vector of the feature point in the PU block and the preset 33 intra-frame prediction angles in the HEVC technology; and secondly, obtaining a segmentation threshold value of the possibility index by using an Ojin threshold value segmentation algorithm, and determining a prediction direction by screening the segmentation threshold value, so that the encoding complexity in the process of encoding the real-time video is reduced.

With reference to the first aspect, in one possible implementation manner, the method for determining the probability indexes of the preset 33 intra prediction angles based on the direction angles of the single-point pose vectors of the feature points in the divided PU blocks is:

Taking the sum of the absolute values of the differences of the direction angles of the single-point attitude vectors of all the characteristic points in each PU block and each intra-frame prediction angle as the angle accumulated deviation of each intra-frame prediction angle;

The ratio of the sum of the angle accumulated deviations of all the intra-prediction angles to the angle accumulated deviation of each intra-prediction angle is taken as the probability index of each intra-prediction angle.

In the above scheme, the accumulated sum of the absolute value of the angle difference between each intra-frame prediction angle and the direction angle is calculated to be used as the angle accumulated deviation of each intra-frame prediction angle, so that the possibility of each intra-frame prediction angle as the prediction direction is evaluated, and the prediction direction is conveniently screened by using the possibility index.

With reference to the first aspect, in a possible implementation manner, the method for encapsulating real-time video in each edge computing device into NAL streams based on the prediction selection queue by using HEVC technology is:

Encoding real-time video into a VCL data sequence based on a prediction selection queue using HEVC techniques;

The VCL data sequence is encapsulated into a NAL stream using a network abstraction layer NAL in HEVC technology, which is stored in each edge node.

In the scheme, the video coding layer and the network extraction layer in the HEVC technology are respectively utilized to process the real-time video to obtain the VCL data sequence and the NAL stream, so that the edge node can conveniently process the real-time video subsequently.

With reference to the first aspect, in one possible implementation manner, the method for encoding real-time video into a VCL data sequence based on a prediction selection queue by using HEVC technology is:

Adding a planar mode and a DC mode in a prediction selection queue to obtain an intra-frame prediction mode;

And encoding the real-time video in a video encoding layer based on the division result of the PU block and the intra-frame prediction mode by utilizing the HEVC technology to obtain a VCL data sequence.

In the scheme, the planar mode and the DC mode are added in the prediction selection queue, so that the high coding efficiency of the real-time video frequent can be ensured after the prediction mode is screened, the coding time of the edge computing equipment when the real-time video is processed is reduced through the effective screening of 35 prediction modes in the HEVC technology, and the processing efficiency is improved.

In a second aspect, there is provided a real-time video processing and intelligent analysis system based on edge computation, the system comprising:

the first acquisition module is used for transmitting the real-time video acquired by the edge node to edge computing equipment of the edge node;

The matching module is used for determining a motion mask image of each video frame based on a detection result of a moving object in each frame in the real-time video, and determining a feature point matching result between the motion mask images based on the matching degree of the moving features of the objects between adjacent video frames in the real-time video;

The second acquisition module is used for respectively determining an attitude motion vector and a single-point attitude vector based on the characteristic point matching result between the motion mask images of the adjacent video frames;

The computing module is used for dividing the real-time video into a large number of coding blocks by using the HEVC technology, and determining the consistency characteristic value between the sub-blocks based on the similarity degree between the single-point gesture vector and the gesture motion vector of the characteristic points in the sub-blocks in each coding block;

The PU dividing module is used for determining a PU dividing mode when the HEVC technology carries out predictive coding on the real-time video based on the consistency characteristic value in each edge computing device to obtain a dividing result of the PU blocks;

The direction selection module is used for determining a prediction selection queue based on the direction angle of the single-point gesture vector of each feature point in the PU block and 33 preset intra-frame prediction angles in the HEVC technology;

And the video processing module is used for encapsulating the real-time video in each edge computing device into NAL streams based on the prediction selection queue by using the HEVC technology to finish the analysis processing of the real-time video.

In a third aspect, a server is provided that includes a memory and a processor. The memory is for storing executable program code and the processor is for calling and running the executable program code from the memory to cause the apparatus to perform the method of the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, a computer readable storage medium is provided, the computer readable storage medium storing computer program code which, when run on a computer, causes the computer to perform the method of the first aspect or any one of the possible implementations of the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a flow chart of a real-time video processing and intelligent analysis method based on edge computation according to an embodiment of the present application;

FIG. 2 is a diagram illustrating real-time video segmentation according to one embodiment of the present application;

FIG. 3 is a schematic flow chart of a real-time video processing and intelligent analysis method based on edge computation according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a gesture motion vector and a single point gesture vector according to one embodiment of the present application;

fig. 5 is a schematic structural diagram of a real-time video processing and intelligent analysis system based on edge computation according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 1, a flowchart of a real-time video processing and intelligent analysis method based on edge computation according to an embodiment of the present application is shown, the method includes the following steps:

and 101, transmitting the real-time video acquired by the edge node to edge computing equipment of the edge node.

In each edge node, a video acquisition device is utilized to acquire local real-time video in real time, and the video acquisition device transmits the real-time video to edge computing equipment in an edge computing system through an edge computing access network ECA (Edge Computing Access). The video acquisition device comprises, but is not limited to, a monitoring camera, a cradle head and a video recording device; the real-time video comprises, but is not limited to, a road real-time monitoring video, a water bottom real-time monitoring video and a real-time target monitoring video; the edge computing devices include, but are not limited to, intelligent video analyzers, intelligent analysis gateways, computers.

102, Determining a motion mask image of each video frame based on a detection result of a moving object in each frame in the real-time video, and determining a feature point matching result between the motion mask images based on a matching degree of an object moving feature between adjacent video frames in the real-time video.

Here, before the real-time video in the edge device of each node is encoded using the HEVC technique, the real-time video needs to be divided into the coding tree blocks CTB, the coding blocks CB, the prediction blocks PB, and the transform blocks TB through sequence division, image division, block division, and the like, respectively, as shown in fig. 2, the real-time video is divided into a plurality of video sequences, each video sequence 21 is divided into a plurality of consecutive video frame groups 22, each video frame group contains a plurality of video frames therein, each video frame 23 is divided into a plurality of coding tree blocks, each coding tree block 24 can be divided into a plurality of coding blocks, and each coding block 25 is composed of a prediction block 26 and a transform block 27. In the coding tree unit CTU, a CU larger than the minimum coding block traverses 8 inter PU partition modes, 2 intra PU partition modes, and skip mode.

For each edge node, each object in motion in the real-time video, the coding blocks corresponding to different areas of the object have a motion direction with higher similarity, and the consistency of motion trends is stronger; in a background area in a video, coding blocks corresponding to different positions are in a static state, and the complexity of image information among sub-blocks is relatively close, so that PU (polyurethane) division can be performed according to the consistency characteristics of the coding blocks.

The motion of an object in a motion state in a real-time video may be complex and changeable, but the motion state of the object on each frame in adjacent video frames has a unique motion gesture, even if the motion amplitudes of different positions of the object are inconsistent, the overall trend of two motion gestures in the adjacent video frames is fixed, i.e. the gesture features of the object have relatively stable and higher similarity. Therefore, in the present application, the prediction direction is screened based on the degree of similarity of the object pose features in the adjacent frames of each frame, and the more similar the prediction direction is to the pose features in the encoded reference block or the adjacent video frame, the more should be taken as the necessary prediction direction. The similarity of the object pose features is determined by the matching result of the feature points in the adjacent video frames, that is, step 102 is implemented by the steps shown in fig. 3:

And 301, determining characteristic points in adjacent video frames in the real-time video by using an optical flow method.

Specifically, any two adjacent video frames in the real-time video are taken as input, the optical flow method is adopted to detect the pixel points belonging to the moving state object in the two video frames, and the connected domain extraction is carried out on the pixel points, so that the moving object in each video frame is obtained. And secondly, setting the gray value of the pixel point outside any connected domain to be 0 in each video frame to obtain a motion mask image of each video frame. The optical flow method is a known technique, and the specific process is not described again.

Further, the motion mask image of each video frame is taken as an input, and feature points within each video frame are output using feature point detection methods including, but not limited to, SIFT feature point detection, SURF feature point detection, FAST feature point detection, ORB feature point detection.

302, Determining a matching result between the feature points in two adjacent video frames by using a feature point matching mode.

Here, the result of feature point matching depends on the degree of similarity between feature points. In one embodiment of the application, feature points in two adjacent video frames are taken as input, and a matcher in an ORB feature point detector is utilized to obtain a feature point matching result.

In another embodiment of the present application, all feature points in each video frame and feature descriptors of each feature point are obtained respectively, and it is to be noted that dimensions of the feature descriptors obtained by each feature point detection method are the same, cosine similarity between each feature point in each video frame and feature descriptors of any feature point in adjacent video frames of each video frame is calculated, and a feature point corresponding to a maximum value of the cosine similarity between feature descriptors of each feature point in each video frame is used as a feature point successfully matched with each feature point in each video frame, and the calculation of the cosine similarity is not described in detail. If a plurality of adjacent video frames have the maximum cosine similarity with one feature point in the current video frame, the Euclidean distance between the feature points is calculated, and the feature point corresponding to the minimum Euclidean distance is used as the feature point successfully matched with the feature point in the current video frame. For example, when the cosine similarity between the feature descriptors in the a+1 frame and the i feature points in the a-th video frame is the maximum, the feature point with the smallest euclidean distance between the i feature points is taken as the feature point with successful matching of the i feature points.

It should be understood that this embodiment only provides a similarity measurement manner between feature descriptors, that is, calculates cosine similarity between feature descriptors, and in other embodiments, other similarity measurement manners may be adopted under the purpose of realizing similarity measurement of feature descriptors between feature points.

103, Respectively determining an attitude motion vector and a single-point attitude vector based on the feature point matching result between the motion mask images of the adjacent video frames.

Here, a pose motion vector is used to characterize the overall motion trend of a moving object in each video frame, and a single pose vector is used to characterize the motion trend of a single feature point in each video frame.

Specifically, in the motion mask image of each video frame, the centroid point is determined as the centroid point of the moving object in the motion mask image based on the coordinate information of all the successfully matched feature points. Taking a vector determined by a connecting line between a centroid point of a moving object in a motion mask image of each video frame and a centroid point of the moving object in an adjacent next frame as a gesture motion vector of each video frame, as shown by 41 in fig. 4; each feature point in the motion mask image of each video frame is taken as a starting point, the feature point successfully matched by the starting point in the motion mask image of the next frame of each video frame is taken as an end point, and a vector pointing from the starting point to the end point is taken as a single point posture vector, as shown at 42 in fig. 4. In the application, the centroid is determined only by using the successfully matched characteristic points, and the gesture motion vector is constructed to represent the overall motion direction, so that when partial pixel points are lost due to the influence of the overlapping phenomena of the foreground and the background, the gesture motion direction of a moving object in continuous video frames can be accurately estimated.

104, Determining the consistency characteristic value between the sub-blocks based on the similarity degree between the single-point gesture vector and the gesture motion vector of the characteristic points in each encoding block.

Here, the consistency characteristic value is used for representing the consistency of the motion trend between two sub-blocks in the same coding block, the greater the consistency characteristic value is, the more similar the motion trend characteristics of the moving objects in the two sub-blocks are, and the higher the similarity of the motion trend characteristics between the sub-blocks is, the more a large-size PU division mode is adopted in the coding block, and coding bits are saved while the rate distortion is smaller; the smaller the consistency characteristic value is, the larger the motion trend characteristic difference of the moving objects in the two sub-blocks is, the larger the difference between the motion directions is, and the finer the coding blocks should be divided, so that the rate distortion performance is better.

Specifically, for each coding block, a pearson correlation coefficient between a single-point pose vector of each feature point in each sub-block of the coding block and a pose motion vector of a video frame where the feature point is located is calculated as a trend feature value of the feature point, and a sequence composed of trend feature values of all feature points in each sub-block is taken as a trend feature sequence of each sub-block.

It should be noted that, in this embodiment, only one similarity measurement manner between vectors is provided, that is, a pearson correlation coefficient between the single-point pose vector and the pose motion vector is calculated, and in other embodiments, other similarity measurement manners between vectors may be adopted for the purpose of realizing the similarity measurement between vectors.

105, In each edge computing device, determining a PU partition mode when the HEVC technology performs predictive coding on the real-time video based on the consistency characteristic value, so as to obtain a partition result of the PU block.

As can be seen from the rate-distortion criterion of video coding, one coding unit is coded with as few bits as possible under a certain distortion, so that when an edge computing device of each edge node encodes real-time video, on the premise that the PU partition mode of the CTU of an inter-prediction frame is conventionally determined based on the variation between the same positions in adjacent video frames, the PU partition mode of the CTU of the coding tree unit is determined based on the consistency characteristic value between the sub-blocks of the coding block. The determination process of the PU partition mode in each CTU is as follows:

S1: calculating the sum of absolute difference values of pixel values of pixels in the CTU with the same space in the current CTU and the previous video frame as an initial difference value of the current CTU, and if the initial difference value of the current CTU is smaller than or equal to V times of the initial difference value of the CTU with the same space in the previous video frame, continuing to use the PU partition mode in the CTU with the same space in the previous video frame, wherein the value of V is 0.5 in the embodiment; otherwise, entering S2;

S2: calculating a consistency characteristic value between sub-blocks in each coding block in the CTU; if the consistency characteristic values among all the sub-blocks in each coding block are equal, carrying out a PU division mode of 2Nx2N and skip; otherwise, determining a PU dividing mode from the consistency characteristic values among the sub-blocks in the horizontal direction and the vertical direction;

S3: and respectively taking the sum of the consistency characteristic values among the sub-blocks in the horizontal direction and the vertical direction as the horizontal consistency characteristic value and the vertical consistency characteristic value. When the horizontal consistency characteristic value is equal to the vertical consistency characteristic value, carrying out a PU division mode of 2N multiplied by 2N; when the horizontal consistency characteristic value is larger than the vertical consistency characteristic value, carrying out PU division modes of 2Nx2N, 2NxN and skip; when the horizontal consistency characteristic value is smaller than the vertical consistency characteristic value, the PU partition mode of 2N multiplied by 2N, N multiplied by 2N and skip is carried out.

Here, the PU dividing condition in each CTU is determined progressively by multiple calculation results between CTUs with the same spatial position between adjacent frames, so as to achieve adaptive division in different CTUs, so that one coding unit is encoded with as few bits as possible when encoding real-time video.

And 106, determining a prediction selection queue based on the direction angle of the single-point gesture vector of each feature point in the PU block and 33 preset intra-frame prediction angles in the HEVC technology.

For each video frame, after the division of the PU blocks is completed according to step 105, the present application further screens the prediction direction based on the division result of the PU blocks, thereby further accelerating the encoding efficiency. For each feature point in the PU, the direction angle of the single-point pose vector of each feature point is calculated, and when in coding, the PU block and the feature points of the moving object in the neighboring pixel blocks should have strong correlation in the motion direction, so that the prediction direction of the necessity is screened based on all the direction angles in the PU block and 33 intra prediction angles in HEVC is considered. In one embodiment of the present application, the process of screening the predicted direction of necessity is as follows:

s1: an angle space of 33 prediction directions is divided on a prediction plane, the difference value between the direction angles of all feature points in the whole PU block and 33 intra-frame prediction angles is counted, and the probability index of each angle is calculated based on the difference value:

， is the likelihood index of the i-th intra prediction angle, Is the sum of the absolute value of the difference between the direction angle of all the feature points and the i-th intra-frame prediction angle,Is the sum of the direction angles of all feature points and the absolute values of the difference values of the 33 intra-prediction angles. The greater the probability index of the ith intra-frame prediction angle is, the more consistent the ith intra-frame prediction angle is with the motion direction of the feature point in the PU block, and the more important the encoding prediction is;

s2: all the 33 kinds of angle values are used as input, the division threshold value is determined by using an Ojin threshold value method, all angles corresponding to the angle values larger than the division threshold value are selected as prediction directions, the selected prediction directions form a prediction selection queue, and the Ojin threshold value algorithm is a known technology and the specific process is not repeated. It should be noted that, if the likelihood index greater than the segmentation threshold is less than 5, selecting 5 corresponding angles with the largest likelihood index to form a queue, so as to ensure the accuracy of inter-frame prediction;

s3: in order to ensure that the prediction mode is filtered and still has higher coding efficiency, a planar mode and a DC mode are added to a prediction selection queue to serve as intra-frame prediction modes. Through the screening process, the number of the whole intra-frame prediction modes is obviously less than 35, so that the coding time is reduced, and the real-time property of coding is improved.

107, Encapsulating the real-time video in each edge computing device into NAL streams based on the prediction selection queue by using HEVC technology, and completing analysis processing of the real-time video.

In one embodiment of the present application, in each edge node, the process of encoding real-time video using HEVC technology is as follows:

And sending the real-time video to a video coding layer of the HEVC technology in edge computing equipment of each edge node, respectively determining a PU dividing mode and an intra-frame prediction mode in a CTU in each inter-frame prediction frame according to the flow, and obtaining a real-time coding result of the real-time video, namely a VCL data sequence, through the video coding layer.

Further, in order to facilitate the subsequent analysis and storage of real-time video of all nodes, the VCL data sequence is delivered to a network abstraction layer NAL, the VCL data sequence is encapsulated into a NAL stream, and the processing of local real-time video data is completed by using each edge computing device. Secondly, after NAL streams are obtained, NAL streams obtained by each edge computing device are transmitted to a video stream storage system of the edge node, and localized data processing is completed at each edge node.

In the application, the feature points are obtained by detecting the moving objects in the adjacent video frames in the real-time video, and the motion mask image of each video frame is constructed, so that the matching result between the feature points in the motion mask images of the adjacent video frames is determined, and the motion centroid of each video frame is convenient to be determined subsequently through the extraction and matching of the feature points; secondly, respectively acquiring an attitude motion vector of each video frame and a single-point attitude vector of each feature point through a feature point matching result in the adjacent video frames, and completing quantification of the overall motion state and single-point movement condition of each video frame; then, obtaining a consistency characteristic value among the sub-blocks by analyzing the similarity degree of object motion trend characteristics among the sub-blocks in the coding block, and determining the PU division mode of the CTU by combining a rate distortion criterion; then, for each inter-frame prediction frame, progressively determining PU dividing conditions in each CTU according to different calculation results, and realizing self-adaptive dividing in different CTUs, so that one coding unit is coded by using fewer bits as possible when real-time video is coded; then, calculating a possibility index of each intra-frame prediction angle based on the angle deviation between the direction angle of the single-point gesture vector of the feature point in the PU block and the preset 33 intra-frame prediction angles in the HEVC technology, and evaluating the possibility that each intra-frame prediction angle is selected; the 33 intra-frame prediction angles are screened based on the possibility indexes, a prediction selection queue is constructed, and prediction modes are obtained, so that the number of the whole intra-frame prediction modes is smaller than 35, the real-time video coding time is shortened, and the real-time performance of video coding is improved; finally, the VCL data sequence is encapsulated into NAL streams through a network extraction layer NAL, and the NAL streams are stored in each edge node to complete localized real-time video processing analysis and storage.

In a second aspect, there is provided a real-time video processing and intelligent analysis system based on edge computation, as shown in fig. 5, the system 500 comprising:

a first obtaining module 501, configured to transmit a real-time video obtained by an edge node to an edge computing device of the edge node;

The matching module 502 is configured to determine a motion mask image of each video frame based on a detection result of a moving object in each frame in the real-time video, and determine a feature point matching result between the motion mask images based on a matching degree of an object moving feature between adjacent video frames in the real-time video;

A second obtaining module 503, configured to determine an pose motion vector and a single-point pose vector based on feature point matching results between motion mask images of adjacent video frames, respectively;

a calculation module 504, configured to divide the real-time video into a plurality of encoding blocks by using an HEVC technology, and determine a consistency feature value between sub-blocks based on a degree of similarity between a single-point pose vector and a pose motion vector of feature points in the sub-blocks in each encoding block;

The PU partitioning module 505 is configured to determine, in each edge computing device, a PU partitioning mode when the HEVC technology performs predictive coding on the real-time video based on the consistency feature value, to obtain a partitioning result of the PU block;

A direction selection module 506, configured to determine a prediction selection queue based on a direction angle of a single-point pose vector of each feature point in the PU block and 33 intra-frame prediction angles preset in the HEVC technology;

And the video processing module 507 is configured to encapsulate the real-time video in each edge computing device into a NAL stream based on the prediction selection queue by using the HEVC technology, so as to complete analysis processing of the real-time video.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. The foregoing description of the preferred embodiments of the present invention is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present invention are intended to be included within the scope of the present invention.

Claims

1.A real-time video processing and intelligent analysis method based on edge computation, characterized in that the method comprises the following steps:

encapsulating the real-time video in each edge computing device into NAL streams based on the prediction selection queue by using HEVC technology, and completing analysis processing of the real-time video;

the method for respectively determining the gesture motion vector and the single-point gesture vector based on the feature point matching result between the motion mask images of the adjacent video frames comprises the following steps:

2. The method for real-time video processing and intelligent analysis based on edge calculation according to claim 1, wherein the method for determining the motion mask image of each video frame based on the detection result of the moving object in each frame in the real-time video is as follows:

3. The method for real-time video processing and intelligent analysis based on edge calculation according to claim 1, wherein the method for determining the feature point matching result between the motion mask images based on the matching degree of the object moving features between the adjacent video frames in the real-time video is as follows:

4. The method for real-time video processing and intelligent analysis based on edge computation according to claim 1, wherein the method for determining the consistency characteristic value between sub-blocks based on the similarity degree between the single-point gesture vector and the gesture motion vector of the characteristic point in each coding block is as follows:

5. The method for real-time video processing and intelligent analysis based on edge computing according to claim 1, wherein the method for determining, in each edge computing device, the PU partition mode when performing predictive coding on the real-time video by using the HEVC technology based on the consistency characteristic value is as follows:

6. The method for determining a prediction selection queue based on the direction angle of the single-point posture vector of each feature point in the PU block and 33 intra-frame prediction angles preset in the HEVC technology according to claim 1 is characterized in that:

7. The method for real-time video processing and intelligent analysis based on edge calculation according to claim 6, wherein the method for determining the probability index of the preset 33 intra prediction angles based on the direction angle of the single-point posture vector of the feature points in the divided PU blocks is as follows:

8. The method for real-time video processing and intelligent analysis based on edge computing according to claim 1, wherein the method for encapsulating real-time video in each edge computing device into NAL streams based on the prediction selection queue by using HEVC technology is as follows:

9. The method for real-time video processing and intelligent analysis based on edge computation according to claim 8, wherein the method for encoding real-time video into VCL data sequence based on prediction selection queue by using HEVC technique is as follows: