CN118587415B

CN118587415B - A video target detection and tracking method and system based on artificial intelligence

Info

Publication number: CN118587415B
Application number: CN202410639450.9A
Authority: CN
Inventors: 曾青青; 赵小蕾; 张海
Original assignee: Guangzhou Xinhua College
Current assignee: Guangzhou Xinhua College
Priority date: 2024-05-22
Filing date: 2024-05-22
Publication date: 2025-01-24
Anticipated expiration: 2044-05-22
Also published as: CN118587415A

Abstract

The invention discloses a video target detection tracking method and system based on artificial intelligence, which belong to the technical field of video monitoring, and the method comprises the steps of marking a video target, extracting a target image of the video target, carrying out corresponding verification and correction, carrying out gray processing on the target image to obtain a target gray image, analyzing the target gray image to obtain initial identification characteristics of the video target, identifying each pixel block in the initial identification characteristics, determining a corresponding positioning block according to block attributes corresponding to each pixel block, wherein the positioning block is used for determining the pixel block of the video target or a combined block between the pixel blocks, acquiring a monitoring video in real time, carrying out gray processing on the monitoring video to obtain a corresponding monitoring gray image, and determining a corresponding positioning matching block in the monitoring gray image according to the positioning block, and fifth, identifying the corresponding video target according to the positioning matching block and the initial identification characteristics.

Description

Video target detection tracking method and system based on artificial intelligence

Technical Field

The invention belongs to the technical field of video monitoring, and particularly relates to a video target detection tracking method and system based on artificial intelligence.

Background

With the rapid development of information technology, video monitoring technology is widely used in various fields. Traditional video monitoring mainly relies on manual monitoring, and the method is not only low in efficiency, but also easily misses key information. In recent years, with the continuous progress of artificial intelligence technology, particularly development of deep learning and computer vision technology, automatic video object detection and tracking is enabled. Therefore, the invention provides the video target detection tracking method and system based on artificial intelligence, which are used for better and faster video target detection tracking and are the directions of solving and improving the current needs.

Disclosure of Invention

In order to solve the problems of the scheme, the invention provides a video target detection tracking method and system based on artificial intelligence.

The aim of the invention can be achieved by the following technical scheme:

a video target detection tracking method based on artificial intelligence includes:

Marking a video target, extracting a target image of the video target, and performing corresponding verification and correction;

step two, carrying out gray processing on a target image to obtain a target gray image, and analyzing the target gray image to obtain initial identification characteristics of a video target;

further, the method for analyzing the target gray scale image comprises the following steps:

The method comprises the steps of identifying gray values of pixels in a target gray image, merging according to the gray values of adjacent pixels to obtain a plurality of pixel blocks and block attributes corresponding to the pixel blocks, marking the pixel blocks in the target gray image according to the block attributes and the corresponding block attributes, and dividing the target gray image to obtain initial identification features corresponding to video targets.

Further, the method for merging according to the gray value between each adjacent pixel includes:

step SA1, marking each pixel as a single sample, identifying a gray level difference value between gray level values corresponding to each adjacent single sample, merging the adjacent single samples with the gray level difference value smaller than a threshold value X1 to obtain a merged sample, and calculating the gray level value corresponding to the merged sample according to a merging formula;

Step SA2, determining a sample to be combined of the combined samples, calculating a gray level difference value between the combined samples and the sample to be combined, combining the combined samples with the gray level difference value smaller than a threshold value X1 with the sample to be combined to obtain a new combined sample, and calculating the gray level value of the new combined sample according to a combining formula;

step SA3, the step SA2 is circulated until the merging samples do not meet the merging requirements and the samples to be merged are marked as pixel blocks;

And step SA4, judging whether a merging block exists, returning to step SA3 when the merging block exists, identifying the corresponding part of each pixel block when the merging block does not exist, determining the corresponding stable value according to the identified part, identifying the block shape, the position relation and the gray value of each pixel block, and integrating the obtained stable value, the part, the block shape, the position relation and the gray value into the block attribute of the corresponding pixel block.

Further, the combining formula is: wherein Hd is the gray value of the combined sample, HB is the sum of the gray values corresponding to the single samples in the combined sample, and L is the number of the single samples in the combined sample.

Further, the method for determining the stable value includes:

matching initial values corresponding to all the parts according to the identified parts;

Acquiring current environmental information, analyzing each part according to the acquired environmental information, and acquiring an adjustment coefficient corresponding to each part;

and calculating a corresponding stable value according to the formula WD=CZ×τ, wherein WD is the stable value, CZ is the initial value, and τ is the adjustment coefficient.

Step three, identifying each pixel block in the initial identification feature, and determining a corresponding positioning block according to the block attribute corresponding to each pixel block, wherein the positioning block is used for determining the pixel block of a video target or a combined block among the pixel blocks;

The method for determining the positioning block comprises the following steps:

identifying block attributes of each pixel block, wherein the block attributes comprise stable values, parts, block shapes, position relations and gray values;

Identifying each pixel block in the first combination, and marking the pixel block as a unit block;

Identifying combined shape data for the first combination;

Identifying stable values corresponding to the unit blocks, and selecting the lowest stable value in the unit blocks as a representative stable value of the first combination;

Calculating a corresponding positioning evaluation value according to the formula QYU = (b1×wdb) × (b2×ba);

Wherein QYU is a positioning evaluation value, b1 and b2 are both proportionality coefficients, the value range is 0< b1 less than or equal to 1,0< b2 less than or equal to 1, WDB is a stable value, BA is a first shape value;

and selecting the first combination with the largest positioning evaluation value as a positioning block.

Further, the calculating method of the first shape value includes:

Determining a reference background according to the environmental background characteristics and the combined shape data, identifying a reference similarity value between the reference background and the combined shape data, and counting the reference probability of the reference background;

Performing boundary assimilation evaluation on the combined shape data and the environmental background characteristics according to a preset assimilation evaluation model, wherein the expression of the assimilation evaluation model is as follows Wherein x is input data;

According to the formula And calculating a corresponding first shape value, wherein BA is the first shape value, SL is the reference similarity value and gL is the reference probability.

Acquiring a monitoring video in real time, carrying out gray processing on the monitoring video to obtain a corresponding monitoring gray image, and determining a corresponding positioning matching block in the monitoring gray image according to a positioning block;

further, the method for determining the positioning matching block comprises the following steps:

recognizing the gray value of each pixel in the monitoring gray image, and merging according to the gray value between each two adjacent pixels to obtain a plurality of monitoring blocks and the gray value and the block shape corresponding to each monitoring block;

recognizing the gray value and the block shape of each pixel block in the positioning block, and screening each monitoring block according to the block shape of the pixel block to obtain a screened monitoring gray image;

traversing the positioning block on the monitoring gray level image, calculating positioning matching values in real time, comparing the obtained positioning matching values, and determining a positioning matching block.

Further, the calculation method of the positioning matching value comprises the following steps:

marking gray values of pixel blocks in the positioning blocks as hi, wherein i represents corresponding pixel blocks, i=1, 2, &..the., n is a positive integer;

identifying a monitoring block corresponding to each pixel block on the monitoring gray image, and marking the gray value of the corresponding monitoring block as ki;

According to the formula And calculating a corresponding positioning matching value, wherein DPW is the positioning matching value.

And fifthly, identifying the corresponding video target according to the positioning matching block and the initial identification feature.

A video target detection tracking system based on artificial intelligence comprises a target analysis module, a positioning module and a tracking module;

The target analysis module is used for analyzing the marked video target and determining corresponding initial identification characteristics;

The positioning module is used for determining a positioning block corresponding to the video target according to the initial identification characteristics;

The tracking module is used for identifying and tracking the video target, acquiring the monitoring video in real time, carrying out gray processing on the monitoring video to obtain a corresponding monitoring gray image, determining a corresponding positioning matching block in the monitoring gray image according to the positioning block, and identifying the corresponding video target according to the positioning matching block and the initial identification feature.

Compared with the prior art, the invention has the beneficial effects that:

The method and the device realize real-time identification and tracking of the video target, particularly realize quick and efficient positioning of the video target by quickly identifying the corresponding positioning matching block through the positioning block, and are convenient for quick operation due to the fact that a model, an algorithm and the like with high resource occupancy rate are rarely configured in the operation process. The method and the device realize automatic real-time detection and tracking of the targets in the video stream without manual intervention, thereby greatly improving the monitoring efficiency.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.

Fig. 1 is a functional block diagram of the present invention.

Detailed Description

The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, a video target detecting and tracking method based on artificial intelligence, the method comprises:

Marking as a video target to be detected and tracked, extracting a target image of the video target, and carrying out corresponding verification and correction, namely showing the extracted target image to a manager, determining whether the extracted target image is a marked complete image, and if not, carrying out corresponding adjustment such as cutting and the like;

performing gray processing on the verified and adjusted target image to obtain a target gray image, and analyzing the target gray image to obtain initial identification characteristics of a video target;

the method for analyzing the target gray level image comprises the following steps:

the method comprises the steps of identifying gray values of pixels in a target gray image, merging according to the gray values of adjacent pixels to obtain a plurality of pixel blocks and block attributes corresponding to the pixel blocks, marking the pixel blocks in the target gray image according to the block attributes and the corresponding block attributes, dividing the target gray image, namely dividing the target gray image marked corresponding to a video target, and obtaining initial identification features corresponding to the video target.

The method for merging according to gray values between adjacent pixels comprises the following steps:

the combining formula is: wherein Hd is the gray value of the combined sample, HB is the sum of the gray values corresponding to the single samples in the combined sample, and L is the number of the single samples in the combined sample.

Step SA2, determining samples to be combined of the combined samples, namely each single sample adjacent to the combined samples and the combined samples, calculating a gray level difference value between the combined samples and the samples to be combined, combining the combined samples with the gray level difference value smaller than a threshold value X1 with the samples to be combined to obtain new combined samples, and calculating the gray level value of the new combined samples according to a combining formula;

Step SA3, namely, the step SA2 is cycled until the merging samples do not meet the merging requirements of the samples to be merged, and the corresponding merging samples are marked as pixel blocks, wherein the merging requirements are that the gray level difference value is smaller than a threshold value X1;

And step SA4, judging whether a merging block exists, returning to step SA3 when the merging block exists, when the merging block does not exist, indicating that all merging is completed, identifying the corresponding part of each pixel block when the merging block does not exist, determining the corresponding stable value according to the identified part, identifying the block shape, the position relation and the gray value of each pixel block, wherein the position relation refers to the connection and the adjacent relation, and integrating the obtained stable value, the part, the block shape, the position relation and the gray value into the block attribute of the corresponding pixel block.

The method for determining the stable value comprises the following steps:

Identifying the corresponding parts of each pixel block, such as the relevant parts of coat, palm, hair, face, eyes and the like for people;

Counting various parts possibly encountered in video detection tracking application, counting the variation probability of each part, such as the variation probability of changing clothes, removing clothes and the like, and counting the corresponding probability value, namely removing percentage numbers;

Analyzing each part according to the obtained environment information to obtain corresponding adjustment coefficients of each part, namely setting corresponding adjustment coefficients according to the influence of the current environment, specifically establishing a corresponding intelligent model based on a CNN network or a DNN network and the like, establishing a corresponding training set by a manual mode to train, wherein the training set comprises input data and output data, the input data is part and environment information, the output data is adjustment coefficients, and analyzing the intelligent model after the training is successful to obtain the corresponding adjustment coefficients;

Step three, identifying each pixel block in the initial identification characteristic, and determining a corresponding positioning block according to the block attribute corresponding to each pixel block, wherein the positioning block is used for rapidly determining the pixel block of a video target or a combined block among the pixel blocks;

The method for determining the positioning block comprises the following steps:

Determining various pixel block combination modes according to the position relation among the pixel blocks, marking the pixel blocks as a first combination, and singly or in combination;

The method comprises the steps of identifying combined shape data of a first combination, wherein the combined shape data comprises data of the overall boundary shape, the area, the gray value and the like of each unit block, acquiring preset environmental background characteristics, wherein the environmental background characteristics are used for representing various shape data possibly existing in a current environment, and the environmental background under the monitoring background is basically fixed, controllable and predictable, so that corresponding environmental background characteristics can be preset according to actual conditions;

determining a background most similar to the combined shape data according to preset environmental background characteristics, marking the background as a reference background, identifying the similarity between the reference background and the combined shape data, marking the similarity as a reference similarity value, counting the occurrence probability of the reference background, counting according to historical data in a period of time, and marking the similarity as a reference probability;

According to the combination shape data and environmental background characteristics, making boundary assimilation evaluation, i.e. if the cell blocks positioned at the boundary in the first combination are possibly assimilated with environment, for example, they are all positive red, their gray values are not greatly different, they are easy to be compatible with environment, their specificity is insufficient, and are not favourable for quick identification from environment, presetting a gray value difference value interval, according to the gray value difference value interval and the occurrence probability of said background meeting the gray value difference value interval setting assimilation evaluation standard, i.e. it is necessary to reach gray value difference value interval and be higher than correspondent occurrence probability to make assimilation evaluation standard, according to the assimilation evaluation standard setting correspondent assimilation evaluation model, its expression is Wherein x is input data, and is the combination of shape data and environmental background characteristics;

Calculating a corresponding positioning evaluation value according to a formula QYU = (b1×WDB) x (b2×BA), wherein QYU is the positioning evaluation value, b1 and b2 are both proportionality coefficients, the value range is 0< b1 less than or equal to 1,0< b2 less than or equal to 1, WDB is a representative stable value, and BA is a first shape value.

The positioning block is an optimal pixel block, the identification and the positioning are convenient, the number of the pixel blocks is preferably not more than 5, and the identification efficiency and the identification precision are higher as the number of the pixel blocks is smaller.

Step four, acquiring a monitoring video in real time, carrying out gray processing on a current monitoring picture to obtain a monitoring gray image, identifying gray values of pixels in the monitoring gray image, merging according to the gray values between adjacent pixels to obtain a plurality of pixel blocks and gray values and block shapes corresponding to the pixel blocks;

the gray value and the block shape of each pixel block corresponding to the positioning block are identified, each monitoring block is screened according to the block shape of the pixel block, namely, the corresponding monitoring block of the shape which is not changed by analyzing the pixel block according to the actual situation is removed, and a corresponding identification evaluation model is established according to the basic common sense and the prior art for evaluation;

Marking gray values of pixel blocks in a positioning block as hi, wherein i represents a corresponding pixel block, i=1, 2, &..the., n is a positive integer;

The method comprises the steps of traversing the positioning blocks on the monitoring gray level image, calculating positioning matching values in real time, directly skipping if the positioning blocks correspond to the removed monitoring blocks, calculating the positioning matching values instead of a cavity on the monitoring gray level image, comparing the obtained positioning matching values to determine positioning matching blocks, namely, combining the monitoring blocks corresponding to the positioning blocks with the minimum positioning matching values, wherein the positioning matching blocks are image blocks on the monitoring gray level image and are part of a video target.

The calculation method of the positioning matching value comprises the following steps:

The method comprises the steps of identifying the monitoring blocks corresponding to each pixel block on a monitoring gray image, marking the gray value of the corresponding monitoring block as ki, representing the corresponding to the corresponding pixel block, and in practical application, generally, carrying out quick matching on the corresponding monitoring block by means of the corresponding intelligent model, namely, directly determining the corresponding monitoring block in the moving process of a positioning block, then carrying out direct calculation according to the corresponding gray value, thereby improving the efficiency, and particularly, establishing the corresponding intelligent model according to the prior art, such as establishing the corresponding intelligent model based on a neural network.

According to the formulaAnd calculating a corresponding positioning matching value, wherein DPW is the positioning matching value.

The target analysis module is used for analyzing the marked video target and determining the corresponding initial identification characteristic.

The positioning module is used for determining a positioning block corresponding to the video target according to the initial identification characteristics.

The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas which are obtained by acquiring a large amount of data and performing software simulation to obtain the closest actual situation, and preset parameters and preset thresholds in the formulas are set by a person skilled in the art according to the actual situation or are obtained by simulating a large amount of data.

The above embodiments are only for illustrating the technical method of the present invention and not for limiting the same, and it should be understood by those skilled in the art that the technical method of the present invention may be modified or substituted without departing from the spirit and scope of the technical method of the present invention.

Claims

1. A video target detection and tracking method based on artificial intelligence, characterized in that the method comprises:

Step 1: Mark the video target, extract the target image of the video target, and perform corresponding verification and correction;

Step 2: grayscale processing is performed on the target image to obtain a target grayscale image, and the target grayscale image is analyzed to obtain initial recognition features of the video target;

Step 3: Identify each pixel block in the initial recognition feature, and determine the corresponding positioning block according to the block attribute corresponding to each pixel block, and the positioning block is used to determine the pixel block of the video target or the combination block between the pixel blocks;

Step 4: Acquire the surveillance video in real time, perform grayscale processing on the surveillance video to obtain a corresponding surveillance grayscale image, and determine a corresponding positioning matching block in the surveillance grayscale image according to the positioning block;

Step 5: Identify the corresponding video target based on the positioning matching block and the initial recognition features;

The method for determining the positioning block includes:

Identifying block attributes of each pixel block, the block attributes including stability value, location, block shape, position relationship and grayscale value;

Determine each first combination according to the positional relationship between each pixel block; identify each pixel block in the first combination and mark it as a unit block;

identifying the combined shape data of the first combination; calculating the first shape value;

Identify the stable value corresponding to each unit block, and select the lowest stable value in each unit block as the representative stable value of the first combination;

Calculate the corresponding positioning evaluation value according to the formula QYU=(b1×WDB)×(b2×BA);

Where: QYU is the positioning evaluation value; b1 and b2 are both proportional coefficients, with a value range of 0<b1≤1, 0<b2≤1; WDB is the representative stability value; BA is the first shape value;

Select the first combination with the largest positioning evaluation value as the positioning block;

The calculation method of the first shape value includes:

Obtaining preset environmental background features; determining a reference background according to the environmental background features and the combined shape data, identifying a baseline similarity value between the reference background and the combined shape data, and calculating a baseline probability of the reference background;

The boundary assimilation evaluation of the combined shape data and environmental background features is performed according to the preset assimilation evaluation model. The expression of the assimilation evaluation model is: ;Where: x is the input data;

According to the formula Calculate the corresponding first shape value; where BA is the first shape value; SL is the benchmark similarity value; gL is the benchmark probability;

The method for determining the positioning matching block includes:

Identify the grayscale value of each pixel in the monitoring grayscale image, merge the grayscale values of adjacent pixels according to the grayscale values, and obtain a number of monitoring blocks and the grayscale value and block shape corresponding to each monitoring block;

Identify the grayscale value and block shape of each pixel block in the positioning block, filter each monitoring block according to the block shape of the pixel block, and obtain a filtered monitoring grayscale image;

The positioning block is moved traversally on the monitoring grayscale image, the positioning matching value is calculated in real time, and the obtained positioning matching values are compared to determine the positioning matching block;

The calculation method of positioning matching value includes:

The gray value of the pixel block in the positioning block is marked as hi, where i represents the corresponding pixel block, i=1, 2, ..., n, and n is a positive integer;

Identify the monitoring blocks corresponding to each pixel block on the monitoring grayscale image, and mark the grayscale value of the corresponding monitoring block as ki;

According to the formula Calculate the corresponding positioning matching value; where: DPW is the positioning matching value.

2. According to the video target detection and tracking method based on artificial intelligence in claim 1, it is characterized in that the method of analyzing the target grayscale image comprises:

Identify the grayscale value of each pixel in the target grayscale image, merge the grayscale values of adjacent pixels, and obtain a number of pixel blocks and block attributes corresponding to each pixel block; mark each pixel block in the target grayscale image according to the block attributes, and mark the corresponding block attributes; segment the target grayscale image to obtain initial recognition features corresponding to the video target.

3. The method for video target detection and tracking based on artificial intelligence according to claim 2, wherein the method for merging according to the grayscale values between adjacent pixels comprises:

Step SA1: Mark each pixel as a single sample, identify the grayscale difference between the grayscale values corresponding to each adjacent single sample, merge the adjacent single samples whose grayscale difference is less than the threshold X1 to obtain a merged sample, and calculate the grayscale value corresponding to the merged sample according to the merging formula;

Step SA2: determine the sample to be merged of the merged sample; calculate the grayscale difference between the merged sample and the sample to be merged, merge the merged sample whose grayscale difference is less than the threshold X1 with the sample to be merged to obtain a new merged sample, and calculate the grayscale value of the new merged sample according to the merging formula;

Step SA3: looping step SA2 until there is no sample to be merged that meets the merging requirements, marking the corresponding merged sample as a pixel block;

Step SA4: Determine whether there is a merged block. If there is a merged block, return to step SA3; if there is no merged block, identify the corresponding part of each pixel block, determine the corresponding stable value according to the identified part, and identify the block shape, position relationship and grayscale value of each pixel block; integrate the obtained stable value, part, block shape, position relationship and grayscale value into the block attribute of the corresponding pixel block.

4. According to the video target detection and tracking method based on artificial intelligence in claim 3, it is characterized in that the merging formula is: ; Where: Hd is the grayscale value of the merged sample; HB is the sum of the grayscale values corresponding to each single sample in the merged sample; L is the number of single samples in the merged sample.

5. The video target detection and tracking method based on artificial intelligence according to claim 3 is characterized in that the method for determining the stability value comprises:

Matching the initial values corresponding to each part according to the identified part;

Obtain current environmental information, analyze each part according to the obtained environmental information, and obtain the adjustment coefficient corresponding to each part;

Calculate the corresponding stable value according to the formula WD=CZ×τ;

Where: WD is the stable value; CZ is the initial value; τ is the adjustment coefficient.

6. A video target detection and tracking system based on artificial intelligence, characterized in that it executes the video target detection and tracking method based on artificial intelligence according to any one of claims 1 to 5, including a target analysis module, a positioning module and a tracking module;

The target analysis module is used to analyze the marked video target and determine the corresponding initial recognition features;

The positioning module is used to determine the positioning block corresponding to the video target according to the initial recognition feature;

The tracking module is used to identify and track video targets, obtain surveillance videos in real time, perform grayscale processing on the surveillance videos, obtain corresponding surveillance grayscale images, determine corresponding positioning matching blocks in the surveillance grayscale images according to the positioning blocks; and identify corresponding video targets according to the positioning matching blocks and initial recognition features.