CN119723426A

CN119723426A - Intelligent analysis method and system for video stream based on neural network

Info

Publication number: CN119723426A
Application number: CN202510211513.5A
Authority: CN
Inventors: 赵燕东; 曾新; 徐笔荣
Original assignee: Shenzhen Geek Intelligent Technology Co ltd; Shenzhen Baitong Xuanwu Technology Co ltd
Current assignee: Shenzhen Geek Intelligent Technology Co ltd; Shenzhen Baitong Xuanwu Technology Co ltd
Priority date: 2025-02-25
Filing date: 2025-02-25
Publication date: 2025-03-28

Abstract

The present invention relates to the field of image processing technology, and is a method and system for realizing intelligent analysis of video streams based on neural networks, comprising: confirming and receiving a video decomposition instruction from a video decomposition unit, parsing the video decomposition instruction, obtaining a first decomposition frame number, obtaining a decomposition image timing sequence using the first decomposition frame number and an initial video stream, obtaining a decomposition video time point set based on the decomposition image timing sequence, obtaining a target video stream set based on the decomposition video time point set and the initial video stream, obtaining a second decomposition frame number based on the target video stream, confirming an abnormal image set based on the second decomposition frame number and the target video stream, and using a result feedback unit to send the abnormal image set to the initiator of the analysis instruction, thereby realizing intelligent analysis of the initial video stream. The present invention can improve the accuracy and intelligence of analyzing video streams and reduce the resources required for analyzing video streams.

Description

Intelligent analysis method and system for realizing video stream based on neural network

Technical Field

The invention relates to the technical field of image processing, in particular to an intelligent analysis method and system for realizing video streaming based on a neural network.

Background

With the popularization of the internet of things technology, more and more monitoring is applied to daily life, real-time shooting of different areas can be realized through monitoring, and if an emergency occurs in the area, the shot video can be used as important video evidence. Correspondingly, how to realize intelligent and resource-saving analysis of the video stream has important significance for confirming the accuracy of the initial video stream.

Currently, for analysis of video streams, most of the people adopt a frame-by-frame analysis mode to screen possible abnormal images from the video streams.

Although the method can analyze the video stream, a great deal of manpower resources are required to be consumed when the video stream is analyzed frame by frame, the judgment of the abnormal images in the video is mostly dependent on personal experience, and the problem of inaccurate screening of the abnormal images may exist. Therefore, how to intelligently, accurately and energy-saving analyze video streams becomes a problem to be solved.

Disclosure of Invention

The invention provides an intelligent analysis method for realizing video stream based on a neural network and a computer readable storage medium, which mainly aim at improving the accuracy and the intelligent degree of analysis of the video stream and reducing the resources required by the analysis of the video stream.

In order to achieve the above object, the present invention provides an intelligent analysis method for implementing video streaming based on a neural network, including:

receiving an analysis instruction, and confirming an intelligent analysis environment based on the analysis instruction, wherein the intelligent analysis environment comprises an initial video stream and a video analysis system, and the video analysis system comprises a video decomposition unit, an image recognition unit, an image comparison unit and a result feedback unit;

Confirming to receive a video decomposition instruction from a video decomposition unit, analyzing the video decomposition instruction to obtain a first decomposition frame number, and acquiring a decomposition image time sequence by using the first decomposition frame number and an initial video stream, wherein the decomposition image time sequence comprises a plurality of decomposition image nodes, and the decomposition image nodes comprise decomposition images and image time;

acquiring a decomposed video time point set based on the decomposed image time sequence, wherein the decomposed video time point set comprises M decomposed video time points, M is an integer greater than or equal to 0, and a target video stream set is acquired based on the decomposed video time point set and an initial video stream, wherein the target video stream set comprises one or more target video streams, and the following operation is performed on each target video stream in the target video stream set:

And acquiring a second analysis frame number based on the target video stream, and confirming an abnormal image set based on the second analysis frame number and the target video stream, wherein the abnormal image set comprises N abnormal images, N is an integer greater than or equal to 0, and the abnormal image set is sent to an initiating end of an analysis instruction by utilizing a result feedback unit so as to realize intelligent analysis of the initial video stream.

Optionally, the acquiring the set of resolved video time points based on the resolved image timing includes:

acquiring analysis gray image time sequences based on the decomposition image time sequences, and executing the following operation on each analysis gray image in the analysis gray image time sequences:

acquiring an analysis gray level average value based on the analysis gray level image, wherein the analysis gray level average value is an average value of a plurality of gray level values corresponding to the analysis gray level image;

correlating the analysis gray average value with the image time to obtain initial analysis nodes, and summarizing the initial analysis nodes to obtain an initial analysis node set;

obtaining one or more target analysis node sets by utilizing a pre-constructed clustering model and an initial analysis node set;

a set of resolved video time points is identified in the initial video stream based on the one or more target analysis node sets.

Optionally, the acquiring the second analysis frame number based on the target video stream includes:

Acquiring an initial frame number and video duration of a target video stream, acquiring the number of images based on the initial frame number and the video duration, acquiring an image extraction gradient set for extracting images, and extracting an image time sequence from the target video stream based on the number of images and the image extraction gradient set, wherein the image extraction gradient set comprises a plurality of image extraction ratios, and the image time sequence comprises a plurality of initial images;

Confirming receipt of an image recognition instruction from an image recognition unit, confirming an image recognition model set based on the image recognition instruction, wherein the image recognition model set includes a plurality of image recognition models, sequentially extracting initial images from the image timing, and performing the following operations on the extracted initial images:

Randomly extracting two image recognition models from the image recognition model set to obtain a first recognition model and a second recognition model, and recognizing the extracted initial image based on the first recognition model and the second recognition model to obtain a first recognition name set and a second recognition name set, wherein the first recognition name set comprises a plurality of first recognition nodes, the first recognition nodes comprise first recognition names and first recognition quantity, the second recognition name set comprises a plurality of second recognition nodes, and the second recognition nodes comprise second recognition names and second recognition quantity;

After the first recognition name set and the second recognition name set are utilized to confirm the target recognition name set, wherein the target recognition name set comprises a plurality of target recognition nodes, the target recognition nodes comprise target recognition names and target recognition quantity, and the following operations are executed on the target recognition names in the target recognition name set:

Identifying local area images corresponding to the target identification names in the initial images, identifying the local area images by utilizing the target identification names and the target identification quantity to obtain target images with serial numbers identified, gathering the target images to obtain a target image set, and acquiring a second analysis frame number by utilizing the target image set.

Optionally, the identifying the target recognition name set by using the first recognition name set and the second recognition name set includes:

the following operations are performed for each first identification node in the first identification name set:

judging whether the second identification node exists in the second identification name set and is the same as the first identification node;

If the second recognition node is not in the second recognition name set and is the same as the first recognition node, randomly extracting a plurality of initial image recognition models from the image recognition model set, wherein the number of the plurality of initial image recognition models is the preset extraction number, and acquiring a plurality of initial recognition name sets by using the plurality of initial image recognition models and the initial images, wherein the initial image recognition models are in one-to-one correspondence with the initial recognition name sets, and the initial recognition name sets comprise a plurality of initial recognition nodes;

Combining a plurality of initial recognition name sets in a combined mode to obtain a plurality of combined recognition name sets, wherein the combined recognition name sets comprise two initial recognition name sets, one or more fusion recognition name sets are confirmed in the plurality of combined recognition name sets, and the two initial recognition name sets in the fusion recognition name sets are identical;

removing any initial recognition name set corresponding to each fusion recognition name set in one or more fusion recognition name sets from the multiple initial recognition name sets to obtain an updated recognition name set, taking the updated recognition name set as the multiple initial recognition name sets, and returning to the step of combining the multiple initial recognition name sets in a combined mode until one or more initial fusion name sets are obtained, wherein the multiple initial recognition name sets corresponding to the initial fusion name sets are identical;

Respectively counting the number of initial identification name sets corresponding to each initial fusion name set in one or more initial fusion name sets to obtain one or more initial fusion numbers, wherein the initial fusion numbers correspond to the initial fusion name sets one by one;

Extracting the maximum initial fusion number from one or more initial fusion numbers to obtain a target checking number, calculating the ratio of the target checking number to the extraction number to obtain a correct proportion, and taking a fusion identification name set corresponding to the target checking number as a target identification name set when the correct proportion is greater than or equal to a preset proportion threshold;

Otherwise, returning to the step of randomly extracting a plurality of initial image recognition models from the image recognition model set until a target recognition name set is obtained.

Optionally, the acquiring the second analysis frame number by using the target image set includes:

summarizing the target image sets to obtain a plurality of target image sets, and acquiring target image time sequences by using target identification names, sequence numbers and target images;

Sequentially extracting initial analysis images from the target image time sequence, and executing the following operations on the extracted initial analysis images:

Confirming a target analysis image in a target image time sequence based on the initial analysis image, wherein the target analysis image is adjacent to and lags behind the initial analysis image in the target image time sequence;

acquiring an initial coordinate point set and a target coordinate point set based on the initial analysis image and the target analysis image;

Mapping the initial coordinate points in the initial coordinate point set and the target coordinate points in the target coordinate point set into a pre-constructed reference coordinate system respectively to obtain a mapped coordinate point set, wherein the mapped coordinate point set comprises a plurality of mapped coordinate points;

Counting the number of reference coordinate points corresponding to each mapping coordinate point in a mapping coordinate point set to obtain a reference number set, wherein the reference coordinate points are initial coordinate points or target coordinate points, the reference number set comprises a plurality of reference numbers, counting the number of reference numbers in the reference number set as 2 to obtain a target reference number, counting the number of mapping coordinate points in the mapping coordinate point set to obtain a comprehensive number, calculating the ratio of the target reference number to the comprehensive number to obtain a reference number proportion, taking a target analysis image as the initial analysis image if the reference number proportion is smaller than or equal to a preset reference proportion threshold, returning the initial analysis image, confirming the target analysis image in a target image time sequence, identifying the target image as a background image if the reference number proportion of the reference number set corresponding to the target image time sequence is smaller than or equal to the reference proportion threshold, otherwise, identifying the target image as an initial detection image, and after confirming the initial detection image as a preset target detection image, utilizing the target detection image to identify an image set in the initial image, wherein the adjacent image set comprises one or more adjacent images, and the adjacent image is acquired as the target detection image or the adjacent image and the second frame number and the second analysis image is based on the adjacent image.

Optionally, the confirming that the initial detection image is a preset target detection image includes:

Acquiring a detection image time sequence according to the target identification name, the sequence number and the initial detection images, wherein the detection image time sequence comprises a plurality of initial detection images, performing graying operation on each initial detection image in the detection image time sequence to obtain a gray detection image time sequence, and sequentially extracting analysis image groups in the gray detection image time sequence based on a preset sliding window, wherein the analysis image groups comprise two gray detection images;

the following is performed for each gray detection image in the analysis image group:

The image center coordinates are calculated based on the gray detection image, and the calculation formula is as follows:

,

Wherein, Respectively represent the abscissa and the ordinate of the center coordinate of the image,Representing sharing of gray scale detection imagesThe number of pixels in a pixel is one,Representing the first in the gray scale detected imageThe gray values corresponding to the individual pixels,Respectively representing the first of the gray scale detection imagesThe abscissa and the ordinate corresponding to the pixel points;

Summarizing the image center coordinates to obtain an image center coordinate set, calculating the comprehensive image change degree based on the image center coordinate set, and judging the comprehensive image change degree and a preset comprehensive change degree threshold;

and if the comprehensive image change degree is greater than or equal to the comprehensive change degree threshold value, confirming that the initial detection image is a target detection image.

Optionally, the calculating the comprehensive image change degree based on the image center coordinate set includes:

Acquiring an image center coordinate time sequence based on the image center coordinate set, sequentially extracting analysis coordinate sets from the image center coordinate time sequence by utilizing the sliding window, wherein the analysis coordinate sets comprise two image center coordinates, and acquiring Euclidean distances of the two image center coordinates in the analysis coordinate sets to obtain a moving distance;

Extracting a first image center coordinate and a last image center coordinate from an image center coordinate time sequence to obtain an initial center coordinate and a target center coordinate, and acquiring an evaluation distance based on the initial center coordinate and the target center coordinate;

Summarizing the moving distances to obtain a moving distance set, and obtaining a moving distance variance by using the moving distance set, wherein the moving distance variance is the variance of a plurality of moving distances in the moving distance set;

and calculating the comprehensive image change degree based on the moving distance variance and the estimated distance, wherein the calculation formula is as follows:

,

Wherein, The degree of change of the integrated image is represented,Are all the coefficients of the preset value,The estimated distance is indicated as such,Representing the variance of the distance traveled.

Optionally, the acquiring the second decomposition frame number based on the neighboring image set and the detection image includes:

Acquiring an adjacent gray level image set and a detection gray level image based on the adjacent image set and the detection image, extracting one or more local area images from the detection gray level image by utilizing a pre-constructed area growth algorithm and a pre-constructed image gray level gradient set, and confirming one or more target area images in the one or more local area images, wherein the image gray level gradient set comprises a plurality of image gradient gray level values, and the target area image is adjacent to at least one adjacent gray level image in the adjacent gray level image set;

the following is performed for each of the one or more target area images:

Acquiring a local area gray average value based on the target area image, wherein the local area gray average value is the average value of a plurality of gray values corresponding to the target area image, and associating the target area image with an adjacent gray image adjacent to the target area image to obtain an associated image set;

Extracting an associated region image from the associated image set by using the region growing algorithm with the local region gray average value as a starting point, and identifying a target discrimination image in the associated region image, wherein the target discrimination image is a region of an adjacent gray image adjacent to the target region image in the associated image set;

acquiring a target discrimination mean value based on the target discrimination image, and calculating discrimination frame numbers according to the target discrimination mean value and the local area gray level mean value, wherein the calculation formula is as follows:

,

Wherein, The number of discrimination frames is represented,The initial number of frames is represented,Represents the target discrimination average value,The local area gray scale average value is represented,The predetermined coefficient is indicated to be a predetermined coefficient,The predetermined coefficient is indicated to be a predetermined coefficient,Representing a rounding symbol;

And summarizing the judging frames to obtain a judging frame set, and confirming a second decomposed frame based on the judging frame set, wherein the second decomposed frame is the smallest judging frame in the judging frame set.

Optionally, the identifying the abnormal image set based on the second analysis frame number and the target video stream includes:

extracting a decomposition image time sequence from the target video stream by using the second decomposition frame number, wherein the decomposition image time sequence comprises a plurality of initial decomposition images, sequentially extracting the initial decomposition images in the decomposition image time sequence, and executing the following operations on the extracted initial decomposition images:

Confirming a target decomposition image in a decomposition image time sequence based on the initial decomposition image, wherein the target decomposition image is adjacent to the initial decomposition image and lags behind the extracted initial decomposition image;

Confirming and receiving an image comparison instruction from an image comparison unit, analyzing the image comparison instruction to obtain a local name database, and respectively confirming a target identification decomposition image set and an initial identification decomposition image set in a target decomposition image and an initial decomposition image by utilizing an image identification model set and the local name database, wherein the target identification decomposition image set comprises a plurality of target identification decomposition images, and the initial identification decomposition image set comprises a plurality of initial identification decomposition images;

Matching a target identification decomposition image in the target identification decomposition image set with an initial identification decomposition image in the initial identification decomposition image set to obtain a plurality of identification decomposition nodes, wherein the identification decomposition nodes comprise the initial identification decomposition image and the target identification decomposition image;

The method comprises the following operation is carried out on each identification decomposition node in a plurality of identification decomposition nodes:

Acquiring local offset distances based on the identification decomposition nodes, summarizing the local offset distances to obtain a local offset distance set, extracting the largest local offset distance from the local offset distance set to obtain a target evaluation distance, and comparing the target evaluation distance with a preset evaluation distance threshold;

If the target evaluation distance is greater than or equal to the evaluation distance threshold, confirming the extracted initial decomposition image and the target decomposition image as abnormal images, taking the target decomposition image as the extracted initial decomposition image, and returning to the step of confirming the target decomposition image in the decomposition image time sequence based on the initial decomposition image;

otherwise, obtaining local offset distance variance based on the local offset distance set, and after confirming that the local offset distance variance is greater than or equal to a preset local offset distance threshold, confirming the extracted initial decomposition image and the target decomposition image as abnormal images, taking the target decomposition image as the extracted initial decomposition image, and returning to the step of confirming the target decomposition image in the decomposition image time sequence based on the initial decomposition image;

And summarizing the abnormal images to obtain an abnormal image set.

In order to achieve the above object, the present invention further provides an intelligent analysis system for implementing video streaming based on a neural network, including:

The analysis environment confirmation module is used for receiving an analysis instruction and confirming an intelligent analysis environment based on the analysis instruction, wherein the intelligent analysis environment comprises an initial video stream and a video analysis system, and the video analysis system comprises a video decomposition unit, an image recognition unit, an image comparison unit and a result feedback unit;

The initial frame number confirming module is used for confirming and receiving a video decomposition instruction from the video decomposition unit, analyzing the video decomposition instruction to obtain a first decomposition frame number, and acquiring a decomposition image time sequence by utilizing the first decomposition frame number and an initial video stream, wherein the decomposition image time sequence comprises a plurality of decomposition image nodes, and the decomposition image nodes comprise decomposition images and image time;

The initial video dividing module is configured to obtain a decomposed video time point set based on the decomposed image time sequence, where the decomposed video time point set includes M decomposed video time points, M is an integer greater than or equal to 0, and obtain a target video stream set based on the decomposed video time point set and the initial video stream, where the target video stream set includes one or more target video streams, and perform the following operations on each target video stream in the target video stream set:

the abnormal image identification module is used for acquiring a second analysis frame number based on the target video stream, and confirming an abnormal image set based on the second analysis frame number and the target video stream, wherein the abnormal image set comprises N abnormal images, N is an integer greater than or equal to 0, and the abnormal image set is sent to an initiating end of an analysis instruction by utilizing the result feedback unit to realize intelligent analysis of the initial video stream.

In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:

And the processor executes the instructions stored in the memory to realize the intelligent analysis method for realizing the video stream based on the neural network.

In order to solve the above problems, the present invention further provides a computer readable storage medium having at least one instruction stored therein, the at least one instruction being executed by a processor in an electronic device to implement the above-described intelligent analysis method for implementing video streaming based on a neural network.

The invention aims to solve the problems in the background art, the invention confirms to receive a video decomposition instruction from a video decomposition unit, analyzes the video decomposition instruction to obtain a first decomposition frame number, and obtains a decomposition image time sequence by using the first decomposition frame number and an initial video stream, wherein the decomposition image time sequence comprises a plurality of decomposition image nodes, the decomposition image nodes comprise decomposition images and image time, and a decomposition video time point set is obtained based on the decomposition image time sequence, wherein the decomposition video time point set comprises M decomposition video time points, M is an integer greater than or equal to 0, and a target video stream set is obtained based on the decomposition video time point set and the initial video stream, wherein the target video stream set comprises one or more target video streams. The method and the device acquire the second analysis frame number based on the target video stream, and confirm the abnormal image set based on the second analysis frame number and the target video stream, so that the method and the device acquire the second analysis frame number for analyzing the target video stream by combining the characteristics corresponding to each target video stream, acquire different second analysis frame numbers through different characteristics, save resources required by analyzing the target video stream, and further improve the intelligentization degree of the embodiment of the invention. Therefore, the invention can improve the accuracy and the intelligent degree of the analysis of the video stream and reduce the resources required by the analysis of the video stream.

Drawings

Fig. 1 is a schematic flow chart of an intelligent analysis method for implementing video stream based on a neural network according to an embodiment of the present invention;

FIG. 2 is a functional block diagram of an intelligent analysis system for implementing video streaming based on a neural network according to an embodiment of the present invention;

Fig. 3 is a schematic structural diagram of an electronic device for implementing the intelligent analysis method for implementing video streaming based on a neural network according to an embodiment of the present invention.

Reference numerals illustrate:

1. Electronic equipment 10, a processor 11, a memory 12 and a bus.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The embodiment of the application provides an intelligent analysis method for realizing video streaming based on a neural network. The execution main body of the intelligent analysis method for realizing the video stream based on the neural network comprises at least one of a server, a terminal and the like which can be configured to execute the method provided by the embodiment of the application. In other words, the intelligent analysis method for implementing video streaming based on the neural network may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The server side comprises, but is not limited to, a single server, a server cluster, a cloud server or a cloud server cluster and the like.

Referring to fig. 1, a flow chart of an intelligent analysis method for implementing video streaming based on a neural network according to an embodiment of the invention is shown. In this embodiment, the intelligent analysis method for implementing video streaming based on the neural network includes:

S1, receiving an analysis instruction, and confirming an intelligent analysis environment based on the analysis instruction, wherein the intelligent analysis environment comprises an initial video stream and a video analysis system, and the video analysis system comprises a video decomposition unit, an image recognition unit, an image comparison unit and a result feedback unit.

It should be explained that the analysis instruction is an instruction for implementing analysis on the video, the intelligent analysis environment is an environment necessary for implementing intelligent analysis on the video, and the intelligent analysis environment includes an initial video stream and a video analysis system, where the video analysis system includes a video decomposition unit, an image recognition unit, an image comparison unit, and a result feedback unit, and reference is made to the subsequent embodiments for specific application of the unit. Through recognition, when the initial video stream is analyzed, an object to be analyzed can be selected from the initial video stream so as to reduce the resources required by the analysis of the initial video stream.

It is understood that the initial video stream refers to the video to be intelligently analyzed. The embodiment of the invention mainly aims at realizing the identification and analysis of the modified content in the video stream, thereby ensuring the safety and the correctness of the video stream, and considering the different frame numbers set according to the specific content of the video when the initial video stream is analyzed, and further saving the resources required by the analysis of the initial video stream.

Illustratively, in order to determine whether the video stream as evidence is tampered with, the analysis instruction is issued by the holder of the video stream, and an intelligent analysis environment is confirmed according to the analysis instruction, where the video stream as evidence is the initial video stream.

S2, confirming receiving of a video decomposition instruction from a video decomposition unit, analyzing the video decomposition instruction to obtain a first decomposition frame number, and obtaining a decomposition image time sequence by using the first decomposition frame number and an initial video stream, wherein the decomposition image time sequence comprises a plurality of decomposition image nodes, and the decomposition image nodes comprise decomposition images and image time.

It should be explained that the first decomposition frame number refers to a preset frame number for dividing the initial video stream. Optionally, the first decomposition frame number is obtained by means of manual setting. The decomposed image timing refers to a sequence corresponding to a decomposed image extracted in the initial video stream according to the first decomposed frame number. For example, the initial video stream includes 1000 images in total, and the preset first decomposition frame number is 10, then the 10 th image, the 20 th image, the 30 th image, the 15 th image, the 990 th image and the 1000 th image are respectively extracted from the initial video stream by the first decomposition frame number, and the extracted images are ordered according to the sequence from first to last according to the corresponding time of the images when shooting, so as to obtain the decomposition image time sequence, wherein the extracted images are the decomposition images, and the corresponding time of the images when shooting is the image time. The first decomposition frame number can be set by combining the frame number of the initial video stream and the time length.

S3, acquiring a decomposed video time point set based on the decomposed image time sequence, wherein the decomposed video time point set comprises M decomposed video time points, M is an integer greater than or equal to 0, and acquiring a target video stream set based on the decomposed video time point set and the initial video stream, wherein the target video stream set comprises one or more target video streams.

It should be appreciated that the acquiring the set of resolved video time points based on the resolved image timing includes:

Further, a graying transformation is performed on each of the decomposed image sequences to obtain the analyzed gray image sequences. The grayscale conversion refers to an operation of converting a color image into a grayscale image, and the grayscale conversion is performed in the prior art, which is not described herein. Alternatively, a k-means clustering algorithm is adopted as the clustering model, and other technologies can achieve the same action and effects, and are not described herein. The classification of the initial video stream can be realized by combining the time and the characteristics of the image through the clustering model, wherein the characteristics of the image refer to the average value of gray values corresponding to the image. For example, 10 initial analysis nodes are clustered by adopting a k-means clustering algorithm to obtain three clusters, wherein the initial analysis nodes contained in each cluster form the target analysis node set.

It should be explained that each target analysis node set includes a plurality of different initial analysis nodes, and each initial analysis node corresponds to an image time, so that, based on each target analysis node set in one or more target analysis node sets, a time period can be confirmed, and a demarcation point of an adjacent time period can be used as the decomposed video time point. For example, there are three target analysis node sets, wherein the time period corresponding to the first target analysis node set is 10 to 15.8 seconds, the time period corresponding to the second target analysis node set is 15.9 to 20 seconds, and the time period corresponding to the third target analysis node set is 0 to 9.9 seconds, and two decomposed video time points can be confirmed by using the three target analysis node sets, wherein the two decomposed video time points are 9.9 seconds and 15.8 seconds respectively.

It may be understood that the obtaining the target video stream set based on the decomposed video time point set and the initial video stream refers to dividing the initial video stream by each decomposed video time point in the decomposed video time point set, where the divided initial video stream is the target video stream.

S4, acquiring a second analysis frame number based on the target video stream, and confirming an abnormal image set based on the second analysis frame number and the target video stream, wherein the abnormal image set comprises N abnormal images, N is an integer greater than or equal to 0, and the abnormal image set is sent to an initiating end of an analysis instruction by utilizing a result feedback unit so as to realize intelligent analysis of the initial video stream.

It can be appreciated that the obtaining the second analysis frame number based on the target video stream includes:

Further, the initial frame number refers to the frame number of the target video stream, the video duration refers to the duration of the target video stream, the number of images refers to the number of images included in the target video stream, and the number of images is the product of the initial frame number and the video duration. The image extraction ratio refers to a ratio for realizing division of the number of images. For example, the frame number of the target video stream is 24FPS, the video duration is 10 seconds, the number of images is 240, the image extraction gradient set comprises a plurality of image extraction ratios, the image extraction ratios are 0.2,0.4,0.6,0.8 respectively, the 48 th frame image, the 96 th frame image, the 144 th frame image and the 192 th frame image are calculated according to the image extraction ratios, the 48 th frame image, the 96 th frame image, the 144 th frame image and the 192 th frame image are required to be extracted from the initial video stream, the 48 th frame image, the 96 th frame image, the 144 th frame image and the 192 th frame image form the image time sequence, and each frame image in the image time sequence is the initial image.

It should be understood that the image recognition model refers to a model for realizing recognition of information in an image. Optionally, a pre-trained neural network model is employed as the image recognition model. The first recognition name refers to a name of information recognized in the initial image using the first recognition model, the second recognition name refers to a name of information recognized in the initial image using the second recognition model, the first recognition number refers to a number corresponding to the first recognition name, and the second recognition number refers to a number corresponding to the second recognition name. For example, if a image includes 5 pedestrians and 3 vehicles, in an ideal situation, two recognition nodes can be recognized by using a neural network model, wherein the recognition nodes are a first recognition node or a second recognition node, the recognition names and the recognition numbers corresponding to the first recognition node are respectively the people and 5, and the recognition names and the recognition numbers corresponding to the second recognition node are respectively the vehicles and 3.

It should be noted that the partial area image refers to an area of the image corresponding to the target identification name. Through recognition, the neural network model can be used for realizing the recognition of the initial image and the division of the local area image, and other technologies can be used for realizing the same action and effects, and the description is omitted here. The purpose of identifying the partial area image by using the target identification name is to distinguish the content of the image corresponding to different partial areas in the initial image, and in order to better distinguish different areas, when different contents corresponding to the same identification name are identified, a form of identifying serial numbers can be adopted, the serial numbers can be obtained by adopting a mode of carrying out identification from left to right or sorting from top to bottom, and the largest number in the serial numbers is the target identification number. For example, if 3 persons exist in a certain image, respectively identifying the area of each person in the 3 persons in the initial image, identifying the area as a person, respectively identifying the area corresponding to the initial image with the identified person as person-1, person-2 and person-3, and identifying the target identification number corresponding to the target identification node as 3. When the same information is identified, the same serial number is adopted so as to realize analysis of the same image.

Further, the identifying the target recognition name set by using the first recognition name set and the second recognition name set includes:

It can be appreciated that when there is a second recognition node in the second recognition name set that is different from the first recognition node, the first recognition name set is different from the second recognition name set, that is, the result of recognizing the initial image using the different image recognition models is different. For example, if the preset extraction number is 5, randomly extracting 5 initial image recognition models from the image recognition model set, and respectively recognizing the initial images by using the 5 initial image recognition models to obtain 5 initial recognition name sets, combining the 5 initial image recognition models into 10 combined recognition name sets in a combined mode, and if two initial recognition name sets corresponding to the combined recognition name sets exist in the 10 combined recognition name sets, marking the combined recognition name sets as fusion recognition name sets. When specific application scenes are different, different modes can be adopted to construct the neural network model, and further accuracy of information identification in the image by adopting the neural network model is improved.

It should be explained that different results may exist when different neural network models are used to identify the same object, so in the embodiment of the present invention, the same initial identification name sets are all combined together in a cyclic manner, and in the cyclic process, the same initial identification name sets are continuously removed from the combined main body to save resources required in the cyclic process, where the step of continuously implementing updating the combined main body refers to using the more identification name sets as the plurality of initial identification name sets.

It should be appreciated that using different initial image recognition models may recognize different information about the same content in the image, and thus, using a combination may be able to categorize the results of the image recognition by the initial image recognition models as much as possible. When the correct proportion is greater than or equal to a preset proportion threshold value, the result of identifying the initial image by adopting the initial image identification model is accurate, otherwise, the result of identifying the initial image may have larger error.

Further, when the correct proportion is greater than or equal to the proportion threshold, a more accurate target recognition name set is obtained, and further, the accuracy of obtaining the second analysis frame number can be improved by adopting the more accurate target recognition name set, and further, energy consumption required by analyzing the initial video stream is saved. For implementation of specific beneficial effects, please refer to the following examples.

It should be explained that the obtaining the second analysis frame number by using the target image set includes:

It is understood that the target image timing refers to an image sequence corresponding to different initial images of the same object. For example, a person with a number of 1 is identified in each target image in the target image set, and the target images in the target image set are ordered in a sequence from first to last according to the time corresponding to the target images, so as to obtain the target image time sequence.

It should be understood that the initial set of coordinates refers to a set of pixel coordinates corresponding to each pixel in the initial analysis image. The acquisition mode of the target coordinate point set is the same as that of the initial coordinate point set, and is not described in detail herein. Alternatively, the image coordinate system is used as the reference coordinate system, and the same effect can be achieved by adopting other technologies, which are not described herein. Through the recognition, if the initial coordinate points in the initial coordinate point set are the same as the target coordinate points in the target coordinate point set, the object corresponding to the target image is indicated to be stationary. Therefore, when the reference numbers corresponding to the target image time sequence are all 2, the object corresponding to the target image in the initial video stream can be static, wherein the number of the reference number set corresponding to the target image time sequence is related to the number of the target images contained in the target image time sequence. For example, if the target image sequence includes 5 target images, 4 reference number sets can be acquired by using the 5 target images, and the 4 reference number sets are acquired by the first target image and the second target image, the second target image and the third target image, the third target image and the fourth target image, and the fourth target image and the fifth target image, respectively. When the reference number proportion is greater than the reference proportion threshold, it is indicated that the object corresponding to the target image is moving, and therefore, it is necessary to determine whether the object corresponding to the target image is moving. It is clear that when an object is in a moving state, there is meaning to analyze the object.

Further, the confirming that the initial detection image is a preset target detection image includes:

,

Further, the sliding window refers to a window of a fixed size, and a sliding step size of the sliding window is set. The graying operation and the graying transformation can achieve the same action and effect, and are not described in detail herein. The method for acquiring the detection image timing sequence is the same as the method for acquiring the target image timing sequence, and will not be described here again.

It should be explained that the calculating the comprehensive image change degree based on the image center coordinate set includes:

,

It can be understood that the manner of acquiring the estimated distance based on the initial center coordinate and the target center coordinate is the same as the manner of acquiring the moving distance, and will not be described herein. It is clear that there may be objects in the initial video stream that are moving but belonging to the background. For example, the tree leaves are blown by wind, and the tree leaves are moving and the tree is static, so that the dynamic object which is used as a background can be distinguished from the moving object to be analyzed actually through the comprehensive image change degree, further, the energy consumption for analyzing the object to be analyzed actually is saved, and the timeliness for analyzing the object to be analyzed actually is improved.

Further, the acquiring the second decomposition frame number based on the neighboring image set and the detection image includes:

the following is performed for each of the one or more target area images:

,

It should be explained that, the graying operation is performed on the adjacent image and the detection image in the adjacent image set, respectively, to obtain an adjacent gray image set and a detection gray image. The method for extracting one or more local area images from the detection gray level image by utilizing an area growth algorithm and the image gray level gradient set comprises the steps of sequentially extracting the image gradient gray level values from the image gray level gradient set, and executing the following operations on the extracted image gradient gray level values:

Searching in the detected gray level image by using an area growing algorithm with the image gradient gray level value as a starting point to obtain an initial searching area, counting the number of pixel points corresponding to the initial searching area to obtain a counting number, and if the counting number is greater than or equal to a preset counting threshold value, marking the initial searching area as a local area image, and summarizing the local area image to obtain one or more local area images.

Further, the technology of using a fixed gray value as a starting point of the region growing algorithm and searching in the image by using the region growing algorithm is the prior art, and will not be described herein. Through recognition, abnormal factors such as noise points possibly exist in the image, so that the noise points in the image can be removed by setting a statistical threshold value, and the accuracy of acquiring the image of the target area is improved. The fact that the current area image is adjacent to the adjacent gray level image means that the areas corresponding to the two images are adjacent.

It should be understood that the target discrimination image refers to an image area similar to the gradation value corresponding to the target area image. The higher the similarity between the target discrimination image and the target area image is, the smaller the number of required second resolution frames is. For example, when there is a large difference between the target discrimination image and the target area image, it is easy to discriminate the target discrimination image from the target area image in the initial video stream. The target discrimination mean value is a mean value of a plurality of gray values corresponding to the target discrimination image.

It should be appreciated that the identifying the abnormal image set based on the second analysis frame number and the target video stream includes:

And summarizing the abnormal images to obtain an abnormal image set.

It should be explained that the acquisition mode of the resolved image timing sequence is the same as the acquisition mode of the detected image timing sequence, and will not be described here again. The partial name database refers to a database in which a plurality of partial names are stored. For example, the partial names corresponding to the person may include hands, arms, mouth, eyes, etc. The method for confirming the target identification decomposition image set and the initial identification decomposition image set in the target decomposition image and the initial decomposition image respectively by using the image identification model set and the local name database is the same as the method for acquiring the target image, and is not repeated here. Matching the target identification decomposition image and the initial identification decomposition image means matching the target identification decomposition image and the initial identification decomposition image by using the local name and the serial number. For example, the target identification decomposition image set comprises a hand-1 and a hand-2, and the initial identification decomposition image set comprises the hand-1 and the hand-2, so that the target identification decomposition image set and the hand-1, the hand-2 and the hand-2 of the initial identification decomposition image set can be matched through the local names and the serial numbers, and two identification decomposition nodes are obtained. The local offset distance is obtained in the same manner as the evaluation distance, and will not be described in detail here. The local offset distance variance refers to variances of a plurality of local offset distances corresponding to the local offset distance set. Through recognition, when the target evaluation distance is greater than or equal to the evaluation distance threshold, the fact that the local area modified trace exists in the initial video stream is indicated, and when the local offset distance variance is greater than or equal to the local offset distance threshold, the fact that the initial video stream possibly has the clipped trace is indicated, so that the initial decomposed image and the target decomposed image which meet the conditions are confirmed to be abnormal images.

It can be understood that, there may be clipping of the target video stream or modification of the target video stream in the same target video stream, when modifying a certain portion of the target video stream, the target evaluation distance will be greater than or equal to the evaluation distance threshold, and when clipping, splicing, etc. the target video stream, the local offset distance variance will be greater than or equal to the local offset distance threshold, that is, the visual effect that two frames of pictures will be split in the target video stream will occur. Therefore, after the abnormal image set is sent to the initiating end of the analysis instruction, the identification of the abnormal image frames in the initial video stream can be realized.

Fig. 2 is a functional block diagram of an intelligent analysis system for implementing video streaming based on a neural network according to an embodiment of the present invention.

The intelligent analysis system 100 for realizing video streaming based on the neural network can be installed in electronic equipment. Depending on the implementation function, the intelligent analysis system 100 for implementing video streaming based on neural network may include an analysis environment confirmation module 101, an initial frame number confirmation module 102, an initial video partitioning module 103, and an abnormal image recognition module 104. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.

The analysis environment confirmation module 101 is configured to receive an analysis instruction, and confirm an intelligent analysis environment based on the analysis instruction, where the intelligent analysis environment includes an initial video stream and a video analysis system, and the video analysis system includes a video decomposition unit, an image recognition unit, an image comparison unit, and a result feedback unit;

The initial frame number confirmation module 102 is configured to confirm that a video decomposition instruction from a video decomposition unit is received, analyze the video decomposition instruction to obtain a first decomposition frame number, and obtain a decomposition image time sequence by using the first decomposition frame number and an initial video stream, where the decomposition image time sequence includes a plurality of decomposition image nodes, and the decomposition image nodes include decomposition images and image time;

The initial video dividing module 103 is configured to obtain a set of resolved video time points based on the resolved image timing sequence, where the set of resolved video time points includes M resolved video time points, M is an integer greater than or equal to 0, and obtain a set of target video streams based on the set of resolved video time points and the initial video streams, where the set of target video streams includes one or more target video streams, and perform the following operations on each target video stream in the set of target video streams:

The abnormal image recognition module 104 is configured to obtain a second analysis frame number based on the target video stream, and confirm an abnormal image set based on the second analysis frame number and the target video stream, where the abnormal image set includes N abnormal images, where N is an integer greater than or equal to 0, and send the abnormal image set to an initiating end of an analysis instruction by using a result feedback unit, so as to implement intelligent analysis on the initial video stream.

In detail, the modules in the intelligent analysis system 100 for implementing video streaming based on neural network in the embodiment of the present invention use the same technical means as the intelligent analysis method for implementing video streaming based on neural network described in fig. 1, and can produce the same technical effects, which are not described herein.

Fig. 3 is a schematic structural diagram of an electronic device for implementing an intelligent analysis method for implementing video streaming based on a neural network according to an embodiment of the present invention.

The electronic device 1 may comprise a processor 10, a memory 11 and a bus 12, and may further comprise a computer program stored in the memory 11 and executable on the processor 10, such as an intelligent analysis method program for implementing a video stream based on a neural network.

The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the electronic device 1. Further, the memory 11 further comprises an internal storage unit of the electronic device 1, and also comprises an external storage device. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of an intelligent analysis method program for implementing a video stream based on a neural network, but also for temporarily storing data that has been output or is to be output.

The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective components of the entire electronic device using various interfaces and lines, executes or executes programs or modules stored in the memory 11 (for example, an intelligent analysis method program for implementing a video stream based on a neural network, etc.), and invokes data stored in the memory 11 to perform various functions of the electronic device 1 and process the data.

The bus 12 may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus 12 may be divided into an address bus, a data bus, a control bus, etc. The bus 12 is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.

Fig. 3 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.

For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to the respective components, and preferably, the power source may be logically connected to the at least one processor 10 through a power management system, so as to perform functions of charge management, discharge management, and power consumption management through the power management system. The power supply may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.

Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.

The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.

The intelligent analysis method program for implementing a video stream based on a neural network stored in the memory 11 of the electronic device 1 is a combination of a plurality of instructions, which when executed in the processor 10, can implement:

Specifically, the specific implementation method of the above instructions by the processor 10 may refer to descriptions of related steps in the corresponding embodiments of fig. 1 to 3, which are not repeated herein.

Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable storage medium may be volatile or nonvolatile. For example, the computer readable medium may include any entity or system capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).

The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, can implement:

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, system and method may be implemented in other manners. For example, the system embodiments described above are merely illustrative, and there may be additional divisions of a practical implementation.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. An intelligent analysis method for realizing video streaming based on a neural network is characterized by comprising the following steps:

2. The intelligent analysis method for implementing video streaming based on a neural network according to claim 1, wherein the acquiring the decomposed video time point set based on the decomposed image timing comprises:

3. The intelligent analysis method for implementing a video stream based on a neural network according to claim 2, wherein the obtaining the second analysis frame number based on the target video stream comprises:

4. The intelligent analysis method for implementing video streaming based on neural network as claimed in claim 3, wherein said identifying the target recognition name set using the first recognition name set and the second recognition name set comprises:

5. The intelligent analysis method for implementing video streaming based on a neural network according to claim 4, wherein said obtaining a second analysis frame number using the target image set comprises:

6. The intelligent analysis method for implementing video streaming based on a neural network according to claim 5, wherein the confirming that the initial detection image is a preset target detection image comprises:

,

7. The intelligent analysis method for implementing video streaming based on neural network according to claim 6, wherein said calculating the degree of change of the integrated image based on the image center coordinate set comprises:

,

8. The intelligent analysis method for implementing video streaming based on neural network as claimed in claim 7, wherein said acquiring the second analysis frame number based on the neighboring image set and the detected image includes:

the following is performed for each of the one or more target area images:

,

9. The intelligent analysis method for implementing video streaming based on a neural network according to claim 8, wherein said identifying an abnormal image set based on the second analysis frame number and a target video stream comprises:

And summarizing the abnormal images to obtain an abnormal image set.

10. An intelligent analysis system for implementing video streaming based on a neural network, the system comprising: