Summary of the invention
The objective of the invention is to propose a kind of high efficiency, video frequency monitoring method and system that security is good.
The present invention complies with the video monitoring trend, robotization is handled and AVS standard introducing video monitoring, in conjunction with technology such as background/non-background class, the detection of people's face and identifications, in advance to the automatic processing of monitor video by computer system, under the prerequisite of the validity that guarantees returned content, the quantity of information that feeds back to operating personnel will be much smaller than traditional supervisory system, thereby has saved human resources greatly, has also improved the reliability of video monitoring system simultaneously.Initiative utilizes AVS at video monitoring technical elements and patent application advantage, and along with country and local government support application to AVS energetically, the present invention controls and application such as identification has certain application value digital supervision, gate inhibition.
The present invention at first gathers according to the AVS code stream by the AVS web camera, uses the compressed domain in the AVS code stream decoding process to carry out the classification of background and non-background.When classification results shows that current frame is not background, carry out people's face and detect.When detecting people's face, carry out recognition of face, be about to people's face data and carry out comparing with training data after the conversion.Before recognition result is fed to the user, calculate degree of confidence t earlier, t shows the credibility of current recognition result.(t_min is obtained by the empirical data statistics during less than threshold value t_min as degree of confidence t, t_min is high more, and then accuracy rate is high more, t_min is low more, and then recall ratio is high more, set a suitable t_min by balance according to system's actual conditions), we think that this people's face does not belong to the data in the Current Library, regard as the stranger, and this result is fed back to the user, new people's face adds in the storehouse with this after the user confirms.When degree of confidence during more than or equal to threshold value t_min, show that recognition result has higher confidence level, write down recognition result then and video is marked.Fig. 1 is the process flow diagram of this video monitoring system, has wherein embodied two characteristics of the present invention, and AVS uses and robotization is handled.
The system of specific implementation mainly forms training module, labeling module and retrieval module by three parts.
Training module comprises the training module of monitoring environment background and the training module of face database, implements respectively to import to people's face sample storehouse and background sample storehouse to the environmental background training with to the training of people's face, is output as each face characteristic and background characteristics.
Labeling module comprises that background detection module, people's face detection module, face recognition module and index structure set up part, and the monitor video of input is marked automatically.Be input as background characteristics, face characteristic and monitor video to be marked that training module obtains, be output as the search index of monitor video to be marked.
Retrieval module is to specifying monitor video to retrieve, comprising picture query, text query and query video.Be input as the index of specifying monitor video, picture, text or segment video that the user submits to obtain content that the user submits to corresponding picture material in monitor video.Figure 2 shows that the logical relation between main composition module, workflow and each module of system.As shown in the figure, the initial input of system is face database and background sample, through obtaining background model and face characteristic transformation matrix and face characteristic storehouse after the training.Then monitor video is marked, the process of mark at first is that background detects, and to not being that the image of background carries out people's face and detects, the people's face that wherein occurs is carried out eigentransformation and creates index under the index structure.The final user submits text to by user interface, picture or video, and system submits to the difference of content to handle respectively according to the user, and what finally feed back to the user is the position that relevant information occurs in monitor data.
Be the design of system's main modular below:
1) background training module: the background video sample to input calculates, and obtains background model.Adopt algorithm to be based on the hsv color space, calculate the span that each pixel belongs to background.
Input: background video sample.
Output: background model is used for the comparison of background.
2) people's face training module: the people's face in the face database is handled.Adopting algorithm is fisher-face.
Input: face database.
Output: by the transformation matrix that people's face data computation in the face database obtains, the purpose of this matrix is that the conversion of input people face is obtained one-dimensional vector, in order to identification.When obtaining transformation matrix, export the center of each one face, in order to identification.
3) background detection module: incoming frame image and background model are compared, and purpose is to know whether this incoming frame is background, if not background, those zones belong to the prospect scope.
Input: background model, two field picture.
Output: know whether this incoming frame is background, if not background, those zones belong to the prospect scope.
4) people's face detection module:, detect people's face therein for the two field picture of non-background.
Input: two field picture.
Output: detected facial image.
5) face recognition module: for detected facial image, the transformation matrix that uses training to obtain obtains a bit vector, adopts the similarity at Euclidean distance calculating and each center, to realize the purpose of identification.
Input: facial image, transformation matrix.
Output: recognition result.
6) index structure module: input video is marked, and the result according to recognition of face obtains video index, and index structure set up in index.
Input: monitor video.
Output: video index.
7) retrieval module: the user is by user interface input inquiry content, and retrieval module submits to the difference of content format to retrieve according to the user, and by the user interface feedback information.
Input: the inquiry that the user submits to.
Output: the information such as video clips that feed back to the user.
The present invention has special pre-service at the AVS video flowing, no matter be at the gate inhibition's monitoring in real time or the video of processed offline storage, the AVS code stream is not decoded completely, and the compressed domain that is to use AVS is carried out background/non-background class, judge whether present image is background, if just do not carry out follow-up work, improve the treatment effeciency of system with this for background.In using in real time, can also add and use hardware handles to quicken this process.
In the middle of the compression domain of AVS, the motion vector of macro block can reflecting video in the middle of the motion of object.In the background segment, image is static relatively, can make when the people occurs and introduce more movable information in the video.Propose in the document [1] to use motion estimation technique H.264 to carry out the classification of background/non-background.The present invention is used for the AVS code stream with similar algorithms.If
Be the motion vector of a macro block in the present image,
0≤i≤N-1.N is a macro block sum in the present image.Calculate exercise intensity in the present image with following formula:
Wherein, size
iThe area of representing i macro block.
The simple motion state of using exercise intensity can not characterize object in the present image fully, therefore introduce the scope of moving in another parameter MS presentation video:
Formula (2)
In the background image sequence, there is not violent motion in the image, exercise intensity and range of movement all are limited in less numerical value.If the threshold value of MV is mv_min, the threshold value of ms is ms_min, mv_min and ms_min are obtained by the empirical data statistics, it is high more that the more little then background of mv_min and ms_min is differentiated accuracy rate, mv_min and ms_min are big more, and then recall ratio is high more, sets a suitable mv_min and ms_min by balance according to system's actual conditions.When satisfying following condition, judge that present image belongs to background:
MV<mv_min and MS<ms_min.
The meaning of carrying out background and non-background class not only is to have improved the efficient of system, also collects the statistical information of each control point on the other hand, thereby infers the environmental information of control point.For example by the distribution of the non-background frames of statistics in the middle of supervisory sequence, just can learn when section is in crowded state in this control point, thereby further suitable deployment is made in this control point, for example the intensive relatively time period improve the frame per second of recording, and reduce frame per second of recording or the like in the time period of stream of people's rareness the stream of people.
Detect through background, do not detect for the image of background carries out people's face judging.People's face detects and adopts the AdaBoost algorithm
[2]But in order to improve the treatment effeciency of system, we do not carry out global detection, but carry out local detection.
From people's face detected, detected facial image carried out according to from left to right, being scanned into sample vector from top to bottom after size unifies convergent-divergent, then sample vector is carried out dimensionality reduction.The Fisher-Face algorithm that we adopt classical PCA to combine with LDA carries out the extraction of people's face projection properties
[3](PCA:Principal Components Analysis is in conjunction with pivot analysis; LDA:Linear Discriminant Analysis, linear discriminant analysis).Use LDA on the space after using the PCA dimensionality reduction, obtain the proper vector of the people's face that detects.Adopt the people's face in minimum distance classifier and the storehouse to compare and identification after the feature extraction.
If the sample vector after people's face f process Fisher-Face feature extraction is f ', f '=(u0, u1 ... uk), calculate the distance of itself and training sample then:
Formula (3)
Fi '=(v0, v1 wherein ... vk) i training sample in the library representation, k is the sample dimension.The distance of i training sample in d (f ', fi ') current sample to be identified of expression and the storehouse.
Calculated in f ' and the storehouse behind all samples, found out minimum preceding 5 samples of distance, fi1 ', fi2 ' ... fi5 '.Wherein most samples belong to class c, and class c appoints the sample class that refers to belong to same individual, the more the sort of c class that is of quantity.If 5 samples respectively belong to a class, then with the minimum sample fi1 ' of f ' distance under class as c.We calculate the degree of confidence t of identification with following formula:
Formula (4)
As degree of confidence t during less than threshold value t_min, illustrate that people's face is the stranger, f is as a result fed back to the user, new people's face adds in the storehouse with this after the user confirms, otherwise the expression recognition result is reliable and write down the result.T_min is obtained by the empirical data statistics, and t_min is high more, and then accuracy rate is high more, and t_min is low more, and then recall ratio is high more, sets a suitable t_min by balance according to system's actual conditions.
According to foregoing, what summarize the present invention's proposition based on the video monitoring system of AVS and the step of its implementation is: 1, utilize the AVS video camera to obtain the AVS code stream; 2, the AVS code stream is carried out background class, the detection of people's face, background training, the training of people's face; 3, to the identification of comparing of people's face; 4, obtain Query Result.