[go: up one dir, main page]

CN101419670B - Video monitoring method and system based on advanced audio/video encoding standard - Google Patents

Video monitoring method and system based on advanced audio/video encoding standard Download PDF

Info

Publication number
CN101419670B
CN101419670B CN2008102032020A CN200810203202A CN101419670B CN 101419670 B CN101419670 B CN 101419670B CN 2008102032020 A CN2008102032020 A CN 2008102032020A CN 200810203202 A CN200810203202 A CN 200810203202A CN 101419670 B CN101419670 B CN 101419670B
Authority
CN
China
Prior art keywords
face
background
video
module
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008102032020A
Other languages
Chinese (zh)
Other versions
CN101419670A (en
Inventor
王新
路红
宋元征
陈桂财
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN2008102032020A priority Critical patent/CN101419670B/en
Publication of CN101419670A publication Critical patent/CN101419670A/en
Application granted granted Critical
Publication of CN101419670B publication Critical patent/CN101419670B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本发明属于视频监控技术领域,具体为一种基于AVS(先进音视频编码标准)的视频监控方法及其实现系统。本发明顺应视频监控发展潮流,将自动化处理和AVS标准引入视频监控,结合背景/非背景分类、人脸检测与识别等技术,预先对监控视频通过计算机系统的自动处理,在保证返回内容的有效性的前提下,反馈给操作人员的信息量将远小于传统监控系统,从而大大节省了人力资源,同时也提高了视频监控系统的可靠性。首创利用AVS在视频监控技术方面和专利应用优势,随着国家和地方政府大力支持对AVS的应用推广,本发明在数字监控、门禁控制以及身份识别等应用领域有一定的应用价值。

Figure 200810203202

The invention belongs to the technical field of video monitoring, in particular to an AVS (Advanced Audio Video Coding Standard)-based video monitoring method and an implementation system thereof. The present invention conforms to the development trend of video monitoring, introduces automatic processing and AVS standard into video monitoring, combines background/non-background classification, face detection and recognition and other technologies, and automatically processes the monitoring video through the computer system in advance to ensure the effectiveness of the returned content. Under the premise of reliability, the amount of information fed back to the operator will be much smaller than that of the traditional monitoring system, which greatly saves human resources and improves the reliability of the video monitoring system. It is the first to use AVS in video surveillance technology and patent application advantages. With the strong support of the national and local governments for the application and promotion of AVS, the invention has certain application value in the application fields of digital surveillance, access control and identification.

Figure 200810203202

Description

Video frequency monitoring method and system based on advanced audio/video encoding standard
Technical field
The invention belongs to technical field of video monitoring, be specially a kind of video frequency monitoring method and realization system thereof based on AVS (advanced audio/video encoding standard).
Background technology
Nowadays safety problem has been subjected to extensive concern, has emerged in large numbers increasing video monitoring system, as gate control system, attendance checking system and identification system or the like.Video monitoring system can allow managerial personnel observe front end in the pulpit and take precautions against in the zone all personnel's active situation and keep a record, for security system provides real-time image, acoustic information.But, traditional video monitoring system needs the great amount of manpower resource overhead, detection, identification and understanding to the monitor video content rely on manually fully, reduced the work efficiency of video monitoring system, security and accuracy also lack assurance, and also do not have special-purpose digital video monitor system video compression standard at present as the video compression standard of video monitoring system core technology, on Network Transmission and system's versatility, caused bigger problem.
Summary of the invention
The objective of the invention is to propose a kind of high efficiency, video frequency monitoring method and system that security is good.
The present invention complies with the video monitoring trend, robotization is handled and AVS standard introducing video monitoring, in conjunction with technology such as background/non-background class, the detection of people's face and identifications, in advance to the automatic processing of monitor video by computer system, under the prerequisite of the validity that guarantees returned content, the quantity of information that feeds back to operating personnel will be much smaller than traditional supervisory system, thereby has saved human resources greatly, has also improved the reliability of video monitoring system simultaneously.Initiative utilizes AVS at video monitoring technical elements and patent application advantage, and along with country and local government support application to AVS energetically, the present invention controls and application such as identification has certain application value digital supervision, gate inhibition.
The present invention at first gathers according to the AVS code stream by the AVS web camera, uses the compressed domain in the AVS code stream decoding process to carry out the classification of background and non-background.When classification results shows that current frame is not background, carry out people's face and detect.When detecting people's face, carry out recognition of face, be about to people's face data and carry out comparing with training data after the conversion.Before recognition result is fed to the user, calculate degree of confidence t earlier, t shows the credibility of current recognition result.(t_min is obtained by the empirical data statistics during less than threshold value t_min as degree of confidence t, t_min is high more, and then accuracy rate is high more, t_min is low more, and then recall ratio is high more, set a suitable t_min by balance according to system's actual conditions), we think that this people's face does not belong to the data in the Current Library, regard as the stranger, and this result is fed back to the user, new people's face adds in the storehouse with this after the user confirms.When degree of confidence during more than or equal to threshold value t_min, show that recognition result has higher confidence level, write down recognition result then and video is marked.Fig. 1 is the process flow diagram of this video monitoring system, has wherein embodied two characteristics of the present invention, and AVS uses and robotization is handled.
The system of specific implementation mainly forms training module, labeling module and retrieval module by three parts.
Training module comprises the training module of monitoring environment background and the training module of face database, implements respectively to import to people's face sample storehouse and background sample storehouse to the environmental background training with to the training of people's face, is output as each face characteristic and background characteristics.
Labeling module comprises that background detection module, people's face detection module, face recognition module and index structure set up part, and the monitor video of input is marked automatically.Be input as background characteristics, face characteristic and monitor video to be marked that training module obtains, be output as the search index of monitor video to be marked.
Retrieval module is to specifying monitor video to retrieve, comprising picture query, text query and query video.Be input as the index of specifying monitor video, picture, text or segment video that the user submits to obtain content that the user submits to corresponding picture material in monitor video.Figure 2 shows that the logical relation between main composition module, workflow and each module of system.As shown in the figure, the initial input of system is face database and background sample, through obtaining background model and face characteristic transformation matrix and face characteristic storehouse after the training.Then monitor video is marked, the process of mark at first is that background detects, and to not being that the image of background carries out people's face and detects, the people's face that wherein occurs is carried out eigentransformation and creates index under the index structure.The final user submits text to by user interface, picture or video, and system submits to the difference of content to handle respectively according to the user, and what finally feed back to the user is the position that relevant information occurs in monitor data.
Be the design of system's main modular below:
1) background training module: the background video sample to input calculates, and obtains background model.Adopt algorithm to be based on the hsv color space, calculate the span that each pixel belongs to background.
Input: background video sample.
Output: background model is used for the comparison of background.
2) people's face training module: the people's face in the face database is handled.Adopting algorithm is fisher-face.
Input: face database.
Output: by the transformation matrix that people's face data computation in the face database obtains, the purpose of this matrix is that the conversion of input people face is obtained one-dimensional vector, in order to identification.When obtaining transformation matrix, export the center of each one face, in order to identification.
3) background detection module: incoming frame image and background model are compared, and purpose is to know whether this incoming frame is background, if not background, those zones belong to the prospect scope.
Input: background model, two field picture.
Output: know whether this incoming frame is background, if not background, those zones belong to the prospect scope.
4) people's face detection module:, detect people's face therein for the two field picture of non-background.
Input: two field picture.
Output: detected facial image.
5) face recognition module: for detected facial image, the transformation matrix that uses training to obtain obtains a bit vector, adopts the similarity at Euclidean distance calculating and each center, to realize the purpose of identification.
Input: facial image, transformation matrix.
Output: recognition result.
6) index structure module: input video is marked, and the result according to recognition of face obtains video index, and index structure set up in index.
Input: monitor video.
Output: video index.
7) retrieval module: the user is by user interface input inquiry content, and retrieval module submits to the difference of content format to retrieve according to the user, and by the user interface feedback information.
Input: the inquiry that the user submits to.
Output: the information such as video clips that feed back to the user.
The present invention has special pre-service at the AVS video flowing, no matter be at the gate inhibition's monitoring in real time or the video of processed offline storage, the AVS code stream is not decoded completely, and the compressed domain that is to use AVS is carried out background/non-background class, judge whether present image is background, if just do not carry out follow-up work, improve the treatment effeciency of system with this for background.In using in real time, can also add and use hardware handles to quicken this process.
In the middle of the compression domain of AVS, the motion vector of macro block can reflecting video in the middle of the motion of object.In the background segment, image is static relatively, can make when the people occurs and introduce more movable information in the video.Propose in the document [1] to use motion estimation technique H.264 to carry out the classification of background/non-background.The present invention is used for the AVS code stream with similar algorithms.If
Figure G2008102032020D00031
Be the motion vector of a macro block in the present image,
Figure G2008102032020D00032
0≤i≤N-1.N is a macro block sum in the present image.Calculate exercise intensity in the present image with following formula:
Figure G2008102032020D00033
Formula (1)
Wherein, size iThe area of representing i macro block.
The simple motion state of using exercise intensity can not characterize object in the present image fully, therefore introduce the scope of moving in another parameter MS presentation video:
MS = Σ i = 0 N - 1 b _ s i , b _ s i = size i , m i → ≠ 0 0 , else Formula (2)
In the background image sequence, there is not violent motion in the image, exercise intensity and range of movement all are limited in less numerical value.If the threshold value of MV is mv_min, the threshold value of ms is ms_min, mv_min and ms_min are obtained by the empirical data statistics, it is high more that the more little then background of mv_min and ms_min is differentiated accuracy rate, mv_min and ms_min are big more, and then recall ratio is high more, sets a suitable mv_min and ms_min by balance according to system's actual conditions.When satisfying following condition, judge that present image belongs to background:
MV<mv_min and MS<ms_min.
The meaning of carrying out background and non-background class not only is to have improved the efficient of system, also collects the statistical information of each control point on the other hand, thereby infers the environmental information of control point.For example by the distribution of the non-background frames of statistics in the middle of supervisory sequence, just can learn when section is in crowded state in this control point, thereby further suitable deployment is made in this control point, for example the intensive relatively time period improve the frame per second of recording, and reduce frame per second of recording or the like in the time period of stream of people's rareness the stream of people.
Detect through background, do not detect for the image of background carries out people's face judging.People's face detects and adopts the AdaBoost algorithm [2]But in order to improve the treatment effeciency of system, we do not carry out global detection, but carry out local detection.
From people's face detected, detected facial image carried out according to from left to right, being scanned into sample vector from top to bottom after size unifies convergent-divergent, then sample vector is carried out dimensionality reduction.The Fisher-Face algorithm that we adopt classical PCA to combine with LDA carries out the extraction of people's face projection properties [3](PCA:Principal Components Analysis is in conjunction with pivot analysis; LDA:Linear Discriminant Analysis, linear discriminant analysis).Use LDA on the space after using the PCA dimensionality reduction, obtain the proper vector of the people's face that detects.Adopt the people's face in minimum distance classifier and the storehouse to compare and identification after the feature extraction.
If the sample vector after people's face f process Fisher-Face feature extraction is f ', f '=(u0, u1 ... uk), calculate the distance of itself and training sample then:
d ( f ′ , f i ′ ) = Σ i = 0 k [ ( u i - v i ) 2 ] Formula (3)
Fi '=(v0, v1 wherein ... vk) i training sample in the library representation, k is the sample dimension.The distance of i training sample in d (f ', fi ') current sample to be identified of expression and the storehouse.
Calculated in f ' and the storehouse behind all samples, found out minimum preceding 5 samples of distance, fi1 ', fi2 ' ... fi5 '.Wherein most samples belong to class c, and class c appoints the sample class that refers to belong to same individual, the more the sort of c class that is of quantity.If 5 samples respectively belong to a class, then with the minimum sample fi1 ' of f ' distance under class as c.We calculate the degree of confidence t of identification with following formula:
t = Σd ( f ′ , f ij ′ | f ij ′ ∈ c ) Σ j = 1 5 d ( f ′ , f ij ′ ) Formula (4)
As degree of confidence t during less than threshold value t_min, illustrate that people's face is the stranger, f is as a result fed back to the user, new people's face adds in the storehouse with this after the user confirms, otherwise the expression recognition result is reliable and write down the result.T_min is obtained by the empirical data statistics, and t_min is high more, and then accuracy rate is high more, and t_min is low more, and then recall ratio is high more, sets a suitable t_min by balance according to system's actual conditions.
According to foregoing, what summarize the present invention's proposition based on the video monitoring system of AVS and the step of its implementation is: 1, utilize the AVS video camera to obtain the AVS code stream; 2, the AVS code stream is carried out background class, the detection of people's face, background training, the training of people's face; 3, to the identification of comparing of people's face; 4, obtain Query Result.
Description of drawings
Fig. 1 is the core process flow diagram of this video monitoring system.
Fig. 2 is system's main modular and workflow.
Number in the figure: 1 training module; 2 labeling module; 3 retrieval modules; 4 face databases; 5 background sample storehouses; 6 background training modules; 7 people's face training modules; 8 background models; 9 face characteristic transformation matrixs; 10 background detection modules; 11 people's face detection modules; 12 face recognition module; 13 index structure modules; 14 monitor videos; 15 search indexs; 16 retrieval modules.
Embodiment
For example, the present invention is in the application of gate control system, and system can be divided into five parts: front-end camera, AVS video database, Video processing and comparison identification, face database, enter information inquiry.In gate control system, camera position is more fixing, and the angle and the figure viewed from behind of shooting are all fixed, and the variation of light neither be very violent in this indoor environment of office building.Because segmentation and remote storage are not supported in the driving that video camera carries, so will carry at camera on the basis of driving according to application requirements and write driver, in the shooting process, automatically realize the segmentation of video, and the AVS segmentation video storage that will take gained is in the data designated storehouse.Simultaneously, real-time order is handled the AVS code stream of segmentation.At first carry out background class,, then be not for further processing if the segment video is the figure viewed from behind.Detect through background, do not detect for the image of background carries out people's face judging.But in order to improve the treatment effeciency of system, we do not carry out global detection, but carry out local detection, and detection method has detailed elaboration in preamble, just do not repeat at this.Detect as degree of confidence t (the computing method preamble is stated) during by people's face less than threshold value t_min (preamble is stated), t_min can be made as 0.85 in the actual realization of system, give the user less than this value feedback sort like the information in " this person's face is not in the storehouse; be the stranger ", remind the user, can also be after the user confirms new people's face adds in the storehouse with this, the result can be existed in the face database.If greater than t_min, the expression recognition result reliably and in protoplast's face database has this person, inquires about and report this person's name automatically, writes down the time that it enters.This is the present invention's a kind of application in practice.
List of references:
[1] Hui H., Liu H., Wu Y., Liang Y.Video surveillance method based on is standard[J H.264] .Computer Applications, 2005,25 (11), 131-133.[Hui Marsha, Liu Han, Wu Yali, Liang Yanming. a kind of based on video encoding standard intelligent video monitoring technology [J] H.264. " computer utility ", 2005,25 (11), 131-133]
[2]Freund?Y.,Schapire?R.E.A?Decision-Theoretic?Generalization?of?Online?Learning?and?anApplication?to?Boosting.Journal?of?Computer?and?System?Sciences,1997,55(1):119-139
[3]Belhumeur?P,Hespanha?J.Eigenfaces?vs?Fisherfaces:recognition?using?class?specific?linearprojection[C],1997,IEEE?Transactions?on?Pattern?Analysis?and?Machine?Intelligence,20(7),711-720

Claims (4)

1.一种基于AVS的视频监控方法,其特征在于具体步骤如下:首先通过AVS网络摄像机采集AVS码流,使用AVS码流解码过程中的压缩域信息来进行背景和非背景的分类;当分类结果表明当前的帧不是背景时,进行人脸检测;当检测到人脸时,进行人脸识别,即将人脸数据进行变换后与训练数据进行比较;在识别结果被反馈给用户之前,先计算置信度t,t表明当前识别结果的可信程度;当置信度t小于阈值t_min时,认为该人脸不属于当前库中的数据,认定为陌生人,并将这个结果反馈给用户,经用户确认后将此新的人脸添加进库中;当置信度大于等于阈值t_min时,表明识别结果有较高的可信度,然后记录识别结果并对视频进行标注;这里AVS是指先进音视频编码标准。 1. A video monitoring method based on AVS is characterized in that concrete steps are as follows: first gather AVS code stream by AVS network camera, use the compression domain information in the AVS code stream decoding process to carry out the classification of background and non-background; The result shows that when the current frame is not the background, face detection is performed; when a face is detected, face recognition is performed, that is, the face data is transformed and compared with the training data; before the recognition result is fed back to the user, it is first calculated Confidence t, t indicates the credibility of the current recognition results; when the confidence t is less than the threshold t_min, it is considered that the face does not belong to the data in the current database, it is identified as a stranger, and this result is fed back to the user. After confirmation, add this new face into the library; when the confidence is greater than or equal to the threshold t_min, it indicates that the recognition result has a high degree of credibility, and then record the recognition result and mark the video; here AVS refers to advanced audio and video coding standards. 2.根据权利要求1所述的方法,其特征在于所述背景分类的方法为设 
Figure FSB00000556747000011
i为当前图像中的一个宏块的运动向量, 0≤i≤N-1;N为当前图像中宏块总数;
2. method according to claim 1, is characterized in that the method for described background classification is assuming
Figure FSB00000556747000011
i is the motion vector of a macroblock in the current image, 0≤i≤N-1; N is the total number of macroblocks in the current image;
用下式来计算当前图像中的运动强度: Use the following formula to calculate the motion intensity in the current image:
Figure FSB00000556747000013
公式(1)
Figure FSB00000556747000013
Formula 1)
其中,sizei表示第i个宏块的面积; Among them, size i represents the area of the i-th macroblock; 参数MS表示图像中运动的范围: The parameter MS indicates the range of motion in the image:
Figure FSB00000556747000014
公式(2)
Figure FSB00000556747000014
Formula (2)
当满足下列条件时,判定当前图像属于背景: When the following conditions are met, it is determined that the current image belongs to the background: MV<mv_min且MS<ms_min;这里mv_min为MV的阈值,ms_min为MS的阈值。 MV<mv_min and MS<ms_min; where mv_min is the threshold of MV, and ms_min is the threshold of MS.
3.根据权利要求1所述的方法,其特征在于所述的人脸识别的方法如下:从人脸检测中检测出的人脸图像进行尺寸统一缩放后,按照由左至右,由上至下扫描成样本向量,然后对样本向量进行降维;采用PCA与LDA结合的Fisher-Face算法进行人脸投影特征的抽取; 3. The method according to claim 1, characterized in that the method of described face recognition is as follows: after the face images detected in the face detection are scaled uniformly in size, according to from left to right, from top to Downscan into sample vectors, and then reduce the dimensionality of the sample vectors; use the Fisher-Face algorithm combining PCA and LDA to extract face projection features; 设人脸f经过Fisher-Face特征抽取后的样本向量为f’,f’=(u0,u1…uk),然后计算其与训练样本的距离: Let the sample vector of face f after Fisher-Face feature extraction be f', f'=(u 0 , u 1 ... u k ), and then calculate its distance from the training sample:
Figure FSB00000556747000015
公式(3)
Figure FSB00000556747000015
Formula (3)
其中fi’=(v0,v1…vk)表示库中的第i个训练样本,k为样本维数;d(f’,fi’) 表示当前待识别样本与库中第i个训练样本的距离; Where f i '=(v 0 , v 1 ... v k ) means the i-th training sample in the library, k is the sample dimension; d(f', fi') means the current sample to be identified and the i-th training sample in the library The distance of the training samples; 计算完f’与库中所有训练样本的距离后,找出与f’距离最小的前5个训练样本,fi1’,fi2’…fi5’;所述5个训练样本中同一类别数量较多的样本所属类别为c类;若5个样本各属一类,则以与f’距离最小的样本所属类作为c类,所述c类任指属于同一个人的样本类;用下列公式计算识别的置信度t: After calculating the distance between f' and all training samples in the library, find the first 5 training samples with the smallest distance from f', f i1 ', f i2 '...f i5 '; the number of the same category in the five training samples More samples belong to category c; if five samples belong to one category, the category of the sample with the smallest distance from f' is taken as c category, and the c category refers to any sample category belonging to the same person; use the following formula Calculate the confidence t of the recognition:
Figure FSB00000556747000021
公式(4)
Figure FSB00000556747000021
Formula (4)
当置信度t小于阈值t_min时,说明人脸为陌生人,将结果f反馈给用户,经用户确认后将此新的人脸添加进库中,否则表示识别结果可靠并记录结果。 When the confidence degree t is less than the threshold t_min, it means that the face is a stranger, and the result f is fed back to the user, and the new face is added to the library after the user confirms, otherwise, the recognition result is reliable and the result is recorded.
4.一种基于AVS的视频监控系统,其特征在于系统包括训练模块、标注模块和检索模块: 4. A video monitoring system based on AVS is characterized in that the system includes a training module, a labeling module and a retrieval module: 所述训练模块,包括监控环境背景的背景训练模块和人脸库的人脸训练模块,分别实施对环境背景训练和对人脸训练;所述训练模块的输入为人脸样本库和背景样本库,输出为各人脸特征和背景特征; Described training module, comprises the background training module of monitoring environment background and the human face training module of human face database, implements respectively to environmental background training and to human face training; The input of described training module is human face sample library and background sample library, The output is each face feature and background feature; 所述标注模块,用于对输入的监控视频进行自动标注,其包括背景检测模块、人脸检测模块、人脸识别模块和索引结构建立模块;所述标注模块的输入为训练模块得到的背景特征、人脸特征和待标注的监控视频,输出为待标注监控视频的检索索引; The labeling module is used to automatically label the monitoring video of the input, and it includes a background detection module, a face detection module, a face recognition module and an index structure building module; the input of the labeling module is the background feature obtained by the training module , facial features and the surveillance video to be marked, and the output is the retrieval index of the surveillance video to be marked; 所述检索模块,用于对指定监控视频进行检索,包括图片查询、文本查询和视频查询;输入为指定监控视频的索引,用户提交的图片、文本或小段视频,输出为用户所提交内容在监控视频中相应的视频片段信息; The retrieval module is used to retrieve the specified monitoring video, including image query, text query and video query; the input is the index of the specified monitoring video, the picture, text or small segment of video submitted by the user, and the output is the content submitted by the user in the monitoring Corresponding video segment information in the video; 所述背景训练模块,用于对输入的背景视频样本进行计算,得到背景模型,采用算法是基于HSV颜色空间,计算各像素属于背景的取值范围;所述背景训练模块的输入为背景视频样本,输出为用于背景比对的背景模型; The background training module is used to calculate the input background video sample to obtain the background model, and the algorithm is based on the HSV color space to calculate the value range that each pixel belongs to the background; the input of the background training module is the background video sample , the output is the background model for background comparison; 所述人脸训练模块,用于对人脸库中的人脸进行处理,采用算法是fisher-face;所述人脸训练模块的输入为人脸库样本,输出为由人脸库中人脸样本计算得到的变换矩阵以及各人脸的中心,所述变换矩阵将输入人脸样本变换为一维向量; The human face training module is used to process the human face in the human face bank, and the algorithm is fisher-face; the input of the human face training module is a human face bank sample, and the output is a human face sample in the human face bank The calculated transformation matrix and the center of each face, the transformation matrix transforms the input face sample into a one-dimensional vector; 所述背景检测模块,用于将输入帧图像与所述背景模型进行比对,目的是获知该输入帧是否为背景,若不是背景,则输入帧属于前景范围;所述背景检测模块的输入为背景模型和输入帧图像,输出为判断该输入帧是否为背景的结果; The background detection module is used to compare the input frame image with the background model, the purpose is to know whether the input frame is the background, if not the background, the input frame belongs to the foreground range; the input of the background detection module is The background model and the input frame image, the output is the result of judging whether the input frame is the background; 所述人脸检测模块,用于在非背景的帧图像中检测人脸;所述人脸检测模块的输入为 非背景的帧图像,输出为检测到的人脸图像; The human face detection module is used to detect human faces in the frame image of non-background; the input of the human face detection module is the frame image of non-background, and the output is the detected human face image; 所述人脸识别模块,用于对于检测到的人脸图像,使用所述变换矩阵,得到一维向量,采用欧式距离计算该向量与各人脸中心的相似度;所述人脸识别模块的输入为检测到的人脸图像和变换矩阵,输出为人脸识别的结果; The face recognition module is used to use the transformation matrix to obtain a one-dimensional vector for the detected face image, and use the Euclidean distance to calculate the similarity between the vector and each face center; the face recognition module The input is the detected face image and transformation matrix, and the output is the result of face recognition; 所述索引结构建立模块,用于对输入视频进行标注,依照人脸识别的结果,得到视频索引,并对索引建立索引结构;所述索引结构建立模块的输入为监控视频,输出为视频索引; The index structure building module is used to mark the input video, obtain a video index according to the result of face recognition, and build an index structure for the index; the input of the index structure building module is a surveillance video, and the output is a video index; 所述检索模块,用于使用户通过用户界面输入查询内容,根据用户提交的不同的内容格式进行检索,并通过用户界面反馈信息;所述检索模块的输入为用户提交的查询内容, The retrieval module is used to enable the user to input query content through the user interface, perform retrieval according to different content formats submitted by the user, and feed back information through the user interface; the input of the retrieval module is the query content submitted by the user, 输出为反馈给用户的视频片断信息。  The output is video segment information fed back to the user. the
CN2008102032020A 2008-11-21 2008-11-21 Video monitoring method and system based on advanced audio/video encoding standard Expired - Fee Related CN101419670B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102032020A CN101419670B (en) 2008-11-21 2008-11-21 Video monitoring method and system based on advanced audio/video encoding standard

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102032020A CN101419670B (en) 2008-11-21 2008-11-21 Video monitoring method and system based on advanced audio/video encoding standard

Publications (2)

Publication Number Publication Date
CN101419670A CN101419670A (en) 2009-04-29
CN101419670B true CN101419670B (en) 2011-11-02

Family

ID=40630456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102032020A Expired - Fee Related CN101419670B (en) 2008-11-21 2008-11-21 Video monitoring method and system based on advanced audio/video encoding standard

Country Status (1)

Country Link
CN (1) CN101419670B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101860731B (en) * 2010-05-20 2012-05-30 杭州普维光电技术有限公司 Video information processing method, system and server
CN102223520A (en) * 2011-04-15 2011-10-19 北京易子微科技有限公司 Intelligent face recognition video monitoring system and implementation method thereof
CN102932625A (en) * 2011-08-10 2013-02-13 上海康纬斯电子技术有限公司 Portable digital audio/video acquisition device
CN103475882B (en) * 2013-09-13 2017-02-15 北京大学 Surveillance video encoding and recognizing method and surveillance video encoding and recognizing system
CN104392439B (en) * 2014-11-13 2019-01-11 北京智谷睿拓技术服务有限公司 The method and apparatus for determining image similarity
CN104463117B (en) * 2014-12-02 2018-07-03 苏州科达科技股份有限公司 A kind of recognition of face sample collection method and system based on video mode
CN105654055A (en) * 2015-12-29 2016-06-08 广东顺德中山大学卡内基梅隆大学国际联合研究院 Method for performing face recognition training by using video data
CN106407281B (en) * 2016-08-26 2019-12-24 北京奇艺世纪科技有限公司 Image retrieval method and device
CN109446967B (en) * 2018-10-22 2022-01-04 深圳市梦网视讯有限公司 Face detection method and system based on compressed information
CN112085858A (en) * 2020-06-19 2020-12-15 北京筑梦园科技有限公司 Parking charging method, server and parking charging processing system
CN114360025A (en) * 2022-01-10 2022-04-15 山东工商学院 Image sample screening method and device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1428694A (en) * 2001-12-29 2003-07-09 成都银晨网讯科技有限公司 Embedded human face automatic detection equipment based on DSP and its method
CN1971630A (en) * 2006-12-01 2007-05-30 浙江工业大学 Access control device and check on work attendance tool based on human face identification technique
CN101236599A (en) * 2007-12-29 2008-08-06 浙江工业大学 Face recognition detection device based on multi-camera information fusion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1428694A (en) * 2001-12-29 2003-07-09 成都银晨网讯科技有限公司 Embedded human face automatic detection equipment based on DSP and its method
CN1971630A (en) * 2006-12-01 2007-05-30 浙江工业大学 Access control device and check on work attendance tool based on human face identification technique
CN101236599A (en) * 2007-12-29 2008-08-06 浙江工业大学 Face recognition detection device based on multi-camera information fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JP特开2007-323318A 2007.12.13

Also Published As

Publication number Publication date
CN101419670A (en) 2009-04-29

Similar Documents

Publication Publication Date Title
CN101419670B (en) Video monitoring method and system based on advanced audio/video encoding standard
Wang et al. Generative neural networks for anomaly detection in crowded scenes
Zhang et al. Weakly supervised video anomaly detection via transformer-enabled temporal relation learning
Liu et al. Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition
Mahmoodi et al. Violence detection in videos using interest frame extraction and 3D convolutional neural network
Duong et al. Shrinkteanet: Million-scale lightweight face recognition via shrinking teacher-student networks
CN112183468A (en) Pedestrian re-identification method based on multi-attention combined multi-level features
CN110889672A (en) A deep learning-based detection system for student punch-in and class status
CN103593464A (en) Video fingerprint detecting and video sequence matching method and system based on visual features
Babu et al. Compressed domain action classification using HMM
Sanli et al. Face detection and recognition for automatic attendance system
Munjal et al. Knowledge distillation for end-to-end person search
Aakur et al. Action localization through continual predictive learning
Zheng et al. Attention assessment based on multi‐view classroom behaviour recognition
CN117058595A (en) Video semantic features and scalable granularity-aware temporal action detection method and device
Caetano et al. Activity recognition based on a magnitude-orientation stream network
Yin et al. Chinese sign language recognition based on two-stream CNN and LSTM network
Pouthier et al. Active speaker detection as a multi-objective optimization with uncertainty-based multimodal fusion
CN116503945B (en) Small sample action recognition method based on Transformer and dislocation alignment strategy
Shaikh et al. Maivar-t: Multimodal audio-image and video action recognizer using transformers
Zhou et al. Preserve pre-trained knowledge: Transfer learning with self-distillation for action recognition
Zhou et al. Recognizing pair-activities by causality analysis
CN116682430A (en) Miniature centralized control recording and broadcasting system
Mejri et al. Facial Region-Based Ensembling for Unsupervised Temporal Deepfake Localization
Lee et al. Video summarization based on face recognition and speaker verification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111102

Termination date: 20141121

EXPY Termination of patent right or utility model