[go: up one dir, main page]

CN110490060B - Security protection front-end video equipment based on machine learning hardware architecture - Google Patents

Security protection front-end video equipment based on machine learning hardware architecture Download PDF

Info

Publication number
CN110490060B
CN110490060B CN201910621068.4A CN201910621068A CN110490060B CN 110490060 B CN110490060 B CN 110490060B CN 201910621068 A CN201910621068 A CN 201910621068A CN 110490060 B CN110490060 B CN 110490060B
Authority
CN
China
Prior art keywords
video
target
feature
local
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910621068.4A
Other languages
Chinese (zh)
Other versions
CN110490060A (en
Inventor
寇京珅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Terminus Beijing Technology Co Ltd
Original Assignee
Terminus Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Terminus Beijing Technology Co Ltd filed Critical Terminus Beijing Technology Co Ltd
Priority to CN201910621068.4A priority Critical patent/CN110490060B/en
Publication of CN110490060A publication Critical patent/CN110490060A/en
Application granted granted Critical
Publication of CN110490060B publication Critical patent/CN110490060B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses security and protection front-end video equipment based on a machine learning hardware architecture. The invention is based on the framework of machine learning, can automatically extract the character target belonging to the designated target from the video picture, and tracks the character target, thereby improving the imaging quality of the character target. The invention is based on the machine learning principle under the neural network architecture, can adapt to the time-varying property of the human target in different frames of video pictures, directly drives the front-end video camera equipment by the machine learning identification result, improves the response speed and reduces the delay.

Description

Security protection front-end video equipment based on machine learning hardware architecture
Technical Field
The invention relates to the technical field of intelligent security, in particular to security front-end video equipment based on a machine learning hardware architecture.
Background
In application scenes such as smart cities, smart buildings and smart communities, security video monitoring is used as an infrastructure facility and plays an increasingly important role.
The security video monitoring system is generally divided into a front end and a background, and uplink and downlink data transmission between the front end and the background is realized through a cellular network, a wired network, a coaxial cable or various internet of things. The front-end equipment is a video camera and a cloud deck and is responsible for shooting and uploading monitoring video pictures so as to archive, analyze and display components such as a background server, a television wall and the like; and the front-end equipment can adjust the shooting direction according to the instruction given by the background, and change the view finding range of the monitoring video picture.
With the development of software and hardware technologies, especially with the maturity of intelligent technologies such as image analysis, target extraction, scene recognition and the like, functions which can be provided by security video monitoring are increasingly diversified, the aspects of tracking, alarming and the like are expanded from pure monitoring, full automation becomes a development trend, and the dependence on artificial observation and remote control is remarkably reduced.
However, research, development and innovation of an intelligent security video monitoring system are mainly embodied in a background framework and algorithm, and the front end always keeps the traditional video acquisition and uploading function, which is far from enough. Firstly, the more powerful the intelligent analysis function of security monitoring, the higher the imaging quality requirement of video pictures, the real-time optimization of factors including focusing definition, imaging brightness and the like, the fact that the front-end equipment has the autonomous regulation capability is of great significance to achieving the point, if the remote instruction issuing of a background is still relied on, not only is the communication load increased, but also the real-time performance obviously cannot follow the requirement. Moreover, at present, the number of the security monitoring front-end video devices is rapidly increased, the point locations are more and more dense, and massive data are all processed by the background with great difficulty, so that the operation related to the autonomous adjustment of the front-end devices is expected to be completed by the front-end devices. It is seen that designing security front-end video equipment with intelligent architecture has become an urgent task.
Disclosure of Invention
Objects of the invention
In view of the needs in the prior art, the present invention provides a security front-end video device based on a machine learning hardware architecture. The invention is based on the framework of machine learning, can automatically extract the character target belonging to the designated target from the video picture, and tracks the character target, thereby improving the imaging quality of the character target and forming a virtuous circle.
(II) technical scheme
The security front-end video device of the invention comprises: the device comprises a video camera device, a networking communication device, a video analysis device, a driving interface device and a three-axis rotating holder. The video camera device is used for shooting video pictures in the visual field range; the networked communication device is used for acquiring the video pictures from the video camera device, uploading the video pictures to a control center at the rear end, and receiving the designated target characteristic information from the control center; the video analysis device is used for obtaining the video pictures from the video camera device and obtaining the specified target characteristic information from the network communication device; judging whether a character target meeting the specified target characteristic information exists in the current video picture or not according to the specified target characteristic information, and determining the relative position of the character target in the video picture relative to the picture center under the condition of existence; the driving interface device calculates and calculates displacement required for imaging the human target on the center of the picture according to the relative position determined by the video analysis device, and outputs a driving signal according to the displacement; and the three-axis rotating holder rotates according to the driving signal to adjust the visual field range of the video camera device.
Preferably, the video analysis device specifically includes: the system comprises a video image acquisition and enhancement module, a target local search module, a depth feature extraction neural network module and a fusion classification module; the video picture acquisition and enhancement module acquires video pictures from the video camera device frame by frame and carries out pre-processing of filtering and color enhancement on the video pictures; the target local search module traverses the whole video picture by a template with a preset scale in a certain step length, extracts a characteristic vector by using a local maximum pooling algorithm in the coverage range of the template, and then performs dimension reduction on the whole characteristic vector; matching the feature vectors contained in the specified target feature information with the extracted local feature vectors, and if the two feature vectors are matched, taking the local video image covered by the template as a candidate character target area; the depth feature extraction neural network module is used for extracting depth features from the candidate character target area; the fusion classification module obtains the local feature vector and the depth feature information of the candidate character target area, and then fuses the local feature vector and the depth feature information and judges whether the candidate character target area belongs to the appointed character target or not by utilizing the depth feature based on a supervised training learning mechanism.
Preferably, the dimension reduction processing of the target local search module includes: and (3) cutting the feature vector data into blocks with a preset size, and then respectively obtaining the origin moment and the central moment of each data block, thereby realizing the dimension reduction of the whole feature vector and obtaining the local feature vector of the video picture.
Preferably, the target local search module matches the feature vector included in the specified target feature information with the extracted local feature vector, and specifically includes: and calculating the Hamming distance of the two feature vectors, converting the Hamming distance into the matching degree according to the Hamming distance, presetting a confidence threshold value of the matching degree, and considering that the two feature vectors are matched when the matching degree of the two feature vectors is greater than the confidence threshold value.
Preferably, the depth feature extraction neural network module performs depth feature extraction and pooling on the image by using a depth residual convolution neural network, the input candidate human target region is sequentially processed by convolution layers of each layer and a maximum pooling layer, each convolution layer convolves the image region to obtain a feature map, each maximum pooling layer performs pooling on the feature map output by the corresponding convolution layer according to a maximum value principle to generate a pooled feature map, and the feature map pooled by the last pooling layer is used as the depth feature.
Preferably, the fusion classification module obtains the local feature vector and the depth feature information of the candidate human target region, further fuses the local feature vector and the depth feature information, performs full-link nonlinear activation processing through a plurality of linearly weighted full-link layers, and finally determines whether the candidate human target region belongs to the specified human target by the classifier.
Preferably, the generation process of the specified target feature information received by the networked communication device from the control center includes: traversing a person target picture area appointed by security workers from a video picture by using a template with a preset scale in a certain step length, and extracting a feature vector from the person target picture area by using a local maximum pooling algorithm within the coverage range of the template; then, performing dimension reduction processing on the whole feature vector; and the dimension reduction is to divide the data of the feature vector into data blocks in a preset size, then respectively obtain the origin moment and the central moment of each data block, and combine the origin moment and the central distance of each data block to form the specified target feature information, thereby realizing the dimension reduction of the whole feature vector.
Preferably, the fusion classification module represents the total feature information obtained by fusing the local feature vector and the depth feature information as follows:
TR=<TL,TD>
wherein, TRFor the fused total feature information, TLAs local feature vectors, TDInputting the fused total feature information into a multilayer full-connection layer for depth feature information, carrying out nonlinear activation and normalization processing on the multilayer full-connection layer, and substituting the generated feature vector into a classifier.
Preferably, the nonlinear activation function of the multilayer fully-connected layer is expressed as
z=h(Wf·TR+bf)
Wherein WfRepresenting the weight value of each of the fully-connected layers, bfRepresenting a migration vector.
Preferably, the multi-layer fully-connected layer performs normalization processing on the feature vector z subjected to the non-linear activation of the fully-connected layer, and enables the feature vector z to be subjected to the non-linear activation of the fully-connected layer
Figure BDA0002125548080000051
Wherein muzAnd σzRespectively representing the mean and variance of the feature vector z.
(III) advantageous effects
The invention has the following beneficial effects: the invention is based on the machine learning principle under the neural network framework, can adapt to the time-varying property of the human target in different frame video pictures, the invention fuses the characteristic information of the appointed target and the depth characteristic information extracted through the depth neural network, can realize the identification of the human target with higher success rate and accuracy, adopt 38000 multiple video pictures containing more than 1000 pedestrians to verify, the success rate can reach more than 97.5% through the experiment; the invention provides a hardware architecture, and the result of machine learning identification directly drives the front-end video camera equipment, thereby improving the response speed and reducing the delay.
Drawings
The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining and illustrating the present invention and should not be construed as limiting the scope of the present invention.
FIG. 1 is a schematic structural diagram of a security front-end video device disclosed by the present invention;
FIG. 2 is a schematic structural diagram of a video analysis apparatus in the security front-end video device disclosed in the present invention;
fig. 3 is a schematic diagram of a deep feature extraction neural network module in the security front-end video device disclosed by the present invention.
Detailed Description
In order to make the implementation objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be described in more detail below with reference to the accompanying drawings in the embodiments of the present invention.
It should be noted that: the embodiments described are some embodiments of the present invention, not all embodiments, and features in embodiments and embodiments in the present application may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides security front-end video equipment which can be arranged in indoor and outdoor security monitoring spaces. The security front-end video equipment has a hardware architecture based on machine learning, can accurately identify and lock a person target according to a shot video picture, further automatically adjusts a shooting angle, realizes continuous tracking of a specific person target, and keeps the person target in a central area of the video picture, so that photometry and focusing parameters can be accurately defined, and the person shooting effect can be guaranteed.
Specifically, as shown in fig. 1, the security front-end video device of the present invention includes: the device comprises a video camera device, a networking communication device, a video analysis device, a driving interface device and a three-axis rotating holder.
The video camera device is used for shooting video pictures in the visual field range of the video camera device. In order to ensure the visual effect of the security monitoring video image and provide good image quality for subsequent application links such as character identification, evidence obtaining and the like, the video image, particularly the image of the locked character target in the video image, is expected to be clear. Therefore, it is very critical to ensure the imaging quality by adopting the correct exposure and aggregation parameters relative to the human target, if the exposure is not correct, the human target is too dark or too bright, the key information of five sense organs, clothing and the like cannot be identified, and if the focus is not correct, the human image is blurred, and the identification degree is also seriously influenced. From the standpoint of proper exposure and focus parameters, it is advantageous to have the human target image as much as possible with the center region of the video frame. Firstly, exposure parameters of a video camera device are determined according to measurement and calculation of the brightness of a video picture in a framing process, and a general camera device can support measurement and calculation of the average brightness of a complete picture and the average brightness of a central area of the picture; because the imaging area of the human target is only a local area when viewed in the whole video picture, if the average brightness of the whole picture is measured and calculated and exposure parameters are determined accordingly, the imaging area is affected by a large number of other picture areas except the human target, and the exposure of the human target is easily caused to be improper, for example, under the condition that the human target area is dark, but other areas in the picture are bright and the contrast of light and shade is large, the exposure parameters determined according to the average brightness of the whole picture are easy to cause the exposure of the human target; on the contrary, after the human target is imaged in the central area of the video picture, the average brightness of the central area of the picture can be measured instead, and the exposure parameter is set accordingly, so that the exposure parameter is matched with the light and shade state of the human target. Secondly, imaging the human target in the central area of the video image is also beneficial to fast and accurate focusing, and the imaging distortion can be avoided. In addition, from the view point of tracking the human target, imaging the human target in the center of the picture can also prevent the human from moving out of the shooting visual field to the maximum extent.
The network communication device is used for obtaining the video pictures from the video camera device, uploading the video pictures to a control center at the rear end, and receiving the specified target characteristic information from the control center. The networking communication device can upload the video pictures obtained from the video camera device to a control center at the rear end in real time based on 4G, 5G or other Internet of things communication technologies. The control center can display video pictures on equipment such as a PC (personal computer), a television wall and the like for security workers to check. And if the security worker determines that a certain person target is worth locking and continuously tracking from the video picture, the person target can be specified in the video picture by clicking the picture area where the person target is located in the video picture by using a tool such as a mouse. The control center can further obtain the specified target characteristic information and send the specified target characteristic information to the corresponding security front-end video equipment. The security front-end video equipment receives the specified target characteristic information from the control center through the networking communication device.
The process of the control center for obtaining the designated target feature information comprises the following steps: traversing a person target picture area appointed by security workers from a video picture by using a template with a preset scale in a certain step length, and extracting a feature vector from the person target picture area by using a local maximum pooling algorithm within the coverage range of the template; then, performing dimension reduction processing on the whole feature vector; and the dimension reduction is to divide the data of the feature vector into data blocks in a preset size, then respectively obtain the origin moment and the central moment of each data block, and combine the origin moment and the central distance of each data block to form the specified target feature information, thereby realizing the dimension reduction of the whole feature vector.
The video analysis device is used for obtaining the video pictures from the video camera device and obtaining the specified target characteristic information from the network communication device; and judging whether a character target meeting the specified target characteristic information exists in the current video picture or not according to the specified target characteristic information, and determining the relative position of the character target in the video picture relative to the picture center if the character target exists. The video analysis apparatus determines whether or not there is a human target in the video picture that matches the specified target feature information, as will be described in detail below.
The driving interface device calculates and calculates displacement required for imaging the human target on the center of the picture according to the relative position determined by the video analysis device, and outputs a driving signal according to the displacement;
and the three-axis rotating holder rotates according to the driving signal, and adjusts the visual field range of the video camera device, so that the character target is kept in the central area of the picture.
The video analysis device adopts a machine learning technology, utilizes a trained neural network, judges whether a human target which accords with the characteristic information of the specified target exists in a video picture based on the characteristic information of the specified target, and determines the relative position of the human target in the video picture relative to the picture center if the human target exists. The following describes the structure and operation of the video analyzer in detail with reference to fig. 2.
As shown in fig. 2, the video analysis apparatus specifically includes: the system comprises a video image acquisition and enhancement module, a target local search module, a depth feature extraction neural network module and a fusion classification module.
The video picture acquiring and enhancing module acquires video pictures from the video camera device frame by frame and carries out filtering and color enhancing pretreatment on the video pictures.
The target local search module traverses the whole video picture by a template with a preset scale in a certain step length, extracts a characteristic vector by using a local maximum pooling algorithm in the coverage range of the template, and then performs dimension reduction on the whole characteristic vector; the dimension reduction processing is to cut feature vector data into blocks with a preset size, and then to obtain an origin moment and a central moment of each data block, so that the dimension reduction of the whole feature vector is realized, and the local feature vector of a video picture is obtained. Furthermore, the target local search module matches the feature vectors included in the specified target feature information with the extracted local feature vectors, specifically, a hamming distance between two feature vectors may be calculated, the hamming distance is converted into a matching degree according to the hamming distance, a confidence threshold of the matching degree is preset, when the matching degree of the two feature vectors is greater than the confidence threshold, the two feature vectors are considered to be matched, and then the video picture part covered by the template is used as a candidate person target area.
And the target local searching module determines one or more candidate character target areas from the current video picture in a characteristic vector matching mode according to the specified target characteristic information. However, even if the same person target is in different frames of video pictures, the characteristics of the image will change greatly due to the difference between the shooting angle and the imaging factor, so in order to ensure the success rate and reliability of the recognition, on one hand, a confidence threshold needs to be set properly, the requirement on the matching degree is not too high, and the robustness is proper, so that the candidate person target area can be obtained from the video pictures, and on the other hand, the depth characteristics can be continuously extracted from the candidate person target area through the depth characteristic extraction neural network module, and based on the supervised training learning mechanism, the depth characteristics are utilized to judge whether the candidate person target area belongs to the designated person target.
The depth feature extraction neural network module uses a depth residual convolution neural network to perform depth feature extraction and pooling on the image; specifically, the size of each candidate human target area is adjusted to a preset size, and the size is transmitted to a depth residual convolution neural network; and the depth residual convolution neural network adopts a ResNet5 model, the input candidate human target area is sequentially processed by the convolution layer and the maximum pooling layer of each layer, the dimension is gradually reduced in the process, and the depth characteristic of the candidate human target area after the dimension reduction is generated. Specifically, as shown in fig. 3, the depth feature extraction neural network module includes alternating convolutional layers and maximum pooling layers, the first convolutional layer convolves the image region to obtain a feature map F1, then the first maximum pooling layer performs pooling on the feature map F1 according to the maximum principle to generate a pooled feature map C1, then the feature map F2 is obtained after entering the second convolutional layer convolution, the pooled feature map C2 is generated after entering the second pooling layer, and so on, dimension reduction is performed step by step in the process, and finally, the feature map Cn after the last pooled layer is used as the depth feature.
And the fusion classification module obtains the local feature vector and the depth feature information of the candidate character target area, further fuses the local feature vector and the depth feature information, performs full-connection nonlinear activation processing through a plurality of linearly weighted full-connection layers, and finally judges whether the candidate character target area belongs to the appointed character target by the classifier. Specifically, the total feature information obtained by fusing the local feature vector and the depth feature information is represented as:
TR=<TL,TD>
wherein, TRFor the fused total feature information, TLAs local feature vectors, TDInputting the fused total feature information into a multi-layer full-connection layer for depth feature information, carrying out nonlinear activation on the multi-layer full-connection layer, and expressing a nonlinear activation function as
z=h(Wf·TR+bf)
Wherein WfRepresenting the weight value of each of the fully-connected layers, bfRepresenting a migration vector; performing normalization processing on the feature vector z subjected to nonlinear activation of the full connection layer, and enabling
Figure BDA0002125548080000101
Wherein muzAnd σzRespectively representing the mean value and the variance of the feature vector z, substituting the normalized feature vector z' into a classifier, wherein the classifier can adopt an SVM classifier, and judging whether the candidate character target area belongs to the appointed character target through the classifier which realizes training. The judgment result of the fusion classification module is used as the judgment of the video analysis device on whether the alternative character target in the video picture belongs to the character target of the specified target characteristic information; when the determination is yes, the video analysis device can determine the relative position of the human target in the video picture with respect to the picture center.
The invention is based on the machine learning principle under the neural network framework, can adapt to the time-varying property of the human target in different frame video pictures, the invention fuses the characteristic information of the appointed target and the depth characteristic information extracted through the depth neural network, can realize the identification of the human target with higher success rate and accuracy, adopt 38000 multiple video pictures containing more than 1000 pedestrians to verify, the success rate can reach more than 97.5% through the experiment; the invention provides a hardware architecture, and the result of machine learning identification directly drives the front-end video camera equipment, thereby improving the response speed and reducing the delay.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (6)

1. The utility model provides a security protection front end video equipment based on machine learning hardware architecture which characterized in that includes: the system comprises a video camera device, a networking communication device, a video analysis device, a driving interface device and a three-axis rotating holder; the video camera device is used for shooting video pictures in the visual field range; the networked communication device is used for acquiring the video pictures from the video camera device, uploading the video pictures to a control center at the rear end, and receiving the designated target characteristic information from the control center; the video analysis device is used for obtaining the video pictures from the video camera device and obtaining the specified target characteristic information from the network communication device; judging whether a character target meeting the specified target characteristic information exists in the current video picture or not according to the specified target characteristic information, and determining the relative position of the character target in the video picture relative to the picture center under the condition of existence; the driving interface device determines the relative position according to the video analysis device, calculates and calculates the displacement required by imaging the human target on the center of the picture, and outputs a driving signal according to the displacement; the three-axis rotating holder rotates according to the driving signal to adjust the visual field range of the video camera device;
the generation process of the specified target characteristic information received by the networked communication device from the control center comprises the following steps: traversing a person target picture area appointed by security workers from a video picture by using a template with a preset scale in a certain step length, and extracting a feature vector from the person target picture area by using a local maximum pooling algorithm within the coverage range of the template; then, performing dimension reduction processing on the whole feature vector; the dimension reduction is to divide the data of the feature vector into data blocks according to a preset size, then respectively obtain an origin moment and a central moment of each data block, and combine the origin moment and the central moment of each data block to form the specified target feature information, so that the dimension reduction of the whole feature vector is realized;
the video analysis device specifically includes: the system comprises a video image acquisition and enhancement module, a target local search module, a depth feature extraction neural network module and a fusion classification module; the video picture acquisition and enhancement module acquires video pictures from the video camera device frame by frame and carries out pre-processing of filtering and color enhancement on the video pictures; the target local search module traverses the whole video picture by a template with a preset scale in a certain step length, extracts a characteristic vector by using a local maximum pooling algorithm in the coverage range of the template, and then performs dimension reduction on the whole characteristic vector; matching the feature vectors contained in the specified target feature information with the extracted local feature vectors, and if the two feature vectors are matched, taking the local video image covered by the template as a candidate character target area; the depth feature extraction neural network module is used for extracting depth features from the candidate character target area; the fusion classification module obtains the local feature vector and the depth feature information of the candidate character target area, and then fuses the local feature vector and the depth feature information and judges whether the candidate character target area belongs to the appointed character target or not by utilizing the depth feature based on a supervised training learning mechanism;
the depth feature extraction neural network module uses a depth residual convolution neural network to carry out depth feature extraction and pooling on the image, the input candidate person target area sequentially passes through the processing of the convolution layers and the maximum pooling layer of each layer, each convolution layer convolves the image area to obtain a feature map, each maximum pooling layer performs pooling on the feature map output by the corresponding convolution layer according to the maximum principle to generate a pooled feature map, and the feature map pooled by the last pooling layer is used as a depth feature;
the fusion classification module obtains the local feature vector and the depth feature information of the candidate character target area, then fuses the local feature vector and the depth feature information, performs full-connection nonlinear activation processing through a plurality of linearly weighted full-connection layers, and finally judges whether the candidate character target area belongs to the appointed character target or not through a classifier.
2. The machine learning hardware architecture based security front-end video device of claim 1, wherein the dimensionality reduction process of the target local search module comprises: and (3) cutting the feature vector data into blocks with a preset size, and then respectively obtaining the origin moment and the central moment of each data block, thereby realizing the dimension reduction of the whole feature vector and obtaining the local feature vector of the video picture.
3. The security front-end video device based on the machine learning hardware architecture of claim 2, wherein the target local search module matches a feature vector included in the specified target feature information with the extracted local feature vector, and specifically includes: and calculating the Hamming distance of the two feature vectors, converting the Hamming distance into the matching degree according to the Hamming distance, presetting a confidence threshold value of the matching degree, and considering that the two feature vectors are matched when the matching degree of the two feature vectors is greater than the confidence threshold value.
4. The machine learning hardware architecture based security front-end video device of claim 1, wherein the fusion classification module represents the local feature vectors and the total feature information after the depth feature information is fused as:
Figure DEST_PATH_IMAGE002A
wherein,
Figure DEST_PATH_IMAGE004A
in order to obtain the fused total feature information,
Figure DEST_PATH_IMAGE006A
in the form of a local feature vector,
Figure DEST_PATH_IMAGE008A
inputting the fused total feature information into a multilayer full-connection layer for depth feature information, carrying out nonlinear activation and normalization processing on the multilayer full-connection layer, and substituting the generated feature vector into a classifier.
5. The machine learning hardware architecture based security front end video device of claim 4, wherein the non-linear activation function of the multi-layer fully connected layer is expressed as
Figure DEST_PATH_IMAGE010A
Wherein
Figure DEST_PATH_IMAGE012A
Representing the weight value of each of the fully connected layers,
Figure DEST_PATH_IMAGE014A
representing a migration vector.
6. The machine learning hardware architecture based security front end video device of claim 5, wherein multiple fully connected layers are non-linearly activated for feature vectors passing through fully connected layers
Figure DEST_PATH_IMAGE016A
Performing a normalization process on
Figure DEST_PATH_IMAGE018A
Wherein
Figure DEST_PATH_IMAGE020A
And
Figure DEST_PATH_IMAGE022A
respectively representing the mean and variance of the feature vector Z.
CN201910621068.4A 2019-07-10 2019-07-10 Security protection front-end video equipment based on machine learning hardware architecture Active CN110490060B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910621068.4A CN110490060B (en) 2019-07-10 2019-07-10 Security protection front-end video equipment based on machine learning hardware architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910621068.4A CN110490060B (en) 2019-07-10 2019-07-10 Security protection front-end video equipment based on machine learning hardware architecture

Publications (2)

Publication Number Publication Date
CN110490060A CN110490060A (en) 2019-11-22
CN110490060B true CN110490060B (en) 2020-09-11

Family

ID=68545944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910621068.4A Active CN110490060B (en) 2019-07-10 2019-07-10 Security protection front-end video equipment based on machine learning hardware architecture

Country Status (1)

Country Link
CN (1) CN110490060B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105118082A (en) * 2015-07-30 2015-12-02 科大讯飞股份有限公司 Personalized video generation method and system
CN105718887A (en) * 2016-01-21 2016-06-29 惠州Tcl移动通信有限公司 Shooting method and shooting system capable of realizing dynamic capturing of human faces based on mobile terminal
CN108229444A (en) * 2018-02-09 2018-06-29 天津师范大学 A kind of pedestrian's recognition methods again based on whole and local depth characteristic fusion
CN108280418A (en) * 2017-12-12 2018-07-13 北京深醒科技有限公司 The deception recognition methods of face image and device
CN108415937A (en) * 2018-01-24 2018-08-17 博云视觉(北京)科技有限公司 A kind of method and apparatus of image retrieval
WO2018215861A1 (en) * 2017-05-24 2018-11-29 Kpit Technologies Limited System and method for pedestrian detection
CN108960140A (en) * 2018-07-04 2018-12-07 国家新闻出版广电总局广播科学研究院 The pedestrian's recognition methods again extracted and merged based on multi-region feature
CN109034044A (en) * 2018-06-14 2018-12-18 天津师范大学 A kind of pedestrian's recognition methods again based on fusion convolutional neural networks
CN109063559A (en) * 2018-06-28 2018-12-21 东南大学 A kind of pedestrian detection method returned based on improvement region
CN109101865A (en) * 2018-05-31 2018-12-28 湖北工业大学 A kind of recognition methods again of the pedestrian based on deep learning
CN109325967A (en) * 2018-09-14 2019-02-12 腾讯科技(深圳)有限公司 Method for tracking target, device, medium and equipment
CN109816012A (en) * 2019-01-22 2019-05-28 南京邮电大学 A multi-scale object detection method fused with context information
CN109934177A (en) * 2019-03-15 2019-06-25 艾特城信息科技有限公司 Pedestrian recognition methods, system and computer readable storage medium again

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170213080A1 (en) * 2015-11-19 2017-07-27 Intelli-Vision Methods and systems for automatically and accurately detecting human bodies in videos and/or images
US20190130189A1 (en) * 2017-10-30 2019-05-02 Qualcomm Incorporated Suppressing duplicated bounding boxes from object detection in a video analytics system

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105118082A (en) * 2015-07-30 2015-12-02 科大讯飞股份有限公司 Personalized video generation method and system
CN105718887A (en) * 2016-01-21 2016-06-29 惠州Tcl移动通信有限公司 Shooting method and shooting system capable of realizing dynamic capturing of human faces based on mobile terminal
WO2018215861A1 (en) * 2017-05-24 2018-11-29 Kpit Technologies Limited System and method for pedestrian detection
CN108280418A (en) * 2017-12-12 2018-07-13 北京深醒科技有限公司 The deception recognition methods of face image and device
CN108415937A (en) * 2018-01-24 2018-08-17 博云视觉(北京)科技有限公司 A kind of method and apparatus of image retrieval
CN108229444A (en) * 2018-02-09 2018-06-29 天津师范大学 A kind of pedestrian's recognition methods again based on whole and local depth characteristic fusion
CN109101865A (en) * 2018-05-31 2018-12-28 湖北工业大学 A kind of recognition methods again of the pedestrian based on deep learning
CN109034044A (en) * 2018-06-14 2018-12-18 天津师范大学 A kind of pedestrian's recognition methods again based on fusion convolutional neural networks
CN109063559A (en) * 2018-06-28 2018-12-21 东南大学 A kind of pedestrian detection method returned based on improvement region
CN108960140A (en) * 2018-07-04 2018-12-07 国家新闻出版广电总局广播科学研究院 The pedestrian's recognition methods again extracted and merged based on multi-region feature
CN109325967A (en) * 2018-09-14 2019-02-12 腾讯科技(深圳)有限公司 Method for tracking target, device, medium and equipment
CN109816012A (en) * 2019-01-22 2019-05-28 南京邮电大学 A multi-scale object detection method fused with context information
CN109934177A (en) * 2019-03-15 2019-06-25 艾特城信息科技有限公司 Pedestrian recognition methods, system and computer readable storage medium again

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
INTEGRATION OF DEEP FEATURES AND HAND-CRAFTED FEATURES FOR PERSON RE- IDENTIFICATION;Sutong Zheng 等;《Proceedings of the IEEE International Conference on Multimedia and Expo Workshops (ICMEW) 2017》;20171231;674-679 *
Pedestrian Detection with Deep Convolutional Neural Network;Xiaogang Chen 等;《ACCV 2014 Workshops》;20151231;第1节,第2节,图1,图3 *
Person Re-identification in the Wild;Liang Zheng 等;《2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20171109;1367-1376 *
Refine Pedestrian Detections by Referring to Features in Different Ways;Jaemyung Lee 等;《2017 IEEE Intelligent Vehicles Symposium (IV)》;20171231;418-423 *
基于多层深度特征融合的行人再识别研究;张丽红 等;《测试技术学报》;20181231;第32卷(第4期);318-322 *
基于特征融合网络的行人重识别;种衍杰 等;《计算机系统应用》;20181226;第28卷(第1期);127−133 *
多尺度局部特征选择的行人重识别算法;徐家臻 等;《计算机工程与应用》;20190321;第56卷(第2期);141-145 *

Also Published As

Publication number Publication date
CN110490060A (en) 2019-11-22

Similar Documents

Publication Publication Date Title
CN109145759B (en) Vehicle attribute identification method, device, server and storage medium
CN110084165B (en) Intelligent identification and early warning method for abnormal events in open scene of power field based on edge calculation
CN108615226B (en) An Image Dehazing Method Based on Generative Adversarial Networks
CN109887040A (en) Active sensing method and system of moving target for video surveillance
US20220148292A1 (en) Method for glass detection in real scenes
CN110490907A (en) Motion target tracking method based on multiple target feature and improvement correlation filter
CN118570312B (en) A multi-camera collaborative calibration method and application for dynamic vision sensors
CN109934108A (en) A multi-target and multi-type vehicle detection and ranging system and implementation method
CN111931654A (en) Intelligent monitoring method, system and device for personnel tracking
CN105184229A (en) Online learning based real-time pedestrian detection method in dynamic scene
CN110796580B (en) Intelligent traffic system management method and related products
CN103888731A (en) Structured description device and system for mixed video monitoring by means of gun-type camera and dome camera
CN117237844A (en) Firework detection method based on YOLOV8 and fusing global information
CN119672613B (en) A surveillance video information intelligent processing system based on cloud computing
Babu et al. Development and performance evaluation of enhanced image dehazing method using deep learning networks
CN113628251B (en) Smart hotel terminal monitoring method
CN119723421A (en) A method for low-altitude target recognition and real-time tracking in AI video based on deep learning
CN114067273A (en) Night airport terminal thermal imaging remarkable human body segmentation detection method
CN110490060B (en) Security protection front-end video equipment based on machine learning hardware architecture
CN116824641B (en) Gesture classification method, device, equipment and computer storage medium
CN116862832A (en) A method for positioning workers based on three-dimensional real-life models
CN114708544A (en) Intelligent violation monitoring helmet based on edge calculation and monitoring method thereof
CN120068002B (en) Adaptive image processing method and system based on annihilation neural network
CN119579456B (en) An automatic image defogging method based on artificial intelligence
CN116503406B (en) Water conservancy project information management system based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant