CN118609061B

CN118609061B - Security check equipment control method, device and equipment based on AI identification and storage medium

Info

Publication number: CN118609061B
Application number: CN202410964886.5A
Authority: CN
Inventors: 张大春; 张远彪; 杜传宗
Original assignee: Shenzhen Amway Electronics Co ltd
Current assignee: Shenzhen Amway Electronics Co ltd
Priority date: 2024-07-18
Filing date: 2024-07-18
Publication date: 2025-03-11
Anticipated expiration: 2044-07-18
Also published as: CN118609061A

Abstract

The present application relates to the field of AI recognition technology, and discloses a security inspection equipment control method, device, equipment and storage medium based on AI recognition. The method includes: photographing a person passing through the security inspection equipment through an image acquisition terminal to obtain an original face image; performing adaptive Gaussian smoothing and standardization processing on the original face image to obtain a pre-processed face image; using a multi-stage repair network to extract features from the pre-processed face image to obtain multi-scale face features; inputting the multi-scale face features into a face recognition model, iteratively matching the similarity from data to class to obtain a candidate identity list; combining the biometric data collected by the security inspection equipment, performing multi-modal fusion and Bayesian reasoning on the candidate identity list to obtain a comprehensive identity recognition result, and controlling the opening or closing of the security inspection equipment. The present application improves the accuracy of personnel identification matching, thereby improving the accuracy of security inspection equipment control.

Description

Security check equipment control method, device and equipment based on AI identification and storage medium

Technical Field

The application relates to the technical field of AI (advanced technology attachment) identification, in particular to a security inspection equipment control method, device and equipment based on AI identification and a storage medium.

Background

With the increasing demands of social security, the application of efficient and accurate security inspection equipment in public places is becoming more and more important. The traditional security inspection method often depends on manual operation, so that the efficiency is low, and human negligence is easy to occur. In recent years, the rapid development of artificial intelligence technology brings new opportunities to the field of security inspection. Security check equipment based on AI identification is gradually applied to public places such as airports, subway stations and the like, but still faces some challenges.

The identification accuracy of the existing AI identification security check system under a complex environment still needs to be improved. For example, in the case of insufficient light, face shielding, or poor angles, the recognition effect of the system is often unsatisfactory. In addition, the single-mode identification method is easy to be interfered by environmental factors, and is difficult to cope with diversified security inspection scenes. Another significant problem is the balance between security efficiency and safety. Too strict identification criteria may lead to a large number of false positives, increasing security time, while too loose criteria may reduce security. Therefore, how to improve the security inspection efficiency on the premise of ensuring the security becomes one of the key directions of the current research.

Disclosure of Invention

The application provides a security inspection equipment control method, device and equipment based on AI (automatic identification) and a storage medium, which are used for improving the accuracy of personnel identification matching so as to improve the control accuracy of the security inspection equipment.

In a first aspect, the present application provides a security inspection device control method based on AI identification, where the security inspection device control method based on AI identification includes:

shooting a passing person through an image acquisition terminal on security inspection equipment to obtain an original face image;

performing adaptive Gaussian smoothing and standardization processing on the original face image to obtain a preprocessed face image;

performing feature extraction on the preprocessed face image by adopting a multi-stage restoration network to obtain multi-scale face features;

Inputting the multi-scale face features into a preset face recognition model, and obtaining a candidate identity list of the passing person through iterative matching of similarity between data and class;

combining biological characteristic data acquired by the security inspection equipment, and carrying out multi-mode fusion and Bayesian reasoning on the candidate identity list to obtain a comprehensive identity recognition result;

And controlling the security inspection equipment to be opened or closed according to the comprehensive identity recognition result.

In a second aspect, the present application provides a security inspection device control apparatus based on AI identification, including:

The shooting module is used for shooting the passing person through an image acquisition terminal on the security inspection equipment to obtain an original face image;

the processing module is used for carrying out self-adaptive Gaussian smoothing and standardization processing on the original face image to obtain a preprocessed face image;

the extraction module is used for extracting the characteristics of the preprocessed face image by adopting a multi-stage restoration network to obtain multi-scale face characteristics;

the matching module is used for inputting the multi-scale face characteristics into a preset face recognition model, and obtaining a candidate identity list of the passing person through iterative matching of the similarity between data and class;

the reasoning module is used for carrying out multi-mode fusion and Bayesian reasoning on the candidate identity list by combining the biological characteristic data acquired by the security inspection equipment to obtain a comprehensive identity recognition result;

and the control module is used for controlling the security inspection equipment to be opened or closed according to the comprehensive identity recognition result.

The application provides security inspection equipment control equipment based on AI identification, which comprises a memory and at least one processor, wherein the memory stores instructions, and the at least one processor calls the instructions in the memory so that the security inspection equipment control equipment based on the AI identification executes the security inspection equipment control method based on the AI identification.

A fourth aspect of the present application provides a computer-readable storage medium having instructions stored therein, which when run on a computer, cause the computer to perform the above-described AI-recognition-based security inspection device control method.

In the technical scheme provided by the application, the precision and the robustness of the feature extraction are improved by the application of the multi-stage repair network. Through multi-level feature extraction and optimization, face images in complex environments can be effectively processed, and recognition accuracy is improved. The data-to-class similarity iterative matching mechanism enhances the flexibility and adaptability of face recognition. The matching strategy can be dynamically adjusted, and different recognition scenes can be effectively dealt with. The reliability of the recognition result is improved by the multi-mode fusion and the Bayesian reasoning. By combining multiple biological characteristic data, the system can evaluate the identity information more comprehensively and reduce the false recognition rate. The adaptive gaussian smoothing and normalization process improves the quality of the original image. This step can effectively reduce the effect of ambient noise, providing a clearer image input for subsequent recognition. The use of the supervised attention module optimizes the feature extraction process. By automatically focusing on important feature areas, the system is able to more accurately capture identity feature information. The multi-scale cross-stage feature fusion technique enhances the expressive power of the features. Feature information of different scales and layers can be comprehensively utilized, and accuracy and robustness of recognition are improved. The dynamic threshold screening mechanism improves the adaptability of the system. By adjusting the identification threshold according to the real-time situation, the system can keep high efficiency and accuracy in different security environments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained based on these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an embodiment of a security inspection device control method based on AI identification in an embodiment of the application;

Fig. 2 is a schematic diagram of an embodiment of a security inspection device control apparatus based on AI identification in an embodiment of the present application.

Detailed Description

The embodiment of the application provides a security inspection equipment control method, device and equipment based on AI identification and a storage medium. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For easy understanding, the following describes a specific flow of an embodiment of the present application, referring to fig. 1, and an embodiment of a security inspection device control method based on AI identification in an embodiment of the present application includes:

step S101, shooting a passing person through an image acquisition terminal on security inspection equipment to obtain an original face image;

It can be understood that the execution subject of the present application may be a security check device control apparatus based on AI identification, and may also be a terminal or a server, which is not limited herein. The embodiment of the application is described by taking a server as an execution main body as an example.

In particular, a high-performance image acquisition terminal is integrated on a security inspection device, and the terminal generally comprises a high-definition camera and a special image processing chip so as to ensure that facial images of passing people can be clearly captured under various light conditions. When someone approaches the security inspection equipment, the sensor or the infrared sensor automatically triggers the camera to start working. Meanwhile, an image processing chip in the system starts to work and cooperates with a camera to acquire real-time images. This process involves adaptive adjustment of ambient light to optimize exposure time and white balance for optimal image quality. And carrying out real-time face detection while collecting images. By means of an integrated computer vision algorithm, the face region is quickly located and locked in the video stream. Common face detection algorithms include deep learning based convolutional neural networks and traditional Haar cascade classifiers, etc. After the face is detected, the region is subjected to key shooting, so that the acquired image is ensured to be clear enough and contains enough facial feature details, and an original face image is obtained.

Step S102, performing self-adaptive Gaussian smoothing and standardization processing on an original face image to obtain a preprocessed face image;

Specifically, luminance histogram analysis is performed on an original face image, and the image luminance distribution characteristics are obtained by calculating the number of pixels with different gray levels in the image. By analyzing the luminance histogram, an area of the image where the luminance is too high or too low can be identified. And carrying out self-adaptive gamma correction on the original face image according to the brightness distribution characteristics of the image so as to adjust the overall brightness and contrast of the image. The adaptive gamma correction is to obtain a brightness-equalized image by adjusting the gamma value of the image so that darker portions of the image are lightened, and at the same time, the lighter portions are properly darkened. And carrying out edge detection on the brightness equalization image, and identifying the intensity and the position of the edge of the object in the image. The edge detection generally adopts algorithms such as a Sobel operator, a Canny operator and the like, and an image edge intensity map is generated by calculating the gradient of gray level change in the image. The image edge intensity map highlights areas of the image that vary widely, which generally correspond to important details in the image, such as contours and feature points of the face. And calculating the variance of the local area according to the image edge intensity graph to obtain the self-adaptive Gaussian kernel parameter. The calculation of variance reflects the gray level variation of the image in the local area, and by adaptively adjusting the Gaussian kernel parameters, important edge details are reserved while the image is smoothed. And carrying out convolution operation on the brightness equalization image based on the self-adaptive Gaussian kernel parameter to obtain a smooth image after noise reduction. The Gaussian smoothing is to perform convolution operation on the image and Gaussian kernel to eliminate high-frequency noise in the image, and meanwhile, low-frequency signals are reserved, so that the image is smoother and more natural. And carrying out geometric transformation and affine transformation on the smooth image to correct the inclination and distortion phenomena in the image, and obtaining a corrected face image. And positioning characteristic points of the corrected face image according to a preset face key point model to obtain face key point coordinates. The key point model of the human face is usually based on a deep learning algorithm, and important feature points on the human face, such as corners of eyes, nose tips, corners of mouth and the like, can be accurately positioned by training a large number of human face images. And cutting and scaling the corrected face image based on the face key point coordinates to obtain a face image with uniform size. Face images with different sizes and proportions are normalized to be the same size through clipping and scaling, so that subsequent feature extraction and recognition are facilitated. And carrying out pixel value normalization processing on the face images with the uniform size to obtain normalized face images. The pixel values in the images are scaled to a fixed range, e.g., between 0 and 1, to eliminate brightness and contrast differences between the different images and enhance the robustness of the model. And carrying out data enhancement on the normalized face image so as to increase the diversity of training data. The data enhancement comprises operations such as random rotation, translation, scaling, mirroring and the like, and the robustness of the model to different gesture, illumination and expression changes can be improved by introducing more changes into the training data, so that a preprocessed face image is obtained.

Step S103, carrying out feature extraction on the preprocessed face image by adopting a multi-stage restoration network to obtain multi-scale face features;

Specifically, a multi-stage restoration network is designed, the network is composed of a plurality of convolution layers and pooling layers, and feature information is extracted from the preprocessed face image through step-by-step processing of the layers. In the first stage, the preprocessing face image is processed by adopting multi-layer rolling and pooling operation, and a low-level feature map is obtained. And performing up-sampling and deconvolution operation according to the low-level feature map of the first stage, recovering the resolution of the image, and simultaneously maintaining the integrity of feature information to obtain the primary feature representation of the first stage. In order to optimize the preliminary feature representation, a first supervised attention module is introduced to optimize the preliminary feature representation by means of supervised learning. The supervision attention module can carry out weighted adjustment on the characteristic representation according to a predefined target, highlight important characteristics and restrain irrelevant information to obtain optimized first-stage characteristics. The optimized first-stage features are input into a second-stage encoder-decoder structure, and higher-level semantic features are extracted through more complex convolution and deconvolution operations. The second stage advanced semantic features can capture deeper image information, including the shape, structure, etc. of the object. After the high-level semantic features are optimized by the second supervision and attention module, the expression capacity of the features can be improved, and the optimized second-stage features can be obtained. And inputting the optimized second-stage features into a network based on the original input resolution, wherein the network can refine the image on the basis of maintaining the original resolution, and a spatially accurate feature map is obtained. And carrying out multi-scale cross-stage feature fusion on the optimized first-stage features, the optimized second-stage features and the spatially accurate feature map. And calculating feature weights of different scales and layers according to the primary fusion features to obtain a self-adaptive feature weight matrix. The self-adaptive feature weight matrix can automatically adjust the weight of each feature according to the importance of each layer and each scale feature, so that feature fusion is more reasonable and effective. And based on the self-adaptive feature weight matrix, carrying out weighted combination on the primary fusion features to obtain the multi-scale comprehensive features. And carrying out nonlinear transformation and normalization processing on the multi-scale comprehensive characteristics to obtain multi-scale face characteristics. The nonlinear transformation can process the features through activation functions (such as ReLU, sigmoid and the like), so that the features have stronger expression capability. The normalization process may then eliminate dimensional differences between features so that features are comparable on the same scale.

And carrying out bilinear interpolation up-sampling on the low-level feature map in the first stage, and amplifying the low-level feature map to twice the resolution of the original image to obtain a first up-sampling feature map. The up-sampling process enables the up-sampled feature map to maintain higher image detail and feature continuity by performing bilinear interpolation calculation on each pixel in the image. And performing transposition convolution operation on the first upsampled feature map, further amplifying the upsampled feature map, and enhancing the details and edge information of the feature map in the deconvolution process to obtain a preliminary deconvolution feature map. The transpose convolution operation is capable of gradually restoring high-level features to original resolution through back propagation while effectively preserving spatial structure and texture information of the image. And performing channel splicing on the preliminary deconvolution feature map and the jump connection feature with the original resolution. The jump connection features are high-resolution features extracted from an early layer of the network, and the preliminary deconvolution feature map and the high-resolution features are combined together through channel splicing operation to form a fusion feature map. And carrying out 1x1 convolution operation on the fusion feature map, and compressing the number of channels in the fusion feature map to a reasonable range through 1x1 convolution to obtain a first-stage preliminary feature representation after channel compression. The 1x1 convolution operation not only can reduce the computational complexity, but also can recombine the feature channels through linear transformation, thereby improving the compactness and the expressive power of the feature representation. And calculating the channel attention weight represented by the first-stage preliminary feature through the first supervision attention module to obtain a channel attention matrix. The channel attention weight reflects the importance degree of each channel to the characteristic representation, and the channel enhancement characteristic is obtained by weighting the characteristic channel, enhancing the important characteristic and inhibiting the unimportant characteristic. And calculating the spatial attention weight of the channel enhancement feature to obtain a spatial attention matrix. By weighting the spatial positions of the feature map, the feature representation of the important regions is enhanced, the features of the unimportant regions are suppressed, and the dual-attention feature is obtained. And carrying out residual connection on the dual-attention feature and the first-stage preliminary feature representation to obtain an attention optimization feature. The residual connection operation can preserve the original information in the input features by directly adding the input features to the output features, while enhancing the local details and global semantic information of the feature representation by the attention mechanism. The method comprises the steps of carrying out batch normalization and ReLU activation function operation on the attention optimization features, enabling the distribution of the features in a network to be more stable through batch normalization, accelerating the network training process, enabling the features to have stronger nonlinear expression capability through the ReLU activation function, and obtaining the optimized first-stage features.

The optimized first stage features are input to a second stage encoder-decoder structure. The structure typically includes multiple convolutional layers and pooled layers that enable extraction of deeper features during the encoding phase. And performing layer-by-layer processing on the optimized first-stage features through multi-layer rolling and pooling operation, and gradually extracting high-level semantic information to obtain second-stage coding features. And carrying out hole convolution on the second-stage coding feature, expanding the convolution receptive field under the condition of not increasing the number of parameters through an expansion convolution kernel of the hole convolution, and capturing more context information to obtain an expansion receptive field feature. And carrying out feature fusion on the expanded receptive field features and the jump connection features of the first stage, and obtaining multi-scale semantic features by combining low-level detail information and high-level semantic information through cross-stage feature fusion. And performing deconvolution operation on the multi-scale semantic features, and amplifying the feature map to a higher resolution through deconvolution to obtain a second up-sampling feature map. The deconvolution operation not only can restore the spatial dimension of the feature map, but also can strengthen the fineness of the feature representation by gradually restoring the detail information. And carrying out self-attention mechanism processing on the second upsampled feature map, and self-adaptively adjusting the feature representation of each position according to the relation inside the feature map through the self-attention mechanism to obtain self-attention enhancement features. And carrying out residual connection on the self-attention enhancement features and the corresponding scale features of the coding stage, reserving original information in the features of the coding stage through residual connection, and simultaneously enhancing the local details and global semantic information represented by the features through a self-attention mechanism to obtain residual enhancement features. And calculating a channel weight matrix of the residual enhancement feature by a second supervised attention module. The importance of each characteristic channel is reflected by the channel weight matrix, and important characteristic channels are highlighted through weighting, unimportant characteristic channels are restrained, so that channel enhancement characteristics are obtained. And carrying out space attention analysis on the channel enhancement features, and obtaining a space weight matrix by calculating the weight of each space position. The spatial attention analysis enables the feature representation of each location to be adaptively adjusted according to the spatial distribution in the feature map, thereby highlighting important region information in the image. Based on the space weight matrix, weighting the channel enhancement features and optimizing the space information in the feature map. And (3) carrying out batch normalization and nonlinear activation on the weighted feature images, enabling the distribution of the feature images in a network to be more stable through batch normalization, reducing internal covariate offset in the training process, and enhancing the nonlinear expression capacity of the feature images through a nonlinear activation function (such as ReLU) to obtain optimized second-stage features.

Step S104, inputting the multi-scale face characteristics into a preset face recognition model, and obtaining a candidate identity list of passing people through iterative matching of the similarity of data to classes;

Specifically, the multi-scale face features are input into a preset face recognition model, the multi-scale face features are subjected to dimension reduction, and the high-dimension feature vectors are reduced to low-dimension compression feature vectors through principal component analysis or linear discriminant analysis and other methods. And constructing an initial similarity threshold according to a preset initial query class to obtain initial matching parameters. The initial query class is typically composed of a priori knowledge or predefined samples, and an initial similarity threshold is calculated from the similarity distribution of the initial samples, as a starting point for the matching process. And carrying out first-round retrieval in the face database based on the compressed feature vector and the initial matching parameters, and obtaining a preliminary candidate list by comparing the similarity between the compressed feature vector and samples in the database. The samples in the preliminary candidate list are a batch of face features most similar to the input features, and possible candidate identities can be preliminarily screened out through preliminary retrieval. And calculating cosine distance and mahalanobis distance for the face features and the compressed feature vectors in the preliminary candidate list to obtain a comprehensive distance matrix. The cosine distance can measure the included angle between the two vectors so as to evaluate the similarity between the feature vectors, and the mahalanobis distance can evaluate the distance between the samples by considering the covariance of the feature distribution. And sequencing the preliminary candidate list according to the comprehensive distance matrix to obtain an updated candidate list. And selecting Top-K samples from the updated candidate list, and constructing an extended query class to obtain an updated query class. The Top-K selection method is to select the first K most similar samples from the updated candidate list, and construct new query classes for use in the next round of retrieval by the features of the samples. And calculating a new similarity threshold based on the updated query class to obtain updated matching parameters. The new similarity threshold is obtained by calculating the sample feature distribution in the updated query class, and compared with the initial threshold, the new similarity threshold is closer to the actual situation, so that the matching accuracy can be improved. And carrying out next iteration search according to the updated matching parameters, and carrying out a new round of sample search in the face database through the updated matching parameters to obtain a new candidate list. The new candidate list contains the most similar samples of the updated query characteristics in the database, and the actual candidate identities are gradually approximated through iterative retrieval. Repeating the steps, and gradually optimizing the candidate list through multiple rounds of iterative retrieval. Each iteration is based on the last search result and the updated matching parameters, the query class and the similarity threshold are continuously adjusted and optimized, and the most true candidate identity is gradually approximated. When the preset iteration times are reached, a final candidate list is obtained, the final candidate list is subjected to self-adaptive threshold screening, and the screening standard is flexibly adjusted according to actual conditions through the self-adaptive threshold, so that the final screened candidate identity list is ensured to have higher accuracy and reliability.

Step 105, combining biological characteristic data acquired by security inspection equipment, and carrying out multi-mode fusion and Bayesian reasoning on the candidate identity list to obtain a comprehensive identity recognition result;

Specifically, gait data, body type data and infrared thermal imaging data acquired by the security inspection equipment are preprocessed. The preprocessing step comprises denoising, normalization and alignment operation, random noise in the data is eliminated through denoising, the normalization process scales the data to the same scale range, the comparability of the data of different modes in subsequent processing is ensured, and the data of different modes can be compared and fused on the same time axis through alignment operation. Through preprocessing, standardized biological feature data are obtained, feature extraction is carried out on the standardized biological feature data, and multi-mode biological feature vectors are extracted from gait data, body type data and infrared thermal imaging data by adopting a deep learning method such as a convolutional neural network or a recurrent neural network. And carrying out feature alignment on the multi-mode biological feature vector and the face features in the candidate identity list. The characteristic alignment step aligns the multi-mode biological characteristic vector with the human face characteristic by a method based on time synchronization and space alignment to obtain an alignment characteristic matrix. The alignment feature matrix places features of different modes in a unified representation space, and consistency in subsequent processing is ensured. And modeling the intra-mode relation of the alignment feature matrix, and establishing the relation among the features in each mode by adopting a graph neural network or a self-attention mechanism and other methods to obtain the intra-mode enhancement features. Intra-modal relational modeling can capture intrinsic links between features of the same modality, enhancing expressive power of the features. And performing cross-modal attention analysis on the intra-modal enhancement features, and analyzing the interaction among different modal features through a self-attention mechanism to obtain inter-modal interaction features. The cross-modal attention analysis can effectively integrate information of different modes, highlight important features, inhibit irrelevant features and obtain more accurate and comprehensive feature representation. Feature fusion is carried out on the interaction features among the modes, and the interaction features of different modes are fused together through methods such as weighted average or a multi-layer perceptron, so as to obtain fusion feature vectors. And constructing prior probability distribution according to the historical identification record and the current security level setting, and obtaining the prior probability of each identity through statistical analysis and empirical modeling. The identity prior probability reflects the probability distribution of each identity occurrence under specific scenarios and conditions. Based on the fusion feature vector and the identity prior probability, a Bayesian network model is constructed, and posterior probability distribution of each candidate identity is obtained by combining the prior probability and the current observation feature through Bayesian reasoning. And carrying out maximum posterior probability estimation on the posterior probability distribution, and determining the most probable identity by solving the maximum value in the posterior probability distribution. The maximum posterior probability estimation can ensure that the most reliable identity is selected on the basis of comprehensively considering prior information and current observation data. And comparing the most probable identity with a preset confidence threshold, and if the posterior probability exceeds the confidence threshold, recognizing the identity as a final comprehensive identity recognition result. High-accuracy identification is realized in a complex environment by a multi-mode fusion and Bayesian reasoning method.

And step S106, controlling the security inspection equipment to be opened or closed according to the comprehensive identity recognition result.

Specifically, the comprehensive identity recognition result is combined with the control logic of the security inspection equipment. After the comprehensive identity recognition result is generated, the result is analyzed and verified to ensure the accuracy and the reliability of the comprehensive identity recognition result. The comprehensive identity recognition result typically includes the most likely identity and its corresponding confidence score. And comparing the identification result with preset security check authority and security policies, and judging whether the currently identified identity has the authority passing the security check. And comparing the confidence score in the comprehensive identity recognition result with a preset confidence threshold. The confidence threshold is set by a system administrator according to specific security requirements, is usually determined when the system is first deployed, and is adjusted according to actual conditions during use. If the confidence score of the identification result is higher than the confidence threshold, the identification result is indicated to have higher reliability, and the system can regard the identification result as effective identification at the moment, trigger corresponding control logic and command the security inspection equipment to be started. For example, the gate control system can open the gate to allow personnel to pass, the baggage inspection system can start the conveyor belt to continue processing subsequent baggage, and if the scene is a scene requiring higher-level inspection, the system can send out a prompt tone or a light signal to inform the security inspection personnel to further inspect. If the confidence score of the recognition result is below the confidence threshold, the system may consider that the reliability of the current recognition result is insufficient and the identity of the person cannot be determined. At this point, the system triggers another set of control logic, commanding the security device to shut down. For example, the gate control system can keep a gate closed state to prevent personnel from passing, the baggage security inspection system can pause the operation of the conveyor belt to wait for manual intervention, and the system can send out an alarm signal to prompt security inspection personnel to pay attention in a scene requiring higher-level inspection. The system considers device status and fault handling when implementing control actions. For example, when the security inspection equipment is executing the opening or closing action, the system needs to monitor the state of the equipment in real time to ensure the smooth execution of the action, and if the equipment fails in the execution process, the system needs to send out an alarm in time and start the standby equipment or the manual processing flow to ensure the continuity and the safety of the security inspection flow.

In the embodiment of the application, the application of the multi-stage repair network improves the precision and the robustness of the feature extraction. Through multi-level feature extraction and optimization, face images in complex environments can be effectively processed, and recognition accuracy is improved. The data-to-class similarity iterative matching mechanism enhances the flexibility and adaptability of face recognition. The matching strategy can be dynamically adjusted, and different recognition scenes can be effectively dealt with. The reliability of the recognition result is improved by the multi-mode fusion and the Bayesian reasoning. By combining multiple biological characteristic data, the system can evaluate the identity information more comprehensively and reduce the false recognition rate. The adaptive gaussian smoothing and normalization process improves the quality of the original image. This step can effectively reduce the effect of ambient noise, providing a clearer image input for subsequent recognition. The use of the supervised attention module optimizes the feature extraction process. By automatically focusing on important feature areas, the system is able to more accurately capture identity feature information. The multi-scale cross-stage feature fusion technique enhances the expressive power of the features. Feature information of different scales and layers can be comprehensively utilized, and accuracy and robustness of recognition are improved. The dynamic threshold screening mechanism improves the adaptability of the system. By adjusting the identification threshold according to the real-time situation, the system can keep high efficiency and accuracy in different security environments.

In a specific embodiment, the process of executing step S102 may specifically include the following steps:

(1) Performing luminance histogram analysis on the original face image to obtain image luminance distribution characteristics, and performing self-adaptive gamma correction on the original face image according to the image luminance distribution characteristics to obtain a luminance equalization image;

(2) Performing edge detection on the brightness equalization image to obtain an image edge intensity image, and calculating the variance of a local area according to the image edge intensity image to obtain a self-adaptive Gaussian kernel parameter;

(3) Performing convolution operation on the brightness equalization image based on the self-adaptive Gaussian kernel parameter to obtain a smooth image after noise reduction, and performing geometric transformation and affine transformation on the smooth image to obtain a corrected face image;

(4) Characteristic point positioning is carried out on the corrected face image according to a preset face key point model to obtain face key point coordinates, and the corrected face image is cut and scaled based on the face key point coordinates to obtain a face image with uniform size;

(5) And carrying out pixel value normalization processing on the face images with the uniform size to obtain normalized face images, and carrying out data enhancement on the normalized face images to obtain preprocessed face images.

Specifically, the original face image is subjected to luminance histogram analysis, and the luminance histogram reflects the number of pixels of each gray level in the image. By calculating the histogram, the image brightness distribution characteristics are obtained, and the characteristics represent the brightness distribution conditions of various areas in the image. Assume thatIs the original image of the object to be imaged,Representing the luminance histogram thereofThe calculation formula of (2) is as follows:

;

wherein, Is a Dirac function and is a function of the Dirac,AndThe width and height of the image respectively,Is the position of the imageIs used for the brightness value of the (c),Is the brightness level. And carrying out self-adaptive gamma correction on the original face image according to the brightness distribution characteristics of the image, and adjusting the contrast of the image to ensure that the brightness distribution of the image is more uniform. The formula for gamma correction is:

;

wherein, Is an image after the correction of the image,Is a gamma value, and is adaptively adjusted according to the distribution characteristics of the luminance histogram. When the image is entirely darker, smaller ones are usedValue when the image is overall bright, use largerValues. And after obtaining the brightness equalization image, performing edge detection. Edge detection may use the Sobel operator or the Canny operator to detect edges by calculating gradients of pixel gray values in an image. Assume thatAndRepresenting the gradient of the image in the horizontal and vertical directions, respectively, the edge intensity of the imageThe calculation formula of (2) is as follows:

;

wherein, Is an image edge intensity map. And calculating the variance of the local area according to the image edge intensity graph to obtain the self-adaptive Gaussian kernel parameter. The variance is calculated as:

;

wherein, Is the variance of the values,Is the first in the edge intensity graphThe value of the individual pixels is determined,Is the average value of the local area,Is the number of pixels in the local area. Adaptive gaussian kernel parametersThe adjustment is made according to the variance, the larger the variance,The larger. And carrying out convolution operation on the brightness equalization image based on the self-adaptive Gaussian kernel parameter, and carrying out convolution by using the Gaussian kernel to achieve the noise reduction effect. The calculation formula of the convolution is:

;

wherein, Is an image after the smoothing of the image,Is a gaussian kernel, which is used to determine the degree of freedom,Is the size of the kernel. And carrying out geometric transformation and affine transformation on the smooth image after noise reduction, and correcting geometric distortion in the image. The geometric transformation and affine transformation can be realized by matrix operation, and the affine transformation matrix is as follows:

;

wherein, Is a rotation and scaling parameter that is used to determine the rotation and scaling parameters,Is a translation parameter. By transforming matricesAnd correcting the smooth image to a standard position to obtain a corrected face image. And positioning characteristic points of the corrected face image according to a preset face key point model. Assuming that the model contains 68 key points, locating the key points through a Convolutional Neural Network (CNN) to obtain the face key point coordinates. Let the coordinates of key points beWherein. Based on the coordinates of the key points of the human face, the corrected human face image is cut and scaled, so that the human face image is uniform in size. Assume that the target size isAnd cutting and scaling the face area to a target size through affine transformation to obtain a face image with uniform size. And carrying out pixel value normalization processing on the face images with the uniform size, and scaling the pixel values to be within the range of 0 and 1 so as to eliminate brightness and contrast differences among different images and obtain normalized face images. And carrying out data enhancement on the normalized face image, and generating a plurality of variants through operations such as rotation, translation, scaling, mirroring and the like so as to increase the diversity of training data and obtain the preprocessed face image. The data enhancement can effectively improve the robustness and generalization capability of the model, so that the model has better performance in practical application.

In a specific embodiment, the process of executing step S103 may specifically include the following steps:

(1) Performing multi-layer rolling and pooling operation on the preprocessed face image by adopting a multi-stage restoration network to obtain a first-stage low-level feature map;

(2) Performing up-sampling and deconvolution operations according to the low-level feature map of the first stage to obtain a preliminary feature representation of the first stage, and optimizing the preliminary feature representation of the first stage through a first supervision and attention module to obtain optimized features of the first stage;

(3) Inputting the optimized first-stage features into a second-stage encoder-decoder structure to obtain second-stage advanced semantic features, and optimizing the second-stage advanced semantic features through a second supervision and attention module to obtain optimized second-stage features;

(4) Inputting the optimized second-stage features into a network based on the original input resolution to obtain a spatially accurate feature map;

(5) Performing multi-scale cross-stage feature fusion on the optimized first-stage features, the optimized second-stage features and the spatially accurate feature map to obtain primary fusion features;

(6) Calculating feature weights of different scales and layers according to the primary fusion features to obtain an adaptive feature weight matrix, and carrying out weighted combination on the primary fusion features based on the adaptive feature weight matrix to obtain a multi-scale comprehensive feature;

(7) And carrying out nonlinear transformation and normalization processing on the multi-scale comprehensive characteristics to obtain multi-scale face characteristics.

Specifically, a multi-stage restoration network is adopted to carry out multi-layer rolling and pooling operation on the preprocessed face image, and a first-stage low-level characteristic diagram is obtained. Multi-stage restoration networks typically include multiple convolution layers and pooling layers, through progressive processing of these layers, rich feature information is extracted from the pre-processed face image. The convolution operation extracts local features in an image through convolution of a convolution kernel and the input image, and the formula is as follows:

;

wherein, Represent the firstPost-layer convolution locationIs characterized in that,Is the firstThe weights of the layer convolution kernel,Is the firstLayer positionIs characterized in that,Is the firstThe bias term of the layer is used,Is an activation function (e.g., reLU). The pooling operation reduces the size of the feature map by downsampling the convolved feature map while preserving important feature information. And performing up-sampling and deconvolution operation according to the low-level characteristic diagram of the first stage to obtain the preliminary characteristic representation of the first stage. The upsampling operation enlarges the feature map size by interpolation or other means, while the deconvolution operation restores the detail of the original image by inverse convolution. The deconvolution formula is as follows:

;

wherein, Represent the firstPost layer deconvolution locationOther symbols are similar to the convolution formula. The first stage preliminary feature representation is optimized by a first supervised attention module. The attention module weights the features by calculating the importance of different positions in the feature map, highlights important features, suppresses unimportant features and obtains optimized first-stage features. The optimized first stage features are input to a second stage encoder-decoder structure. In the second stage, higher level semantic features are extracted by more complex convolution and deconvolution operations. After the high-level semantic features are optimized by the second supervision and attention module, the expression capacity of the features can be improved, and the optimized second-stage features can be obtained. And inputting the optimized second-stage features into a network based on the original input resolution, and carrying out refinement treatment on the image by maintaining the original resolution to obtain a spatially accurate feature map. And carrying out multi-scale cross-stage feature fusion on the optimized first-stage features, the optimized second-stage features and the spatially accurate feature map. By combining the features of different stages and different scales, the feature information of each stage and each scale is fully utilized, and the richer and comprehensive primary fusion features are obtained. And calculating the feature weights of different scales and layers according to the primary fusion features to obtain a self-adaptive feature weight matrix. The self-adaptive feature weight matrix can automatically adjust the weight of each feature according to the importance of each layer and each scale feature, so that feature fusion is more reasonable and effective. And based on the self-adaptive feature weight matrix, carrying out weighted combination on the primary fusion features to obtain the multi-scale comprehensive features. And carrying out nonlinear transformation and normalization processing on the multi-scale comprehensive characteristics. The nonlinear transformation can process the features through activation functions (such as ReLU, sigmoid and the like), so that the features have stronger expression capability. The normalization processing can eliminate dimension differences among the features, so that the features have comparability on the same scale, and finally the multi-scale face features are obtained.

In a specific embodiment, the performing step performs up-sampling and deconvolution operations according to the low-level feature map of the first stage to obtain a first-stage preliminary feature representation, and optimizes the first-stage preliminary feature representation by using the first supervised attention module, so that the process of obtaining the optimized first-stage feature may specifically include the following steps:

(1) Performing bilinear interpolation upsampling on the low-level feature map of the first stage to obtain a first upsampled feature map, and performing transposition convolution operation on the first upsampled feature map to obtain a preliminary deconvolution feature map;

(2) Channel splicing is carried out on the preliminary deconvolution feature map and jump connection features with original resolution to obtain a fusion feature map, and 1x1 convolution operation is carried out on the fusion feature map to obtain a first-stage preliminary feature representation after channel compression;

(3) Calculating the channel attention weight of the first-stage preliminary feature representation through the first supervision attention module to obtain a channel attention matrix, and weighting the first-stage preliminary feature representation based on the channel attention matrix to obtain a channel enhancement feature;

(4) Calculating the spatial attention weight of the channel enhancement feature to obtain a spatial attention matrix, and weighting the channel enhancement feature based on the spatial attention matrix to obtain a dual attention feature;

(5) And carrying out residual connection on the dual-attention feature and the first-stage preliminary feature representation to obtain an attention optimization feature, and carrying out batch normalization and ReLU activation function operation on the attention optimization feature to obtain an optimized first-stage feature.

Specifically, bilinear interpolation upsampling is performed on the low-level feature map of the first stage. Bilinear interpolation is a common image amplification method, and the size of a feature map is enlarged by performing linear interpolation calculation on four adjacent pixels of each pixel to obtain an interpolated pixel value. Assume that the original feature map isThe size isObtaining the dimension by bilinear interpolation operationIs a first up-sampled feature map of (1). The calculation formula of bilinear interpolation is:

;

wherein, Is the weight of the interpolation that is to be used,Representing a rounding down operation. And performing transposition convolution operation on the first upsampling feature map. The transpose convolution further expands the spatial dimension of the feature map by way of deconvolution while preserving high resolution detail information. The calculation formula of the transposed convolution is:

;

wherein, Is a preliminary deconvolution feature map,Is the weight of the transposed convolution kernel,Is the size of the convolution kernel. And performing channel splicing on the preliminary deconvolution feature map and the jump connection feature with the original resolution. The jump connection features are high resolution features extracted from early layers of the network that contain low level detail information of the image. The primary deconvolution feature map and the jump connection feature are spliced in the channel dimension, and feature information of different layers is fused to obtain a fused feature map. And carrying out 1x1 convolution operation on the fusion feature map to reduce the number of channels and obtain a first-stage preliminary feature representation after channel compression. The 1x1 convolution can effectively compress the number of channels of the feature map by performing a linear transformation at each location while maintaining the spatial resolution unchanged. The calculation formula of the 1x1 convolution is:

;

wherein, Is a characteristic diagram of the compressed channel,Is the weight of the 1x1 convolution kernel,Is the number of channels. Channel attention weights for the first stage preliminary feature representation are calculated by a first supervised attention module. The attention module weights the features by calculating importance weights for each channel to highlight important features. The calculation formula of the channel attention weight is as follows:

;

wherein, Is the channel attention weight and,Is an activation function (e.g., sigmoid). The first stage preliminary feature representation is weighted based on the channel attention matrix to obtain channel enhancement features. The weighting operation applies the channel attention weight to the feature map for each channel by the formula:

;

spatial attention weights of the channel enhancement features are calculated. Spatial attention is weighted by computing importance weights for each location in the feature map to emphasize the important areas. The calculation formula of the spatial attention weight is as follows:

;

wherein, Is the weight of the attention of the space,Is a weight matrix. The channel enhancement features are weighted based on the spatial attention matrix to obtain dual attention features. The weighting operation applies a spatial attention weight to the feature map for each location by:

;

the channel enhancement features are weighted based on the spatial attention matrix to obtain dual attention features. The weighting operation applies a spatial attention weight to the feature map for each location by:

;

And carrying out residual connection on the dual-attention feature and the first-stage preliminary feature representation to obtain an attention optimization feature. Residual connection retains the original information in the input features by adding them to the output features, while enhancing the feature representation by the attention mechanism, the formula:

;

And carrying out batch normalization and ReLU activation function operation on the attention optimization characteristics to obtain optimized first-stage characteristics. Batch normalization reduces internal covariate offset by normalizing the features, accelerates the training process, and has the formula:

;

wherein, AndRespectively are channelsIs a function of the mean and variance of (a),Is a small constant that prevents zero removal. The ReLU activation function improves the expression capacity of the model by carrying out nonlinear transformation on the normalized characteristics, and the formula is as follows:

;

In a specific embodiment, the executing step inputs the optimized first-stage feature into a second-stage encoder-decoder structure to obtain a second-stage advanced semantic feature, and optimizes the second-stage advanced semantic feature through a second attention-monitoring module, so that the process of obtaining the optimized second-stage feature may specifically include the following steps:

(1) Inputting the optimized first-stage characteristics into a second-stage encoder-decoder structure, and performing multi-layer rolling and pooling operation on the optimized first-stage characteristics to obtain second-stage encoding characteristics;

(2) Carrying out cavity convolution on the second-stage coding features to obtain expanded receptive field features, and carrying out feature fusion on the expanded receptive field features and jump connection features of the first stage to obtain multi-scale semantic features;

(3) Performing deconvolution operation on the multi-scale semantic features to obtain a second up-sampling feature map, and performing self-attention mechanism processing on the second up-sampling feature map to obtain self-attention enhancement features;

(4) Residual connection is carried out on the self-attention enhancement features and the corresponding scale features of the coding stage, residual enhancement features are obtained, and a channel weight matrix of the residual enhancement features is calculated through a second supervision attention module;

(5) Weighting the residual enhancement features based on the channel weight matrix to obtain channel enhancement features, and performing spatial attention analysis on the channel enhancement features to obtain a spatial weight matrix;

(6) And weighting the channel enhancement features based on the space weight matrix, and carrying out batch normalization and nonlinear activation to obtain optimized second-stage features.

Specifically, the optimized first stage features are input to a second stage encoder-decoder structure. And (3) performing layer-by-layer processing on the optimized first-stage features through multi-layer convolution and pooling operation, and extracting higher-level semantic information. The convolution operation extracts local features in the image through convolution of the convolution kernel and the input feature map, and the pooling operation reduces the size of the feature map through downsampling the convolved feature map, and simultaneously retains important feature information. The calculation formula of the convolution operation is as follows:

;

wherein, Represent the firstPost-layer convolution locationIs characterized in that,Is the firstThe weights of the layer convolution kernel,Is the firstLayer positionIs characterized in that,Is the firstThe bias term of the layer is used,Is an activation function (e.g., reLU). And obtaining the second stage coding characteristic through multi-layer rolling and pooling operation. And carrying out hole convolution on the second-stage coding feature, and expanding the convolution receptive field under the condition of not increasing the number of parameters by using an expansion convolution kernel of the hole convolution, so as to capture more context information and obtain expansion receptive field features. The calculation formula of the cavity convolution is as follows:

;

wherein, Is a characteristic of the expanded receptive field,Is the weight of the convolution kernel,Is the expansion rate of the hole convolution,Is the position of the input feature mapIs a value of (2). And carrying out feature fusion on the expanded receptive field features and the jump connection features in the first stage, and obtaining multi-scale semantic features by fusing feature information of different layers. And performing deconvolution operation on the multi-scale semantic features, expanding the space size of the feature map through deconvolution, and simultaneously retaining high-resolution detailed information to obtain a second up-sampling feature map. The deconvolution is calculated as:

;

wherein, Is a characteristic diagram after deconvolution,Is the weight of the deconvolution kernel,Is the size of the convolution kernel. And carrying out self-attention mechanism processing on the second upsampled feature map, and weighting the features by calculating importance weights of different positions in the feature map to obtain self-attention enhanced features. And carrying out residual connection on the self-attention enhancement feature and the corresponding scale feature of the coding stage to obtain a residual enhancement feature. Residual connection preserves the original information in the input features by adding them to the output features, while enhancing the feature representation through the attention mechanism. The formula of the residual connection is:

;

wherein, Is a residual enhancement feature that is used to enhance the residual,Is a self-attention-enhancing feature that,Is the corresponding scale feature of the encoding stage. And calculating a channel weight matrix of the residual enhancement feature by a second supervised attention module. The attention module weights the features by calculating importance weights for each channel to highlight important features. The calculation formula of the channel attention weight is as follows:

;

wherein, Is the channel attention weight and,Is a matrix of weights that are to be used,Is an activation function (e.g., sigmoid). And weighting the residual enhancement features based on the channel weight matrix to obtain channel enhancement features. The weighting operation applies the channel attention weight to the feature map for each channel by the formula:

;

and carrying out spatial attention analysis on the channel enhancement features, and calculating the importance weight of each position in the feature map to obtain a spatial weight matrix. The calculation formula of the spatial attention weight is as follows:

;

wherein, Is the weight of the attention of the space,Is a weight matrix. And weighting the channel enhancement features based on the spatial weight matrix to obtain the dual-attention feature. The weighting operation applies a spatial attention weight to the feature map for each location by:

;

the dual attention feature is batch normalized and non-linearly activated. Batch normalization reduces internal covariate offset by normalizing the features, accelerates the training process, and has the formula:

;

wherein, AndRespectively are channelsIs a function of the mean and variance of (a),Is a small constant that prevents zero removal. Nonlinear activation nonlinear transformation is carried out on the normalized characteristics through a ReLU activation function, and the formula is as follows:

;

And obtaining the optimized second-stage characteristics.

In a specific embodiment, the process of executing step S104 may specifically include the following steps:

S1, inputting multi-scale face features into a preset face recognition model, and performing dimension reduction processing on the multi-scale face features to obtain compressed feature vectors;

s2, constructing an initial similarity threshold according to a preset initial query class to obtain initial matching parameters;

S3, carrying out first-round retrieval in a face database based on the compressed feature vector and the initial matching parameters to obtain a preliminary candidate list;

S4, calculating cosine distance and mahalanobis distance for the face features and the compressed feature vectors in the preliminary candidate list to obtain a comprehensive distance matrix;

S5, sorting the preliminary candidate list according to the comprehensive distance matrix to obtain an updated candidate list;

S6, selecting Top-K samples from the updated candidate list, and constructing an extended query class to obtain an updated query class;

S7, calculating a new similarity threshold value based on the updated query class to obtain updated matching parameters;

S8, carrying out next iteration search according to the updated matching parameters to obtain a new candidate list;

and S9, repeating the steps S4 to S8 until the preset iteration times are reached, obtaining a final candidate list, and carrying out self-adaptive threshold screening on the final candidate list to obtain a candidate identity list of the passing person.

Specifically, the multi-scale face features are input into a preset face recognition model, and dimension reduction processing is carried out on the multi-scale face features so as to reduce the dimension of the features and keep main information. The dimension reduction is usually carried out by adopting methods such as principal component analysis, linear discriminant analysis and the like. Assume that the multi-scale face features are matricesObtaining a compressed feature vector by dimension reduction. The dimensionality reduction formula of PCA is:

;

wherein, Is a projection matrix comprising a principal component,Is the feature vector after dimension reduction. And constructing an initial similarity threshold according to a preset initial query class to obtain initial matching parameters. The initial query class is made up of predefined samples whose feature distributions are used to calculate an initial similarity threshold. The similarity threshold may be determined by calculating the average similarity and standard deviation between the initial query class samples, as:

;

wherein, Is the initial similarity threshold value and,Is the average similarity between the initial query class samples,Is the standard deviation of the two-dimensional image,Is an adjustment parameter. A first round of search is performed in the face database based on the compressed feature vector and the initial matching parameters. And obtaining a preliminary candidate list by comparing the similarity between the compressed feature vector and the sample in the database. The similarity calculation typically uses cosine similarity or euclidean distance. The calculation formula of cosine similarity is as follows:

;

wherein, AndIs a combination of two feature vectors,Is the norm of the vector. And calculating cosine distance and mahalanobis distance for the face features and the compressed feature vectors in the preliminary candidate list. Cosine distance is used to measure the angle between the two vectors, while mahalanobis distance takes into account the covariance of the feature distribution. The calculation formula of the mahalanobis distance is as follows:

;

wherein, Is the covariance matrix of the feature distribution. And sequencing the preliminary candidate list according to the comprehensive distance matrix to obtain an updated candidate list. The comprehensive distance matrix is a weighted combination of cosine distance and mahalanobis distance, and the formula is:

;

wherein, Is a comprehensive distance matrix of the two-dimensional space,Is a cosine distance matrix and,AndIs a weight parameter. And selecting Top-K samples from the updated candidate list, and constructing an extended query class to obtain an updated query class. The Top-K selection method is to select the first K most similar samples from the updated candidate list, and the characteristics of the samples are used for updating the query class. And calculating a new similarity threshold based on the updated query class to obtain updated matching parameters. The new similarity threshold is determined by calculating the average similarity and standard deviation between the updated query class samples, and the formula is the same as the initial similarity threshold. And carrying out next iteration retrieval according to the updated matching parameters, and obtaining a new candidate list by comparing the similarity between the compressed feature vector and the sample in the database. And repeating the steps S4 to S8 until the preset iteration times are reached, and obtaining a final candidate list. The final candidate list is subjected to multiple rounds of iterative search and screening, and the most true candidate identity is approximated gradually. And carrying out self-adaptive threshold screening on the final candidate list to finally obtain a candidate identity list of the passing person. The self-adaptive threshold screening flexibly screens candidate identities according to actual conditions by adjusting the similarity threshold, so that the accuracy and reliability of the identification result are ensured.

In a specific embodiment, the process of executing step S105 may specifically include the following steps:

(1) Preprocessing gait data, body type data and infrared thermal imaging data acquired by security inspection equipment to obtain standardized biological feature data, and extracting features of the standardized biological feature data to obtain multi-mode biological feature vectors;

(2) Carrying out feature alignment on the multi-mode biological feature vector and the face features in the candidate identity list to obtain an alignment feature matrix, and carrying out intra-mode relation modeling on the alignment feature matrix to obtain intra-mode enhancement features;

(3) Performing cross-modal attention analysis on the intra-modal enhancement features to obtain inter-modal interaction features, and performing feature fusion on the inter-modal interaction features to obtain fusion feature vectors;

(4) Constructing prior probability distribution according to the history identification record and the current security level setting to obtain identity prior probability, and constructing a Bayesian network model based on the fusion feature vector and the identity prior probability to obtain posterior probability distribution;

(5) And carrying out maximum posterior probability estimation on posterior probability distribution to obtain the most probable identity, and comparing the most probable identity with a preset confidence threshold to obtain a comprehensive identity recognition result.

Specifically, gait data, body type data and infrared thermal imaging data acquired by the security inspection equipment are preprocessed. The preprocessing step includes denoising, normalization, and alignment operations to ensure consistency and comparability of the data of different modalities in subsequent processing. The denoising process may use a filtering technique, such as gaussian filtering or median filtering, to remove random noise in the data. The normalization process scales the data to the same scale range, common methods include min-max normalization and Z-score normalization. The alignment operation aligns data of different modes in the same time axis and space coordinate system by a time synchronization and space alignment method. And extracting the characteristics of the standardized biological characteristic data. Deep learning methods such as convolutional neural networks or recurrent neural networks are employed. Feature extraction of gait data can use recurrent neural networks because gait data has time series characteristics. Feature extraction of body conformation data and infrared thermographic data may use convolutional neural networks because both data have spatial structures. And obtaining the multi-mode biological feature vector through feature extraction. And carrying out feature alignment on the multi-mode biological feature vector and the face features in the candidate identity list. Feature alignment is to align the multi-mode biological feature vector with the face feature by a method based on time synchronization and space alignment to obtain an alignment feature matrix. And modeling intra-mode relation of the alignment feature matrix, and establishing relation among features in each mode by adopting a graph neural network or a self-attention mechanism to obtain intra-mode enhancement features. Intra-modal relational modeling can capture intrinsic links between features of the same modality, enhancing expressive power of the features. And performing cross-modal attention analysis on the intra-modal enhancement features, and analyzing the interaction among different modal features by using a self-attention mechanism to obtain inter-modal interaction features. The cross-modal attention analysis can effectively integrate information of different modes, highlight important features, inhibit irrelevant features and obtain more accurate and comprehensive feature representation. And carrying out feature fusion on the interaction features among the modes, and fusing the interaction features of different modes together by a weighted average method or a multi-layer perceptron method and the like to obtain a fusion feature vector. And constructing prior probability distribution according to the historical identification record and the current security level setting, wherein the prior probability distribution reflects the probability distribution of each identity under specific scenes and conditions. Assuming a priori probability ofWhereinThe identity prior probability is obtained by statistical analysis and empirical modeling. Based on the fusion feature vector and the identity prior probability, a Bayesian network model is constructed, and posterior probability distribution of each candidate identity is obtained by combining the prior probability and the current observation feature through Bayesian reasoning. The Bayesian network model can effectively fuse prior information and observation data to obtain more accurate probability estimation. The formula of the bayesian theorem is:

;

wherein, Is a likelihood function representing the probability of observing the current feature given the identity; Is a priori probability; is a normalization constant. And (3) carrying out maximum posterior probability (MAP) estimation on the posterior probability distribution, and determining the most probable identity by solving the maximum value in the posterior probability distribution. The formula for MAP estimation is:

;

and comparing the most probable identity with a preset confidence threshold, and if the posterior probability exceeds the confidence threshold, recognizing the identity as a final comprehensive identity recognition result.

The above describes a security inspection device control method based on AI identification in the embodiment of the present application, and the following describes a security inspection device control apparatus based on AI identification in the embodiment of the present application, referring to fig. 2, one embodiment of the security inspection device control apparatus based on AI identification in the embodiment of the present application includes:

the shooting module 201 is used for shooting a passing person through an image acquisition terminal on security inspection equipment to obtain an original face image;

the processing module 202 is configured to perform adaptive gaussian smoothing and normalization processing on an original face image to obtain a preprocessed face image;

the extracting module 203 is configured to perform feature extraction on the preprocessed face image by using a multi-stage restoration network, so as to obtain multi-scale face features;

The matching module 204 is configured to input the multi-scale face features into a preset face recognition model, and obtain a candidate identity list of the passing person through iterative matching of the similarity between the data and the class;

the reasoning module 205 is configured to combine the biometric data collected by the security inspection device to perform multi-modal fusion and bayesian reasoning on the candidate identity list, so as to obtain a comprehensive identity recognition result;

And the control module 206 is used for controlling the security inspection equipment to be opened or closed according to the comprehensive identification result.

The application of the multi-stage repair network improves the accuracy and robustness of feature extraction through the cooperation of the components. Through multi-level feature extraction and optimization, face images in complex environments can be effectively processed, and recognition accuracy is improved. The data-to-class similarity iterative matching mechanism enhances the flexibility and adaptability of face recognition. The matching strategy can be dynamically adjusted, and different recognition scenes can be effectively dealt with. The reliability of the recognition result is improved by the multi-mode fusion and the Bayesian reasoning. By combining multiple biological characteristic data, the system can evaluate the identity information more comprehensively and reduce the false recognition rate. The adaptive gaussian smoothing and normalization process improves the quality of the original image. This step can effectively reduce the effect of ambient noise, providing a clearer image input for subsequent recognition. The use of the supervised attention module optimizes the feature extraction process. By automatically focusing on important feature areas, the system is able to more accurately capture identity feature information. The multi-scale cross-stage feature fusion technique enhances the expressive power of the features. Feature information of different scales and layers can be comprehensively utilized, and accuracy and robustness of recognition are improved. The dynamic threshold screening mechanism improves the adaptability of the system. By adjusting the identification threshold according to the real-time situation, the system can keep high efficiency and accuracy in different security environments.

The application also provides a security inspection device control device based on AI identification, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the security inspection device control method based on AI identification in the embodiments.

The present application also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and may also be a volatile computer readable storage medium, where instructions are stored in the computer readable storage medium, where the instructions when executed on a computer cause the computer to perform the steps of the security inspection device control method based on AI identification.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, systems and units may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (random acceS memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

While the application has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that the foregoing embodiments may be modified or equivalents may be substituted for some of the features thereof, and that the modifications or substitutions do not depart from the spirit and scope of the embodiments of the application.

Claims

1. A security inspection equipment control method based on AI identification is characterized by comprising the following steps:

The method comprises the steps of carrying out multi-layer convolution and pooling operation on a preprocessed face image by adopting a multi-stage restoration network to obtain a first-stage low-level feature image, carrying out up-sampling and deconvolution operation on the first-stage low-level feature image to obtain a first-stage preliminary feature representation, optimizing the first-stage preliminary feature representation through a first supervision attention module to obtain an optimized first-stage feature, inputting the optimized first-stage feature into a second-stage encoder-decoder structure to obtain a second-stage high-level semantic feature, optimizing the second-stage high-level semantic feature through a second supervision attention module to obtain an optimized second-stage feature, inputting the optimized second-stage feature into a network based on original input resolution to obtain a spatially accurate feature image, carrying out multi-scale cross-stage feature fusion on the optimized first-stage feature, the optimized second-stage feature and the spatially accurate feature image to obtain a preliminary feature fusion feature, carrying out multi-scale adaptive feature-weight-based on the optimized first-stage feature, carrying out self-adaptive feature-adaptive multi-scale feature-fusion on the optimized first-stage feature, and carrying out self-adaptive feature-adaptive multi-scale feature-adaptive matrix fusion on the optimized feature matrix, and carrying out self-adaptive feature-adaptive multi-adaptive feature fusion on the obtained by adopting a first-scale feature matrix;

2. The AI-recognition-based security inspection equipment control method of claim 1, wherein the performing adaptive gaussian smoothing and normalization on the original face image to obtain a preprocessed face image comprises:

performing luminance histogram analysis on the original face image to obtain image luminance distribution characteristics, and performing self-adaptive gamma correction on the original face image according to the image luminance distribution characteristics to obtain a luminance equalization image;

Performing edge detection on the brightness equalization image to obtain an image edge intensity image, and calculating the variance of a local area according to the image edge intensity image to obtain a self-adaptive Gaussian kernel parameter;

Performing convolution operation on the brightness equalization image based on the self-adaptive Gaussian kernel parameter to obtain a smooth image after noise reduction, and performing geometric transformation and affine transformation on the smooth image to obtain a corrected face image;

Performing feature point positioning on the corrected face image according to a preset face key point model to obtain face key point coordinates, and cutting and scaling the corrected face image based on the face key point coordinates to obtain a face image with uniform size;

and carrying out pixel value normalization processing on the face images with the uniform size to obtain normalized face images, and carrying out data enhancement on the normalized face images to obtain preprocessed face images.

3. The AI-recognition-based security inspection equipment control method of claim 1, wherein the performing up-sampling and deconvolution operations according to the first-stage low-level feature map to obtain a first-stage preliminary feature representation, and optimizing the first-stage preliminary feature representation through a first supervisory attention module to obtain an optimized first-stage feature comprises:

performing bilinear interpolation upsampling on the low-level feature map of the first stage to obtain a first upsampled feature map, and performing transposition convolution operation on the first upsampled feature map to obtain a preliminary deconvolution feature map;

performing channel splicing on the preliminary deconvolution feature map and jump connection features with original resolution to obtain a fusion feature map, and performing 1x1 convolution operation on the fusion feature map to obtain a first-stage preliminary feature representation after channel compression;

calculating the channel attention weight of the first-stage preliminary feature representation through a first supervision attention module to obtain a channel attention matrix, and weighting the first-stage preliminary feature representation based on the channel attention matrix to obtain a channel enhancement feature;

Calculating the spatial attention weight of the channel enhancement feature to obtain a spatial attention matrix, and weighting the channel enhancement feature based on the spatial attention matrix to obtain a dual attention feature;

and carrying out residual connection on the dual-attention feature and the first-stage preliminary feature representation to obtain an attention optimization feature, and carrying out batch normalization and ReLU activation function operation on the attention optimization feature to obtain an optimized first-stage feature.

4. The AI-recognition-based security inspection equipment control method of claim 3, wherein inputting the optimized first-stage features into a second-stage encoder-decoder structure to obtain second-stage advanced semantic features, and optimizing the second-stage advanced semantic features through a second supervisory attention module to obtain optimized second-stage features comprises:

Inputting the optimized first-stage characteristics into a second-stage encoder-decoder structure, and performing multi-layer convolution and pooling operation on the optimized first-stage characteristics to obtain second-stage encoding characteristics;

Carrying out cavity convolution on the second-stage coding feature to obtain an expanded receptive field feature, and carrying out feature fusion on the expanded receptive field feature and the jump connection feature of the first stage to obtain a multi-scale semantic feature;

performing deconvolution operation on the multi-scale semantic features to obtain a second up-sampling feature map, and performing self-attention mechanism processing on the second up-sampling feature map to obtain self-attention enhancement features;

Residual connection is carried out on the self-attention enhancement features and corresponding scale features of the coding stage, residual enhancement features are obtained, and a channel weight matrix of the residual enhancement features is calculated through a second supervision attention module;

Weighting the residual enhancement features based on the channel weight matrix to obtain channel enhancement features, and performing spatial attention analysis on the channel enhancement features to obtain a spatial weight matrix;

And weighting the channel enhancement features based on the space weight matrix, and carrying out batch normalization and nonlinear activation to obtain optimized second-stage features.

5. The AI-recognition-based security inspection equipment control method of claim 4, wherein the inputting the multi-scale face features into a preset face recognition model, and obtaining the candidate identity list of the passing person through iterative matching of data-to-class similarity, comprises:

s1, inputting the multi-scale face features into a preset face recognition model, and performing dimension reduction on the multi-scale face features to obtain compressed feature vectors;

s3, carrying out first-round retrieval in a face database based on the compressed feature vector and the initial matching parameter to obtain a preliminary candidate list;

s4, calculating cosine distance and Markov distance for the face features in the preliminary candidate list and the compressed feature vector to obtain a comprehensive distance matrix;

s8, carrying out next iteration retrieval according to the updated matching parameters to obtain a new candidate list;

And S9, repeating the steps S4 to S8 until the preset iteration times are reached, obtaining a final candidate list, and carrying out self-adaptive threshold screening on the final candidate list to obtain the candidate identity list of the passers-by.

6. The AI-recognition-based security inspection equipment control method of claim 5, wherein the performing multi-modal fusion and bayesian reasoning on the candidate identity list in combination with the biometric data collected by the security inspection equipment to obtain a comprehensive identity recognition result comprises:

Preprocessing gait data, body type data and infrared thermal imaging data acquired by the security inspection equipment to obtain standardized biological feature data, and extracting features of the standardized biological feature data to obtain a multi-mode biological feature vector;

Performing feature alignment on the multi-modal biological feature vector and the face features in the candidate identity list to obtain an alignment feature matrix, and performing intra-modal relational modeling on the alignment feature matrix to obtain intra-modal enhancement features;

Performing cross-modal attention analysis on the intra-modal enhancement features to obtain inter-modal interaction features, and performing feature fusion on the inter-modal interaction features to obtain fusion feature vectors;

Constructing prior probability distribution according to the history identification record and the current security level setting to obtain identity prior probability, and constructing a Bayesian network model based on the fusion feature vector and the identity prior probability to obtain posterior probability distribution;

and carrying out maximum posterior probability estimation on the posterior probability distribution to obtain the most probable identity, and comparing the most probable identity with a preset confidence threshold to obtain a comprehensive identity recognition result.

7. A security inspection device control apparatus based on AI identification, for executing the security inspection device control method based on AI identification of any one of claims 1-6, the apparatus comprising:

8. The security inspection equipment control equipment based on the AI identification is characterized by comprising a memory and at least one processor, wherein the memory stores instructions;

The at least one processor invokes the instructions in the memory to cause the AI-identification-based security device control apparatus to perform the AI-identification-based security device control method of any of claims 1-6.

9. A computer-readable storage medium having instructions stored thereon, which when executed by a processor, implement the AI-recognition-based security inspection device control method of any of claims 1-6.