1. Introduction
As an important part of the eye, the pupil’s shape, size, and dynamic changes contain rich biological information [
1]. The color and texture differences between the sclera, iris, and pupil provide a rich characteristic basis for pupil recognition. Compared with traditional biometrics such as fingerprint recognition and facial recognition, pupil recognition has higher accuracy and stability [
2]. In particular, under light changes, camouflage interference, and other complex environments, pupil recognition can still maintain a high recognition rate, so it is widely used in access control systems, identity identification [
3], medical assistance, human–computer interaction, and other fields. Among many biometric technologies, pupil recognition has gradually emerged as one of the core technologies in identity authentication, medical assistance, human–computer interaction, and driver assistance due to its unique advantages. Pupil recognition not only relies on the physiological structure of the eye [
4], but also integrates complex image processing algorithms and machine learning techniques to achieve high-precision, high-stability identity identification and status monitoring. The core of pupil recognition lies in extracting the key features of the pupil from the image and realizing identity authentication or status monitoring by matching with the pre-existing feature library [
5]. However, in order to realize high-precision pupil recognition, it is necessary to overcome adverse factors such as noise interference, lighting changes, and eye movement during image acquisition. There is also a need to consider the variability in pupil features among different individuals; therefore, the study of pupil refinement recognition is particularly important.
At present, pupil recognition technology has made significant progress but still faces many challenges. On the one hand, the quality of a pupil’s image is affected by many factors, such as lighting conditions, eye occlusion, camera resolution, etc. These factors may lead to a decline in recognition accuracy. On the other hand, the features of different individuals’ pupils are different, such as the complexity of iris texture, changes in pupil shape, etc., increasing the difficulty of feature extraction and matching. In addition, real-time, security, and privacy protection should also be considered in pupil recognition technology. In order to overcome these challenges, researchers have proposed a variety of fine pupil recognition methods. For example, Khoje, S et al. [
6] located the inner edge of the iris by gray histogram segmentation and threshold processing and used calculus operators to narrow the search range to locate the outer edge to achieve high-precision iris location. In the feature extraction phase, wavelet transform is used to extract iris texture features. In view of the shortcomings of the existing feature extraction methods, improvements are made in preserving texture regions, extracting feature signals along two directions, and detecting singular points using the modulus maxima of the first derivative. Singularities are judged through the location of modulus maxima after wavelet transform, and feature coding is carried out to finally achieve efficient feature matching and then complete iris recognition. Although wavelet transform improves the abstraction of feature extraction to a certain extent, the extracted features are not sufficient to fully characterize the complexity and uniqueness of the pupil, especially when faced with complex recognition tasks, such as iris recognition under different races, ages, genders, or health conditions. Karthik, B. and others [
7] first designed an iris segmentation algorithm based on CNN (convolutional neural network) in view of the noise factors such as eyeglasses occlusion, blur, angle deviation, etc. in the iris image. This algorithm effectively improves the positioning information and semantic information fusion of bottom-to-top features and improves the accuracy of iris feature extraction by building a feature pyramid and introducing a bottom-up path. At the same time, in the mask prediction branching stage, the up-sampling and convolutional block attention module (CBAM) technology is used to further enhance the salient information of the feature map and improve the iris segmentation accuracy. In the iris recognition phase, the pre-trained CNN model is used to extract feature descriptors of iris images. These feature descriptors can represent complex image features and effectively capture the subtle differences in iris texture. By comparing these feature descriptors with iris templates in the database, high-precision iris recognition is achieved. This method mainly relies on CNN to process the iris image of visible light and does not fully consider the spectral characteristics of the pupil at different wavelengths. Spectral-domain information can reveal deeper details and changes in the pupil, but this method has limitations in this respect. Dronky, M. R. and others [
8] first normalized and denoised the acquired iris image through preprocessing steps to eliminate the interference of external factors such as illumination and rotation. Then, residual information in the iris image is extracted using ResNet or other deep learning models. The residual image can highlight the tiny differences and feature details in the iris image, providing more abundant and accurate information for subsequent recognition. The statistical-based local texture description method (BSIF) is introduced to further process the residual image. Finally, the machine learning algorithm is used to classify and recognize the extracted feature vectors. By comparing the feature vector of the iris image to be recognized with the feature vector stored in the database, high-precision iris recognition is achieved. The performance of the machine learning algorithm depends on the diversity and quality of training data to a large extent. If the training data does not fully cover the spectral changes or weak inter-class differences in the pupil at different wavelengths, the algorithm will encounter difficulties in identifying these subtle changes. Shirke, S. D. et al. [
9], first of all, use preprocessing steps to enhance image quality, including denoising, light equalization, and image enhancement techniques, to improve the reliability of subsequent processing. Then, a local gradient pattern is used to extract texture features of the iris image. In order to further improve the recognition performance, deep learning technology is introduced to further learn and represent the features extracted from the local gradient pattern. Finally, the trained deep learning model is used to classify and make decisions on the extracted features to achieve high-precision recognition of long-distance iris. Although local gradient mode can extract local texture features of images, its sensitivity to subtle texture changes may be limited. In iris images, especially in remote or low-quality images, subtle texture differences are difficult to accurately capture. In addition, the local gradient model focuses more on the statistical characteristics of local regions than on the global or higher-level feature representation. Conti, V. et al. [
10] used advanced image processing techniques to extract spatial features of iris images, including local patterns of textures, edge contours, and geometric shapes of iris regions. In the feature comparison stage, Levenshtein distance is introduced as the metric to compare the spatial feature differences between different iris images and achieve iris recognition. Levenshtein distance is mainly used to measure the difference between two strings. It calculates the minimum editing times through insertion, deletion, and replacement operations. However, in iris recognition, it is not entirely appropriate to apply Levenshtein distance to the measurement of spatial features. The spatial features of iris images (such as texture, edge, etc.) are usually not simple string forms, but complex image data. Therefore, using Levenshtein distance directly cannot accurately reflect the real differences between iris images, and it is difficult to capture sufficient distinguishing features.
In order to capture subtle changes in the pupils and ensure that the network can recognize relatively weak inter-class changes, deep learning models can automatically learn complex feature representations from large amounts of data [
11] and extract more abstract and discriminative pupil features. The pupil refinement recognition method based on a deep residual network and attention mechanism is studied in this paper to improve the accuracy of pupil recognition and provide users with more reliable identity authentication and status monitoring services. This method utilizes the powerful performance of ResNet in the field of image processing, especially its deep structure that can capture more and more detailed feature information, meeting the demand for extracting subtle and complex features in pupil recognition tasks. Subsequently, by introducing attention mechanisms, the network can focus more on the subtle changes and weak inter-class differences in the pupils when extracting features, thereby improving the accuracy and robustness of recognition. This mechanism simulates the attention process in the human visual system, enhancing the network’s attention to important features. In addition, combining deep residual networks with attention mechanisms to fully leverage their advantages not only improves the learning efficiency of the network but also makes it more flexible and accurate in processing complex and variable pupil images. This comprehensive application has brought significant performance improvement to the task of fine pupil recognition. In summary, this study proposes a pupil refinement recognition method based on deep residual networks and attention mechanisms, which brings new and original contributions to the current research status of pupil recognition technology. This method is expected to provide users with more reliable identity authentication and status monitoring services in practical applications.
Firstly, deep residual networks (ResNet) have demonstrated powerful performance in the field of image processing. By introducing residual connections, ResNet can effectively alleviate the problems of gradient vanishing and exploding during the training process of deep neural networks, enabling the network to learn deeper feature representations. In pupil recognition tasks, it is necessary to extract subtle and complex features from pupil images, and ResNet’s deep-level structure can precisely meet this requirement, especially variants such as ResNet101, whose deeper layers and more complex structures enable the network to capture more and more detailed feature information. Secondly, the attention mechanism has made important progress in the field of deep learning in recent years. It simulates the attention mechanism in the human visual system, enabling the network to focus more on important regions and features when processing images. In pupil recognition, the subtle changes and subtle inter-class differences in the pupil are the focus of attention. By introducing attention mechanisms, the network can focus more on these important changes when extracting features, thereby improving the accuracy and robustness of recognition. In addition, the combination of deep residual networks and attention mechanisms can fully leverage the advantages of both. ResNet provides powerful feature extraction capabilities, while attention mechanisms further enhance the network’s attention to important features. This combination not only improves the learning efficiency of the network, but also makes it more flexible and accurate in processing complex and varied pupil images. Finally, selecting these specific machine learning models and algorithms is also based on their performance in practical applications. In previous studies, deep residual networks and attention mechanisms have achieved significant results in multiple image processing tasks. Therefore, in the task of fine pupil recognition, these models and algorithms can also demonstrate excellent performance. In summary, selecting deep residual networks and attention mechanisms as the core components of the proposed method aims to capture subtle changes in the pupil, identify relatively weak inter-class changes, and extract more abstract and discriminative pupil features through their combination, thereby improving the accuracy and robustness of pupil recognition.
The introduction of this paper outlines the research background and applications of pupil recognition technology, emphasizing the challenges encountered by current methods. Subsequently,
Section 2 details the proposed pupil refinement recognition method based on a deep residual network and attention mechanisms, covering the specific design and implementation of the ResNet101 network structure, channel attention module, and external attention module.
Section 3 presents the experimental evaluation, where the proposed method’s performance in terms of recognition accuracy, stability, and processing speed is analyzed and validated through comparative experiments. Finally,
Section 4 summarizes the findings and discusses potential future directions to further improve the accuracy and robustness of pupil recognition.
2. Pupil Refinement Recognition Methods
2.1. Pupil Refinement Recognition Model
The pupil refinement recognition model is based on a deep learning framework. The backbone network selects the deep residual network ResNet101 and uses its pre-trained parameters on ImageNet to remove the specific classification task part (full connection layer) [
12]. The step size of the last residual block is set to 1 to avoid the spatial resolution of the pupil feature map being too small. At the same time, the BN (BatchNorm1d) layer is added to make the input of the model meet the same distribution in the training and accelerate the convergence of the network model. The pupil refinement recognition model uses the deep residual network ResNet101 as the backbone network. ResNet101 is a deep convolutional neural network consisting of 101 layers, including a 7 × 7 convolutional layer as the input layer, followed by four stages containing several residual blocks, with each residual block consisting of two 3 × 3 convolutional layers, and equipped with batch normalization and ReLU activation functions. This network structure aims to alleviate the gradient vanishing problem through residual connections, allowing for the training of deeper networks. In the pupil refinement recognition model, ResNet101 is used to extract features from pupil images. Its pre-training parameters are obtained on the ImageNet dataset and adjusted to adapt to pupil recognition tasks, removing specific classification task parts (fully connected layers) and making appropriate modifications to the network structure to avoid low spatial resolution of feature maps. In order to process the infrared spectrum pupil image, the network parameters of the first convolution block of the backbone network ResNet101 are different so as to obtain the low-level characteristics specific to each pupil [
13]. The network parameters of other depth convolution blocks are shared by spectral data so that the pupil data for the same category will not be too different after being mapped to the feature space.
The method in this article mainly consists of three parts: the backbone network ResNet101, channel attention module, and external attention module. The overall model structure is shown in
Figure 1.
In this model, the pupil image captured by infrared spectroscopy is input and preprocessed using an image preprocessing module to remove internal noise from the pupil image, providing high-quality image data for subsequent feature extraction. The ResNet101 backbone network is used to capture subtle changes in the pupil, identify weak inter-class changes, and extract different features of the pupil image. The ResNet101 backbone network consists of multiple convolutional layers, pooling layers, and residual blocks. A channel attention module is inserted after a residual block in an intermediate layer of ResNet101. This module will weight the features of different channels, emphasizing the useful channel features for pupil recognition and suppressing the influence of irrelevant channels. An external attention module is inserted before another residual block or fully connected layer at a deeper level in ResNet101. This module will further extract key features of the pupil and enhance the model’s ability to identify subtle changes in the pupil. The channel attention module is used to efficiently use the effective information of different channel pupil features, suppress the influence of irrelevant components on the fine recognition of pupils, and obtain more critical and useful pupil features. The external attention module is used to enhance the expression of key features of pupils and extract more abstract and differentiated pupil features. Through the global average pooling layer, the spatial dimension of the pupil feature map in the nonspectral domain is compressed to reduce the amount of data for subsequent processing, while retaining key feature information. The batch normalization layer is used to stabilize the training process of the pupil refinement recognition model, reduce the impact of internal covariate offset on the model performance, and thus improve the generalization ability and recognition accuracy of the model. The Softmax classifier is used to process the pupil features captured by infrared spectra and output refined pupil recognition results. A loss function is used to continuously optimize the recognition accuracy of the model and improve its ability to identify pupil images.
2.2. Pupil Feature Extraction of Backbone Network ResNet101
Preprocessing the pupil images captured by infrared spectroscopy is a crucial step before performing pupil recognition. Through effective image preprocessing methods, multiple noise sources in the image are removed while preserving the detailed information of the pupil edges. Firstly, Gaussian filtering is used to smooth the original infrared image in order to suppress the thermal and random noise of the device itself. Gaussian filtering can preserve the main features of an image while reducing the impact of noise on image quality. Morphological processing methods are used to remove the light spots generated by direct exposure to ambient light. By constructing appropriate structural elements and performing erosion and dilation operations on the image, it is possible to effectively remove light spots and restore detailed information in the image. Motion blur estimation and restoration algorithms are used for correction of motion blur caused by eye movement or capture device shaking. By estimating the blur kernel and applying inverse filtering, clear edges and details in the image can be restored.
The backbone network ResNet101 is used for processing the preprocessed infrared-spectrum-captured pupil images, capturing subtle changes in the pupils, identifying weak inter-class variations, and extracting different features in the pupil images [
14]; these features contain information on changes between spectral domains.
In the backbone network ResNet101, first, three convolutional layers are used to extract preliminary pupil features from the pupil images captured by infrared spectroscopy while reducing computational complexity and maintaining the same receptive field. Secondly, the maximum pooling layer is used to compress the pupil feature dimension [
15]. Finally, self-calibrating convolution is introduced to expand the receptive field of the convolutional layer, capture subtle changes in the pupil, identify relatively weak inter-class changes, and achieve the goal of extracting adjacent information on pupil image features in spectral domains.
The pupil image captured by infrared spectroscopy contains a large amount of texture information, but information extraction from the existing network structure presents the problem of poor representativeness. The existing network structure mainly uses ordinary convolution for pupil feature extraction [
16], which ignores the dependency between channels and lacks the ability to extract neighboring information on the main pupil features. To address this problem, self-calibrated convolution is introduced in the paper; this can extract and fuse the background information to improve the characterization of features [
17]. Unlike traditional convolution methods, self-calibrated convolution has the powerful ability to capture the subtle changes in the pupil, recognize the weak inter-class variations, and extract not only the main pupil features [
18] but also the background information adjacent to the main pupil features. Therefore, it can use the background information around the pupil texture as features to improve the performance of pupil recognition [
19].
Order
as the initial pupil feature map in the infrared spectrum extracted by the convolutional layer, which has the size
, split the input
into two,
and
, of size
. Secondly, put
and
into a special path, which is mainly used to collect contextual information. Finally,
is obtained by self-calibration of branch
with
,
, and
. First, for input branch
, the average pooling operation is performed and the size of the convolution kernel of the pooling layer is
, with a step size of
. Pooled pupils are characterized by the following:
where
denotes the input preliminary pupil characterization map after splitting,
denotes the output pupil feature map after pooling, and
represents the average pooling operation. The feature transformation for
using the second convolution kernel
is shown in Formula (2) as follows:
where
is denoted as the output information for the transformed pupil features,
is the bilinear interpolation operator, and converse
from a smaller feature space to the original feature space. The calibration operation is represented using Formula (3):
where
denotes the output of the element-by-element multiplication and
denotes the third convolutional kernel. The final output is Formula (4):
where the fourth convolution kernel is denoted as
and the calibrated output is expressed as
. In the second path, a convolution operation is performed, as shown in Formula (5):
where
denotes the first convolution kernel and
indicates that the input is split.
represents the output information after convolution [
20]. This operation can help the network retain the spatial context information with the purpose of fusing the outputs on two paths and producing a new output. Using the property of self-calibrating convolution, the receptive field of the convolutional layer in the network can be increased so that feature information from several different spaces can be fused into a single space, thus making the infrared spectrum pupil features
richer and more differentiated, as well as extracting pupil features
in the near-infrared spectral domain.
2.3. Channel Attention Module
ResNet101 effectively alleviates the gradient disappearance problem in deep network training through residual connection and improves network performance; however, when processing pupil features, its output feature map may contain a lot of redundant information and the features of different channels have different importance for pupil recognition tasks. In order to make more efficient use of the effective information from different channels’ pupil features and suppress the influence of irrelevant components, the channel attention module is used to adaptively weight the pupil features.
In order to compute the channel attention efficiently [
21], the module first compresses the pupil features extracted in
Section 2.2 using maximum pooling and average pooling, then the compressed pupil features are nonlinearly transformed by a parameter-sharing multilayer perceptron, and finally, the output pupil features are summed up so as to complete the adaptive assignment operation. The structure of the channel attention module is shown in
Figure 2.
Taking the infrared spectrum pupil features
extracted in
Section 2.2 as an example, the process of finding the value of different weights can be expressed by Formula (6):
where
is the Sigmoid activation function,
is the output feature sequence of the input infrared spectrum pupil feature matrix after a multi-layer perceptron,
is the maximum pooling pupil features for the layer
inputs, and
is the global average pooling feature for the layer
inputs, which can be obtained from Formula (7) as follows:
where
(j) is the pupil characteristic of the
-th infrared spectrum of the
-th neuron in layer
and
is the width of the pooling area.
The more critical [
22,
23] and useful infrared spectrum pupil features output by the channel attention module is as follows:
Similarly, the more critical and useful pupil features in the near-infrared spectral domain output are obtained by the channel attention module.
2.4. External Attention Module
The channel attention module mainly focuses on the importance of different channels in the feature map and adjusts the channel weights to enhance useful features and suppress useless features. However, this approach may ignore the spatial information inside the feature map, i.e., the importance of features at different locations for pupil recognition tasks is somewhat different. For this reason, by constructing an external attention module in which an external memory unit
is introduced,
is the learnable parameters, implicitly learns the features of the whole key and useful pupil features continuation training set, taking into account the potential connection between different samples, and this connection is meaningful for dealing with pupil feature samples, carving out the most essential features of all the pupil feature samples, and then extracting the more abstract and differentiated pupil features and improving the pupil refinement recognition effect [
24]. The structure of the external attention module is shown in
Figure 3.
In the realization process, the key and useful pupil features extracted in
Section 2.3 are taken as the features to be processed, and the key and useful infrared spectrum pupil features, for example, can be expressed as
. The input key and useful pupil feature
are first transformed into a new feature space, similar to the self-attention mechanism, and
is obtained by its own linear transformation Linear_Q (Conv 1 × 1). The aim is to enhance the expressive power of pupil features and extract more abstract and discriminative pupil features and then to compute the attention matrix
of the external memory units
to
with the following expression:
where
represents the similarity between the
-th pixel in
and the
-th column memory value in
. The application of dual-normalized Norm can avoid the attention failure problem caused by the excessive feature vector of a pupil. Double-normalized Norm normalizes the rows and columns of the attention
calculated by matrix multiplication. The realization process can be expressed as follows:
The similarity feature matrix
is used to update the second external memory cell
, instead of the first external memory unit
. In this way, the capacity of the network can be increased, and then combined with the original pupil features, the final pupil features in the infrared spectrum processed by the external attention module can be obtained, and the expression is as follows:
Similarly, the final NIR spectral-domain pupil features processed by the external attention module is obtained.
2.5. Fine Pupil Recognition Based on Softmax Classifier
The Softmax classifier is used to process pupil features
and
in the infrared spectrum obtained in
Section 2.4 to obtain output pupil refinement recognition results.
Make
a training sample; Softmax [
25,
26] regression is a generalization of the logical regression model on multi-classification problems. The function of the logical regression model is as follows:
where
denotes the pupil refinement recognition model parameters and
indicates transposed symbols.
Make the pupil refinement identification label corresponding to the sample; is the total number of sample categories, and for the dichotomous problem takes 2.
For multiclassification problems, such as class
for a given sample input
, if for each pupil category
calculate the probability value
, then there is
The pupil refinement recognition result can then be obtained using Formula (13).
When implementing Softmax [
27,
28] regression,
is usually represented by a matrix that is given by row listing from
.
is the probability of identifying
refinement as a category
, the formula is as follows:
2.6. Loss Function for the Pupil Refinement Recognition Model
The parameters of the pupil refinement recognition model are optimized by the loss function to improve the model’s ability to recognize pupil images and improve the pupil refinement recognition accuracy.
The loss cost function of Softmax [
29,
30] regression Softmax Loss is as follows:
In order to ensure separability, we assume that the pupil features are compact within the class and separated between classes. The loss function of Center Loss is introduced to solve the problem of low pupil refinement recognition. The specific calculation formula of Center Loss is as follows:
where
indicates the center of the pupil features for class
.
In the process of network training, only using the loss cost function of Softmax regression can reduce the within-class distance but cannot increase the inter-class distance to better distinguish each pupil category. Considering the supervision effect of Softmax [
31,
32] Loss and Center Loss on data samples, this paper attempts to obtain discriminant features by using two kinds of losses to adjust the parameters of the pupil refinement recognition model. The specific loss function is shown in Formula (18):
where
indicates the weight coefficient of Center Loss.
3. Experimental Analysis
Using the BioID Face database (
https://www.bioid.com/facedb/ (accessed on 8 September 2024)) as the experimental subject, this dataset contains pupil images of 200 people’s left and right eyes obtained simultaneously in the near-infrared band. Each person’s left and right eyes simultaneously contain 20 image samples of near-infrared spectra, making the total number of images in this dataset 160,008. The relevant introduction of the pupil database is shown in
Table 1.
In the pupil database, pupil images captured by infrared spectra from the same person and angle were randomly selected and preprocessed using the method proposed in this paper. The preprocessing results are shown in
Figure 4.
From the analysis of
Figure 4, it can be seen that there are indeed multiple sources of noise in the pupil image captured by the original infrared spectrum, such as thermal noise from the device itself, light spots generated by direct exposure to ambient light, and motion blur caused by eye movement or shaking of the capture device. These noise sources not only reduce the clarity of the pupil edge but may also cause blurring or loss of edge information, thereby seriously interfering with the accuracy of pupil recognition. Through the preprocessing method proposed in this article, as shown in
Figure 4b, the random noise in the image has been effectively suppressed and smoothed, while the detailed information of the pupil edge has been maximally preserved. This denoising effect makes the infrared spectral pupil image purer, laying a solid foundation for subsequent feature extraction work and helping to improve the accuracy and robustness of feature extraction.
Using the method described in this article, pupil features are extracted from the preprocessed infrared spectrum captured pupil image. Taking the infrared spectrum pupil image as an example, some pupil feature extraction results are shown in
Figure 5.
Analysis of
Figure 5 shows that the method in this paper can effectively extract the pupil features within the pupil image, providing favorable data support for the subsequent pupil refinement recognition.
To further demonstrate the pupil feature extraction effect of this method, the t-SNE (t-distributed stochastic neighbor embedding) algorithm is used to reduce the dimension of the pupil features output by this method, and its effect is shown in
Figure 6, taking the shape features and color features of the pupil image in the infrared spectrum as an example.
Analyzing
Figure 6, it can be seen that the method in this paper can successfully group the pupil features from different pupils, which proves that the extracted pupil features are not only representative but also able to form clear boundaries in the multidimensional space so as to realize effective clustering. The color features reflect the color distribution of the pupil, while the shape features describe the contour and geometry of the pupil, and the good differentiation between them indicates that the method is able to capture and make effective use of both types of feature information at the same time.
In this pupil dataset, 10 individuals’ pupil images were randomly selected. Although the number of these 10 individuals is limited, their pupil characteristics cover different ranges, including pupil center position, diameter, and shape parameters (roundness). Therefore, from preliminary observations, these 10 individuals have a certain representativeness in terms of pupil characteristics. Using the method described in this article, fine pupil recognition was performed on these 10 individuals, and the results of the fine pupil recognition are shown in
Table 2.
Analysis of
Table 2 shows that the method in this paper can effectively complete the pupil refinement recognition and obtain the pupil’s center position, pupil diameter, and pupil shape parameters—three aspects of the refinement of the recognition results that provide data support for identity authentication, emotion analysis, disease diagnosis, human–computer interaction, and other fields.
Taking the iris recognition method using ripple transform in [
6], the iris recognition method using CNN in [
7], the iris recognition method using the residual image in [
8], the iris recognition method using local gradient pattern in [
9], and the iris recognition method using Levenshtein distance in [
10] and comparing them with the methods in this paper, the pupil refinement recognition effects of these six methods were analyzed in order to visually display the pupil refinement recognition performance of each method and draw their recognition results distribution maps on the verification set. The rendering process is as follows: First, the pupil refinement recognition results of the validation set samples are obtained. Then, the recognition value used to determine the category in the recognition result is extracted according to the category discrimination benchmark; at this time, each tested sample only corresponds to one recognition value. Finally, the number of the extracted recognition values in each threshold interval is determined, and a histogram of the statistical results is drawn, as shown in
Figure 7. In the figure, the green stripe represents the statistics of the recognition results of known samples, and the pink stripe represents the statistics of the recognition results of unknown samples. The detailed data are shown in
Table 3.
From the analysis of
Figure 7 and
Table 3, it can be seen that the iris recognition method using ripple transform extracts iris features. Although it achieves certain recognition effects, the recognition rates of known and unknown class samples are relatively low, at 85.0% and 72.0%, respectively, with an average recognition rate of 78.5% and a high false recognition rate of 15.0%. This indicates that the ripple transform has insufficient robustness in feature extraction when dealing with complex or highly variable iris images. Compared with the method proposed in this article, the recognition performance of the ripple transform method is significantly lower, especially in the recognition of unknown class samples, with a large gap. At the same time, the false recognition rate and recognition stability are also poor. The CNN iris recognition method uses deep convolutional networks to automatically learn iris features, significantly improving recognition performance. The recognition rates of known and unknown class samples reached 90.0% and 80.0%, respectively, with an average recognition rate of 85.0% and a false recognition rate of 10.0%. This indicates that CNN has advantages in handling large-scale datasets and extracting complex features. Although the CNN method performs well, compared with the method proposed in this paper, there is still room for improvement in the recognition of unknown class samples, and the recognition stability and average recognition rate are slightly lower. The iris recognition method of residual images enhances iris features by constructing residual images, thereby improving recognition accuracy. The recognition rates of known and unknown class samples are 88.0% and 78.0%, respectively, with an average recognition rate of 83.0% and a false recognition rate of 12.0%. This indicates that the residual image method performs well in processing iris images with subtle differences. Compared with the method proposed in this article, the residual image method still has a gap in recognition performance, especially in the recognition of unknown class samples. The iris recognition method based on local gradient pattern extracts the local gradient pattern of the iris image to construct feature vectors, which have high recognition accuracy. The recognition rates of known and unknown class samples are 92.0% and 82.0%, respectively, with an average recognition rate of 87.0% and a false recognition rate of 8.0%. This indicates that the local gradient mode method has advantages in processing iris images with significant texture changes.
Although the local gradient pattern method performs well, compared with the method proposed in this paper, there is still room for improvement in the recognition of unknown class samples and the average recognition rate and recognition stability are slightly lower. The Levenshtein distance-based iris recognition method uses Levenshtein distance to measure the similarity between iris features and has a certain recognition effect. However, the recognition rates of known and unknown class samples were 86.0% and 76.0%, respectively, with an average recognition rate of 81.0% and a false recognition rate of 14.0%. This indicates that the Levenshtein distance method has limited effectiveness in processing iris images with significant deformation. Compared with the method proposed in this article, the Levenshtein distance method has a significant gap in recognition performance, especially in the recognition of unknown class samples (88.0% vs. 76.0%), and the false recognition rate and recognition stability are also poor. This method achieves fine recognition of iris images by optimizing feature extraction and classification strategies. The recognition rates of known and unknown class samples are as high as 95.0% and 88.0%, respectively, with an average recognition rate of 91.5% and a false recognition rate of only 5.0%. The recognition stability index is 9.0. This indicates that the method proposed in this article has significant advantages in the field of fine pupil recognition. In summary, compared with the other five iris recognition methods, this method demonstrates superior performance and stability in fine pupil recognition. This is mainly due to the optimization of feature extraction and classification strategies in our method, which results in a clear separation trend of recognition values between known and unknown class samples on the distribution map, thereby reducing the degree of confusion between categories and improving recognition accuracy and stability. Therefore, the method proposed in this article has broad application prospects in the field of fine pupil recognition.
In order to verify the effectiveness of each module in the pupil refinement recognition method based on deep residual network and attention mechanism, the contribution of deep residual network ResNet101, channel attention module, external attention module, and Softmax classifier in pupil refinement recognition was verified by comparing the pupil refinement recognition effects under different module combinations. A dataset containing pupil images captured by infrared spectroscopy was used to ensure sufficient samples for reliable experiments. The same deep learning framework and hardware configuration were used to ensure the fairness of experimental results. Accuracy, recall, and F1 score were introduced as evaluation metrics to measure the effectiveness of pupil refinement recognition. The method proposed in this article was compared with pupil detection methods based on the Hough transform, edge detection, and machine learning algorithms. The specific experimental results are shown in
Table 4.
According to an analysis of
Table 4, the proposed pupil refinement recognition method based on a deep residual network and attention mechanism shows significant advantages in accuracy, recall, and F1 score compared to traditional pupil detection methods based on Hough transform and edge detection and machine learning algorithms. The accuracy of this method is as high as 98.5%, the recall rate is 99.0%, and the F1 score is 98.8%, all of which are superior to other comparative methods. When removing the channel attention module or external attention module, although the recognition effect is still better than traditional methods, compared to the complete model, its accuracy, recall, and F1 score have all decreased, which verifies the important role of the channel attention module and external attention module in improving the fine pupil recognition effect. In summary, this method not only achieves high-precision pupil recognition but also proves the effectiveness of each module through ablation experiments, providing new ideas and technical support for the field of fine pupil recognition.
4. Conclusions
The pupil, as a part of human biometric features, has a unique texture and shape. Through the refined recognition method, the pupil’s fine features can be captured more accurately so as to improve the overall accuracy of biometrics. Therefore, research is conducted on a pupil refinement recognition method based on deep residual networks and attention mechanisms. This article uses the deep residual network ResNet101 as the backbone network for feature extraction, combined with the image preprocessing module, which can effectively process the pupil images captured by infrared spectra. It can effectively process the pupil images captured by infrared spectra, not only removing the internal noise of the images but also capturing the subtle changes and weak inter-class differences of the pupils through the powerful feature extraction ability of ResNet101. Furthermore, by introducing the channel attention module and external attention module, this method can filter out key features of the pupil and enhance their information expression, thereby extracting more abstract and discriminative pupil features. Finally, the Softmax classifier was used to process these features, achieving fine-grained recognition of the pupil.
Although the method in this paper has achieved remarkable results, there are still some challenges and problems to be solved. Future research can be carried out from the following aspects:
- (1)
Algorithm optimization and efficiency improvement: In response to the complexity and computational requirements of current deep learning models, we will further optimize the model structure and parameters to improve training efficiency and recognition speed. By exploring more efficient feature extraction and fusion methods, efforts will be made to reduce computational complexity and improve the real-time performance of algorithms to meet the fast response requirements in practical application scenarios.
- (2)
Multi-source data fusion: In order to further improve the accuracy and robustness of pupil recognition, consideration will be given to combining other biometric information (iris texture, corneal reflection, etc.) with pupil information for multi-source data fusion. By integrating multiple biological features and fully utilizing their complementarity, the overall performance of the recognition system is improved, enhancing its adaptability to complex environments.
- (3)
Cross-scene adaptability: In response to various lighting conditions, posture changes, and occlusion situations encountered in practical applications, more flexible pupil recognition algorithms will be studied. By introducing adaptive learning mechanisms or transfer learning methods, the model can adaptively adjust according to the data distribution in different scenarios, thereby maintaining good recognition performance in different scenarios. This will help broaden the application scope of pupil recognition technology and improve its practicality and reliability in practical scenarios.
In summary, this study has achieved preliminary results in fine pupil recognition, but further efforts are still needed to overcome existing challenges and promote the sustained development of technology. Future research will focus on exploring algorithm optimization, multi-source data fusion, and cross-scene adaptability in order to achieve more efficient, accurate, and robust pupil recognition technology.