A Comprehensive Survey of Masked Faces: Recognition, Detection, and Unmasking
<p>Illustration showcasing the tasks of masked face recognition (MFR), face mask recognition (FMR), and face unmasking (FU) with varied outputs for the same input.</p> "> Figure 2
<p>Illustrates the evolving landscape of MFR and FMR studies from 2019 to 2024. The data were sourced from Scopus using keywords “Masked face recognition” for MFR and “Face mask detection”, “Face masks”, and “Mask detection” for FMR.</p> "> Figure 3
<p>Samples of masked and unmasked faces from the real-mask masked face datasets used in masked face recognition.</p> "> Figure 4
<p>Samples from real masked face datasets used in face mask recognition.</p> "> Figure 5
<p>Samples of synthetic masked faces from benchmark datasets.</p> "> Figure 6
<p>Illustration of the FMR-Net architecture for face mask recognition, depicting two-subtask scenarios: 2-class (with and without mask) and 3-class (with, incorrect, and without mask).</p> "> Figure 7
<p>Overview of the GAN network as an example of FU-Net for face mask removal.</p> "> Figure 8
<p>Face unmasking outputs from three state-of-the-art models: GANMasker, GUMF, and FFII-GatedCon. The first column shows the input masked face, while the second column displays the original unmasked face for reference.</p> "> Figure 9
<p>Three directions in masked face recognition (MFR): face restoration, masked region discarding, and deep learning-based approaches.</p> ">
Abstract
:1. Introduction
1.1. Challenges in MFR
- Scarcity of Datasets: The scarcity of datasets tailored for masked face recognition constitutes a pivotal challenge. Training any deep learning model requires a robust dataset, yet the shortage of publicly available datasets featuring masked faces complicates the development of effective MFR methods. Researchers tackling this challenge often resort to creating synthetic datasets by introducing masks to existing public face datasets like CASIA-WebFace [4], CelebA [5], and LFW [6]. To simulate masked–unmasked pairs, popular methods involve using deep learning-based tools such as MaskTheFace [7] or leveraging generative adversarial networks like CycleGAN [8]. Manual editing using image software, exemplified by the approach in [3], further supplements dataset generation efforts.Training on Synthetic Data:While synthetic datasets offer a practical solution when real-world masked face datasets are scarce, relying solely on AI-generated data introduces certain challenges. Models trained exclusively on synthetic data may overfit to the specific features or artifacts inherent in the data generation process, rather than learning to recognize real-world occlusions. This can result in a diminished performance when these models are deployed in real-world scenarios involving masked faces. To mitigate this risk, it is essential to balance the use of synthetic datasets with real-world data or apply fine-tuning techniques on real-world samples, ensuring the model’s generalizability. Additionally, models trained on AI-generated masks may develop an oversensitivity to synthetic artifacts, further necessitating the use of adversarial training approaches to enhance their robustness across a wide range of mask types, whether synthetic or real.
- Dataset Bias: In addition to the scarcity of publicly masked datasets, a prominent challenge lies in the bias inherent in existing benchmark datasets for MFR. Many widely used datasets exhibit a notable skew towards specific demographics, primarily favoring male and Caucasian or Asian individuals. This bias introduces a risk of developing MFR systems that may demonstrate reduced accuracy when applied to individuals from other demographic groups. To mitigate dataset bias in MFR, efforts should be directed towards creating more inclusive and representative benchmark datasets. This involves intentionally diversifying dataset populations to encompass a broader spectrum of demographics, including gender, ethnicity, and age.
- Occlusion Complexity: The complexity introduced by facial occlusion, particularly the masking of the mouth, poses a significant hurdle to existing face recognition methods. The diverse sizes, colors, and types of masks exacerbate the challenge, impacting the training of models for various masked face tasks, including recognition, detection, and unmasking. Strategies to address this complexity vary by task. Recognition methods may employ attention models [9,10] that focus on the upper half of the face or exclusively train on this region. Another approach involves using face mask removal methods as a pre-step before recognition. In unmasking tasks, researchers may introduce a pre-stage to detect the mask area, as demonstrated by generating a binary mask map in the first stage in [2]. Training datasets are further diversified by incorporating various mask types, colors, and sizes to enhance model robustness. These nuanced approaches aim to unravel the intricacies posed by occlusions, ensuring the adaptability of masked face recognition methodologies.
- Real-Time Performance: Integrating masked face recognition into real-world scenarios poses intricate challenges, given the variability in lighting conditions, diverse camera angles, and environmental factors. Maintaining a consistent performance amid these dynamic variables is a significant hurdle. Practical applicability across diverse settings necessitates real-time capabilities for MFR systems. However, the computational demands of deep learning-based MFR methods present a challenge, particularly when striving for real-time functionality on resource-constrained mobile devices. Addressing these real-time performance challenges involves a strategic optimization approach. Efforts focus on enhancing the efficiency of deep learning models without compromising accuracy.
1.2. Applications of MFR
- Security and Access Control: Strengthening security measures to achieve precise identification, especially in scenarios involving individuals wearing masks. Seamlessly integrating with access control systems to guarantee secure entry across public and private spaces, including restricted areas like airports, government buildings, and data centers. Additionally, implementing facial recognition-based door locks for both residential and office settings, enhancing home and workplace security. Enabling employee authentication protocols for secure entry into workplaces.
- Public Safety: MFR plays a crucial role in safeguarding public safety in crowded spaces. Integrated seamlessly with surveillance systems, MFR empowers law enforcement with enhanced monitoring and rapid response capabilities. This technology aids in identifying suspects and missing persons involved in criminal investigations, proactively detects suspicious activity in public areas, swiftly pinpoints individuals involved in disturbances, and strengthens security measures at events and gatherings. MFR’s potential to enhance public safety and create a secure environment is undeniable.
- Healthcare: Ensuring secure access to medical facilities and patient records, along with verifying the identity of both patients and healthcare workers. Implementing contactless patient tracking to elevate healthcare services while simultaneously fortifying security and privacy within healthcare settings.
- Retail and Customer Service: Delivering tailored and efficient customer service by recognizing individuals, even when their faces are partially obscured. Additionally, optimizing payment processes to elevate the overall shopping experience.
- Human–Computer Interaction: Facilitating secure and personalized interactions with user-authenticated devices while also improving the user experience across a spectrum of applications, including smartphones, computers, and smart home devices.
- Workplace and Attendance Tracking: Facilitating contactless attendance tracking for employees in workplace settings, thereby reinforcing security measures to grant access exclusively to authorized individuals in designated areas.
- Education Institutions: Overseeing and securing entry points in educational institutions to safeguard the well-being of students and staff. Streamlining attendance tracking in classrooms and campus facilities for enhanced efficiency.
- An in-depth exploration of MFR, FMR, and FU within the framework of deep learning methodologies, highlighting the challenges inherent in identifying individuals with partially obscured facial features.
- A comprehensive exploration of evaluation metrics, benchmarking methods, and diverse applications of masked face recognition across security, healthcare, and human-computer interaction domains.
- A detailed analysis of critical datasets and preprocessing methodologies essential for training robust masked face recognition models.
- Tracing the evolutionary trajectory of face recognition within the deep learning paradigm, providing insights into the development of techniques tailored for identifying and verifying individuals under various degrees of facial occlusion.
2. Related Surveys
3. Masked Face Datasets
3.1. Real Mask Datasets
Dataset | Size | Identities | Access | Aim | Year |
---|---|---|---|---|---|
MAFA [19] | 30,811 | 3 | Public | FMR | 2017 |
MD (Kaggle) [20] | 853 | 3 | Public | FMR | 2019 |
RMFRD [17] | 5000/90,000 | 525 | Public | MFR | 2020 |
MFR2 [7] | 269 | 53 | Public | MFR | 2020 |
MASR-REC [21] | 11,615 | 1004 | Private | MFR | 2020 |
MFI [22] | 4916 | 669 | Private | MFR | 2020 |
MFV [22] | 400 | 200 | Private | MFR | 2020 |
MFDD [23] | 24,771 | 2 | Private | FMR | 2020 |
FMD (AIZOOTech) [24] | 7971 | 2 | Public | FMR | 2020 |
Moxa3K [25] | 3000 | 2 | Public | FMR | 2020 |
FMLD [26] | 41,934 | 3 | Public | FMR | 2021 |
Sunil’s custom dataset [27] | 7500 | 2 | Public | FMR | 2021 |
Jun’s practical dataset [28] | 4672 | 3 | Private | FMR | 2021 |
ISL-UFMD [29] | 21,316 | 3 | Public | FMR | 2021 |
PWMFD [30] | 9205 | 3 | Public | FMR | 2021 |
WMD [31] | 7804 | 1 | Public | FMR | 2021 |
WMC [31] | 38,145 | 2 | Public | FMR | 2021 |
COMASK20 [32] | 2754 | 300 | Public | MFR | 2022 |
MDMFR (FMD) [33] | 2896 | 226 | Public | MFR | 2022 |
MDMFR (MFR) [33] | 6006 | 2 | Public | FMR | 2022 |
TFM [34] | 107,598 | 2 | Private | FMR | 2022 |
BAFMD [35] | 6264 | 2 | Public | FMR | 2022 |
FMDD [18] | 300,988 | 75,247 | Private | MFR | - |
3.2. Synthetic Mask Datasets
4. Evaluation Metrics
- Accuracy is a fundamental evaluation metric utilized across various domains, including facial recognition tasks. It represents the proportion of correct predictions relative to the total number of samples and can be formally defined as illustrated in Equation (1).
- ERR (Error Rate) is a crucial metric utilized in diverse classification tasks, offering valuable insights into model accuracy by measuring misclassifications relative to dataset size. Unlike accuracy, ERR accounts for both false positives and false negatives, providing a comprehensive assessment of model performance. Its sensitivity to imbalanced data underscores its importance, making it an essential tool for evaluating classification accuracy. Mathematically, ERR is calculated by dividing the sum of false positive and false negative predictions by the total number of instances, as shown in Equation (2).
- Precision quantifies the proportion of accurate positive identifications among all the positive matches detected, and it can be formally expressed as depicted in Equation (3).
- Recall also known as sensitivity or true positive rate measures the proportion of true positive instances correctly identified by the system out of all actual positive instances. It is formally defined as shown in Equation (4).
- F1-Score is a pivotal evaluation metric, that represents the harmonic mean of precision and recall. This metric offers a balanced measure of the facial recognition model’s performance, accounting for both false positives and false negatives. Particularly valuable for imbalanced datasets, the F1-Score provides a comprehensive assessment of the model performance. Unlike accuracy, which may overlook certain types of errors, the F1-Score considers both false positives and false negatives, rendering it a more reliable indicator of a model’s effectiveness. Its calculation is demonstrated in Equation (5).
- ROC (Receiver Operating Characteristic) Curve ROC curves [52] graphically represent the trade-off between sensitivity (true positive rate) and specificity (true negative rate) across various threshold values. This visualization aids in selecting an optimal threshold that strikes a balance between true positive and false positive recognition rates. By examining the ROC curve, decision makers can effectively assess the performance of a classification model and make informed decisions about threshold selection.
- AUC (Area Under the Curve) is a pivotal evaluation metric in classification tasks, offering a comprehensive assessment of a model’s performance. It quantifies the discriminative power of the model across varying threshold values, providing insights into its ability to correctly classify positive and negative instances. A higher AUC value signifies stronger discrimination, indicating a superior model performance. Conversely, an AUC value of 0.5 suggests that the model’s predictive ability is no better than random chance. AUC is instrumental in gauging the effectiveness of classification models and is widely utilized in performance evaluation across diverse domains.
- Confusion Matrix provides a detailed breakdown of the model’s predictions, including true positives, true negatives, false positives, and false negatives. It serves as a basis for computing various evaluation metrics and identifying areas for improvement.
- FAR (False Acceptance Rate) serves as a focused gauge of security vulnerabilities, offering precise insights into the system’s efficacy in thwarting unauthorized access attempts. This pivotal metric plays a crucial role in evaluating the overall security effectiveness of biometric authentication systems, thereby guiding strategic endeavors aimed at bolstering the system reliability and mitigating security threats. Equation (6) delineates its formula, providing a quantifiable framework for assessing system performance.
- FRR (False Rejection Rate) is a crucial metric for evaluating system usability, representing the likelihood of the system inaccurately rejecting a legitimate identity match. Its assessment is integral to gauging the user-friendliness of the system, with a high FRR indicating diminished usability due to frequent denial of access to authorized individuals. Conversely, achieving a lower FRR is essential for improving user satisfaction and optimizing access procedures. The calculation of FRR is depicted in Equation (7).
- EER (Equal Error Rate) denotes the threshold on the ROC curve where the false acceptance rate (FAR) equals the false rejection rate (FRR), signifying the equilibrium point between false acceptance and false rejection rates. A lower EER signifies a superior performance in achieving a balance between these two error rates.
- Specificity also known as the true negative rate gauges the system’s proficiency in accurately recognizing negative instances. Specifically, it assesses the system’s capability to correctly identify individuals who are not the intended subjects. Mathematically, specificity is calculated using Equation (8). This metric offers valuable insights into the system’s performance in correctly classifying negatives, contributing to its overall effectiveness and reliability.
- Rank-N Accuracy is a widely employed metric in facial recognition tasks, that assesses the system’s capability to prioritize the correct match within the top N retrieved results. It quantifies the percentage of queries for which the correct match is positioned within the top-N-ranked candidates. In the Rank-N Identification Rate evaluation, the system’s output is considered accurate if the true identity of the input is within the top N identities listed by the system. For instance, in a Rank-1 assessment, the system is deemed correct if the true identity occupies the top spot. Conversely, in a Rank-5 evaluation, the system is considered accurate if the true identity is among the top 5 matches. A higher Rank-N Accuracy signifies superior performance in identifying the correct match among the retrieved candidates, providing valuable insights into the system’s efficacy in real-world scenarios. Mathematically, it is represented as depicted in Equation (9).
- Intersection over Union (IoU) quantifies the extent of spatial overlap between the predicted bounding box (P) and the ground truth bounding box (G). Its mathematical representation is shown in Equation (10).
- AP (Average Precision) serves as a crucial measure in assessing object detection systems. It provides insight into how effectively these systems perform across different confidence thresholds, by evaluating their precision–recall performance. AP computes the average precision across all recall values, indicating the model’s ability to accurately detect objects at varying confidence levels. This calculation involves integrating the precision–recall curve, as demonstrated in Equation (11). By considering the precision–recall trade-off comprehensively, AP offers a holistic evaluation of the detection method’s effectiveness.
- mAP (Mean Average Precision) enhances the notion of AP by aggregating the average precision values across multiple object classes. It offers a unified metric summarizing the overall performance of the object detection model across diverse object categories. Mathematically, mAP is calculated as the average of AP values for all classes, as illustrated in Equation (12).
- Dice Score also known as Dice Coefficient is a metric commonly used in image segmentation tasks to assess the similarity between two binary masks or segmentation maps. It quantifies the spatial overlap between the ground truth mask (A) and the predicted mask (B), providing a measure of segmentation accuracy. The Dice Score equation compares the intersection of A and B with their respective areas, as defined in Equation (13).
- PSNR (Peak Signal-to-Noise Ratio) is widely employed in image inpainting to assess the quality of image generation or reconstruction. It quantifies the level of noise or distortion by comparing the maximum possible pixel value to the mean squared error (MSE) between the original and reconstructed images, as depicted in Equation (13).
- SSIM (Structural Similarity Index Measure) The Structural Similarity Index Measure [53] evaluates the similarity between two images by considering their luminance, contrast, and structure. It provides a measure of perceptual similarity, accounting for both global and local image features. SSIM is calculated by comparing the luminance, contrast, and structure similarity indexes, as expressed in Equation (15).
- FID (Fréchet Inception Distance) [54] serves as a metric for assessing the likeness between two sets of images. It quantifies the disparity between feature representations of real and generated images within a learned high-dimensional space, typically modeled by a pre-trained neural network. A lower FID score denotes a higher degree of resemblance between the datasets. The calculation of FID involves the application of the Fréchet distance formula, as depicted in Equation (16).
- NIQE (Naturalness Image Quality Evaluator) [55] assesses the quality of an image based on natural scene statistics. It evaluates the level of distortions introduced during image acquisition or processing, providing a measure of image fidelity. NIQE computes the deviation of the image from the expected natural scene statistics, with higher scores indicating greater image distortion.
- BRISQUE (Blind/Referenceless Image Spatial Quality Evaluator) [56] is a no-reference image quality assessment metric. It evaluates the perceived quality of an image by analyzing its spatial domain features, such as local sharpness and contrast. BRISQUE computes a quality score based on the statistical properties of these features, with lower scores indicating higher image quality.
5. Masked Face Methods
5.1. Face Mask Recognition Approaches
5.1.1. Convolutional Neural Network
5.1.2. Multi-Stage Detection
Model | Year | Dataset | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|
Fine-Tuning of InceptionV3 [64] | 2020 | SMFD | 100% | 100% | - | - |
MobileNetV2 + SVM [65] | 2020 | Private Dataset | 97.11% | 95.08% | 94.84% | - |
Resnet50 + SVM + ensemble algorithm [66] | 2021 | RMFRD | 99.64% | - | - | - |
SMFD | 99.49% | - | - | - | ||
LFW | 100% | - | - | - | ||
Faster_RCNN + InceptionV2 + BLS [31] | 2021 | WMD Simple Scene | - | 96.46% | 98.20% | 97.32% |
WMD Complex Scene | - | 94.22% | 88.24% | 91.13% | ||
Max pooling [59] | 2021 | RMFRD + SMFD + Own Dataset | 98.67% | - | - | - |
Average pooling | 96.23% | - | - | - | ||
MobileNetV2 | 99.82% | - | - | - | ||
CNN [60] | 2021 | Kaggle Dataset | 98% | 98% | 97% | 98% |
ResNet50 + bounding box transformation [61] | 2021 | MAFA dataset Face Detection | - | 99.2% | 99% | - |
MAFA dataset Mask Detection | - | 98.92% | 98.24% | - | ||
RetinaFace + CNN(NASNetMobile)) [62] | 2021 | RMFRD + Larxel (Kaggle) | 99.23% | 98.28% | 100% | 99.13% |
RetinaFace + CNN(Dense Net121)) | 99.49% | 99.70% | 99.12% | 99.40% | ||
SSDMNV2 [67] | 2021 | Self-made Dataset of Masked Faces | 92.64% | - | - | 93% |
Fusion Transfer Learning [68] | 2022 | RMFRD and MAFA | 97.84% | - | 97.87% | 98.13% |
Customized CNN + Image Preprocessing Techniques [63] | 2023 | RILFD | 97.25% | 96.20% | 97.34% | 96.77% |
MAFA | 95.74 | - | 94.29% | - | ||
MOXA | 94.37% | - | 95.28% | - | ||
RMFRD | 99.63% | - | 99.69% | - | ||
SSD, ResNet-50, and DeepSiamese Neural Network [69] | 2023 | RMFRD + Larxel | 98.24% | - | - | - |
MobileNetV2 and Caffe-based SSD [70] | 2023 | Efficient Face Mask Dataset | 97.81% | - | - | 98% |
CMNV2 [71] | 2023 | The Prajna Bhandary | 99.64% | 100% | 99.28% | 99.64% |
Model | Year | Dataset | Accuracy | Precision | Recall | F1-Score | AP | mAP |
---|---|---|---|---|---|---|---|---|
YOLOv3 [27] | 2021 | MAFA and Wider Face | - | - | - | 55% | - | |
Faster R-CNN | - | - | - | - | 62% | - | ||
SE-YOLOv3 [30] | 2021 | PWMFD | - | - | - | - | 73.7% | - |
AP50 99.5% | - | |||||||
AP75 88.7% | - | |||||||
Improved YOLO-v4 (CSPDarkNet53) [72] | 2021 | RMFRD and Masked Face-Net | - | 93.6% | 97.9% | 95.7% | 84.7% | - |
YOLOV5 [73] | 2021 | Kaggle and MakeM | 96.5% | - | - | - | - | - |
Efficient-YOLOv3 [68] | 2022 | Face Mask Dataset | - | - | - | - | 98.18% | 96.03% |
FMDYolo [74] | 2022 | Kaggle (Face Mask Detection Dataset) | - | - | - | - | - | 66.4% |
VOC Mask | - | - | - | - | - | 57.5% | ||
YOLOv5s-CA [75] | 2023 | Kaggle + Created Dataset from YouTube | - | 95.9% | 92.3% | 94% | - | [email protected] 96.8% |
AI-Yolo [76] | 2023 | Kaggle (WMD-1) | - | - | - | 89.3% | - | 94.1% |
Kaggle (WMD-2) | - | - | - | 78.6% | - | 90.7% | ||
YOLOv8 [77] | 2023 | Face Mask Detection (FMD) | - | 95% | 95% | - | - | [email protected] 96% |
5.1.3. Single Shot Detector
5.1.4. Transfer Learning
5.1.5. YOLO (You Only Look Once)
5.2. Face Unmasking Approaches
Model | Year | Dataset | PSNR | SSIM | FID | NIQ | BRISQUE | MAE | ||
---|---|---|---|---|---|---|---|---|---|---|
Context Encoders [78] | 2016 | Paris StreetView | 18.58 dB | - | - | - | - | - | 9.37% | 1.96% |
GFC (M5 and Q5) [84] | 2017 | CelebA | 19.5 | 0.784 | - | - | - | - | - | - |
PConv (N/B) [81] | 2018 | Places2 | 18.21/19.04 | 0.468/0.484 | - | - | - | - | 6.45/5.72 | - |
FFII-GatedCon [80] | 2019 | Places2(rectangular mask) | - | - | - | - | - | - | 8.6% | 2.0% |
Places2(free-form mask) | - | - | - | - | - | - | 9.1% | 1.6% | ||
EdgeConnect [83] | 2019 | Places2 | 21.75 | 0.823 | 8.16 | - | - | - | 3.86 | - |
MRGAN [85] | 2019 | Synthetic dataset | 29.91 dB | 0.937 | - | 3.548 | 29.97 | - | - | - |
ERFGOFI (Mask) [86] | 2020 | CelebA and CelebA-HQ | 28.727 | 0.908 | - | 4.425 | 40.883 | - | - | - |
GUMF [3] | 2020 | CelebA | 26.19 dB | 0.864 | 3.548 | 5.42 | 37.85 | - | - | - |
R-MNet-0.4 [82] | 2021 | CelebA-HQ | 40.40 | 0.94 | 3.09 | - | - | 31.91 | - | - |
Paris Street View | 39.55 | 0.91 | 17.64 | - | - | 33.81 | - | - | ||
Places2 | 39.66 | 0.93 | 4.47 | - | - | 27.77 | - | - | ||
GANMasker [2] | 2023 | CelebA | 30.96 | 0.95 | 16.34 | 4.46 | 19.27 | - | - | - |
Model | Year | Dataset | Metric | Result |
---|---|---|---|---|
RePaint [87] | 2022 | CelebA-HQ | LPIPS (Half) | 0.165 |
LPIPS (Expand) | 0.435 | |||
ImageNet | LPIPS (Half) | 0.304 | ||
LPIPS (Expand) | 0.629 | |||
DDRM-CC(SR) [89] | 2022 | ImageNet | PSNR | 26.55 |
SSIM | 0.74 | |||
KID | 6.56 | |||
LNFEs | 20 | |||
DDNM [90] | 2022 | ImageNet | PSNR | 32.06 |
SSIM | 0.968 | |||
FID | 3.89 | |||
CelebA | PSNR | 35.64 | ||
SSIM | 0.982 | |||
FID | 4.54 | |||
COPAINT-TT [88] | 2023 | CelebA-HQ | LPIPS (Half) | 0.180 |
LPIPS (Expand) | 0.464 | |||
ImageNet | LPIPS (Half) | 0.294 | ||
LPIPS (Expand) | 0.636 |
5.3. Masked Face Recognition Approaches
5.3.1. Face Restoration
Model | Year | Dataset | Accuracy | mAP | Rank-1 |
---|---|---|---|---|---|
IAMGAN-DCR (VGGFace2) [21] | 2020 | MFSR-REC | 86.5% | 42.7% | 68.1% |
IAMGAN-DCR (CAISA-Webface) [21] | 82.3% | 37.5% | 67.4% | ||
LPD [22] | 2020 | MFV | 97.94% | - | - |
MFI | 94.34% | 49.08% | - | ||
Synthesized LFW | 95.70% | 75.92% | - | ||
LTM [92] | 2020 | LFW | 95.44% | - | - |
AR1 | 98.0% | - | - | ||
AR2 | 94.1% | - | - | ||
CA-MFR [94] | 2021 | Masked-Webface (Case 1) | 91.525 | - | - |
Masked-Webface (Case 2) | 86.853% | - | - | ||
Masked-Webface (Case 3) | 81.421% | - | - | ||
Masked-Webface (Case 4) | 92.612% | - | - | ||
Hariri [95] | 2022 | RMFRD | 91.3% | - | - |
SMFRD | 88.9% | - | - | ||
UNMaskedArea-MFR (Cosine Similarity) [96] | 2022 | Custom (Indonesian people) Dataset | 98.88% | - | - |
5.3.2. Masked Region Discarding
5.3.3. Deep Learning-Based Masked Face Approaches
6. Limitations of Existing Works
- Dependence on Synthetic Data: Many state-of-the-art MFR and FU models are trained on synthetic datasets that simulate the presence of masks. While these datasets are essential due to the scarcity of real-world masked face data, they often fail to capture the complex variations in mask types, lighting conditions, facial structures, and occlusions seen in practice. As a result, models trained on synthetic data tend to perform well in controlled environments but show significant performance degradation when applied to real-world settings, where facial occlusions may vary unpredictably.
- Lack of Robustness to Diverse Occlusions: Current FU and MFR methods struggle to generalize across a wide range of occlusions. For instance, while a model may perform reasonably well when dealing with standard medical masks, it may falter when confronted with different types of face coverings (e.g., scarves, transparent masks, or masks with varying shapes and patterns). This limitation restricts the versatility and scalability of existing models, as their performance heavily depends on the types of occlusions present in the training data.
- Weakness in Accurately Reconstructing Fine-Grained Features: Face unmasking models typically focus on restoring critical facial features such as the mouth, nose, and chin. However, accurately reconstructing fine-grained details like lip color, facial hair, skin texture, and irregularities (e.g., scars or spots) remains a challenge. These subtleties are crucial for applications requiring high fidelity, such as identification and authentication systems.
- Computational Complexity: Advanced techniques such as GAN-based (generative adversarial networks) or diffusion-based models often require substantial computational power. While these models excel in generating visually plausible face reconstructions, their resource-intensive nature makes real-time processing difficult, especially on edge devices or mobile platforms. This limits the applicability of these models in environments where computational resources are constrained, such as CCTV systems or on-device recognition systems.
- Generalization to Unseen Data: Transfer learning and domain adaptation techniques have been explored to enhance the generalization capabilities of MFR models. However, the effectiveness of these methods is limited when faced with real-world masked datasets that differ significantly from the training data. In many cases, the models are sensitive to variations in demographics, lighting, and pose, leading to a decreased performance when applied to diverse populations or environments outside of their training conditions.
7. Future Research Directions
- Real-World Dataset Availability: A major challenge for both MFR and FU is the limited availability of real-world masked face data. Future research should focus on developing methods for collecting and organizing diverse, high-quality masked face datasets that represent a wide range of mask types, facial features, and environmental conditions. These efforts will be essential to enhance the generalization capabilities of models beyond controlled, synthetic datasets.
- Synthetic Dataset Generation: Addressing the challenge of dataset scarcity, generating high-quality synthetic masked face datasets can provide a significant solution. Advanced techniques like GAN-based augmentation, domain adaptation, and multi-task learning can be employed to produce more realistic masked face images, closely simulating real-world conditions, and occlusions.
- Lightweight Models for Real-Time Applications: To reconcile computational demands with real-time performance needs, future research should prioritize the development of efficient models for MFR and FU tasks. This involves exploring lightweight architectures and hardware-aware optimizations capable of running effectively on edge devices, while still achieving high accuracy.
- Face Unmasking as a Preprocessing Step: Combining face unmasking with masked face recognition could enhance the robustness of recognition systems. By treating FU as a preprocessing step, models can achieve better accuracy in scenarios with varying degrees of facial occlusion. This approach would also help mitigate the challenges posed by inconsistent facial coverings.
- Cross-Disciplinary Integration: Finally, the integration of MFR, FMR, and FU with other biometric systems (such as voice recognition or gait analysis) could offer multi-modal solutions that enhance the reliability and accuracy of identity verification systems, especially in security-critical applications.
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
MFR | Masked face recognition |
FMR | Face mask recognition |
FU | Face unmasking |
GANs | Generative adversarial networks |
SSPS | Single sample per subject |
RMFRD | Real-World Masked Face Recognition Dataset |
MFSR | Masked Face Segmentation and Recognition Dataset |
MFV | Masked Face Verification |
MFI | Masked Face Identification |
MFDD | Masked Face Detection Dataset |
MDMFR | Mask Detection and Masked Facial Recognition Dataset |
FMLD | Face-Mask Label Dataset |
ISL-UFMD | Interactive Systems Labs Unconstrained Face Mask Dataset |
MD-Kaggle | Masked Dataset |
PWMFD | Properly Wearing Masked Face Detection Dataset |
WMD | Wearing Mask Detection |
WMC | Wearing Mask Classification |
BAFMD | Bias-Aware Face Mask Detection |
SMFRD | Simulated Mask Face Recognition Dataset |
CMFD | Correctly Masked Face Dataset |
IMFD | Incorrectly Masked Face Dataset |
LFW | Labeled Faces in the Wild |
IJB-C | IARPA Janus Benchmark-C |
CFP | Celebrities in Frontal-Profile |
ERR | Error rate |
ROC | Receiver Operating Characteristic Curve |
TP | True positive |
TN | True negative |
FP | False positive |
FN | False negative |
AUC | Area Under the Curve |
FAR | False acceptance rate |
FRR | False rejection rate |
EER | Equal Error Rate |
IoU | Intersection over Union |
AP | Average precision |
mAP | Mean Average Precision |
PSNR | Peak Signal-to-Noise Ratio |
MSE | Mean squared error |
SSIM | Structural Similarity Index Measure |
FID | Fréchet Inception Distance |
NIQE | Naturalness Image Quality Evaluator |
BRISQUE | Blind/Referenceless Image Spatial Quality Evaluator |
SOTA | State-of-the-art |
ML | Machine learning |
DL | Deep learning |
CNNs | Convolutional neural networks |
RRFMDS | Rapid Real-Time Face Mask Detection System |
SSD | Single Shot Multibox Detector |
SMFD | Simulated Masked Face Dataset |
SVM | Support Vector Machine |
K-NN | K-Nearest Neighbors |
BLS | Broad Learning System |
CMNV2 | Caffe-MobileNetV2 |
YOLO | You Only Look Once |
SE | Squeeze and Excitation |
SK | Selective Kernel |
FF | Feature Fusion |
CIoU | Complete Intersection over Union |
R-MNet | Reverse Masking Network |
MURF | Masked–unmasked region Fusion |
DDRM | Denoising Diffusion Restoration Models |
DDNM | Denoising Diffusion Null-Space Model |
BoF | Bag-of-features |
RBF | Radial Basis Function |
IAMGAN | Identity Aware Mask GAN |
DCR | Domain Constrained Ranking |
MTArcFace | Multi-Task ArcFace |
LBP | Local Binary Pattern |
CBAM | Convolutional Block Attention Module |
HMFD | HSTU Masked Face Dataset |
EUM | Embedding Unmasking Model |
SRT | Self-Restrained Triplet |
FCNN | Fully Connected Neural Network |
References
- Zhang, S.; Chi, C.; Lei, Z.; Li, S.Z. Refineface: Refinement neural network for high performance face detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 4008–4020. [Google Scholar] [CrossRef]
- Mahmoud, M.; Kang, H.S. GANMasker: A Two-Stage Generative Adversarial Network for High-Quality Face Mask Removal. Sensors 2023, 23, 7094. [Google Scholar] [CrossRef]
- Din, N.U.; Javed, K.; Bae, S.; Yi, J. A novel GAN-based network for unmasking of masked face. IEEE Access 2020, 8, 44276–44287. [Google Scholar] [CrossRef]
- Yi, D.; Lei, Z.; Liao, S.; Li, S.Z. Learning face representation from scratch. arXiv 2014, arXiv:1411.7923. [Google Scholar]
- Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep Learning Face Attributes in the Wild. In Proceedings of the International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Huang, G.B.; Mattar, M.; Berg, T.; Learned-Miller, E. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Proceedings of the Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition, Marseille, France, 7–20 October 2008. [Google Scholar]
- Anwar, A.; Raychowdhury, A. Masked face recognition for secure authentication. arXiv 2020, arXiv:2008.11104. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Lahasan, B.; Lutfi, S.L.; San-Segundo, R. A survey on techniques to handle face recognition challenges: Occlusion, single sample per subject and expression. Artif. Intell. Rev. 2019, 52, 949–979. [Google Scholar] [CrossRef]
- Zhang, Z.; Ji, X.; Cui, X.; Ma, J. A survey on occluded face recognition. In Proceedings of the 2020 9th International Conference on Networks, Communication and Computing, Tokyo, Japan, 18–20 December 2020; pp. 40–49. [Google Scholar]
- Zeng, D.; Veldhuis, R.; Spreeuwers, L. A survey of face recognition techniques under occlusion. IET Biom. 2021, 10, 581–606. [Google Scholar] [CrossRef]
- Alzu’bi, A.; Albalas, F.; Al-Hadhrami, T.; Younis, L.B.; Bashayreh, A. Masked face recognition using deep learning: A review. Electronics 2021, 10, 2666. [Google Scholar] [CrossRef]
- Wang, B.; Zheng, J.; Chen, C.P. A survey on masked facial detection methods and datasets for fighting against COVID-19. IEEE Trans. Artif. Intell. 2021, 3, 323–343. [Google Scholar] [CrossRef]
- Nowrin, A.; Afroz, S.; Rahman, M.S.; Mahmud, I.; Cho, Y.Z. Comprehensive review on facemask detection techniques in the context of COVID-19. IEEE Access 2021, 9, 106839–106864. [Google Scholar] [CrossRef]
- Wang, Z.; Wang, G.; Huang, B.; Xiong, Z.; Hong, Q.; Wu, H.; Yi, P.; Jiang, K.; Wang, N.; Pei, Y.; et al. Masked Face Recognition Dataset and Application. arXiv 2020, arXiv:2003.09093. [Google Scholar]
- Pro, T.D. Face Masks Detection Dataset. Dataset. Available online: https://trainingdata.pro/data-market/face-masks-detection/#header-form (accessed on 25 January 2024).
- Ge, S.; Li, J.; Ye, Q.; Luo, Z. Detecting masked faces in the wild with lle-cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2682–2690. [Google Scholar]
- MVD, A. Mask Dataset. Kaggle Dataset. 2020. Available online: https://www.kaggle.com/datasets/andrewmvd/face-mask-detection/data (accessed on 10 February 2024).
- Geng, M.; Peng, P.; Huang, Y.; Tian, Y. Masked face recognition with generative data augmentation and domain constrained ranking. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 2246–2254. [Google Scholar]
- Ding, F.; Peng, P.; Huang, Y.; Geng, M.; Tian, Y. Masked face recognition with latent part detection. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 2281–2289. [Google Scholar]
- Wang, Z.; Huang, B.; Wang, G.; Yi, P.; Jiang, K. Masked Face Recognition Dataset and Application. IEEE Trans. Biom. Behav. Identity Sci. 2023, 5, 298–304. [Google Scholar] [CrossRef]
- AIZOOTech. AIZOOTech-FaceMasksDetection. GitHub Repository. 2021. Available online: https://github.com/AIZOOTech/FaceMaskDetection/tree/master (accessed on 8 February 2024).
- Roy, B.; Nandy, S.; Ghosh, D.; Dutta, D.; Biswas, P.; Das, T. MOXA: A deep learning based unmanned approach for real-time monitoring of people wearing medical masks. Trans. Indian Natl. Acad. Eng. 2020, 5, 509–518. [Google Scholar] [CrossRef]
- Batagelj, B.; Peer, P.; Štruc, V.; Dobrišek, S. How to Correctly Detect Face-Masks for COVID-19 from Visual Information? Appl. Sci. 2021, 11, 2070. [Google Scholar] [CrossRef]
- Singh, S.; Ahuja, U.; Kumar, M.; Kumar, K.; Sachdeva, M. Face mask detection using YOLOv3 and faster R-CNN models: COVID-19 environment. Multimed. Tools Appl. 2021, 80, 19753–19768. [Google Scholar] [CrossRef]
- Zhang, J.; Han, F.; Chun, Y.; Chen, W. A novel detection framework about conditions of wearing face mask for helping control the spread of COVID-19. IEEE Access 2021, 9, 42975–42984. [Google Scholar] [CrossRef]
- Eyiokur, F.I.; Ekenel, H.K.; Waibel, A. Unconstrained face mask and face-hand interaction datasets: Building a computer vision system to help prevent the transmission of COVID-19. Signal Image Video Process. 2022, 17, 1027–1034. [Google Scholar] [CrossRef] [PubMed]
- Jiang, X.; Gao, T.; Zhu, Z.; Zhao, Y. Real-time face mask detection method based on YOLOv3. Electronics 2021, 10, 837. [Google Scholar] [CrossRef]
- Wang, B.; Zhao, Y.; Chen, C.P. Hybrid transfer learning and broad learning system for wearing mask detection in the COVID-19 era. IEEE Trans. Instrum. Meas. 2021, 70, 1–12. [Google Scholar] [CrossRef]
- Vu, H.N.; Nguyen, M.H.; Pham, C. Masked face recognition with convolutional neural networks and local binary patterns. Appl. Intell. 2022, 52, 5497–5512. [Google Scholar] [CrossRef]
- Ullah, N.; Javed, A.; Ghazanfar, M.A.; Alsufyani, A.; Bourouis, S. A novel DeepMaskNet model for face mask detection and masked facial recognition. J. King Saud-Univ.-Comput. Inf. Sci. 2022, 34, 9905–9914. [Google Scholar] [CrossRef] [PubMed]
- Benitez-Garcia, G.; Takahashi, H.; Jimenez-Martinez, M.; Olivares-Mercado, J. TFM a Dataset for Detection and Recognition of Masked Faces in the Wild. In Proceedings of the 4th ACM International Conference on Multimedia in Asia, Tokyo, Japan, 13–16 December 2022; pp. 1–7. [Google Scholar]
- Kantarcı, A.; Ofli, F.; Imran, M.; Ekenel, H.K. Bias-Aware Face Mask Detection Dataset. arXiv 2022, arXiv:2211.01207. [Google Scholar]
- Yang, S.; Luo, P.; Loy, C.C.; Tang, X. Wider face: A face detection benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5525–5533. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
- Sengupta, S.; Chen, J.C.; Castillo, C.; Patel, V.M.; Chellappa, R.; Jacobs, D.W. Frontal to profile face verification in the wild. In Proceedings of the 2016 IEEE winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–9. [Google Scholar]
- Moschoglou, S.; Papaioannou, A.; Sagonas, C.; Deng, J.; Kotsia, I.; Zafeiriou, S. Agedb: The first manually collected, in-the-wild age database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 51–59. [Google Scholar]
- Maze, B.; Adams, J.; Duncan, J.A.; Kalka, N.; Miller, T.; Otto, C.; Jain, A.K.; Niggel, W.T.; Anderson, J.; Cheney, J.; et al. Iarpa janus benchmark-c: Face dataset and protocol. In Proceedings of the 2018 International Conference on Biometrics (ICB), Gold Coast, QLD, Australia, 20–23 February 2018; pp. 158–165. [Google Scholar]
- Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
- Cao, Q.; Shen, L.; Xie, W.; Parkhi, O.M.; Zisserman, A. Vggface2: A dataset for recognising faces across pose and age. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 67–74. [Google Scholar]
- SB, P. SMFD-GitHub. GitHub Repository. 2020. Available online: https://github.com/prajnasb/observations/tree/master/experiements/data (accessed on 8 February 2024).
- Gurav, O. FMDD-kaggle. Kaggle Dataset. 2020. Available online: https://www.kaggle.com/datasets/omkargurav/face-mask-dataset (accessed on 8 February 2024).
- Deng, H.; Feng, Z.; Qian, G.; Lv, X.; Li, H.; Li, G. MFCosface: A masked-face recognition algorithm based on large margin cosine loss. Appl. Sci. 2021, 11, 7310. [Google Scholar] [CrossRef]
- Cabani, A.; Hammoudi, K.; Benhabiles, H.; Melkemi, M. MaskedFace-Net–A dataset of correctly/incorrectly masked face images in the context of COVID-19. Smart Health 2021, 19, 100144. [Google Scholar] [CrossRef] [PubMed]
- Boutros, F.; Damer, N.; Kirchbuchner, F.; Kuijper, A. Self-restrained triplet loss for accurate masked face recognition. Pattern Recognit. 2022, 124, 108473. [Google Scholar] [CrossRef] [PubMed]
- Pann, V.; Lee, H.J. Effective attention-based mechanism for masked face recognition. Appl. Sci. 2022, 12, 5590. [Google Scholar] [CrossRef]
- Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4690–4699. [Google Scholar]
- King, D.E. Dlib-ml: A machine learning toolkit. J. Mach. Learn. Res. 2009, 10, 1755–1758. [Google Scholar]
- Xiong, Z.; Wang, Z.; Du, C.; Zhu, R.; Xiao, J.; Lu, T. An asian face dataset and how race influences face recognition. In Proceedings of the Advances in Multimedia Information Processing–PCM 2018: 19th Pacific-Rim Conference on Multimedia, Hefei, China, 21–22 September 2018; pp. 372–383. [Google Scholar]
- Zweig, M.H.; Campbell, G. Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine. Clin. Chem. 1993, 39, 561–577. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30, 6629–6640. [Google Scholar]
- Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
- Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [CrossRef] [PubMed]
- Habib, S.; Alsanea, M.; Aloraini, M.; Al-Rawashdeh, H.S.; Islam, M.; Khan, S. An efficient and effective deep learning-based model for real-time face mask detection. Sensors 2022, 22, 2602. [Google Scholar] [CrossRef] [PubMed]
- Boulila, W.; Alzahem, A.; Almoudi, A.; Afifi, M.; Alturki, I.; Driss, M. A deep learning-based approach for real-time facemask detection. In Proceedings of the 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), Pasadena, CA, USA, 13–16 December 2021; pp. 1478–1481. [Google Scholar]
- Shamrat, F.J.M.; Chakraborty, S.; Billah, M.M.; Al Jubair, M.; Islam, M.S.; Ranjan, R. Face Mask Detection using Convolutional Neural Network (CNN) to reduce the spread of COVID-19. In Proceedings of the 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 3–5 June 2021; pp. 1231–1237. [Google Scholar]
- Goyal, H.; Sidana, K.; Singh, C.; Jain, A.; Jindal, S. A real time face mask detection system using convolutional neural network. Multimed. Tools Appl. 2022, 81, 14999–15015. [Google Scholar] [CrossRef] [PubMed]
- Sethi, S.; Kathuria, M.; Kaushik, T. Face mask detection using deep learning: An approach to reduce risk of Coronavirus spread. J. Biomed. Inform. 2021, 120, 103848. [Google Scholar] [CrossRef]
- Chavda, A.; Dsouza, J.; Badgujar, S.; Damani, A. Multi-stage CNN architecture for face mask detection. In Proceedings of the 2021 6th International Conference for Convergence in Technology (i2ct), Maharashtra, India, 2–4 April 2021; pp. 1–8. [Google Scholar]
- Umer, M.; Sadiq, S.; Alhebshi, R.M.; Alsubai, S.; Al Hejaili, A.; Nappi, M.; Ashraf, I. Face mask detection using deep convolutional neural network and multi-stage image processing. Image Vis. Comput. 2023, 133, 104657. [Google Scholar] [CrossRef]
- Jignesh Chowdary, G.; Punn, N.S.; Sonbhadra, S.K.; Agarwal, S. Face mask detection using transfer learning of inceptionv3. In Proceedings of the Big Data Analytics: 8th International Conference, BDA 2020, Sonepat, India, 15–18 December 2020; pp. 81–90. [Google Scholar]
- Oumina, A.; El Makhfi, N.; Hamdi, M. Control the COVID-19 pandemic: Face mask detection using transfer learning. In Proceedings of the 2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS), Kenitra, Morocco, 2–3 December 2020; pp. 1–5. [Google Scholar]
- Loey, M.; Manogaran, G.; Taha, M.H.N.; Khalifa, N.E.M. A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic. Measurement 2021, 167, 108288. [Google Scholar] [CrossRef]
- Nagrath, P.; Jain, R.; Madan, A.; Arora, R.; Kataria, P.; Hemanth, J. SSDMNV2: A real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2. Sustain. Cities Soc. 2021, 66, 102692. [Google Scholar] [CrossRef]
- Su, X.; Gao, M.; Ren, J.; Li, Y.; Dong, M.; Liu, X. Face mask detection and classification via deep transfer learning. Multimed. Tools Appl. 2022, 81, 4475–4494. [Google Scholar] [CrossRef]
- Vignesh Baalaji, S.; Sandhya, S.; Sajidha, S.; Nisha, V.; Vimalapriya, M.; Tyagi, A.K. Autonomous face mask detection using single shot multibox detector, and ResNet-50 with identity retrieval through face matching using deep siamese neural network. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 11195–11205. [Google Scholar] [CrossRef] [PubMed]
- Sheikh, B.u.h.; Zafar, A. RRFMDS: Rapid real-time face mask detection system for effective COVID-19 monitoring. SN Comput. Sci. 2023, 4, 288. [Google Scholar] [CrossRef] [PubMed]
- Kumar, B.A.; Bansal, M. Face mask detection on photo and real-time video images using Caffe-MobileNetV2 transfer learning. Appl. Sci. 2023, 13, 935. [Google Scholar] [CrossRef]
- Yu, J.; Zhang, W. Face mask wearing detection algorithm based on improved YOLO-v4. Sensors 2021, 21, 3263. [Google Scholar] [CrossRef] [PubMed]
- Ieamsaard, J.; Charoensook, S.N.; Yammen, S. Deep learning-based face mask detection using yolov5. In Proceedings of the 2021 9th International Electrical Engineering Congress (iEECON), Pattaya, Thailand, 10–12 March 2021; pp. 428–431. [Google Scholar]
- Wu, P.; Li, H.; Zeng, N.; Li, F. FMD-Yolo: An efficient face mask detection method for COVID-19 prevention and control in public. Image Vis. Comput. 2022, 117, 104341. [Google Scholar] [CrossRef]
- Pham, T.N.; Nguyen, V.H.; Huh, J.H. Integration of improved YOLOv5 for face mask detector and auto-labeling to generate dataset for fighting against COVID-19. J. Supercomput. 2023, 79, 8966–8992. [Google Scholar] [CrossRef]
- Zhang, H.; Tang, J.; Wu, P.; Li, H.; Zeng, N. A novel attention-based enhancement framework for face mask detection in complicated scenarios. Signal Process. Image Commun. 2023, 116, 116985. [Google Scholar] [CrossRef]
- Tamang, S.; Sen, B.; Pradhan, A.; Sharma, K.; Singh, V.K. Enhancing COVID-19 safety: Exploring yolov8 object detection for accurate face mask classification. Int. J. Intell. Syst. Appl. Eng. 2023, 11, 892–897. [Google Scholar]
- Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2536–2544. [Google Scholar]
- Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. Acm Trans. Graph. (Tog) 2017, 36, 1–14. [Google Scholar] [CrossRef]
- Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T.S. Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4471–4480. [Google Scholar]
- Liu, G.; Reda, F.A.; Shih, K.J.; Wang, T.C.; Tao, A.; Catanzaro, B. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 85–100. [Google Scholar]
- Jam, J.; Kendrick, C.; Drouard, V.; Walker, K.; Hsu, G.S.; Yap, M.H. R-mnet: A perceptual adversarial network for image inpainting. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 2714–2723. [Google Scholar]
- Nazeri, K.; Ng, E.; Joseph, T.; Qureshi, F.Z.; Ebrahimi, M. Edgeconnect: Generative image inpainting with adversarial edge learning. arXiv 2019, arXiv:1901.00212. [Google Scholar]
- Li, Y.; Liu, S.; Yang, J.; Yang, M.H. Generative face completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3911–3919. [Google Scholar]
- Khan, M.K.J.; Ud Din, N.; Bae, S.; Yi, J. Interactive removal of microphone object in facial images. Electronics 2019, 8, 1115. [Google Scholar] [CrossRef]
- Din, N.U.; Javed, K.; Bae, S.; Yi, J. Effective removal of user-selected foreground object from facial images using a novel GAN-based network. IEEE Access 2020, 8, 109648–109661. [Google Scholar] [CrossRef]
- Lugmayr, A.; Danelljan, M.; Romero, A.; Yu, F.; Timofte, R.; Van Gool, L. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11461–11471. [Google Scholar]
- Zhang, G.; Ji, J.; Zhang, Y.; Yu, M.; Jaakkola, T.S.; Chang, S. Towards coherent image inpainting using denoising diffusion implicit models. In Proceedings of the 40 th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
- Kawar, B.; Elad, M.; Ermon, S.; Song, J. Denoising diffusion restoration models. Adv. Neural Inf. Process. Syst. 2022, 35, 23593–23606. [Google Scholar]
- Wang, Y.; Yu, J.; Zhang, J. Zero-shot image restoration using denoising diffusion null-space model. arXiv 2022, arXiv:2212.00490. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Li, C.; Ge, S.; Zhang, D.; Li, J. Look through masks: Towards masked face recognition with de-occlusion distillation. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 3016–3024. [Google Scholar] [CrossRef]
- Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T.S. Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5505–5514. [Google Scholar]
- Li, Y.; Guo, K.; Lu, Y.; Liu, L. Cropping and attention based approach for masked face recognition. Appl. Intell. 2021, 51, 3012–3025. [Google Scholar] [CrossRef]
- Hariri, W. Efficient masked face recognition method during the COVID-19 pandemic. Signal Image Video Process. 2022, 16, 605–612. [Google Scholar] [CrossRef] [PubMed]
- Firdaus, F.; Munir, R. Masked face recognition using deep learning based on unmasked area. In Proceedings of the 2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T), Raipur, India, 1–3 March 2022; pp. 1–6. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and PATTERN Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Passalis, N.; Tefas, A. Learning bag-of-features pooling for deep convolutional neural networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5755–5763. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Aswal, V.; Tupe, O.; Shaikh, S.; Charniya, N.N. Single camera masked face identification. In Proceedings of the 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 14–17 December 2020; pp. 57–60. [Google Scholar]
- Wu, G. Masked face recognition algorithm for a contactless distribution cabinet. Math. Probl. Eng. 2021, 2021, 5591020. [Google Scholar] [CrossRef]
- Kumar, M.; Mann, R. Masked face recognition using deep learning model. In Proceedings of the 2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), Greater Noida, India, 17–18 December 2021; pp. 428–432. [Google Scholar]
- Montero, D.; Nieto, M.; Leskovsky, P.; Aginako, N. Boosting masked face recognition with multi-task arcface. In Proceedings of the 2022 16th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Dijon, France, 19–21 October 2022; pp. 184–189. [Google Scholar]
- Golwalkar, R.; Mehendale, N. Masked-face recognition using deep metric learning and FaceMaskNet-21. Appl. Intell. 2022, 52, 13268–13279. [Google Scholar] [CrossRef] [PubMed]
- Kocacinar, B.; Tas, B.; Akbulut, F.P.; Catal, C.; Mishra, D. A real-time cnn-based lightweight mobile masked face recognition system. IEEE Access 2022, 10, 63496–63507. [Google Scholar] [CrossRef]
- Thanathamathee, P.; Sawangarreerak, S.; Kongkla, P.; Nizam, D.N.M. An Optimized Machine Learning and Deep Learning Framework for Facial and Masked Facial Recognition. Emerg. Sci. J. 2023, 7, 1173–1187. [Google Scholar] [CrossRef]
- Faruque, M.O.; Islam, M.R.; Islam, M.T. Advanced Masked Face Recognition using Robust and Light Weight Deep Learning Model. Int. J. Comput. Appl. 2024, 975, 8887. [Google Scholar]
Dataset | Size | Identities | Access | Aim | Year |
---|---|---|---|---|---|
LFW [6] | 13,233 | 5749 | Public | MFR | 2008 |
CASIA-WebFace [4] | 494,414 | 10,575 | Public | MFR | 2014 |
CelebA [5] | +200,000 | 10,000 | Public | FMR and FU | 2015 |
CFP [38] | 7000 | 500 | Public | MFR | 2016 |
AgeDB [39] | 16,488 | 568 | Public | MFR | 2017 |
IJB-C [40] | 148,876 | 3531 | Public | MFR | 2018 |
CelebA-HQ [41] | 30,000 | - | Public | FMR and FU | 2018 |
VGGFace2 [42] | 3.31 M | 9131 | Public | MFR | 2018 |
SMFRD [23] | 536,721 | 16,817 | Public | MFR | 2020 |
LFW-SM [7] | 64,973 | 5749 | Public | MFR | 2020 |
VGGFace2-mini-SM [7] | 697,084 | 8631 | Public | MFR | 2020 |
SMFD(GitHub) [43] | 1376 | 2 | Public | FMR | 2020 |
FMDD(Kaggle) [44] | 7553 | 2 | Public | FMR | 2020 |
PS-CelebA [3] | 10,000 | - | Private | FU | 2020 |
VGGFace2_m [45] | 666,800 | 8335 | Public | MFR | 2021 |
LFW_m [45] | 26,466 | 85,749 | Public | MFR | 2021 |
CF_m [45] | 5000 | 500 | Public | MFR | 2021 |
MaskedFace-Net [46] | 137,016 | 2 | Public | FMR | 2022 |
MS1MV2-Masked [47] | 5.374 M | 85,000 | Public | MFR | 2022 |
CASIA-WebFace_m [48] | 789,296 | 10,575 | Public | MFR | 2022 |
Synthetic-CelebA [2] | 30,000 | - | Private | FU | 2023 |
Model | Year | Dataset | Accuracy | Precision | Recall | F1-Score | EER% |
---|---|---|---|---|---|---|---|
YOLOv3 [102] | 2020 | Custom dataset | 93% | - | - | - | - |
YOLO-face + VGGFace2 | 96.8% | - | - | - | - | ||
RetinaFace + VGGFace2 | 94.5% | - | - | - | - | ||
MFCosface [45] | 2021 | LFW_m | 99.33% | - | - | - | - |
CF_m | 97.03% | - | - | - | - | ||
MFR2 | 98.50% | - | - | - | - | ||
RMFRD | 92.15% | - | - | - | - | ||
MFR-CDC [103] | 2021 | SMFRD | 95.31% | - | - | - | - |
RMFRD | 95.22% | - | - | - | - | ||
RggNet [104] | 2021 | Custom dataset | 60.8% | 77.7% | 51.9% | - | - |
MFCosface [45] | 2021 | LFW_m | 99.33% | - | - | - | - |
CF_m | 97.03% | - | - | - | - | ||
MFR2 | 98.50% | - | - | - | - | ||
RMFRD | 92.15% | - | - | - | - | ||
MTArcFace [105] | 2022 | Masked-LFW | 98.92% | - | - | - | - |
Masked-CFP_FF | 98.33% | - | - | - | - | ||
Masked-CFP_FP | 88.43% | - | - | - | - | ||
Masked-AGEDB_30 | 93.17% | - | - | - | - | ||
MFR2 | 99.41% | - | - | - | - | ||
Deepmasknet [33] | 2022 | MDMFR | 93.33% | 93.00% | 94.50% | 93.74% | - |
MFR-CNNandLBP [32] | 2022 | COMASK20 | - | 87% | 87% | 87% | - |
Essex dataset | - | 99% | 97% | 98% | - | ||
MFR-DML and FaceMaskNet-21 [106] | 2022 | User dataset | 88.92% | - | - | - | - |
RMFRD | 82.22% | - | - | - | - | ||
User dataset | 88.186% | - | - | - | - | ||
Att-Based-MFR (CASIA-Webface_m) [48] | 2022 | LFW_m | 99.43% | 99.30% | 99.56% | 99.43% | - |
AgeDB-30_m | 95.86% | 93.83% | 97.82% | 95.78% | - | ||
CFP-FP_m | 97.74% | 96.77% | 98.69% | 97.72% | - | ||
MFR2 | 96.75% | 96.25% | 97.22% | 96.73% | - | ||
Att-Based-MFR (VGG-Face2_m) [48] | 2022 | LFW_m | 99.41% | 99.26% | 99.56% | 99.40% | - |
AgeDB-30_m | 95.38% | 93.10% | 98.11% | 95.53% | - | ||
CFP-FP_m | 96.98% | 96.17% | 98.40% | 97.27% | - | ||
MFR2 | 99.00% | 99.50% | 98.45% | 99.02% | - | ||
Fine-Tuned MobileNet [107] | 2022 | MadFaRe (12 subjects) | 78.41% | - | - | - | - |
ResNet-100-MR-MP(SRT) [47] | 2022 | MFR | - | - | - | - | 0.8270 |
MRF2 | - | - | - | - | 3.4416 | ||
LFW | - | - | - | - | 0.9667 | ||
IJB-C | - | - | - | - | 2.9197 | ||
ResNet-50-MR-MP(SRT) [47] | 2022 | MFR | - | - | - | - | 1.1207 |
MRF2 | - | - | - | - | 6.2578 | ||
LFW | - | - | - | - | 1.2333 | ||
IJB-C | - | - | - | - | 3.0833 | ||
MobileFaceNet-MR-MP(SRT) [47] | 2022 | MFR | - | - | - | - | 3.1866 |
MRF2 | - | - | - | - | 7.8232 | ||
LFW | - | - | - | - | 2.2667 | ||
IJB-C | - | - | - | - | 4.6837 | ||
FaceNet + optimized SVM [108] | 2023 | CASIA + LWF + user dataset | 99.912% | - | - | - | - |
Lightweight CNN [109] | 2024 | HMFD (frontal image) | 98.00% | 98.00% | 97.00% | 98.00% | - |
HMFD (lateral image) | 79.00% | 83.00% | 80.00% | 79.00% | - |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mahmoud, M.; Kasem, M.S.; Kang, H.-S. A Comprehensive Survey of Masked Faces: Recognition, Detection, and Unmasking. Appl. Sci. 2024, 14, 8781. https://doi.org/10.3390/app14198781
Mahmoud M, Kasem MS, Kang H-S. A Comprehensive Survey of Masked Faces: Recognition, Detection, and Unmasking. Applied Sciences. 2024; 14(19):8781. https://doi.org/10.3390/app14198781
Chicago/Turabian StyleMahmoud, Mohamed, Mahmoud SalahEldin Kasem, and Hyun-Soo Kang. 2024. "A Comprehensive Survey of Masked Faces: Recognition, Detection, and Unmasking" Applied Sciences 14, no. 19: 8781. https://doi.org/10.3390/app14198781