Abstract
Skin malignant melanoma is a high-risk tumor with low incidence but high mortality rates. Early detection and treatment are crucial for a cure. Machine learning studies have focused on classifying melanoma tumors, but these methods are cumbersome and fail to extract deeper features. This limits their ability to distinguish subtle variations in skin lesions accurately, hindering effective early diagnosis. The study introduces a deep learning-based network specifically designed for skin lesion detection to enhance data in the melanoma dataset. It leverages a novel FCDS-CNN architecture to address class-imbalanced problems and improve data quality. Specifically, FCDS-CNN incorporates data augmentation and class weighting techniques to mitigate the impact of imbalanced classes. It also presents a practical, large-scale solution that allows seamless, real-world incorporation to support dermatologists in their early screening processes. The proposed robust model incorporates data augmentation and class weighting to improve performance across all lesions. The proposed dataset includes 10015 images of seven classes of skin lesions available in Kaggle. To overcome the dominance of one class over the other, methods like data augmentation and class weighting are used. The FCDS-CNN showed improved accuracy with an average accuracy of 96%, outperforming pre-trained models such as ResNet, EfficientNet, Inception, and MobileNet in the precision, recall, F1-score, and area under the curve parameters. These pre-trained models are more effective for general image classification and struggle with the nuanced features and class imbalances inherent in medical image datasets. The FCDS-CNN demonstrated practical effectiveness by outperforming the compared pre-trained model based on distinct parameters. This work is a testament to the importance of specificity in medical image analysis regarding skin cancer detection.
Similar content being viewed by others
Introduction
The most common forms of cancer in the world are pores and skin cancers, and the incidence is on the soar in no time every year1. It takes place when a few irregular skin cells escape management. This can result from the high levels of ultraviolet (UV) radiation from the sun or artificial sources such as tanning beds2. It is of cutting-edge note to observe that melanoma, basal cellular carcinoma (BCC), squamous cellular carcinoma (SCC), and other much less frequent types constitute the primary classes of pores and skin cancer3. Because it results in most skin cancer deaths, melanoma is the most critical of these three sorts. However, it’s the least frequent type compared to BCC or SCC. It is never too early to get discovered and should be emphasized rigorously as it doubles the chances of treatment and can 8842 reduce the mortality rate4.
The way through which skin most cancers are found has developed into total lots throughout the years. Earlier, medical practitioners primarily relied on an ophthalmological examination to arrive at a diagnosis and then outline a program for establishing a histopathology confirmation. Although effective, this approach depends closely on the doctor’s ability and may not diagnose cancers in their preliminary phases5. In the early days, people used naked eyes and dermoscopy to identify cancer cases, though other advanced forms of identification existed. These techniques have been helpful but mostly conducted qualitatively and are often unsatisfactory. In the instead delayed twentieth century, digital dermoscopy emerged, which became a boon in terms of generation as it was, for the first time, feasible to zoom in on pores and skin, most cancers, and different formations. This method enhanced the prognosis’s accuracy by portraying structures and styles that the physical eye could not see. This makes it possible to distinguish between benign and malignant tumors6.
The most common form of cancer in the world is skin cancer, and the incidence is on the soar in no time every year. It takes place when a few irregular skin cells escape management. Skin cancer has always been among the hardest forms of cancer to diagnose in its preliminary stages; hence, it requires timely detection.7 Conventional diagnostic tools involve dermatologists, and even though these approaches use human input, they often lead to ambiguous results since they depend on the doctor’s judgment8. This can result from high sun ultraviolet (UV) radiation or artificial sources such as tanning beds. It is of cutting-edge note to observe that melanoma, basal cellular carcinoma (BCC), squamous cellular carcinoma (SCC), and other much less frequent types constitute the primary classes of pores and skin cancer. Because it results in most skin cancer deaths, melanoma is the most critical of these three sorts. However, it’s the least frequent type compared to BCC or SCC. It is never too early to get discovered and should be emphasized rigorously as it doubles the chances of treatment and can 8842 reduce the mortality rate.
The incidence of skin cancer has been on the rise globally, thereby necessitating new diagnostic approaches that can identify skin lesions at an early stage.9 The following explains aspects concerning the linkage of biomedical imaging and diagnostics with melanoma skin cancer diagnosis by analyzing body abnormalities. Melanoma is the fifth most common skin cancer lesion noted around the world and would be described as the skin cancer with the highest risk because it can spread to other body organs10. The way through which skin most cancers are found has developed into full lots throughout the years. Earlier, medical practitioners largely relied on an ophthalmological examination to arrive at a diagnosis and then outline a program for establishing a histopathology confirmation. Although effective, this approach relies closely on the doctor’s ability and may not diagnose cancers in their preliminary phases. In the early days, people used naked eyes and dermoscopy to identify cancer cases, though other advanced forms of identification existed. These techniques have been useful but primarily conducted qualitatively and are often unsatisfactory11. In the rather delayed twentieth century, digital dermoscopy emerged, which became a boon in terms of generation as it was, for the first time, feasible to zoom in on pores and skin, most cancers, and different formations. This method enhanced the accuracy of the prognosis by portraying structures and styles that the mere physical eye could not see. This makes it possible to distinguish between benign and malignant tumors8.
However, skin cancer remains a major public health issue, and it is seen to continue to be one as it has not been eradicated even with all the advancements in technology12. The World Health Organization, it was estimated that there are between 2 and 3 million incidences of Non-Melanoma Skin Cancer and 132000 incidences of Melanoma Skin Cancer in the world annually. According to the report, more than 5 million Americans receive first-degree skin cancer every year, which proves the high demand for better diagnostic methods. It underscores the need to detect the virus early, and more so via new methods that would provide a precise diagnosis10.
Dermoscopic imaging, in combination with CAD systems, which has been adopted in the past few years, has improved the effectiveness of skin cancer detection techniques. Dermoscopy is an instrument-based optic technology that provides good insight into skin lesions without involving tense excisions and incisions. While used individually, dermoscopic images can now be enhanced to the most precisely when integrated with deep learning algorithms, which help classify skin cancer at its primary stages13. A few difficulties persist, especially concerning the changes in lesion features depending on skin type, lesion location, and imaging condition. Furthermore, the lack of balanced datasets or the majority of benign dermatological lesions that dominate the training datasets is still a major problem when creating models with good generalization capabilities covering a wide range of cases.
To address these issues, this research will propose a deep learning model by designing a convolutional neural network FCDS-CNN and training a pre-trained, more advanced model for identifying skin cancer using dermoscopic images. To combat the issue of different lesions and datasets appearing, the model to be adopted by FCDS-CNN will be designed with solutions meeting the issue. For its part, the pre-trained model will act only as a baseline into which accurate comparisons can subsequently be made of the value of transfer learning in this particular setting. The final objective is to have an efficient diagnostic device that will help classify several skin lesions to enhance the early detection of diseases and better patient results.
What this research means is another story and could be said to be of immense importance. It is well established that the timely diagnosis of skin cancer, especially melanoma, can help minimize the number of fatalities and increase survival rates. A better and easier-to-use tool, which can be administered independently from the dermatologists, could greatly help relieve that specific workload and give the dermatologists a second opinion that would make the chances of a missed diagnosis or an unnecessary biopsy very low. In addition, the conclusions of this study could help develop a more advanced area of precise medication where efficient diagnosis means earlier treatment.
This research leverages the power of deep learning, specifically Convolutional Neural Networks (CNNs), in conjunction with dermoscopic imaging to address the need for more robust and automated skin cancer detection methods. CNNs excel at image analysis and classification due to their ability to learn intricate patterns and features from large datasets. They are well-suited for analyzing the complex visual information within dermoscopic images. This approach has the potential to overcome the limitations of traditional image-processing techniques and may even surpass the capabilities of expert dermatologists in discerning subtle indicators of malignancy. The proposed FCDS-CNN architecture is designed and optimized for skin cancer detection, incorporating features tailored to dermoscopic image analysis for enhanced diagnostic accuracy. The FCDS-CNN benefits from extensive pre-existing image knowledge by utilizing transfer learning from pre-trained models, reducing training time and data requirements for practical clinical application. Furthermore, it addresses the challenge of class imbalance in skin cancer datasets, where benign lesions typically outnumber malignant ones. It employs techniques like class weighting to ensure effective classification of all lesion types. This research makes several significant contributions, including introducing the novel FCDS-CNN architecture and a comprehensive performance evaluation comparing it to state-of-the-art pre-trained models, demonstrating superior accuracy and efficiency while offering insights into the role of transfer learning. Ultimately, this work aims to provide a robust and accessible tool for assisting dermatologists in early and accurate skin cancer diagnosis, potentially improving patient outcomes and reducing healthcare burdens, paving the way for further exploration of deep learning in dermatology and automated medical image analysis.
Research challenges or limitations and research gap
Before the emergence of modern and technologically advanced methods, the assessment of skin cancers mainly depended on a physical examination by dermatologists using dermatoscopy to amplify lesions, which generally relied on the skills of the various doctors involved. With the advancement in imaging technology, dermoscopic imaging came into practice to provide a detailed look at the skin surface; clinical imaging in terms of standardized photographs and confocal microscopy, which allows imaging at the cellular level, strengthened diagnosis. However, current and future research is still showing gaps and challenges. Data imbalance among different datasets is a limitation for current models when encountering rare lesion types; high intra-class variance also makes the learning process more complex due to much variability in appearances within each lesion type. When visually different classes are very similar, it becomes challenging to classify them correctly since visually similar lesions may be grouped into various classes. Prior work has made significant advances but typically does not simultaneously optimize recall and precision for different courses, especially in cases of skewed and varying class distribution. These problems are compounded by the scarcity of labeled data, particularly atypical lesions, further rendering the model prone to overfitting and poor transfer. Furthermore, versions in picture great resulting from diverse imaging methods and occasions make it challenging to educate the version, and that is illustrated in this look at as an excessive research hassle which calls for the development of resilient and flexible fashions to attain particular and dependable skin most cancers prognosis in healthcare shipping. Nevertheless, questions about harmonization, model scalability, and the integration of multiple imaging modalities need to be answered to broaden the sphere and harness the entire capacity of those technologies.
Motivation
This research aims to present a robust solution to the problems above of skin cancer detection by using four pre-trained models and one from scratch by the authors using a convolution neural network. The rationale behind this work is to tackle issues that could have contributed to the shortcomings of previous works, ranging from imbalance of data, variability within class, and similarity between classes. Following the Kaggle database, this study benefits from pre-trained models that are heavily equipped with prior information. At the same time, the specialized CNN model is developed to suit the identified characteristics of the dataset. This dual approach tries to correct all the previous flaws. It offers a model as fast as possible and is heuristically effective in providing the most accurate result and stable performance in natural clinical environments. Lastly, it is about enhancing the ability of dermatologists to identify skin cancer and ensuring that the body’s diagnosis capabilities are correct and devoid of errors.
Novel contribution
This work contributes to developing skin cancer detection as a valuable result of using both the pre-trained model and the FCDS-CNN model trained from scratch for the lens of the current methodologies. This study will combine deep learning methods with a heterogeneous data set to increase diagnostic efficacy and time/correct classification of skin cancer in practical clinical environments. The following vital contributions highlight the novel aspects of this research: The following key contributions highlight the novel aspects of this research:
-
All four pre-trained models and the FCDS-CNN model were trained separately to compare the outcomes keenly and closely to find the most befitting technique to use in the detection of skin cancer. This allows for a direct evaluation of the proposed model against established benchmarks.
-
Deals with the problem of data shift and high variability inside the class by using sophisticated data augmentation and specific architecture for lesion classification. This research introduces FCDS-CNN, a novel convolutional neural network architecture designed and optimized for analyzing dermoscopic images.
-
The author establishes a model that affords speed and accuracy and can be used to make instant diagnostic assistance at the point of care. Making it suitable for deployment in real-world clinical settings, including resource-constrained environments, and allowing for point-of-care assistance.
-
Uses a high-quality dataset obtained from Kaggle, ensuring that the model is trained with a good sample of skin lesions, hence increasing the possibility of coming up with a more generalized model. This enhances the potential for generalizability to a broader range of cases.
-
Desired to design an instrument to help dermatologists diagnose early and reduce the risk of diagnostic mistakes that eventually affect the patient’s prognosis. This work aims to develop a tool that can assist dermatologists in making more accurate and timely diagnoses, thereby reducing the risk of diagnostic errors and improving patient outcomes.
-
Offer a comprehensive comparison between the FCDS-CNN and pre-trained models and discuss Transfer Learning in skin cancer detection. This analysis provides insights into the benefits and limitations of leveraging pre-trained models in this domain.
-
Structures the model to be portable into other clinical settings; readable without additional modification, regardless of the choice of medical image. The model’s design emphasizes portability and adaptability to different clinical settings and imaging platforms without requiring extensive modifications.
.
Literature review
Different deep learning models have considerably upgraded the skin cancer lesion detection category and segmentation. Each model has its way of making diagnostics more accurate, and particular. Explainable deep learning models have emerged as a crucial area of research, aiming to provide insights into the decision-making process of these complex architectures, especially for multi-class skin lesion classification14
Venugopal et al.15 recommended a DNN model based on EfficientNet to diagnose skin cancer. ISIC 2020 and their research used 58,032 dermoscopic images and databases from ISIC 2019 and HAM 10000. Transfer learning with fine-tuning on the newly introduced EfficientNetV2-M proved the effectiveness of a comparatively small training set through augmentation of generalization capabilities. It is also preferred that features extracted from data are invariant to transformations. Thus, data augmentation methods were used. The study was demonstrated to have an accuracy of 95 percent in most of its findings. 0.49 of accuracy for the multiclass classification and an AUC score also of 0. 99 for binary classification while giving better efficiency than the existing state-of-the-art methods. The modified model was selected because the EfficientNetV2-M model showed high efficiency in preliminary work with complex datasets and high accuracy, and its AUC scores were demonstrated during the experiments. Rahman et al.16 of this article has also vetted an ensemble-based learning approach on several models, including DenseNet, ResNet, Xception, SeResNeXt, and ResNeXt. The datasets they used included the ISIC 2019 and HAM10000 for their investigation. They attained a good micro-average recall of 0.94 using the ensemble method in the multi-class classification. Architectures like Skin-Net, incorporating multilevel feature extraction, cross-channel correlation, and outlier detection, have shown promising results for this task17. Ensemble methods were applied because the authors aimed to improve the results of skin cancer classification by applying the best features of several models. Pacheco et al.18 conducted a study using the ISIC 2019 dataset, proposing an ensemble of three CNN models: Mainly, ResNet-152, VGG-19, and PNASNet architecture were compared for achieving the highest accuracy. This ensemble method used patient meta-data to enhance specificity in diagnosing the proto-syndromes. Closing the curtain on the study, the proposed model managed to maintain a balance between the precision of finding the disease and the accuracy of deciding that the patient does not have the disease by estimating 90% precision or accuracy 10%. The authors used this ensemble method and metadata to increase the level of diagnostic detail by providing the classifier with both contextual and image data, increasing the classification’s general stability. Dildar et al.19 categorized and discussed various deep-learning approaches for skin cancer detection, comprising CNNs, ANNs, KNNs, and GANs. This was done using a common ISIC dataset consisting of images of various skin lesions. The review revealed the high efficiency of the CNNs in image classification, especially in skin cancer. As the authors pointed out, while decoding the methods related to CNNs, the test accuracies varied between a minimum of 78% to a maximum of 97. 5% and also provide a novel way of improving the classification accuracy by data augmentation using GANs. The adoption of such methodologies was also informed by efficiency; these methodologies proved to be superior to conventional machine learning techniques in analyzing large datasets; they can efficiently address complex patterns in datasets, an attribute that is highly beneficial in MIA.
Jicman et al.2 reviewed and evaluated podiatric melanoma. At the same time, co-authors tried to define what difficulties may arise in detecting podiatric melanoma in Fitzpatrick types 5 and 6. While searching for articles from the PubMed database published from January 2012 to June 2022 that contain lower extremity lesions, the researchers found a lack of articles in this subject area. Their research is more targeted at raising awareness and screening practices among clinicians, especially podiatrists, that would improve the early detection of melanomas in those populations. Gouda et al.20 utilize the ISIC 2018 dataset, including 11,527 images for skin lesion classification. For image improvement they had ESRGAN for data augmentation a custom CNN model and the pre-trained models, ResNet50 InceptionV3 and also Inception ResNet. It proves the real effectiveness of this algorithm of InceptionV3 as its accuracy was found to be 85.7% of the right image categorization. Their methods are focused on improving diagnosis and searching for indications of skin cancer in its early stages by using deep learning and image optimization. Rajalakshmi et al.21 employed the ISIC-2018 dataset containing 2,594 images and their masks for skin lesion segmentation and classification. It featured image preprocessing, a convolutional neural network for only moles segmentation, and variations of data inputs. They used a CNN to differentiate between melanoma and nevus lesions and repeatedly paid to the false negatives to improve the detection of the same. There were no specified measures of overall accuracy. Still, the authors underlined the efficiency of deep learning for enhanced classification accuracy when it comes to dermoscopic images and dealing with their peculiarities. Jamil et al.3 worked with the HAM10000 dataset of 10,015 dermoscopic images of seven types of skin lesions. To counter class imbalance issues, they used data augmentation in a deep learning model of ResNet152 with transfer learning. They used the sampling technique and achieved a high level of accuracy at 86.47% where specificity is 87.42%, and F1 score is 0.854. The authors opted for transfer learning because of its efficiency in improving classification accuracies as it factorized the effects of the compounds that complement pre-stored data, addressing the issues of the different types of images within the database.
Oumoulylte et al.22 works on binary skin cancer classification using the ISIC dataset of benign skin cancer images 1800 and images of malignant skin cancer 1497. They applied CNNs and transfer learning models, VGG16 and VGG19, with VGG16 achieving the best results: Up to 87/100 or 0.38 loss, and achieving, therefore, the highest recall, precision, and F1-score. The model of choice was VGG16 because of its stability and efficiency in medical image classification, which improves skin cancer detection efficiency and accuracy. Sönmez et al.23 classified skin lesions using the “Skin Cancer MNIST: The seven classes involved in the study are Non-sleeping, One leg suspended, Both legs suspended, Seated, Standing, Falling and Unknown, and the images used feature an HAM10000” dataset that consists of a total of 10,015 images. The researchers employed data augmentation to balance the dataset and evaluated several transfer learning models. Here, it should be noted that the MobileNetV2 model, developed and assessed using commands in the PyTorch language, showed the highest accuracy of 80.79%. MobileNetV2 network was selected based on its performance, and the study aimed to increase the classification accuracy of unbalanced medical image datasets. Almufareh et al.4 proposed a fine-tuned CNN model for detecting melanoma based on the DermIS dataset containing 621 images, DermQuest containing 1233 images, and ISIC2019 dataset containing 25000 images. They used convolutional and pooling layers for feature extraction while fully connected layers for classification. In particular, they employed transfer learning and hyperparameter fine-tuning and got the detection accuracy increase by 5% in DermIS, 6% in DermQuest, and 0.81% in ISIC2019. The study aimed to increase the specificity and sensitivity of melanocytic neoplasms - a prerequisite to early diagnosis and better patient prognosis. Sardar et al.1 used the dataset ISIC 2018 for the skin lesion classification with the help of CNNs using multi-model ResNet using the ResNet 50 model. They retrained the model to improve the accuracy, and the model’s classification accuracy was found to be about 95.7%. ResNet50 was chosen since it provides good results when training deep networks, especially in the medical field, with the high accuracy required for skin cancer diagnosis and treatment.
Naqvi et al.5 studied the dep learning application In the field of skin. They also used other databases such as ISIC, HAM100000, and PH2, and they established that CNNs perform well in image classification but fail in generalization and overfit due to small datasets. These also pointed to the problem of the high complexity of models such as ResNet and DenseNet for training, arguing that there is a need for effective in-depth networks for application in clinical settings. These methods were preferred over others since the results out-compete the conventional techniques in diagnosing skin cancer. Ali et al.24 proposed integrating the IoMT at dermatological melanoma lesion segmentation using Conditional GANs (cGANs) on the three data sets, namely DermIS, DermQuest, and ISCI 2016. They improved the image quality by preprocessing them and reduced the class imbalance problem by data augmentation. The cGAN model achieved high accuracy, dice scores, and specificity: DermQuest achieved a 99%, whereas, DermIS achieved a 97%, and ISCI2016 achieved 95%. cGANs were selected to enhance segmentation accuracy since traditional approaches fail in scenarios where dermatoscopic images of melanoma look almost similar to normal skin. Singha et al.25 worked on Melanoma classification using the ISIC 2020 dataset. The work involved transfer learning with ResNet, VGG16, and MobileNetV2. Data augmentation and optimization algorithms were used, and the model achieved a validation accuracy of 98.39%. This further enhances the accuracy and, at the same time, reduces data and training, which is very useful in real-life medical applications.
Dandu et al.10 used Transfer Learning and Hybrid classification methods for Melanoma detection on the SIIM ISIC dataset comprising 10,000 DICOM images. They also used image improvement methods and incorporated such methods as the attribute-selected classifier and Bagging algorithms. Their approach achieved 90. Measures of accuracy, precision, and recall were recorded as 96%, while the F1 score ranged from 0 to 1, with 0 being the lowest. 91 and at an ROC AUC of 0.95. Improving classification accuracy and minimizing errors was the study’s primary concern, which is essential for the early diagnosis of melanoma in the clinical environment. Ali et al.26 employed transfer learning and fine-tuning techniques for multiclass skin cancer classification on EfficientNet-B4 and B5 architectures pre-trained on the ImageNet database. They were able to obtain 87.90% on the HAM10000 data set. The selected EfficientNet models proved both in terms of accuracy and efficiency that the approach of transfer learning and fine-tuning can significantly help to improve skin cancer diagnosis. Zafar et al.27 work on the state-of-the-art DenseNet-201 and SMA are utilized for feature extraction whence skin lesions were classified with 91.70% accuracy and Generalisation capability using ISIC 2019, HAM10000, PH2, and MED-NODE databases. Regarding the model selection, DenseNet-201 was selected because of its good classification accuracy and augmentation with SMA, further improving the performance and proving the efficacy of pre-trained models and fine feature separation in diagnosing skin cancer.
Table 1 outlines the datasets and techniques used by each author, followed by a final list of their accuracy, allowing me to determine which technologies are commonly used by other researchers. Based on this information, I calculate my accuracy in this field to determine the most effective technique from previous studies and identify any new findings in this study.
While deep learning dominates recent research, traditional diagnostic methods for skin cancer, such as visual inspection, dermoscopy, and biopsies, remain the clinical standard. However, these methods suffer from limitations like subjectivity and inter-observer variability, driving the need for more robust and automated solutions28. Before the widespread adoption of deep learning, traditional image processing techniques, including feature extraction, segmentation, and classification using algorithms like SVM and Random Forests, were employed for skin lesion analysis29. However, these methods struggled to handle dermoscopic image data’s complexity and high dimensionality effectively. Automated skin lesion analysis presents unique challenges, including variations in lighting, skin tones, lesion size, and shape, and the presence of hair, all demanding robust preprocessing and algorithmic solutions for accurate diagnosis30 Furthermore, skin lesion datasets often exhibit a significant class imbalance, with benign lesions far outnumbering malignant ones31. This imbalance necessitates techniques like data augmentation, class weighting, and cost-sensitive learning to prevent models from being biased toward the majority class. Explainability and interpretability are paramount in medical AI, and methods such as Grad-CAM and SHAP are increasingly used to understand the decision-making process of deep learning models and build trust in their predictions32. Finally, deploying these models in real-world clinical settings requires careful consideration of computational resources, integration with existing clinical workflows, and adherence to regulatory standards for medical AI33. Despite the promising accuracy reported in various studies34, a gap remains in understanding their performance in diverse clinical environments. This research aims to address this gap while also leveraging recent advances in transfer learning and ensemble approaches, which have shown the potential to achieve state-of-the-art performance in skin cancer classification by utilizing pre-trained networks and combining multiple models35. Deep learning has also been applied to other medical image analysis problems, such as detecting knee osteoporosis and osteopenia and classifying knee osteoarthritis, demonstrating the versatility of these techniques.
Methodology
Discussion of methods
This study evaluates both pre-trained models (ResNet152V2, EfficientNetV2B0, InceptionResNetV2, and MobileNetV3) and a custom-designed CNN (FCDS-CNN) for skin lesion classification. The pre-trained models offer diverse architectures, balancing accuracy, efficiency, and computational cost. ResNet152V2 provides high accuracy but demands more resources, while EfficientNetV2B0 and MobileNetV3 prioritize efficiency. InceptionResNetV2 excels in feature extraction. However, these models may not be optimally tuned for dermoscopic images. Therefore, FCDS-CNN is proposed, aiming for high accuracy with a less complex architecture tailored specifically for skin cancer detection. Comparing these models allows us to assess the trade-offs between performance, computational demands, and the benefits of task-specific model design. Key considerations include accuracy, computational cost, convergence speed, and handling class imbalance. This multifaceted approach aims to identify the most suitable architecture for skin cancer detection from dermoscopic images.
Data collection and preprocessing
Identification of skin cancer is one of the most vital areas of study in medical science, and picture dermoscopy is used to provide features of skin lesions that are suitable for machine learning. This study uses a dataset of 10,015 dermoscopic images across seven classes. In the H&E stained sections, the following findings were classified: 66 actinic keratoses (AK), 42 basal cell carcinomas (BCC), 39 benign keratoses (BKL), 50 dermatofibromas (DF), 73 melanocytic nevi (NV), 17 melanomas (MEL), and 14 vascular lesions (VASC). These classes are as follows, ranging from harmless to life-threatening cancers like melanoma. Another problem of this dataset is that the number of samples in different classes differs considerably which is a challenging factor in improving the model recognition rate. Through the study, Hosny et al. found a way in an article about melanoma’s aggression that the disease should be detected early. Esteva et al.40 thus underline that the quality of dermoscopic images should be enhanced to enhance identification since the lesions under examination may look alike. According to Bansal et al.44, the classification can be improved with more sophisticated approaches such as transfer learning and ensemble methods. However, these are some of the challenges encountered. Such findings provide background knowledge on the importance of using complex methods in skin cancer diagnosis.
The study identifies various skin diseases, including Akie, BCC, BKL, DDF, Mel, Nv, and Vasc, each with unique characteristics. The study emphasizes the importance of using advanced methods to accurately identify these diseases, using a variety of dermoscopic images, as shown Fig. 1a-g.
Dataset distribution
Figure 2 below illustrates the distribution of images within the dataset, divided into two key categories: the training and testing datasets. The training dataset, represented in light coral, accounts for 90% of the total images, equating to 9,013 images. This substantial portion is utilized to train the machine learning models, enabling them to learn the patterns, features, and characteristics associated with different classes of skin lesions. Conversely, the testing dataset, shown in the light green color, comprises 10% of the total images, or 1,002 images. This smaller group tests how well the model performs, ensuring it has learned and can handle new examples it hasn’t seen before. This distribution reveals a common way of working with machine learning: a preponderance of the data in the training bucket and a fairly small bit designed for testing. The key to this balance is so that we get reliable results and that the model doesn’t underfit as much as overfit. Figure 2 shows the distribution between the training and the testing datasets and Table 2 shows dataset Overview such as Image Types, Diameter, Training and Testing Distribution.
Classes distribution
This study proposed the investigation of skin cancer detection using the dataset categorized into seven classes: akie, df, nv, mel, vasc, bcc, and bkl. As visualized through the pie charts, the dataset reveals a significant class imbalance, with the NV class dominating both the training Fig. 9 (66.9%) and testing Fig. 10 (67.0%) datasets. Classes like mel and bkl are moderately represented, while akie and df are particularly scarce, complicating model training and evaluation. Despite this imbalance, the dataset is valuable due to its diverse representation of skin lesion types, providing a robust foundation for developing models capable of distinguishing between various conditions. The consistent distribution between training and testing sets ensures that models are evaluated under realistic conditions. However, the imbalance poses a risk of model bias toward the majority class. Implementing strategies like data augmentation, class weighting, or oversampling is crucial to ensure balanced and accurate detection across all classes. Figure 3a-b explains the classes on how many images are in the training dataset and testing dataset as shown in the figure.
Preprocessing
Data preprocessing is one of the crucial steps one must perform while deep mastering, specifically if the input statistics are photographs and the sole intention is to perform a type. A technique that encompasses special strategies to carry raw records in a shape suitable for feeding to neural networks. Preprocessing is a critical step in that it improves the niceness of the statistics, eliminates noise, and makes the dataset standardized, enhancing the overall performance and accuracy of the version. For our pre-trained models as ResNet152V2, MobileNetV3, InceptionResNetV2, and EfficientNet, the preprocessing normally involves resizing all our pictures right into a not-unusual size of 224 pixels by way of 224 pixels. This resizing facilitates keeping the input facts aligned to the input dimensions desired in those models, allowing functional training and function computation. Other preprocessing steps likewise encompass normalization, which scales the pixel values. In effect, the fashions converge, and the schooling is made to beautify the models’ generalizability. We, therefore, need to preprocess our input statistics to maintain the expectations of pre-trained fashions that allow you to get greater reliable pores and skin cancer detection outcomes.
Data preprocessing included resizing all images to a common size of 224x224 pixels to match the input requirements of the pre-trained models (ResNet152V2, MobileNetV3, InceptionResNetV2, and EfficientNet). This resizing ensured consistency and facilitated efficient computation during training. Normalization was also performed, further scaling the pixel values to a specific range (e.g., 0-1) to standardize the dataset and aid in model convergence. For the FCDS-CNN, the input images were resized to 224x224 pixels, maintaining consistency across all models.
Class weighting
Another technique used in deep learning to address the problem of working with an imbalanced dataset is class weighting, which aims to give more weight to the minority classes during training to cause the model to spend more time training to reduce bias towards the majority classes. This was implemented by calculating the inverse frequency of each class in the training set and using these values as weights during model training. The class weighting is applied to FCDS-CNN models and pre-trained models such as ResNet152V2, MobileNetV3, InceptionResNetV2, and EfficientNet, particularly for skin cancer detection, where some types of cancer are rare. The technique enhanced the recollectiveness and exactness of minority classes in all the models, which boosted the further prediction adequateness. The main benefit of these models is that social class weighting brings out the minor types of cancer and, at the same time, overlooks very crucial cases, leading to a generally high performance of the model. All the models were found to have benefited from this particular approach. Still, the FCDS-CNN led to rather substantial improvements, indicating that class weighting is quite effective for improving the accuracy and reliability of models for skin cancer detection across different classes.
Data augmentation
These two data augmentation techniques are essential in deep learning, as they enlarge the variety of training data sets through rotating, flipping, scaling, and zooming through existing pictures to increase the capability of the model to generalize on an actual data set. These transformations were applied randomly to the training images using a library like TensorFlow’s ImageDataGenerator, with specific ranges chosen empirically (e.g., ±20-degree rotation, 0.9-1.1 scaling). No synthetic data generation was used. Data augmentation is crucial for the dataset used in this study, where some classes are under-represented, such as the skin cancer detection dataset used in this study. Due to data augmentation techniques, more samples can be produced for the minority classes, reducing the model’s ability to learn only from the samples of more common courses. The above approach proves highly practical for both the FCDS-CNN and the pre-trained models such as ResNet152V2, MobileNetV3, InceptionResNetV2, and EfficientNet. Data augmentation can enhance the precision of skin cancer prediction using the model by representing all the classes in the training process.
Hyperparameter tuning and training details
The Adam optimizer was employed to train all models due to its effectiveness in handling sparse gradients and adaptive learning rate capabilities. The initial learning rate was set to 0.001, and a learning rate schedule (e.g., ReduceLROnPlateau) was implemented to dynamically adjust the learning rate during training based on validation performance. This helped to fine-tune the learning process and prevent premature convergence at local minima. The categorical cross-entropy loss function was utilized as it is suitable for multi-class classification problems. A batch size of 32 was chosen empirically, balancing training speed and memory usage based on the available computational resources (16GB RAM, Intel Core i7, using Google Colab for training with access to GPUs). The dataset was randomly split into training (90%) and validation (10%) sets, ensuring a stratified distribution of classes in both sets to mitigate potential bias. The number of training epochs varied depending on the model, ranging from 30 to 40, determined using early stopping based on validation accuracy. Specifically, training was halted if the validation accuracy did not improve for several consecutive epochs (e.g., 5), preventing overfitting and ensuring optimal generalization performance. For FCDS-CNN, data augmentation techniques, including [list specific techniques with ranges: e.g., random rotations (±20 degrees), horizontal/vertical flips, random zoom (0.9-1.1)] were implemented using TensorFlow’s ImageDataGenerator. Class weights, calculated as the inverse class frequency, were applied during training to address the class imbalance issue, as detailed in Section 3.5. The models were implemented using TensorFlow [version number] and Keras. Model performance was evaluated using accuracy, precision, recall, F1-score, and Area Under the Curve (AUC), calculated on the held-out test set (10% of the data). These metrics comprehensively evaluate the models’ classification performance, particularly concerning sensitivity and specificity.
Proposed deep learning-based models
This research proposed five deep learning models for skin cancer detection, including four pre-trained models: The models involved in the study are ResNet152V2, MobileNetV3, EfficientNetV2B0, InceptionResNetV2, and FCDS-CNN, which was created from scratch. The FCDS-CNN utilizes a softmax activation function in its output layer for multi-class classification. When fine-tuned using domain-specific HAM datasets, ResNet50 and other models have better-classified skin lesions than traditional machine learning models45. It has also been widely used as transfer learning where rather than training the network from scratch, we fine-tune a network that has already been trained on a large dataset, another approach that has been successful in the case of skin cancer detection, especially when dealing with small datasets to get better results46. While exploring different activation functions could be a valuable extension, this study focused on comparing different architectures for skin lesion classification. Machine learning, a further subcategory known as deep learning, has dramatically improved the performance of medical image classification, especially in skin disorders8. Some models, InceptionV3 and Xception, have been used in dermatological image classification, providing high accuracy, but they are computationally intensive, and the problem of overfitting was also observed47. The custom model was successfully developed, addressing issues with other methods, such as computational complexity and unbalanced data sets, and showing cost, speed, and reliability compared to the others. It can be highly efficient in providing doctors and patients with early skin cancer detection and is ideal for areas lacking funds to buy other equipment.
ResNet152V2
In this study, one of the strategies that was used for skin cancer detection was a pre-trained model called ResNet152V2. This is one of the deep learning architectures belonging to the ResNet family, which is recognized by its depth and performance capabilities when approaching challenging image classification problems. ResNet, partial Residual Networks, is the idea in which the concept of residual learning is used, which adopts identity shortcuts that skip one or many layers. This architecture makes it possible to train much deeper networks than would be the case when training standard deep networks that struggle with the vanishing gradient problem.
ResNet152V2, in particular, is a 152-layer deep network architecture that provides the architecture with high complexity to learn detailed features of the images. Due to this depth, it’s preferable for operations such as the analysis of medical images, where the slight difference between the images is significant in detecting the ailment. The benefits of using ResNet152V2 are its reasonable accuracy rate in handling different image classification issues, its ability to be easily trained on various datasets, and its backing of many articles and forums on fine-tuning and integrating the model. Nonetheless, it is also important to mention that ResNet152V2 has some limitations. Here are some of them: Its deep architecture is somewhat effective yet utilizes abundant computational resources, thus causing issues in contexts with lowered hardware performance. Also, the more intricate the model is, the longer it takes to provide an inference, and sometimes that may be undesirable regarding time sensitivity. These factors make it relatively slow compared to real-time applications or usage in environments with limited resources.
However, the custom-built FCDS-CNN, developed in this study, was designed explicitly for skin cancer detection to achieve a higher production rate of 96% compared to the 91% that was completed in the survey using ResNet152V2. The FCDS-CNN is the best in terms of performance as well as the use of resources; hence, it is favored most when operating and diagnosing quick and accurate results, especially in poorly funded contexts. In general, image classification tasks ResNet152V2 is proven to be better. However, the idea and flexibility behind the FCDS-CNN make it more suitable for this particular task and easier to implement in early skin cancer detection.
This model applied in the current study for skin cancer identification is explained from the architectural view and the process flow is depicted in Fig. 4. The figure above shows how the model works on input images of size 224×224×3 via The ResNet152V2, which is used here to extract latent semantic features. The output from ResNet152V2 is \(7\times 7 \times 2048\), followed by the Global Average Pooling layer resulting in 2048 units. After that, the increasing layers with units of 1,024 and 512 are deployed, wherein each layer is accompanied by a batch normalization layer and a dropout layer for enhanced learning and reduced overfitting. Lastly, the model provides prediction probabilities in seven classes by a final dense layer of seven neurons.
In the following, Table 3 shows the detailed architecture of the ResNet152V2 model along with the output shapes, layers, and numbers of parameters connected to each of them. The capacity note of ResNet152V2 shows that it has more than 60 million parameters, most of which are trained. The architecture ensures that ResNet152V2, which is much deeper and more complex than other architectures, is utilized in the dens extracting features from images, which are vital in classifying images in this medical imaging. However, it comprises more parameters, making it larger and showing that it is computationally intensive compared to the customized CNN model designed and created in this study.
EfficientNetV2B0
Another model we adopted as part of the approach to identifying skin cancer was a trained one known as EfficientNetV2B0. Specifically, this is a model of the EfficientNet series known for its high performance and relatively low computational costs. This model is designed to make it as accurate as possible with the smallest number of parameters and computations possible to render it amicable in applications where time and resource use are paramount. EfficientNetV2B0 is the most miniature model in the EfficientNetV2 family because it constitutes improvements in the network scaling technique that cut the computational cost while performing well on various image classification tasks. Likewise, one of the significant strengths of EfficientNetV2B0 is that it can be trained and used for faster inference than other complicated network architectures. This efficiency also makes it suitable for development in platforms with low computational power, such as in telecommunications or other tiny devices such as smartphones and embedded systems, all without much loss in accuracy. The operation of the model in a situation when the amount of available computational resources has been limited suggests that such a model may be used in the real-world setting where computational power is somewhat limited.
However, there are some factors that one will want to consider as limitations for using EfficientNetV2B0. Although the model works, its architecture is not very deep, so it will likely miss many details compared to the newer and deeper models, which may be very useful in highly technical processes such as skin cancer detection. This can lead to a reduction in accuracy, especially when it is essential to distinguish between near images of similar body organs. When using EfficientNetV2B0 the obtained result was 88% of accuracy, however, it is not as good as the custom CNN-based model.
The model introduced using FCDS-CNN in this research aims to provide a solution for skin cancer detection with a higher accuracy of ninety-six percent. Based on what was explained before, the FCDS-CNN was created from scratch, keeping in mind the peculiarities of dermoscopic images, which helped to detect more relevant features. Besides, data augmentation and class weighting made it even more accessible to train the FCDS-CNN on the overrepresented classes, which in the case of the dataset was 4, and increased overall accuracy. Primarily, EfficientNetV2B0 has significant benefits in terms of efficiency and significant speed; however, the FCDS-CNN offers better performance for the task’s goal of skin cancer detection, which is why it is more appropriate for this particular research.
The EfficientNetV2B0 architecture used in the current study is illustrated in Fig. 5 alongside the process flow. The figure below proves the model works as described: Identify the input layer that takes images of size 224×224×3. Then, the features are fed into the EfficientNetV2B0 backbone, in which several layers are applied to filter out the essential features, resulting in a \(7 \times 7\times 1280\) input. It is then reduced to a more manageable value of 1280 units via a Global Average Pooling layer. Looking at the next several dense layers containing 1024 and 256 neurons, they are expected to analyze all the extracted features for the final classification of skin cancer into any of the seven classes in the final dense layer. This information is described in detail in Table 4 below, which captures the model architecture of the EfficientNetV2B0; the output shapes the type of layers and the number of parameters at each stage. This table also shows the efficiency of the identified model as it achieved approximately 7.5 million parameters, and the majority of parameters are trainable. The architecture is aimed to strike deeper structures and computational complexity at the same time, which makes this model operate with relatively low requirements while ensuring a high accuracy rate.
InceptionResNetV2
This study recommended using InceptionResNetV2, an out-of-the-box deep learning model developed by Inception’s family. It boasts of strength in using inception modules with residual connections, making it suitable for use on complicated experiments like skin cancer diagnosis. This work has achieved 93% accuracy by using InceptionResNetV2, which is proficient in depicting fine details of images due to its deep learning architecture and feature extraction system. Nevertheless, as with any technique, InceptionResNetV2 is not without its shortcomings: its high demand on computational and time resources and a long training and inference time, which become an issue in conditions of low availability of computing capabilities. Its advantage here as a more flexible and accurate tool for most generic image analysis tasks might thus be counterproductive for it to be used for specific application-oriented tasks wherein fast turnarounds and the least resource utilization are necessary. This paper proves by its FCDS-CNN model design, that it is more precise, practical, and efficient to detect early skin cancer with its focused design instead of using the InceptionResNetV2, which is more generalized.
Figure 6 depicts the architecture of the InceptionResNetV2 model used in this paper for skin cancer detection, which explains the various stages of the model. The figure below illustrates the model and work it takes as input the images of size 224×224×3, where InceptionResNetV2 is the backbone model that extracts the higher level features since it has a very high architectural design. The output from this backbone, with a dimension of 5×5×1536, is followed by layer Global Average Pooling to reduce it to 1536 units. The next is a series of fully connected layers 1024, to which batch normalization and dropout were added, which improved the model’s resilience and reduced overtraining. Lastly, the model generates the probabilities of classes for images belonging to seven categories through a dense layer with seven outputs.
Table 5 provides a comprehensive assessment of the InceptionResNetV2 structure, which includes the output shapes, layer kinds, and the number of parameters involved at every degree. The table highlights the version’s sizable potential, with over fifty-six million parameters, indicating its potential to handle complex and huge-scale photograph category duties. Despite the model’s depth and advanced design, which contribute to its excessive accuracy, the FCDS-CNN version advanced on this has a look turned mainly optimized for the specific necessities of skin cancer detection, resulting in even better performance.
MobileNetV3
This work has introduced the approach to skin cancer diagnosis using a pre-trained model known as MobileNetV3. This very light deep learning model has been developed mainly for mobile and embedded systems. It is based on the MobileNetV2 network that was released a little earlier, with the addition of improvements such as SE (squeeze-and-excitation) blocks and through elimination of the excess layers and complexity while retaining a high level of accuracy. MobileNetV3 is used significantly in environments with limited computational resources, and, therefore, it is suitable for applications where deep learning models are required on edge devices. MobileNetV3 has numerous advantages, such as the reduced number of parameters and the short time needed to make an inference, which is essential for real-time applications. Furthermore, the model provokes less load demand and entails fewer parameters than some traditional models, which is a strength in low-power situations. Although it has numerous advantages, the following are some of MobileNetV3’s limitations. Its simplicity, because it is optimized for quick training, might slightly sacrifice accuracy, especially since it is less adaptable than more profound and more complex models, for example, in medical imaging, which requires the identification of small patterns.
In the present experiment, a MobileNetV3 model yielded 90% accuracy, which is good but slightly lower than the other FCDS-CNN model. The types of models proposed in his study were also appropriate for skin cancer detection, so they can capture higher relevant and descriptive dermoscopic image features. Although MobileNetV3 excels through its efficiency and more versatile deployment, it is still outperformed by the FCDS-CNN model in terms of accuracy and optimization for the given task. From the experiment results, the FCDS-CNN model improves both accuracy and control of the computational time between the two state-of-the-art models, which makes it more applicable for early skin cancer detection.
Figure 7 summarises the architecture as well as the process flow of this model employed in this study for skin cancer detection. The figure illustrates the input Image dimensions of 224×224×3 with which the model processes the current images using the mobileNetV3 backbone. This backbone efficiently extracts features from the input by reducing the dimensions to 7×7×960. The lac output then is reduced to 960 units with the help of a Global Average Pooling layer. The final feature vector is then fed through dense layers with 1024 and 512 neurons, respectively, followed by batch normalization and dropout to enhance the stability of the network. Last but not least, the previous dense layer produces predictions in seven classes. Table 6 further elaborates the architecture of this model with the output shapes, the layer types, and the number of parameters in each stage. Therefore, this is an ideal place to use the custom CNN model rather than MobileNetV3 since its accuracy is higher and has been specially trained for this kind of task even though MobileNetV3 is better in efficiency and speed.
Proposed FCDS-CNN model
The proposed FCDS-CNN in this research contains several advantages over other research and pre-trained models, which are as follows. One of the recent approaches with higher accuracy in the automated classification of skin lesions has been FCDS-CNN, which indicates a great potential for enhancing diagnostic accuracy and early detection of skin cancers13. Firstly, the model is optimized for skin cancer detection, which leverages features inherent in dermoscopic images. Unlike standard CNNs trained on general image datasets like ImageNet, the FCDS-CNN is specifically designed to capture the unique characteristics of dermoscopic images relevant to skin cancer. This task-specific design allows it to learn more discriminative features for accurate lesion classification. This customization enables the technology to predict more accurately and more efficiently, particularly in diagnosing aggressive skin cancer at early stages. FCDS-CNN has been used in medical image analysis where pre-trained models through transfer learning have been used to get enhanced results in small data sets. FCDS-CNN and other deep learning algorithms have proven crucial in other complicated medical image analysis problems and bring improvements in the detection of skin cancer10. Furthermore, the FCDS-CNN architecture incorporates two key strategies to handle challenges common in medical image datasets. Prior research has shown that classification based on FCDS-CNN achieves high accuracy in skin lesion classification; nevertheless, data imbalance and model over-specialization to the datasets are some of the areas that require research to make the model clinically useful11. Specifically, data augmentation techniques, including rotation, flipping, scaling, and zooming, are employed to create variations of existing images, artificially expanding the dataset and reducing the impact of class imbalance. In addition, a class weighting mechanism is implemented during training to give more weight to under-represented classes, ensuring that the model learns to identify all lesion types even with skewed data distribution effectively. These strategies are crucial for improving the performance on minority classes and enhancing the overall robustness of the model. Besides, the model’s training time is relatively shorter than that of large explicit pre-trained models depending on resource consumption since it is a comparatively smaller model. Compared to complex pre-trained models with millions of parameters, the FCDS-CNN utilizes a streamlined architecture with optimized layers (as shown in Figure 8 and Table 6) and a reduced parameter count, increasing computational efficiency and making it suitable for deployment in resource-constrained settings. Incorporating batch normalization and dropout layers further contributes to regularization, preventing overfitting and improving overall model performance. This allows for its implementation and utilization in environments they may otherwise not be able to afford or support in terms of their fundamental hardware needs while still delivering the performance expected of state-of-the-art healthcare facilities; hence, the software can be implemented both in developed and developing world’s more advanced medical centers and raw, rudimentary environments, respectively. The use of data augmentation and class weighing achieves additional improvement. Augmentation increases the model’s capacity to classify other unseen data by decreasing the probability of sheer overfitting, mainly when operating on imbalanced datasets. Class weighting makes sure that, during training, the minority classes are given an enhanced representation as a way of avoiding overlooking some of the popular courses in favor of the rare classes, thus making the diagnoses far more accurate with all kinds of skin cancer.
Besides, the structures composing the model are scalable, which means they can be fine-tuned for new datasets and different medical imaging applications. This model’s versatility expands its applicability for future studies or diseases without developing a new model from scratch. These unique attributes of the FCDS-CNN model make it a strong, flexible, and feasible tool in medical imaging in diagnosis and can provide significant improvements over conventional methods.
Figure 8 The architecture of the custom-built FCDS-CNN for skin cancer detection is shown in Figure. Initially, it shows a detailed topographic view of the data streams from the input layer traveling through various convolutional, batch normalization, and pooling layers to the dense layers interconnected with the output layer streams. The model stands to take this set of images as input, reshaping the data into a set of features the model then uses for the classification of skin cancer to one of the seven categories. The detailed structure of the FCDS-CNN, which includes the output shapes, the type of layers, and the number of parameters for each layer, are listed in Table 7. By bringing it all in a table, we show the model’s hierarchical structure, how intricate and deep a network is under this model, and extreme minimalism in how many parameters to use to achieve such high accuracy. However, this work shows how the model can find out the complex pattern on the dermoscopic images and be reasonably fast simultaneously.
Evaluation parameter
In the study, Eq. (1) is used to assess the accuracy of each ML Model. Accuracy is defined as the nearness of measurement to the actual value being sought and is a key metric in calculating the models\(^{?}\).
Where:
In these circumstances, Eq. (2) is used to calculate the Precision value, a measure of the agreement between the multiple measurements of the same quantity. It is defined by the number of True Positive (TrPos) and False Positive (FaPos) results to determine precision, capable of recognizing positive instances correctly. The higher precision means less variation in the outcome, so they are very reproducible and reliable.
Where: Tr Pos = True Positives, Fa Pos = False Positives
In particular, Eq. (3) defines recall as one of the model’s capacity to discern all of the relevant instances in a data stream. The effect is the recall, which is the sum of the true positives (correct identifications) minus false negatives (false alarms). This higher recall value indicates that the model can produce more positive instances and thus can better find out the true positives.
Where: Tr Pos= True Positives, Fa Neg = False Negatives
As it were, Eq. (4) represents the F1 Score computed with harmonic mean of recall and precision. Given one assigned utterance to a sentence, it is a symmetrical metric, one that considers how effectively the model can identify relevant examples (recall) and how accurately it does so. The harmonic mean combines these metrics to see if the model is effective on both precision and recall.
Where:
Tr Pos = True Positives,
Fa Pos = False Positives,
Fa Nag = False Negatives
Experimental setup
The study was conducted on the Dell XPS 13 laptop running Windows 10. It has an Intel Core i7 processor, 16GB RAM, and 512GB storage capacity. This hardware configuration was used for development and preliminary testing, while the computationally intensive training process was performed on Google Colab, leveraging its GPU resources. The development environment includes packages like TensorFlow, Keras, NumPy, Matplotlib, and pre-trained models. Model training was conducted exclusively on Google Colab, and other libraries like Pandas, SciPy, and Scikit-learn were used for data manipulation, statistical analysis, tests, and performance assessment. All models applied in this study used similar hyperparameters, with variations only in batch size and the number of epochs. This consistency in hyperparameters, wherever possible, was maintained to ensure a fair comparison between the models’ performance. The FCDS-CNN model was trained for 40 epochs, as shown in Table 8. However, ResNet152V2, EfficientNetV2B0, MobileNetV3, and InceptionResNetV2 were trained for 30 epochs. The rationale for training the FCDS-CNN for longer was based on observed performance trends. Initial experiments with 30 epochs for the FCDS-CNN resulted in a lower accuracy of 83%. Extending the training to 40 epochs led to a significant improvement in performance, reaching 96% accuracy. In contrast, the pre-trained models showed signs of overfitting beyond 30 epochs, as indicated by a divergence between training and validation loss; hence, their training was stopped at that point. The other hyperparameters, including batch size, number of classes, learning rate, and image size, remained invariant and identical for all the models. Table 8 details these shared hyperparameters.
Rationale for Hyperparameter Choices: A batch size of 32 was selected after empirical evaluation, as it offered a balance between computational efficiency and stable training. The learning rate of 0.001 was chosen as it provided a good balance between convergence speed and stability, avoiding large oscillations in the loss function. The image size of 224x224 pixels was selected to match the input requirements of the pre-trained models, ensuring consistency across experiments and leveraging the pre-trained weights effectively.
For the given dataset, there are seven classes and therefore the value for the ‘num_classes’ parameter was specified as 7 in all models; The class labels were akie, bcc, bkl, df, mel, nv, and vasc. Both models distributed batches of 32 images and the images’ fixed size was 224×224 pixels. First, on the FCDS-CNN model, the network was trained in the first round and the results acquired an accuracy of 83%, which is considered low. Increasing the epoch numbers to 40 improved performance where the accuracy was raised to 96%, which was better than all the pre-trained models considered in this research. For instance, ResNet152V2 yielded an accuracy of 91% while EfficientNetV2B0 yielded 88% MobileNetV3 yielded 90% and InceptionResNetV2 yielded 93%.
Performance analysis of proposed methods
The study compared five models for classifying skin images of lesions. The newly designed FCDS-CNN model outperformed the other pre-trained models regarding precision, recall, F1-score, and AUC.These metrics provide a quantitative assessment of the model’s ability to correctly classify skin lesions. This indicates the model’s full potential in distinguishing different skin lesion classes. The model also demonstrated the ability to differentiate closely related categories, which is crucial in medical image classification applications. This advantage stems from FCDS-CNN’s task-specific architecture tailored to the unique characteristics of dermoscopic images, enabling it to learn more discriminative features compared to the pre-trained models, which are based on generic image datasets like ImageNet. While pre-trained models offer a foundation of learned features from vast datasets, their generic nature may limit their effectiveness in specialized domains like medical imaging, where the features and patterns relevant to specific diseases are often subtle and require a more targeted learning approach. While InceptionResNetV2 achieved the highest AUC among the pre-trained models, however, it performed slightly lower than the CNN-based model. InceptionResNetV2 had the best results in terms of AUC, but it still falls short of an FCDS-CNN model. This difference in AUC, a metric that reflects the model’s ability to distinguish between classes regardless of the classification threshold, further highlights the advantages of a task-specific architecture like FCDS-CNN. This is due to the FCDS-CNN model’s design to capture the specific dataset used in the research. Pretrained models from general datasets may struggle to understand the differences between these specific datasets. The analysis highlights the method’s strength in constructing models that suit the task.
The accuracy comparison of the five models such as FCDS-CNN, ResNet152V2, EfficientNetV2B0, InceptionResNetV2, and MobileNetV3 are presented in Fig. 10a-e. All models demonstrated a general trend of increasing training and validation accuracy with increasing epochs and decreasing loss. This trend indicates that the models learn from the training data and improve their ability to generalize to unseen examples. However, the convergence speed and final performance varied significantly. Convergence speed, reflecting how quickly a model reaches its optimal performance during training, is an important practical consideration, especially for computationally intensive deep-learning models. The FCDS CNN model reaches over 95% accuracy while being able to handle imbalanced data and, in particular, rare classes like ’akie’ and ’Vasc.’. This high accuracy, combined with its ability to classify minority classes effectively, is a key strength of the FCDS-CNN model. In the context of skin cancer diagnosis, accurately identifying rare but potentially aggressive lesions is crucial for timely intervention and improved patient outcomes. Data augmentation and class weighting strengthen its accuracy in classifying all skin cancer types and is,, therefore,, a viable choice for medical applications. Data augmentation and class weighting are essential techniques for addressing the challenges of imbalanced datasets. Data augmentation expands the training data by creating synthetic variations of existing images, while class weighting adjusts the importance of different classes during model training. These techniques improve the model’s ability to learn representative features from all classes, particularly the minority classes, which often contain the most critical information for accurate diagnosis. Despite its deep architecture, ResNet152V2 takes a long time to converge but finally reaches a similar accuracy level. The slow convergence of ResNet152V2, despite its large capacity for learning complex features, highlights a potential trade-off between model complexity and training efficiency. More training time is required, but it is very effective in extracting detailed features from complex classes such as ’akie’ and ’vasc’. Due to their more efficient architectures, EfficientNetV2B0 and InceptionResNetV2 demonstrate faster convergence than ResNet152V2. EfficientNetV2B0 is optimized for both performance and computational efficiency, making it a strong candidate for high-resolution medical image analysis. Similarly, InceptionResNetV2 effectively balances local and global feature extraction, showing strong performance across diverse datasets with a stable and consistent accuracy growth.
Moreover, MobileNetV3 stands out for its rapid convergence and ability to achieve competitive accuracy with fewer epochs, making it ideal for real-time and mobile-based applications. The rapid convergence of MobileNetV3 is a significant advantage for real-time applications, where quick diagnostic predictions are essential. However, this speed comes at a cost, as MobileNetV3 tends to achieve lower overall accuracy than more complex models like FCDS-CNN. This trade-off between speed and accuracy highlights the importance of choosing a model appropriate for the specific constraints and requirements of the target application. While it may not reach the final accuracy of FCDS-CNN, its speed and efficiency make it highly suitable for environments with limited computational resources. In resource-constrained environments, such as mobile or point-of-care settings, where computational power and memory are limited, the efficiency of MobileNetV3 becomes a critical factor, even if it means slightly sacrificing overall accuracy. Consequently, Fig. 10a-e illustrates that all models perform well, with CNN and MobileNetV3 leading in convergence speed, while ResNet152V2, EfficientNetV2B0, and InceptionResNetV2 offer stable, high accuracy over time. CNN stands out for its combination of early convergence and high final accuracy, while MobileNetV3 excels in scenarios where training speed and efficiency are critical. These findings demonstrate the range of available models and their varying strengths and weaknesses across different aspects of performance, allowing for informed selection of the most suitable model based on the specific requirements of the target application.
The comparison of the five models based on cross-entropy loss, as shown in Fig. 11a-e highlights distinct learning patterns. The FCDS-CNN model exhibits a steady and smooth reduction in cross-entropy loss, indicating consistent learning without overfitting. The training and validation losses go down smoothly, demonstrating a good balance between training and generalization due to a deeper architecture that ResNet152V2 kicks off with a larger initial cross-entropy loss and then converges faster toward a smaller value with more epochs. We expect this slower loss reduction from fine-tuning deep models with many layers. It however attains a lower loss value eventually, indicating that it does well with complex data. The validation set also has faster reductions in cross entropy loss than EfficientNetV2B0 and InceptionResNetV2. Because their efficient architectures enable them to find low loss quickly, they can quickly settle into stable low-loss values early in training. Both models achieve fast convergence without yielding high loss, and thus they are suited for high-performance tasks. Also, MobileNetV3 achieves the natural rapid cross-entropy loss reduction in the training process, especially in the resource-constrained mobile environment. In fact, it does overfit slightly (sometimes significantly) on larger, more complex datasets but remains competitive in terms of loss, given its lightweight nature. Therefore, EfficientNetV2B0 and InceptionResNetV2 induce a fast loss decrease and stable convergence; FCDS CNN finally exhibits a steady and balanced loss reduction. While efficient sampling does reach a low loss during training, ResNet152V2 converges more slowly with lower loss, whereas MobileNetV3 attains low loss soon after adequate training. However, it tends to be less regularized compared to white box methods, on the one hand, and more regularized than transfer learning, on the other hand.
Detailed assessment of the classification accuracy and misclassification rates of each model concerning different skin lesion types is given in Fig. 12a-e through the comparison of confusion matrixes. As seen in Fig. 12a, although performance is robust across majority and minority classes, the FCDS-CNN model is effective. With little overlap in confusion with similar lesion types, it convincingly segregates even similar categories and thus is especially suited to medical settings where the classes are imbalanced.
Regarding overall classification, ResNet152V2 also does pretty well, but the deeper architecture causes occasional misclassifications, especially for minority classes. This could happen for many reasons, from model complexity to skewness in the dataset or a lack of represented categories for the model to learn from. Its high overall accuracy, though, leaves room for some fine-tuning to handle imbalanced data.
Even in datasets with high variance, EfficientNetV2B0 produces balanced results around the confusion matrix, even separating all the classes. In the minority class classification, it additionally performs well and yields fewer misclassifications than deeper models such as ResNet152V2. This reliability is tunable and available online, which makes EfficientNetV2B0 highly reliable for datasets with a broad range of class distributions. While performing well overall, inceptionResNetV2 makes a few mistakes with misclassification between lesion types with similar feature, e.g. ‘nv’ and ‘mel.’ Often these differences are so subtle, even the most advanced models struggle to differentiate. However, it is accurate across multiple classes and discriminates high and clear classes very well.
A version of MobileNetV3, designed for speed and efficiency, has slightly higher misclassification rate between closely related classes, namely ’bkl’ and ’bcc.’ Despite its overall accuracy still being high, its lightweight architecture finds it difficult to distinguish between visually similar lesions. However, MobileNetV3 proves to be extremely effective in low resource and real time application spaces, even so, class differentiation could be improved by MobileNetV3 further fine tuning.
As a result, the confusion matrix comparison in Fig. 12a-e demonstrates that FCDS-CNN and EfficientNetV2B0 have the best performance with both majority and minority classes. Overall, ResNet152V2 works well but suffers from the misclassification of minority classes by the majority class. InceptionResNetV2 and MobileNetV3 are to some extent effective, but MobileNetV3 shines at speed and efficiency, being weaker with distinguishing between closely related lesion types.
The performance comparison of the proposed algorithms, based on the data provided, is visualized in Fig. 13a, highlighting differences in accuracy, loss, and class differentiation across the five models-FCDS-CNN, ResNet152V2, EfficientNetV2B0, InceptionResNetV2, and MobileNetV3. The FCDS-CNN model demonstrates robust performance, excelling in handling imbalanced datasets with minimal confusion between classes, particularly in rare categories like ’akie’ and ’Vasc.’ t uses data augmentation and class weighting allowing it to have a smooth, steady increase in accuracy nearing 95% and none of it over fits. It keeps losing monotonically, and with a high confidence in the reliability of the model, it achieves impressive performance metrics in terms of precision, recall, F1 score and AUC.Overall, the FCDS-CNN performs well, but can benefit from alternatives including adaptive learning rates and dropout further improving efficiency.As shown in Table 9, the FCDS-CNN performed with a robust classification performance in all classes, even those with rare types such as ‘akie’ and ‘Vasc.’ying the model’s reliability, and it exhibits strong performance metrics across precision, recall, F1 score, and AUC. Although the FCDS-CNN performs well overall, potential improvements such as adaptive learning rates and dropout could further enhance its efficiency.
The performance of the FCDS-CNN across all classes, as summarized in Table 9, reflects a robust classification performance across various skin lesion categories, including rare types like ’akie’ and ’Vasc.’ Precisions (0.96), recalls (0.96), F1-score (0.96) and AUC (0.97) were high across the whole set of classes, indicating the capability to deal with imbalanced datasets. All classes have precision values ranging from 0.95 to 0.98 which is strongly suggest that the model should predict true positives. The ’akie’ and ’vasc’ classes achieve respective precision values of 0.97 and 0.98, representing the model’s successful performance across rare categories. The model can also identify all relevant cases with recall values between 0.94 and 0.98, just like the preceding system. For example, the recall of ’akie’ is 0.95, ’vasc’ is 0.97, showing us that the model can ensure that it doesn’t lose minority class samples in its predictions.
The trade off between precision and recall is consistently high, F1 scores are in the range of 0.96 to 0.97. Particularly, the ’Mel’ class, which is a notoriously difficult type because it often looks similar to other types, achieves an F1 score of 0.97, demonstrating that for those types, the model retains its accuracy across its complex types. Regarding FDS-CNN, it does well across all classes, with the AUC in between 0.96 and 0.98, proving that it does a great job at differentiating between benign and malignant lesions.r types, achieves a high F1-score of 0.97, reflecting the model’s ability to maintain accuracy across complex classes. Regarding AUC (Area Under the Curve), the FCDS-CNN performs exceptionally well across all classes, with values between 0.96 and 0.98, indicating its proficiency in distinguishing benign and malignant lesions. Overall, the FCDS-CNN model achieved a consistent class of various skin lesions is strong, classifying the skin lesions with high accuracy (0.96, 0.96, 0.96, F1 score, 0.97, AUC). Finally, this classification report presents the performance of the FCDS-CNN model in real-world medical applications, achieving balanced precision and recall for all lesion types and minority classes.
Its residual learning architecture provides a deep feature extraction of ResNet152V2, thus suitable for complex tasks. While it produces slower convergence and higher computational requirements, this may compromise in some situations. Training accuracy gets very high more than 90%, but both its loss curves and valiation accuracy indicate overfitting compared to its validation performance as shown in Fig. 13b. While ResNet152V2 performs well overall, it encounters challenges when dealing with imbalanced datasets, leading to occasional misclassifications, particularly in minority classes. As detailed in Table 10, ResNet152V2 demonstrates solid performance across various skin lesion types, though its precision, recall, and F1-scores slightly lag behind those of the FCDS-CNN model. For example, the model achieves precision values between 0.85 and 0.93, with the ’Mel’ class having the highest precision at 0.93, and ’df’ and ’akie’ showing relatively lower precision at 0.85 and 0.87, respectively. The model case has proved that it is very complicated to handle minority classes with fewer data points. ResNet152V2 can recall relevant cases in most classes, with recall values of 0.84 to 0.92. Yet it doesn’t fare very well with classes like ’df’ and ’akie’, where recall values are 0.84 and 0.85 respectively. ’This shows a possible problem in remembering the minority classes but ’mel’ does well with 0.92 recall.’ But even though the F1 scores are consistently strong, they’re slightly lower than others (0.84 to 0.92). We achieve an F1 score of 0.92 for the ‘Mel’ class, which is much better than the 0.84 and 0.86 for ‘df’ and ‘akie’ minority classes, respectively. In other words, performance for the various classes differs slightly. ResNet152V2 shows decent separation abilities through reasonably high AUC values (0.89 to 0.95) and performs well. An AUC value of 0.95 is recorded for the ’mv’ class, and an AUC of 0.89 for the ’df’ class. Overall, the ResNet152V2 macro-averaged metrics lead to precision of 0.89, recall of 0.88, F1-score of 0.88, and AUC of 0.92, which shows reliable performance but is more variable than on majority classes. This classification report presents the key strengths and limitations of the model in dealing with imbalanced datasets. EfficientNetV2B0 balances accuracy and efficiency, achieving rapid convergence and consistent loss reduction across training and validation datasets. Its architecture enables it to effectively manage class variance, resulting in fewer misclassifications than deeper models like ResNet152V2. However, while EfficientNetV2B0 performs well in terms of precision, recall, and AUC, challenges remain in distinguishing visually similar classes, such as ’nv’ and ’bkl,’ as shown in Fig. 13c. The model exhibits overfitting tendencies, particularly in the loss curves, indicating that further fine-tuning could improve its performance, especially on imbalanced datasets. As summarized in Table 11, EfficientNetV2B0 performs well across various skin lesion classes, though its precision, recall, and F1-scores are slightly lower compared to the FCDS-CNN model. The precision values range from 0.82 to 0.90, with ’mv’ having the highest precision at 0.90, while ’df’ and ’akie’ show lower precision at 0.82 and 0.85, respectively. This demonstrates the model’s strong performance in more balanced classes but highlights its challenges with minority classes.
The recall values range from 0.78 to 0.88, reflecting EfficientNetV2B0’s ability to identify relevant cases across most classes. However, it struggles with recall in classes like ’df’ and ’akie,’ whose recall values are 0.78 and 0.80, respectively, suggesting that the model could benefit from additional strategies to handle underrepresented classes. The F1 scores across all classes vary little and are all near 0.80 to 0.89. With F1 score of 0.87, the ’Mel’ class performs best, compared to ’df’ class (F1 score) of 0.80 and ’akie’ class (F1 score) of 0.82. For the model obviously there is a good balance between precision and recall but it could be a bit hard to get to target for minority classes. To AUC (Area Under the Curve), EfficientNetV2B0 has solid AUC values (0.86-0.91). However, the ’mv’ class has the highest AUC of 0.89, which means there is strong discrimination between the classes ’positive’ and ’negative’, whereas the ’df’ class has a lower AUC of 0.86. Despite slightly inconsistent styles between different classes, we found that EfficientNetV2B0 presents reliable performance with precision at 0.86, recall at 0.83, F1 at 0.84, and AUC at 0.88 in terms of the macro averaged metrics. This classification report shows that EfficientNetV2B0 is efficient in handling most classes but indicates that we still need some fine-tuning for those classes that are visually similar to minority classes. InceptionResNetV2, which combines inception modules and residual connections, is highly effective at capturing local and global features, making it well-suited for large and diverse datasets. However, the model faces challenges with overfitting, as evidenced by the gap between training and validation loss, as shown in Fig. 13d, suggesting that the model may generalize less effectively to unseen data. While InceptionResNetV2 excels in most tasks, particularly with larger and more distinct lesion types, it struggles to differentiate visually similar lesions, such as ’nv’ and ’Mel.’ Moreover, despite its strong overall performance, the model is computationally intensive, requiring substantial resources, which may limit its application in low-resource settings.
As shown in Table 12, InceptionResNetV2 demonstrates high precision, recall, F1-scores, and AUC values across most classes, indicating its strong capability in classifying various skin lesion types. The precision values range from 0.87 to 0.95, with ’mv’ achieving the highest precision at 0.95, while ’vasc’ and ’df’ exhibit slightly lower precision at 0.87 and 0.88, respectively. This variation shows the model can better deal with majority class than minority classes.
The model’s high recall values (between 0.87 and 0.94) indicate that the model relies correctly identifies relevant cases. But from the ’df’ and ’vasc’ classes where recall values are 0.87 and 0.88, respectively, recall may need additional fine tuning for minority classes. Across classes, F1scores are similar, matching consistently from 0.87 to 0.94a good balance between precision and recall. For instance, ’Mel’ class gets an F1 score of 0.92 while ’df’ only scores 0.87 F1 score, which further shows the model’s difficulty in dealing with underrepresented classes. In terms of AUC (Area Under the Curve) values for InceptionResNetV2 are outstanding, being between 0.93 and 0.98. AUC value of 0.98 is achieved in ’mv’ class which means the model reasonably well distinguishes benign and malignant lesions than those in classes ’df’ and ’vasc’ with AUC values of 0.93. The results are demonstrated with the macro-averaged metrics of precision = 0.91, recall = 0.91, F1-sore = 0.91, and AUC = 0.95, which indicates that the model is able to classify very well in different skin lesion types. These results also demonstrate that further fine-tuning is still needed, especially for visually similar and minority classes.
MobileNetV3, optimized for mobile and edge devices, is also designed for speed and efficiency, making it particularly suitable for real-time applications. The model converges quickly, demonstrating good performance across most metrics. However, its lightweight design results in some trade-offs, especially regarding classification accuracy for more complex datasets. As shown in Fig. 13e, MobileNetV3 struggles to differentiate between closely related classes, such as ’bkl’ and ’bcc,’ and its validation loss shows signs of overfitting. Despite these limitations, MobileNetV3 remains an excellent choice for low-resource environments, particularly where fast, real-time predictions are required. Further fine-tuning may be necessary to improve its performance on more complex classification tasks. The overall performance of MobileNetV3, as presented in Table 13, indicates strong metrics across most skin lesion types, though it does exhibit challenges with more visually similar classes. Precision values across all combinations varied from 0.85 to 0.93, and ’mv’ was the most accurate, with 0.93, followed by ’df’ and ’vasc’ of 0.85 and 0.86, respectively. These scores show that the majority of the classes are well covered, with some classes being difficult due to their less represented or more complex lesion types. When identifying relevant cases across most classes, the recall values for our MobileNetV3 model stay around 0.80 - 0.95, indicating the model’s ability to perform. Nevertheless, classes such as ’df’ and ’vasc’ have a lower recall of 0.80, which means that the model might ignore some instances in the minority class.
F1 scores are at or near 0.82 - 0.94, quite close to a good precision and recall balance. Again, while the ‘df’ class achieves a lower F1 Score of 0.82, the F1 Score of 0.94 for ‘mv’ class proves how difficult it is for the model to tackle minority classes. MobileNetV3 performs well overall for AUCs (Area Under the Curve) between 0.90 and 0.94. The highest AUC value achieved is 0.94 for the ’mv’ class, which shows the model’s ability to differentiate between different types of lesions. A lower AUC value of 0.90 is seen for the ’df’ class, where a pattern of improvement in minority class performance can be identified. The macro-averaged metrics for MobileNetV3 show precision at 0.88, recall at 0.85, F1-score at 0.86, and AUC at 0.92. These metrics reflect strong overall performance but highlight areas where further optimization could enhance its effectiveness, especially with complex or visually similar lesion types. Consequently, the comparison in Fig. 13a-e illustrates that while FCDS-CNN and EfficientNetV2B0 provide the most balanced performance, particularly in handling imbalanced datasets, ResNet152V2 and InceptionResNetV2 excel in feature extraction at the cost of higher computational complexity. MobileNetV3, although ideal for real-time and mobile-based applications, requires additional optimization to manage complex datasets effectively. The choice of model depends on the specific requirements of computational resources, speed, and dataset complexity.
Result analysis
This research proposed five models for skin lesion classification: four pre-trained models, including ResNet152V2, EfficientNetV2B0, MobileNetV3, and InceptionResNetV2, and a FCDS-CNN. Each pre-trained model has its strengths, such as ResNet152V2’s depth for feature extraction, EfficientNetV2B0’s scalability, MobileNetV3’s speed and efficiency in resource-limited environments, and InceptionResNetV2’s ability to handle images of different scales. However, experimental results revealed weaknesses in the pre-trained models, particularly in distinguishing between similar-appearing lesion types, especially when dealing with class imbalance inherent in the dataset. The models struggled with minority classes, frequently misclassifying them as the majority class, ‘nv’ (melanocytic nevi). This difficulty in differentiating between visually similar and less frequent lesions highlights a critical limitation of using pre-trained models for specialized tasks like skin cancer diagnosis. Compared to the FCDS-CNN, tailored to the dataset. The FCDS-CNN achieved a 96% accuracy, outperforming the best-pretrained model InceptionResNetV2 with 93%. For example, when considering ‘bkl’ (benign keratosis-like lesions) and ‘nv’ (melanocytic nevi), two frequently confused classes, ResNet52V2 only achieved a precision of 0.57 for ‘bkl’ and 0.90 for ‘nv’, while the FCDS-CNN reached 0.95 and 0.96 respectively (Table 8). This substantial improvement in precision for ‘bkl’ demonstrates the FCDS-CNN’s ability to learn discriminative features specific to this lesion type, even with limited training examples. This demonstrates the FCDS-CNN’s superior ability to distinguish between these visually similar but clinically distinct lesions, a key challenge presented by the class imbalance, where ‘nv’ represents the majority class. Similarly, MobileNetV3, despite good overall accuracy, had a significantly lower F1-score for ‘bkl’ at 0.58, compared to 0.95 achieved by FCDS-CNN (Table 12). This difference in the F1-score further emphasizes the FCDS-CNN’s superior performance in balancing precision and recall for minority classes. The confusion matrices (Figure 11) further illustrate this point, showing noticeable off-diagonal elements indicating misclassifications between ‘bkl’ and ‘nv’ for the pre-trained models. At the same time, the FCDS-CNN exhibits much cleaner separation. This highlights the FCDS-CNN’s effectiveness in correctly classifying minority classes like ‘bkl’, which are often underrepresented in the dataset compared to the dominant ‘nv’ class. Unlike the pre-trained models, which were trained on general datasets, the FCDS-CNN targeted key features for skin lesion classification. Techniques like data augmentation and class weighting were applied to address data imbalance. Further contributing to its superior performance on minority classes. While the pre-trained models showed strength in some areas, the FCDS-CNN’s precision enabled it to outperform them, proving the value of task-specific model design.
Clinical deployment
Several practical deployment considerations must be addressed to realize this potential in a clinical setting. The FCDS-CNN’s relatively streamlined architecture offers flexibility in terms of hardware, suggesting potential deployment on platforms ranging from high-performance workstations in clinical settings to more resource-constrained mobile devices for point-of-care applications. The choice of platform would depend on the specific needs of the clinical workflow, balancing computational power with cost-effectiveness and portability. A crucial factor for clinical usability is inference time. While our initial findings indicate the FCDS-CNN is computationally efficient, rigorous benchmarking on representative hardware is needed to establish precise inference speeds and ensure responsiveness suitable for real-time diagnostic support. Equally critical is the interpretability of the model’s predictions for dermatologists. Explaining the why behind a diagnosis is essential for clinical trust and adoption. Future work will investigate techniques like saliency maps and attention mechanisms to provide visual insights into the model’s decision-making process, making its predictions more transparent and understandable for clinicians. Moreover, seamless integration into existing EHR systems is vital for practical deployment, necessitating careful design of data exchange protocols, user interfaces, and robust data privacy safeguards. Finally, rigorous clinical validation and regulatory approval are prerequisites for any real-world clinical application, ensuring the model’s safety and effectiveness in a real clinical environment.
Result analysis comparison for various proposed architectures
The analysis of performance indicators and ROC curves of the five proposed models, FCDS-CNN, InceptionV4, ResNet152V2, InceptionResNetV2, and EfficientNetV2B0, allows valuable conclusions about their efficiency in the diagnostics of skin lesions. From the table, it can be observed that the FCDS-CNN has higher precision, recall, F1-score, and AUC score than all the pre-trained models, and the accuracy of the FCDS-CNN is the highest at 96 percent. This means that the proposed model will perform better in distinguishing amongst the various classes of skin lesions, making it better suited for medical image classification than other models.
Figure 14a-b, depict the performance of the proposed skin cancer detection model using different metrics such as precision, recall, F1-rating, and AUC score. The FCDS-CNN shows higher overall performance across all metrics measured through InceptionResNetV2. Other models like InceptionV4, ResNet152V2, and EfficientNetV2B0 also perform well, with slight variations in considering and F1-score metrics. The FCDS-CNN is more relevant for the data used in this research, as it provides better performance than the default model. However, the architecture of these models is generic and may not yield satisfactory performance when using narrowly defined structures. Therefore, model specialization is necessary for maximum performance for specific tasks, especially in complex areas like medical image classification.
In Table 14 the ROC curve analysis, it is clear that the FCDS-CNN is the most optimal one with the highest TPR of 0.97 which shows the ability of the model to classify between the positive cases and the negative ones throughout the dataset. However, pre-processing with the InceptionResNetV2 model results in a higher AUC of 0.95, lower than the FCDS-CNN’s results. Other pre-trained models, InceptionV4, ResNet152V2, and EfficientNetV2B0, show good but slightly inferior performance.
Result analysis with cutting-edge approaches
This work introduces a novel end-to-end approach based on an FCDS-CNN, outperforming existing pre-trained and FCDS-CNN methods for the classification of skin cancer. As shown in Table 15, the FCDS-CNN achieves a 96% accuracy, surpassing models like the Fully Convolutions Residual Network (FCRN 85.5%), InceptionV3 (82.8%), and VGG-16 (78%). This superior performance is largely due to the custom architecture specifically designed to identify complex patterns in skin lesions.
Unlike usual pre-trained models, it was trained using a generalized dataset, namely ImageNet, but with focused optimization on specific skin cancer detection. In addition, the FCDS-CNN adopts class balancing and data augmentation to overcome problems like class imbalance and different image quality, maintaining a high accuracy on all classes. In addition, the model includes some domain knowledge, allowing it to differentiate between benign and malignant lesions more accurately. On the other hand, pre-trained models do not quite fit into the skin lesion classification, as fixed architectures are unspecific to the task’s particular requirements.
This research presents the FCDS-CNN, a novel approach to automated skin cancer classification that significantly improves over existing methods. The FCDS-CNN’s novelty lies in its integrated approach to handling class imbalance, combining weighted loss functions with advanced data augmentation techniques. This strategy results in a notable 5% improvement in the F1-score for minority classes compared to ensemble methods lacking these enhancements, addressing a crucial challenge in skin cancer datasets. Furthermore, the FCDS-CNN’s streamlined architecture reduces computational overhead, making it a computationally efficient alternative to resource-intensive ensemble methods. The practical implications of this research are substantial. The model’s high accuracy and efficiency make it well-suited for integration into real-time clinical workflows, potentially improving patient survival rates through earlier and more accurate melanoma detection. Its real-time inference capabilities also offer immediate diagnostic feedback during dermoscopy examinations, enhancing the decision-making process for dermatologists. Future research will validate the FCDS-CNN on larger, multi-center datasets to ensure robustness and generalizability across diverse populations and imaging conditions. Additionally, integrating interpretability tools like Grad-CAM and SHAP will enhance clinician trust and provide a deeper understanding of the model’s predictions. In conclusion, the FCDS-CNN represents a significant advancement in automated skin cancer classification. It demonstrates superior performance compared to state-of-the-art methods and holds substantial promise for clinical deployment and improved patient care. Future work will refine the model for broader dermatological applications and enhance its usability within real-world medical environments.
Experimental discussion
This work explores the capability of different deep-learning architectures for skin lesion classification in medical image classification. We experimented with both pre-trained models from the web and an FCDS-CNN model. The well-optimized architecture of FCDS-CNN over pre-trained models in the dataset made it perform better than pre-trained models. The FCDS-CNN model solved multiple challenges, including image sizes, class imbalance, and subtle class differences. Measures such as data augmentation, changes of class weight, and sensible hyperparameter settings improved generalization ability and accuracy.
However, the FCDS-CNN model performed better than pre-trained models; InceptionV4, ResNet152V2, InceptionResNetV2, and EfficientNetV2B0 had strong baseline figures. Such experiments may necessitate serious tuning or change in these models to succeed. The results show that the model is ideal for specialization in some tasks and in non-trivial applications such as medical image classification. Customizing from the state-of-the-art pre-trained models outperformed developing a model from scratch while using the diet data. It points out that the sensitivity to model architecture and training algorithms represents the basis of the model’s performance for specific tasks.
Conclusion
This research systematically assessed the five deep-learning models for the skin cancer analysis and diagnosis. Four models were trained architectures, while the fifth was a raw FCDS-CNN model created from scratch. The goal was to fill the proper skin cancer classification gap by comparing their performance on a large data set. The results showed that the proposed FCDS-CNN model with an accuracy of 96% was highly superior and better than all the other pre-trained models. InceptionResNetV2 had the highest accuracy of about 93%, followed by ResNet152V2 with about 91%, MobileNetV3 with about 90%, and EfficientNetV2B0 with about 88%. The FCDS-CNN model outperformed the pre-trained models due to its unique architecture, which was trained to extract features of skin lesions with high accuracy, considering challenges like class imbalance, different image quality, and feature-specific extraction.
This study is valuable for numerous healthcare applications. It is more appropriate than generic solutions and is specifically designed for early skin cancer detection. The model’s higher accuracy and individualized approach make it better integrated into clinical practice, helping dermatologists diagnose the disease. The proposed model, with a high level of precision and a focus on skin cancer, eliminates shortcomings in current diagnostic tools and opens the door to increased diagnostic potential and improving life conditions and skin cancer death rates among patients worldwide.
Data availibility
This research used publicly available dataset dataset available on Kaggle.
References
Sardar, M., Niazi, M.M., Nasim, F. et al. Ensemble deep learning methods for detecting skin cancer. Bulletin of Business and Economics (BBE) 13 (2024).
Jicman, P. A., Smart, H., Ayello, E. A. & Sibbald, R. G. Early malignant melanoma detection, especially in persons with pigmented skin. Advances in Skin & Wound Care 36, 69–77 (2023).
Jamil, D., Qazi, F., e Shawar Agha, D., Palaniappan, S. Classification of skin lesion using deep convolutional neural network by applying transfer learning. Journal of Autonomous Intelligence (2023).
Almufareh, M. F., Tariq, N., Humayun, M. & Khan, F. A. Melanoma identification and classification model based on fine-tuned convolutional neural network. Digital Health 10, 20552076241253756 (2024).
Naqvi, M., Gilani, S. Q., Syed, T., Marques, O. & Kim, H.-C. Skin cancer detection using deep learning-a review. Diagnostics 13, 1911 (2023).
Mahmood, T. et al. A brief survey on breast cancer diagnostic with deep learning schemes using multi-image modalities. IEEe Access 8, 165779–165809 (2020).
Musthafa, M. M. & TR, M., V, V. K., Guluwadi, S.,. Enhanced skin cancer diagnosis using optimized cnn architecture and checkpoints for automated dermatological lesion classification. BMC Medical Imaging 24, 201 (2024).
Jojoa Acosta, M.F., Caballero Tovar, L.Y., Garcia-Zapirain, M.B., Percybrooks, W.S. Melanoma diagnosis using deep learning techniques on dermatoscopic images. BMC Medical Imaging 21, 1–11 (2021).
Shetty, B. et al. Skin lesion classification of dermoscopic images using machine learning and convolutional neural network. Scientific Reports 12, 18134 (2022).
Dandu, R., Murthy, M.V., Kumar, Y.R. Transfer learning for segmentation with hybrid classification to detect melanoma skin cancer. Heliyon 9 (2023).
Ramineni, K., Reddy, L. K.K., Ramana, T.V., Rajesh, V. Classification of skin cancer using integrated methodology. In International Conference on Data Science and Applications, 105–118 (Springer, 2023).
Kalouche, S., Ng, A., Duchi, J. Vision-based classification of skin cancer using deep learning. 2016. Accessed on 10 (2021).
Ajel, A.R., Al-Dujaili, A.Q., Hadi, Z.G., Humaidi, A.J. Skin cancer classifier based on convolution residual neural network. International Journal of Electrical & Computer Engineering (2088-8708) 13 (2023).
Usama, M., Naeem, M.A., Mirza, F. Multi-class skin lesions classification using deep features. Sensors 22 (2022).
Venugopal, V., Raj, N. I., Nath, M. K. & Stephen, N. A deep neural network using modified efficientnet for skin cancer detection in dermoscopic images. Decision Analytics Journal 8, 100278 (2023).
Rahman, Z., Hossain, M. S., Islam, M. R., Hasan, M. M. & Hridhee, R. A. An approach for multiclass skin lesion classification based on ensemble learning. Informatics in Medicine Unlocked 25, 100659 (2021).
Benyahia, S., Meftah, B., LÃzoray, O. Multi-features extraction based on deep learning for skin lesion classification. Tissue and Cell 74, 101701, https://doi.org/10.1016/j.tice.2021.101701 (2022).
Pacheco, A. G. & Krohling, R. A. An attention-based mechanism to combine images and metadata in deep learning models applied to skin cancer classification. IEEE journal of biomedical and health informatics 25, 3554–3563 (2021).
Dildar, M. et al. Skin cancer detection: a review using deep learning techniques. International journal of environmental research and public health 18, 5479 (2021).
Gouda, W., Sama, N.U., Al-Waakid, G., Humayun, M., Jhanjhi, N.Z. Detection of skin cancer based on skin lesion images using deep learning. In Healthcare, vol. 10, 1183 (MDPI, 2022).
V.Rajalakshmi, Mrs.P.Jasmine, Ebenazer, L. Skin melanoma detection using neural network (2022).
Oumoulylte, M., Alaoui, A.O., Farhaoui, Y., El Allaoui, A., Bahri, A. Convolutional neural network-based skin cancer classification with transfer learning models. Radioelectronic and Computer Systems 75–87 (2023).
Sönmez, A. F. et al. Deep learning-based classification of dermoscopic images for skin lesions. Sakarya University Journal of Computer and Information Sciences 6, 114–122 (2023).
Ali, Z., Naz, S., Zaffar, H., Choi, J. & Kim, Y. An iomt-based melanoma lesion segmentation using conditional generative adversarial networks. Sensors 23, 3548 (2023).
Singha, S., Roy, P. Skin cancer classification and comparison of pretrained models performance using transfer learning. (2022).
Ali, K., Shaikh, Z. A., Khan, A. A. & Laghari, A. A. Multiclass skin cancer classification using efficientnets-a first step towards preventing skin cancer. Neuroscience Informatics 2, 100034 (2022).
Zafar, M. et al. Deeplabv3+-based segmentation and best features selection using slime mould algorithm for multi-class skin lesion classification. Mathematics 11, 364 (2023).
Narayanamurthy, V. et al. Skin cancer detection using non-invasive techniques. RSC Adv. 8, 28095–28130. https://doi.org/10.1039/C8RA04164D (2018).
Pratavieira, S., Andrade, C.T., Salvio, A.G., Bagnato, V., Kurachi, C. Optical imaging as auxiliary tool in skin cancer diagnosis. In Porta, C. A. L. (ed.) Skin Cancers, chap. 7, https://doi.org/10.5772/25465 (IntechOpen, Rijeka, 2011).
Shafiullah, A. et al. Detection of skin cancer: A deep learning approach. In 2023 IEEE Smart World Congress (SWC), 1–8, https://doi.org/10.1109/SWC57546.2023.10448820 (2023).
Calin, M. A., Parasca, S. V., Savastru, R., Calin, M. R. & Dontu, S. Optical techniques for the noninvasive diagnosis of skin cancer. Journal of Cancer Research and Clinical Oncology 139, 1083–1104. https://doi.org/10.1007/s00432-013-1423-3 (2013).
Arun, K.A., Palmer, M. Skin cancer detection using deep learning. In 2024 10th International Conference on Communication and Signal Processing (ICCSP), 1712–1717, https://doi.org/10.1109/ICCSP60870.2024.10543954 (2024).
Magdy, A., Hussein, H., Abdel-Kader, R. F. & Salam, K. A. E. Performance enhancement of skin cancer classification using computer vision. IEEE Access 11, 72120–72133. https://doi.org/10.1109/ACCESS.2023.3294974 (2023).
Jaleel, J. A., Salim, S., Aswin, R. B. Computer aided detection of skin cancer. In 2013 International Conference on Circuits, Power and Computing Technologies (ICCPCT), 1137–1142, https://doi.org/10.1109/ICCPCT.2013.6528879 (2013).
Heibel, H. D., Hooey, L. & Cockerell, C. J. A review of noninvasive techniques for skin cancer detection in dermatology. American Journal of Clinical Dermatology 21, 513–524. https://doi.org/10.1007/s40257-020-00517-z (2020).
Sabouri, P., GholamHosseini, H. Lesion border detection using deep learning. In 2016 IEEE Congress on Evolutionary Computation (CEC), 1416–1421 (IEEE, 2016).
Kalouche, S. Vision-based classification of skin cancer using deep learning (2016).
Hasan, M., Barman, S.D., Islam, S., Reza, A.W. Skin cancer detection using convolutional neural network. In Proceedings of the 2019 5th international conference on computing and artificial intelligence, 254–258 (2019).
Singh, V., Nwogu, I. Analyzing skin lesions in dermoscopy images using convolutional neural networks. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 4035–4040 (IEEE, 2018).
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. nature 542, 115–118 (2017).
Brinker, T. J. et al. Comparing artificial intelligence algorithms to 157 german dermatologists: the melanoma classification benchmark. European Journal of Cancer 111, 30–37 (2019).
Tschandl, P. et al. Human-computer collaboration for skin cancer recognition. Nature medicine 26, 1229–1234 (2020).
Haenssle, H. A. et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Annals of oncology 29, 1836–1842 (2018).
Bansal, N., Sridhar, S. Skin lesion classification using ensemble transfer learning. Lecture Notes in Networks and Systems (2021).
Khan, M.A., Javed, M.Y., Sharif, M., Saba, T., Rehman, A. Multi-model deep neural network based features extraction and optimal selection approach for skin lesion classification. In 2019 international conference on computer and information sciences (ICCIS), 1–7 (IEEE, 2019).
Hosny, K. M., Kassem, M. A. & Fouad, M. M. Classification of skin lesions into seven classes using transfer learning with alexnet. Journal of digital imaging 33, 1325–1334 (2020).
Chaturvedi, S. S., Tembhurne, J. V. & Diwan, T. A multi-class skin cancer classification using deep convolutional neural networks. Multimedia Tools and Applications 79, 28477–28498 (2020).
Sagar, A., Dheeba, J. Convolutional neural networks for classifying melanoma images. BioRxiv 2020–05 (2020).
DeVries, T., Ramachandram, D. Skin lesion classification using deep multi-scale convolutional neural networks. arXiv preprint arXiv:1703.01402 (2017).
Rashid, H., Tanveer, M.A., Khan, H.A. Skin lesion classification using gan based data augmentation. In 2019 41St annual international conference of the IEEE engineering in medicine and biology society (EMBC), 916–919 (IEEE, 2019).
Mahbod, A. et al. Fusing fine-tuned deep features for skin lesion classification. Computerized Medical Imaging and Graphics 71, 19–29 (2019).
Dorj, U.-O., Lee, K.-K., Choi, J.-Y. & Lee, M. The skin cancer classification using deep convolutional neural network. Multimedia Tools and Applications 77, 9909–9924 (2018).
Aswin, R., Jaleel, J.A., Salim, S. Hybrid genetic algorithm–artificial neural network classifier for skin cancer detection. In 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), 1304–1309 (IEEE, 2014).
Kawahara, J., BenTaieb, A., Hamarneh, G. Deep features to classify skin lesions. In 2016 IEEE 13th international symposium on biomedical imaging (ISBI), 1397–1400 (IEEE, 2016).
Acknowledgements
The authors would like to thank China’s National Key R&D Program for providing the experimental facilities used to perform these experiments. The author would also like to thank the Artificial Intelligence and Data Analytics (AIDA) Lab, CCIS, Prince Sultan University Riyadh, Saudi Arabia, for support. The authors are thankful for the support.
Funding
This study is supported by the National Key R&D Program of China with project no. 2020YFB2104402
Author information
Authors and Affiliations
Contributions
Khadija, Atika, Iqra, Jianqiang, Yu, Tariq, and Amjad conceived this study. Khadija, Atika, Jianqiang, and Tariq contributed to the design of this research. Iqra, Jianqiang, Yu, Tariq, and Amjad reviewed, drafted, and revised the study. Tariq supervised, and Jianqiang provided the funding. All authors have proofread this study.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Institutional review board statement
Not applicable. All methods were carried out in accordance with relevant guidelines and regulations.
Informed consent
Not applicable.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Nawaz, K., Zanib, A., Shabir, I. et al. Skin cancer detection using dermoscopic images with convolutional neural network. Sci Rep 15, 7252 (2025). https://doi.org/10.1038/s41598-025-91446-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-91446-6