[go: up one dir, main page]

0% found this document useful (0 votes)
61 views6 pages

Vision Transformer for MRI Brain Tumor Detection

This study evaluates the effectiveness of Vision Transformer (ViT) models for brain tumor detection using MRI images, demonstrating high accuracy rates across various patch sizes. The proposed method outperforms traditional pretrained models like ResNet-50 and DenseNet121, achieving up to 99% accuracy with an 8x8 patch size. The findings highlight the potential of ViT in enhancing diagnostic accuracy in medical imaging applications.

Uploaded by

Heba M. Emara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views6 pages

Vision Transformer for MRI Brain Tumor Detection

This study evaluates the effectiveness of Vision Transformer (ViT) models for brain tumor detection using MRI images, demonstrating high accuracy rates across various patch sizes. The proposed method outperforms traditional pretrained models like ResNet-50 and DenseNet121, achieving up to 99% accuracy with an 8x8 patch size. The findings highlight the potential of ViT in enhancing diagnostic accuracy in medical imaging applications.

Uploaded by

Heba M. Emara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Comprehensive Evaluation of Vision Transformer in

MRI-based Brain Tumor Detection


Alaa E. Ahmed Muhammad A. Elwany
dept. of Electronic and Electrical Engineering dept. of Electronic and Electrical Engineering
PHI for Engineering and Technology PHI for Engineering and Technology
6th of October City, Giza, Egypt 6th of October City, Giza, Egypt
liloelsayed7@[Link] [Link]@[Link]

Muhammad A. Mahmoud Mohamed A. Fouad


dept. of Electronic and Electrical Engineering dept. of Electronic and Electrical Engineering
PHI for Engineering and Technology PHI for Engineering and Technology
6th of October City, Giza, Egypt 6th of October City, Giza, Egypt
mmooggh484@[Link] mf7607069@[Link]

Muhammad A. Al-Ghareeb Tarek Dahab Mohamed R. Shoaib


dept. of Electronic and Electrical Engineering Shgardi Company College of Computing and Data Science (CCDS)
PHI for Engineering and Technology Shgardi Company Nanyang Technological University (NTU)
6th of October City, Giza, Egypt Riyadh, Saudi Arabia Singapore
MohamedGharib2@[Link] Goldentarek@[Link] MOHAMEDR003@[Link]

Ahmed F Abdallah Gamal A. El-Sheikh


dept. of Electronic and Electrical Engineering dept. of Electronic and Electrical Engineering
PHI for Engineering and Technology PHI for Engineering and Technology
6th of October City, Giza, Egypt 6th of October City, Giza, Egypt
afelsharkasy@[Link] gaelsheikh@[Link]

Heba M. Emara
dept. of Electronic and Electrical Engineering
PHI for Engineering and Technology
6th of October City, Giza, Egypt
Hebam9909@[Link]

Abstract—Brain tumor detection using magnetic resonance clinical settings. This study underscores the promise of Vision
imaging (MRI) plays a pivotal role in early diagnosis and Transformer models in medical imaging tasks and their potential
treatment planning. This study presents a novel approach for for advancing medical diagnostics.
brain tumor detection using MRI images, preceded by prepro- Index Terms—MRI, Brain Tumor, CNN, Vision Transform,
cessing steps including contrast enhancement, noise removal, Patch Size
and normalization. Leveraging the Vision Transformer (ViT)
architecture, the proposed method is examined across differ-
ent patch sizes (8x8, 16x16, 32x32, 64x64) to comprehensively I. I NTRODUCTION
evaluate its performance. Evaluation is conducted on the Brain A tumor represents an abnormal proliferation of tissue cells
Tumor MRI Dataset, comparing the proposed method with four
pretrained models: ResNet-50, DenseNet121, EfficientNetB0, and devoid of physiological function. Tumors are typically catego-
InceptionV3. Performance metrics including accuracy, precision, rized as either benign or malignant and often associated with
recall, and F-score are employed for evaluation. Notably, the cancerous conditions. A brain tumor, medically referred to as
proposed Vision Transformer achieves an accuracy of 99% an intracranial tumor, manifests as an anomalous accumulation
with a patch size of 8x8, 97% with a patch size of 16x16, of tissue where cellular proliferation occurs uncontrollably.
96% with a patch size of 32x32, and 93% with a patch size
of 64x64. These results highlight the superior efficacy of the Such tumors can emerge in various regions of the brain or
proposed approach in brain tumor detection compared to the within the surrounding structures, encompassing the protective
pretrained models. Comparative analysis with state-of-the-art meninges, the skull base, the brainstem, the sinuses, nasal
methods further confirms the proposed approach’s superiority, cavities, and other anatomical sites. Not all brain tumors
indicating its potential for enhancing diagnostic accuracy in possess malignant characteristics; those lacking cancerous
properties are termed noncancerous or benign brain tumors. Hussain Alasmawi et al. [13] investigated a vision transformer-
While noncancerous brain tumors may gradually enlarge over based method for automatically delineating HN tumors and
time, exerting pressure on surrounding brain tissue, malignant compared it with leading CNN-based models. They utilized
brain tumors, or brain cancers, are characterized by rapid multi-modal data from CT and PET scans for segmentation
growth and invasive properties that enable them to infiltrate and found that the transformer-based model demonstrated
and disrupt normal brain tissue architecture [1], [2]. comparable results to CNN-based models. Through cross-
Brain tumors encompass various types, each characterized validation, they achieved a mean dice similarity coefficient
by the specific cellular composition of the tumor. These types (DSC) of 73.6%, a mean precision of 76.6%, and a mean
include gliomas, the most prevalent form of malignant brain recall of 76.6%. Similarly, on the testing set, they obtained a
tumors, which originate from cells resembling glial cells, the DSC of 73.6%, a precision of 77.3%, and a recall of 76.0%.
supportive cells surrounding nerve cells within brain tissue Jakub Siłka et al. [14] proposed a correlation learning mech-
[3]. Additionally, pituitary tumors may arise within or around anism (CLM) tailored for deep neural network architectures,
the pituitary gland, situated near the base of the brain [4], merging CNN with classical architecture principles. This inte-
[5]. Meningiomas are tumors that develop in the membranes gration incorporates a supportive neural network to facilitate
enveloping the brain and spinal cord, while pineal tumors the classic model in discerning optimal filters for pooling and
originate in or around the pineal gland located at the brain convolution layers, leading to expedited learning and height-
center, responsible for producing the hormone melatonin, ened efficiency. The CLM model achieved a commendable
which regulates sleep patterns [6], [7]. accuracy of approximately 96%, accompanied by precision
Detecting brain tumors in their early stages poses a sig- and recall rates nearing 95%. Akmalbek et al. [15] addressed
nificant challenge, necessitating a comprehensive array of the intricate challenge of brain tumor detection in MRI scans
diagnostic tests and imaging scans. These include head CT by leveraging a substantial collection of brain tumor images.
scans, which utilize X-rays to produce detailed images of the Employing transfer learning, they refined a state-of-the-art
brain, and brain MRI scans employing strong magnetic fields YOLOv7 model, enhancing its ability to detect various brain
to visualize brain structures with high precision, commonly tumors such as gliomas, meningiomas, and pituitary tumors.
utilized for tumor detection. PET scans of the brain involve Their deep learning model exhibited promising performance in
injecting a radioactive tracer into the bloodstream, enabling accurately identifying the presence and precise localization of
the visualization of brain tumor cells on PET scan images. brain tumors in MRI images, boasting an impressive accuracy
Additionally, neurological examinations are conducted, and of 96.5% compared to traditional techniques.
tissue samples are collected for testing, enhancing the diag- Alamin et al. [16] proposed a deep learning approach based
nostic process [8]–[11]. on transfer learning for the classification of brain tumors. Their
Symptoms of brain tumors can vary depending on factors methodology involved preprocessing steps and the utilization
such as size, location, and tumor type. Common symptoms of various transfer learning architecture reconstruction algo-
include headaches, seizures, vision or hearing problems, diffi- rithms, including Xception, ResNet50V2, InceptionResNetV2,
culties with balance or coordination, weakness or numbness in and DenseNet201, followed by fine-tuning. The achieved
limbs, speech impairments, cognitive difficulties, and changes accuracies were 97.40%, 97.68%, 97.36%, and 98.72% for
in mood or personality. Xception, ResNet50, InceptionResNetV2, and DenseNet201,
This paper introduces an approach for brain tumor detection respectively. Notably, ResNet50V2 exhibited the highest ac-
utilizing MRI images and leveraging the Vision Transformer curacy rate of 97.68%. Mushtaq et al. [17] introduced the
architecture. The contributions of this work are itemized as VGG16 deep learning model for brain tumor identification,
follows: achieving a testing accuracy of 97.33%. However, despite
• Comprehensive evaluation of the proposed method across its high accuracy, deep learning models such as VGG16 are
different patch sizes, providing insights into the impact often regarded as ”black boxes,” providing predictions without
of patch size variation on model performance. transparent explanations.
• Comparative analysis with state-of-the-art models, high-
III. M ATERIAL AND M ETHOD
lighting the superiority of the proposed approach in brain
Dataset Description
tumor detection.
• Outlining the significance of Vision Transformer models To evaluate the performance of the proposed model, we
in medical imaging tasks and their potential for advancing utilized the brain tumor MRI dataset [18], obtained from
diagnostic accuracy in clinical settings. Kaggle datasets, for training, validation, and testing purposes.
This composite dataset amalgamates three publicly available
II. R ELATED WORKS datasets: Figshare, SARTAJ, and Br35H, comprising a total
MdIshtyaq Mahmud et al. [12] proposed a CNN model of 7023 MRIs of the human brain in grayscale and JPG
for brain tumor detection and compared its performance with format. The dataset encompasses four distinct classes of brain
ResNet-50, VGG16, and Inception V3 architectures. The CNN tumors: Glioma (1321 images for training, 300 for testing),
model achieved an accuracy of 93.3%, an area under the ROC Meningioma (1339 images for training, 306 for testing), No-
curve (AUC) of 98.43%, a recall of 91.19%, and a loss of 0.25. tumor (1595 images for training, 405 for testing), and Pituitary
(1457 images for training, 300 for testing). For training In Equation 3, M LP represents the output of the fully
and validation, we allocated 80% and 20% of the images, connected feed-forward dense block, z ′ is the input to the
respectively. Sample images from the dataset are illustrated MLP, and z1′ is the output of the MLP. At each step of the
in Figure 1. input sequence, three values - query (Q), key (K), and value
(V ) - are generated by multiplying the element against three
learned matrices (UQKV ) (Equation 4):
[Q, K, V ] = z × UQKV (4)
In Equation 4, Q, K, and V represent the query, key, and value
matrices, respectively, and UQKV is the learned matrix. The
dot-product of Q and K is then scaled and passed through
Fig. 1. Samples from the dataset. a softmax function for normalization, resulting in attention
outputs A (Equation 5):
A. Vision Transform QK T
 
A = softmax (5)
The Vision Transformer (ViT) represents a breakthrough DK
in computer vision, offering a novel approach to image
In Equation 5, A represents the attention outputs, QK T is the
processing by leveraging the Transformer architecture ini-
dot product of Q and K, and DK is the dimension of the key
tially designed for natural language processing tasks. Unlike
matrix [25].
conventional convolutional neural networks (CNNs), which
process images through hierarchical feature extraction [19], B. The proposed Approach
ViT directly models image data as sequences of tokens,
enabling more effective capture of global dependencies and
long-range interactions. In this paradigm, the original image
is divided into a grid of patches, which are then transformed
into an embedded sequence. Each sequence element incor-
porates positional information to retain spatial relationships
within the image. The subsequent encoding process, facili-
tated by multiple transformer encoder layers, further refines
these representations through self-attention mechanisms and
feed-forward neural networks. This paper explores the inner
workings of ViT, elucidating the mathematical formulations
and operations involved in processing image data [20]–[23].
The embedded sequence, denoted as z0 , encompasses the class
token vclass and each patch xi multiplied by an encoding matrix
E to incorporate positional information, represented as Epos
(Equation 1):
z0 = [vclass ; x1 E; x2 E; ...xn E] + Epos (1)
In Equation 1, z0 represents the embedded sequence, vclass is
the class token, xi are the patches, E is the encoding matrix,
Fig. 2. Block diagram of the proposed approach.
and Epos incorporates positional information. The encoder
architecture comprises two identical layers, each comprising This paper introduces an advanced methodology for brain
a multihead self-attention block (MSA) and a fully connected tumor detection, utilizing VIT technology. Illustrated in Figure
feed-forward dense block (MLP) [24]. The MSA operation 2, this approach represents a significant advancement in med-
(Equation 2) involves layer normalization (LN) applied to the ical imaging, combining meticulous preprocessing with VIT
previous layer output zl−1 before adding the output of the transformative capabilities. Through detailed analysis, each
MSA (zl ): step of the methodology will be explored, emphasizing key
M SA = LN(Zl−1 ) + zl−1 , l = 1...L (2) parameters critical to its success. Initially, brain tumor images
undergo meticulous preprocessing to optimize their quality,
In Equation 2, M SA represents the multihead self-attention
including noise reduction, contrast enhancement, and normal-
block output, Zl−1 is the output of the previous layer, and zl
ization, with parameters such as Gaussian kernel size for noise
is the output of the current layer. Similarly, the MLP operation
reduction set to 3x3, contrast enhancement factor set to 1.5,
(Equation 3) entails layer normalization (LN) applied to the
and normalization range set to [0, 1]. Following preprocessing,
input z ′ before adding the output of the MLP (z1′ ):
the images are partitioned into fixed-size patches through
M LP = LN(z ′ ) + z1′ , l = 1...L (3) Image Patching, with parameters specifying the patch size
set to 32x32 pixels and an overlap of 50% between adjacent IV. R ESULTS AND D ISCUSSION
patches. This enables detailed analysis of distinct regions
within the images. Subsequently, each patch is transformed This paper investigates the performance of Vision Trans-
into a vector and embedded into a lower-dimensional space former compared to four widely used pretrained models
using a linear projection, with parameters such as embedding (ResNet-50, Inception V3, EfficientNet B0, and DenseNet121)
dimension set to 512. Additionally, positional encodings are for brain tumor classification. The model parameters for brain
incorporated into each patch embedding to preserve spatial tumor classification include a learning rate of 0.001 and a
information, utilizing sinusoidal positional encodings with a weight decay of 0.0001, which govern the optimization process
maximum frequency set to 30. The embedded patches are during trainingn phase. Training is conducted for a single
then processed through a series of transformer encoder blocks, epoch to assess the initial performance and observe any trends.
with parameters including the number of encoder blocks set The patch size is set to 7, determining the size of patches
to 12, the number of attention heads set to 8, and the feed- extracted from input images, while the number of patches is
forward network dimension set to 2048. This configuration calculated based on the image size to ensure comprehensive
allows the model to effectively focus on relevant patches coverage. Each patch undergoes dimensionality reduction to
and learn intricate relationships between them. Following the a projection dimension of 64 before being processed by
transformer encoder blocks, a classification head is introduced a multi-head self-attention mechanism with 4 heads. The
to make the final prediction regarding tumor presence. This transformer architecture consists of 8 layers, each containing
classification head consists of fully connected layers, with transformer units with dimensions specified as [128, 64].
parameters such as the number of hidden units set to 1024 Finally, the output of the transformer layers is fed MLP
and the output dimension set to the number of tumor classes. head with units configured as [56, 28], facilitating the final
A softmax layer is applied to generate probability distributions classification decision. These parameters collectively define
over the tumor classes. Finally, an MLP processes the output the architecture and training regimen of the model, aiming
from the classification head to determine the presence or to optimize performance and accuracy in brain tumor clas-
absence of brain tumors. The MLP incorporates parameters sification. Different patch sizes are examined and evaluated.
such as the number of hidden layers, activation functions Table II provides a comparative analysis of the performance
ReLU, and dropout rate of 0.6 to ensure robust decision- metrics for various deep learning models, including ResNet50,
making based on the probabilities assigned to each class by EfficientNetB0, DenseNet121, InceptionV3, and VIT, across
the classification head [26]. different patch sizes. Each row represents a different model,
while the columns denote evaluation metrics such as Accuracy,
Precision, Recall, and F-score. Among the traditional CNN
C. performance metrics architectures, ResNet50 shows competitive performance but
with slightly lower accuracy compared to EfficientNetB0 and
The success of a model, whether developed from scratch DenseNet121. InceptionV3 stands out with higher accuracy,
or replacing an existing one, is often determined by the precision, recall, and F-score, indicating its effectiveness in
number of correct predictions made. However, relying solely the classification task. However, the VIT model surpasses these
on classification accuracy may not suffice to fully assess the traditional architectures, particularly with smaller patch sizes
model performance [27]. To provide a more comprehensive (8x8 and 16x16), achieving notably higher accuracy, precision,
evaluation, the confusion matrix is employed to illustrate the recall, and F-score. This suggests that the VIT model’s ability
classifier’s estimated outputs. This matrix, typically repre- to capture fine-grained details within smaller patches leads
sented in a table format with four parameters, namely false to superior performance. As patch size increases (32x32 and
positives (Fp ), false negatives (Fn ), true positives (Tp ), and 64x64), the performance of VIT slightly diminishes, though it
true negatives (Tn ), describes the model’s performance with still maintains competitive metrics compared to other models.
a known set of test data. These performance metrics are Analyzing the confusion matrices across different patch
summarized in Table I. sizes provides valuable insights into how the granularity of
patch representations impacts the VIT model ability to accu-
TABLE I rately classify images. Figure 3 presents confusion matrices
C ALCULATION FORMULAS AND EXPLANATIONS OF PERFORMANCE for the proposed VIT model across different patch sizes
METRICS .
(8x8, 16x16, 32x32, and 64x64). For the VIT model with
Measure Formula Evaluation Focus
It is used to calculate the ratio of
smaller patch sizes (8x8 and 16x16), give better discrimination
Accuracy
Tp +Tn
the number of correctly
estimated samples to all samples.
between different classes due to the finer level of detail
Tp +Tn +Fp +Fn
If the model utilized is highly
accurate, it might be regarded as the best.
captured by the smaller patches. Consequently, the confusion
Precision
Tp
The proportion of positively identified
samples that were accurately
matrices for these patch sizes might exhibit clearer diagonal
Tp +Fp
predicted to be positive samples.
Tp It is used to calculate the percentage of positive values structures, indicating higher correct classifications and lower
Recall that are considered to be true.
Tp +Fn
P recision×Recall It is the sensitivity harmonic mean. Consequently, confusion between classes. As the patch size increases to
F1-Score 2∗ P recision+Recall it considers both false positives and false negatives.
32x32 and 64x64, the model might struggle to capture fine-
grained features, potentially leading to decreased performance
and increased confusion between classes. In the confusion
matrices for these larger patch sizes, we might observe more
off-diagonal elements, indicating misclassifications and higher
levels of confusion. Figure 4 presents the training/ validation
losses and accuracies for proposed VIT model across different
patch sizes. Smaller patch sizes, like 8x8 and 16x16, are
expected to capture finer details within the images, potentially
leading to superior performance. Consequently, models trained
with smaller patches may exhibit lower training and validation
losses and higher accuracies. Conversely, larger patch sizes,
such as 32x32 and 64x64, may yield coarser representations, Fig. 4. Training/ validation losses and accuracies for proposed VIT model
across different patch sizes.
which could result in increased losses and decreased accu-
racies. Monitoring the gap between training and validation
losses helps gauge the modelgeneralization capabilities, with
smaller gaps indicating better generalization and larger gaps
suggesting potential overfitting. Similarly, observing the accu-
racy curves aids in identifying whether the model is overfitting.
Figure 5 presents a sample from the predicted MRI images
using the proposed VIT model using 8 x8 patch sizes.

TABLE II
P ERFORMANCE C OMPARISON OF D EEP L EARNING M ODELS AND VIT
ACROSS D IFFERENT PATCH S IZES .

Model Accuracy Precision Recall F-score


ResNet50 88 91 89 89 Fig. 5. Sample from the predicted MRI images using the proposed VIT model
EfficientNetB0 90 92 90 91 using 8 x8 patch sizes.
DensNet121 91 92 91 91
InceptionV3 92 93 92 93
VIT
(patch size 8 x8) 99 99 98 99
TABLE III
VIT C OMPARISON OF THE PROPOSED M ODEL WITH THE STATE - OF - THE - ART
(patch size 16 x16) 97 97 96 96
MODELS .
VIT
(patch size 32 x32) 96 96 96 96
Authors Method Accuracy (%)
VIT Alasmawi et al. [13] Vision Transformer 77.6
(patch size 64 x 64) 93 93 93 93
Siłka et al. [14] Correlation Learning Mechanism 96
Abdusalomov et al. [15] YOLOv7 with Transfer Learning 96.5
Talukder et al. [16] Transfer Learning with Various Architectures 97.68
Mushtaq et al. [17] VGG16 97.33
(VIT, 8x8) 99
(VIT, 16x16) 97
Proposed Model (VIT, 32x32) 96
(VIT, 64x64) 93

Table III provides a detailed comparison between the pro-


posed model and several state-of-the-art approaches in medical
image analysis, focusing on accuracy as the primary met-
ric. Each row represents a different model, including Vision
Transformer, Correlation Learning Mechanism, YOLOv7 with
Transfer Learning, Transfer Learning with Various Architec-
tures, VGG16, and variations of the proposed model with
different patch sizes in the Vision Transformer framework. By
organizing the models in descending order of accuracy, readers
can easily discern the most effective approaches. Additionally,
the table is aptly labeled Comparison of the proposed Model
with the state-of-the-art models, clarifying its purpose. Overall,
this comprehensive comparison aids researchers and practi-
tioners in understanding the relative performance of different
Fig. 3. Confusion matrices for proposed VIT model across different patch
models in the field of medical image analysis, particularly with
sizes. respect to accuracy and the impact of patch size variations
within the proposed model.
V. CONCLUSIONS [10] M. Ali and L. Liu, “Gp-frontier for local mapless navigation,” in 2023
IEEE International Conference on Robotics and Automation (ICRA),
This paper presents an approach for brain tumor detection pp. 10047–10053, IEEE, 2023.
using magnetic resonance imaging (MRI) images, which in- [11] H. M. Emara, W. El-Shafai, A. D. Algarni, N. F. Soliman, and F. E. Abd
volves preprocessing steps for contrast enhancement, noise re- El-Samie, “A hybrid compressive sensing and classification approach for
dynamic storage management of vital biomedical signals,” IEEE Access,
moval, and normalization. Leveraging the Vision Transformer 2023.
(ViT) architecture, the proposed method was examined across [12] M. I. Mahmud, M. Mamun, and A. Abdelgawad, “A deep analysis of
different patch sizes (8x8, 16x16, 32x32, 64x64) to compre- brain tumor detection from mr images using deep learning networks,”
Algorithms, vol. 16, no. 4, p. 176, 2023.
hensively evaluate its performance. Evaluation on the Brain [13] I. Sobirov, O. Nazarov, H. Alasmawi, and M. Yaqub, “Automatic
Tumor MRI Dataset demonstrated that the proposed Vision segmentation of head and neck tumor: How powerful transformers are?,”
Transformer model achieved notable accuracy scores, with in International Conference on Medical Imaging with Deep Learning,
pp. 1149–1161, PMLR, 2022.
99% accuracy for a patch size of 8x8, 97% for 16x16, 96% for [14] M. Woźniak, J. Siłka, and M. Wieczorek, “Deep neural network
32x32, and 93% for 64x64. These results indicate the superior correlation learning mechanism for ct brain tumor detection,” Neural
efficacy of the proposed approach compared to pretrained Computing and Applications, vol. 35, no. 20, pp. 14611–14626, 2023.
[15] A. B. Abdusalomov, M. Mukhiddinov, and T. K. Whangbo, “Brain tumor
models such as ResNet-50, DenseNet121, EfficientNetB0, detection based on deep learning approaches and magnetic resonance
and InceptionV3. Furthermore, the comparative analysis with imaging,” Cancers, vol. 15, no. 16, p. 4172, 2023.
state-of-the-art methods confirmed the proposed approach’s [16] M. A. Talukder, M. M. Islam, M. A. Uddin, A. Akhter, M. A. J.
Pramanik, S. Aryal, M. A. A. Almoyad, K. F. Hasan, and M. A.
superiority in brain tumor detection. By outperforming existing Moni, “An efficient deep learning model to categorize brain tumor
models across various evaluation metrics including accuracy, using reconstruction and fine-tuning,” Expert Systems with Applications,
precision, recall, and F-score, the proposed Vision Transformer p. 120534, 2023.
[17] F. Ahmed, M. Asif, M. Saleem, U. F. Mushtaq, and M. Imran, “Iden-
model demonstrated its potential for enhancing diagnostic tification and prediction of brain tumor using vgg-16 empowered with
accuracy in clinical settings. explainable artificial intelligence,” International Journal of Computa-
The findings of this study underscore the promise of tional and Innovative Sciences, vol. 2, no. 2, pp. 24–33, 2023.
[18] “Brain Tumor MRI Dataset kernel description.” [Link]
Vision Transformer models in medical imaging tasks and com/datasets/masoudnickparvar/brain-tumor-mri-dataset. Accessed:
their capacity to advance medical diagnostics. Future research 2024-03-10.
directions could explore further optimizations of the proposed [19] M. R. Shoaib, H. M. Emara, J. Zhao, W. El-Shafai, N. F. Soliman, A. S.
Mubarak, O. A. Omer, F. E. Abd El-Samie, and H. Esmaiel, “Deep
approach, as well as its applicability to other medical imaging learning innovations in diagnosing diabetic retinopathy: The potential
modalities and pathological conditions. Ultimately, the contin- of transfer learning and the diacnn model,” Computers in Biology and
ued development and refinement of deep learning techniques Medicine, vol. 169, p. 107834, 2024.
[20] M. G. Vilas, T. Schaumlöffel, and G. Roig, “Analyzing vision trans-
hold significant potential for improving healthcare outcomes formers for image classification in class embedding space,” Advances in
through more accurate and efficient medical image analysis. Neural Information Processing Systems, vol. 36, 2024.
[21] J. Maurı́cio, I. Domingues, and J. Bernardino, “Comparing vision
R EFERENCES transformers and convolutional neural networks for image classification:
[1] J. Amin, M. Sharif, A. Haldorai, M. Yasmin, and R. S. Nayak, “Brain A literature review,” Applied Sciences, vol. 13, no. 9, p. 5521, 2023.
tumor detection and classification using machine learning: a compre- [22] C.-F. R. Chen, Q. Fan, and R. Panda, “Crossvit: Cross-attention multi-
hensive survey,” Complex & intelligent systems, vol. 8, no. 4, pp. 3161– scale vision transformer for image classification,” in Proceedings of the
3183, 2022. IEEE/CVF international conference on computer vision, pp. 357–366,
[2] M. Ali and L. Liu, “Light-weight pointcloud representation with sparse 2021.
gaussian process,” in 2023 IEEE International Conference on Robotics [23] H. Abdelkhalik, S. Aktar, Y. Arafa, A. Barai, G. Chennupati, N. Santhi,
and Automation (ICRA), pp. 4931–4937, IEEE, 2023. N. Panda, N. Prajapati, N. H. Turja, S. Eidenbenz, et al., “Bb-ml: Basic
[3] M. R. Shoaib, M. R. Elshamy, T. E. Taha, A. S. El-Fishawy, and F. E. block performance prediction using machine learning techniques,” in
Abd El-Samie, “Efficient deep learning models for brain tumor detection 2023 IEEE 29th International Conference on Parallel and Distributed
with segmentation and data augmentation techniques,” Concurrency and Systems (ICPADS), pp. 1975–1982, IEEE, 2023.
Computation: Practice and Experience, vol. 34, no. 21, p. e7031, 2022. [24] H. S. Abdelkhalek, H. Medhat, I. Ziedan, and M. Amal, “Simulation and
[4] S. Aktar, H. Abdelkhalik, N. H. Turja, Y. Arafa, A. Barai, N. Panda, prediction for a satellite temperature sensors based on artificial neural
G. Chennupati, N. Santhi, S. Eidenbenz, and A.-H. Badawy, “Bb-ml: network,” Journal of Aerospace Technology and Management, vol. 11,
Basic block performance prediction using machine learning techniques,” p. e3719, 2019.
arXiv preprint arXiv:2202.07798, 2022. [25] H. Abdelkhalik, Y. Arafa, N. Santhi, and A.-H. A. Badawy, “Demys-
[5] O. Aaziz, M. Al-Tahat, S. Trecakov, and J. Cook, “Incprof: Efficient tifying the nvidia ampere architecture through microbenchmarking and
source-oriented phase identification for application behavior understand- instruction-level analysis,” in 2022 IEEE High Performance Extreme
ing,” in 2022 IEEE International Conference on Cluster Computing Computing Conference (HPEC), pp. 1–8, IEEE, 2022.
(CLUSTER), pp. 616–625, IEEE, 2022. [26] M. Ali, H. Jardali, N. Roy, and L. Liu, “Autonomous navigation,
[6] D. N. Louis, A. Perry, P. Wesseling, D. J. Brat, I. A. Cree, D. Figarella- mapping and exploration with gaussian processes,” Proceedings of
Branger, C. Hawkins, H. Ng, S. M. Pfister, G. Reifenberger, et al., Robotics: Science and Systems, Daegu, Republic of Korea, 2023.
“The 2021 who classification of tumors of the central nervous system: [27] I. S. Mohamed, M. Ali, and L. Liu, “Gp-guided mppi for efficient navi-
a summary,” Neuro-oncology, vol. 23, no. 8, pp. 1231–1251, 2021. gation in complex unknown cluttered environments,” in 2023 IEEE/RSJ
[7] F. Taher, M. R. Shoaib, H. M. Emara, K. M. Abdelwahab, F. E. Abd El- International Conference on Intelligent Robots and Systems (IROS),
Samie, and M. T. Haweel, “Efficient framework for brain tumor detection pp. 7463–7470, IEEE, 2023.
using different deep learning techniques,” Frontiers in Public Health,
vol. 10, p. 959667, 2022.
[8] T. Uda, “Neuroimaging of brain tumor surgery and epilepsy,” 2023.
[9] M. R. Shoaib, M. R. Elshamy, T. E. Taha, A. S. El-Fishawy, and F. E.
Abd El-Samie, “Efficient brain tumor detection based on deep learning
models,” in Journal of Physics: Conference Series, vol. 2128, p. 012012,
IOP Publishing, 2021.

You might also like