DL Midterm Report Topic Id 41
DL Midterm Report Topic Id 41
MIDTERM REPORT
Variational autoencoder GAN for medical image generation.
Group members
3. Experimentation........................................................................................................................ 4
3.1. Dataset .............................................................................................................................. 4
5. Conclusion ................................................................................................................................ 7
1. Introduction
Medical imaging provides essential insights into the body's internal structures but is challenged
by limited high-quality data, privacy concerns, and the need for advanced diagnostic tools.
VAE is effective at learning compact representations of complex data, while GANs excel at
generating realistic synthetic images. Merging these methods can leverage their strengths to
develop a powerful tool to enhance medical image generation and analysis.
In this project, we aim to apply VAE-GAN models to generate brain tumor MRI scans,
providing an accurate diagnostic tool for identifying various brain conditions like cancer,
cerebral infarction, encephalocele, and more.
• The Encoder extracts latent variables of input data x and outputs them in the form of a
vector representing latent space z.
1
• The Latent space is both the output layer of the encoder network and the input layer
of the decoder network. It is fully compressed, lower-dimensional embedding of the
input data.
• The Decoder use the data in latent space to reconstruct the original input by essentially
reversing the encoder
Loss function:
Reconstruction loss: is an expectation operator that measures how close the decoder output is
to the original input.
2
up of two neural networks, which are trained simultaneously, with the generator trying to fool
the discriminator and the discriminator trying to classify real and fake samples accurately.
• The Generator takes random noise as input and produces data from it. Its goal is to
generate data that is as real as possible.
• The Discriminator takes real data and the data generated by the Generator as input and
attempts to distinguish between the two. It outputs the probability that the given data is
real.
Loss function:
min(G) max(D) V(D, G) = Ex~pdata(x) [log D(x)] + Ex~pz (z) [log (1-D(G(z))]
A VAE is combined with a GAN by collapsing the decoder and the generator into one.
3
Loss function:
𝐷𝑖𝑠𝑙
ℒ = ℒ𝑝𝑟𝑖𝑜𝑟 + ℒ𝑙𝑙𝑖𝑘𝑒 + ℒ𝐺𝐴𝑁
With
3. Experimentation
The model was implemented in Pytorch, using NVIDIA CUDA for acceleration and efficient
training on high dimensional images.
3.1. Dataset
This project uses a dataset containing various brain tumor MRI images. A brain tumor is a
collection, or mass, of abnormal cells in the brain. When brain tumors grow, they can cause
brain damage and even be life-threatening. The dataset includes over 5000 images categorized
into the following classes:
• Glioma: A glioma is a type of primary tumor that starts in the glial cells of the brain or
spinal cord.
• Notumor: No tumor is when the brain is in standard condition and no tumor appears.
4
• Pituitary: The pituitary gland is a small, pea-sized endocrine gland located at the base
of the brain below the hypothalamus.
Architectures for the three networks (Encoder, Decoder, Discriminator) that comprise VAE/GAN. ↓ and ↑
represent down- and upsampling respectively. BNorm denotes batch normalization.
3.4. Devices
• The model was trained on an NVIDIA RTX 2060 and completed after 2 hours and 18
minutes. It was then saved to “vae_gan_model.pth”, allowing it to run on any device
without retraining. This helps low-end devices handle the model without any issues.
5
Image generated by VAE – GAN model
4.2. Analysis
6
• Image 2: The generated images show significant improvement. The images now have
clearer structures, with recognizable patterns and features that resemble real images.
While there is still some noise and the images are not perfect, they are far more detailed
than those generated after 1 epoch.
The VAE-GAN model shows noticeable progression in image generation from epoch 1 to
epoch 25. Initially, the generated images are very blurry with no clear features, which means
the model has not yet learned the patterns of the data. By epoch 25, the generated images show
significant improvement, capturing the overall structure and texture of the brain MRI scans.
However, noise is still present, so further adjustments can improve the image details.
5. Conclusion
Integrating Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs)
in brain tumor MRI generation marks a breakthrough in medical imaging AI. By combining
the structured latent space of VAEs with the realistic image generation capabilities of GANs,
this model supports large-scale research and data sharing while preserving patient privacy and
fostering medical innovation.
6. References
- A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, "Autoencoding beyond pixels using
a learned similarity metric," arXiv, 2015. [Online]. Available: https://arxiv.org/pdf/1512.09300
- S. H. Tsang, "Review: VAE-GAN - Autoencoding beyond pixels using a learned similarity metric,"
Medium, Oct. 10, 2019. [Online]. Available: https://sh-tsang.medium.com/review-vae-gan-
autoencoding-beyond-pixels-using-a-learned-similarity-metric-dc0f8cb74435
- D. Bergmann and C. Stryker, "Variational autoencoder," IBM, June 12, 2024. [Online]. Available:
https://www.ibm.com/think/topics/variational-autoencoder
- P. D. Khanh, "GAN: An overview," phamdinhkhanh.github.io, July 13, 2020. [Online]. Available:
https://phamdinhkhanh.github.io/2020/07/13/GAN.html
- M. Del Pra, "Generative adversarial networks," Medium, Oct. 30, 2023. [Online]. Available:
https://medium.com/@marcodelpra/generative-adversarial-networks-dba10e1b4424