Deep Learning for Mental Illness Prediction
Deep Learning for Mental Illness Prediction
LEARNING
Abstract:
Neuroimaging-based approaches have been extensively applied to study brain disease in recent
years and have deepened our understanding of both cognitively healthy and disordered brain
structure and function. Recent advancements in machine learning techniques have shown
promising outcomes for individualized prediction and characterization of patients with
psychiatric disorders. Studies have utilized features from a variety of neuroimaging modalities,
including structural, functional, and diffusion magnetic resonance imaging data, as well as
jointly estimated features from multiple modalities, to assess patients with heterogeneous brain
disorder such as schizophrenia and bipolar disorder. The proposed method is used to predict
whether the patient have any brain disease or not along with illness types like schizophrenia and
bipolar disorder, using deep learning models CNN and VGG16. The performance matrices like
accuracy is used to compare both the model to show which one is better.
1
TABLE OF CONTENTS
CHAPTE PAGE
R NO. TITLE NO.
CHAPTER 1 : INTRODUCTION
1.1 GENERAL
1. 1.1.1 THE MACHINE LEARNING SYSTEM
1.1.2 FUNDAMENTAL
4
1.2 JUPYTER
1.3 MACHINE LEARNING 6
1.4 CLASSIFICATION TECHNIQUES 9
1.4.1 NEURAL NETWORK AND DEEP LEARNING
1.4.2 METHODOLOGIES - GIVEN INPUT AND EXPECTED
OUTPUT
12
1.5 OBJECTIVE AND SCOPE OF THE PROJECT 12
1.6 EXISTING SYSTEM
1.6.1 DISADVANTAGES OF EXISTING SYSTEM 13
1.6.2 LITERATURE SURVEY 17
1.7 PROPOSED SYSTEM
17
1.7.1 PROPOSED SYSTEM ADVANTAGES
CHAPTER 2 :PROJECT DESCRIPTION
28
2.1 INTRODUCTION
2.2 DETAILED DIAGRAM 29
2.2.1 FRONT END DESIGN 29
2. 2.2.2 BACK END FLOW
30
2.3 SOFTWARE SPECIFICATION
2.3.1 HARDWARE SPECIFICATION
2
2.3.2 SOFTWARE SPECIFICATION 30
31
2.4 MODULE DESCRIPTION
2.4.1 DATA COLLECTION 32
2.4.2 DATA AUGUMENTATION
2.4.3 DATA SPLITTING 33
2.4.4 CLASSIFICATION 34
2.4.5 PERFORMANCES MATRICES
35
2.4.6 CONFUSION MATRIX
2.5 MODULE DIAGRAM 35
2.5.1 SYSTEM ARCHITECTURE
36
2.5.2 USECASE DIAGRAM
2.5.3 CLASS DIAGRAM 37
3
CHAPTER I
INTRODUCTION
1.2Jupyter
Jupyter, previously known as IPython Notebook, is a web-based, interactive development
environment. Originally developed for Python, it has since expanded to support over 40 other
programming languages including Julia and R.
Jupyter allows for notebooksto be written that contain text, live code, images, and equations.
These notebooks can be shared, and can even be hosted on GitHubfor free.
For each section of this tutorial, you can download a Juypter notebook that allows you to edit and
experiment with the code and examples for each topic. Jupyter is part of the Anaconda
distribution; it can be started from the command line using the jupyter command:
4
SciKit-Learn
SciKit-Learn provides a standardised interface to many of the most commonly used machine
learning algorithms, and is the most popular and frequently used library for machine learning for
Python. As well as providing many learning algorithms, SciKit-Learn has a large number of
convenience functions for common preprocessing tasks (for example, normalisation or k-fold
cross validation).
SciKit-Learn is a very large software library.
Clustering
Clustering algorithms focus on ordering data together into groups. In general clustering
algorithms are unsupervised—they require no y response variable as input. That is to say, they
attempt to find groups or clusters within data where you do not know the label for each sample.
SciKit-Learn have many clusteringalgorithms, but in this section we will demonstrate
hierarchical clustering on a DNA expression microarray dataset using an algorithm from the
SciPy library.
We will plot a visualisation of the clustering using what is known as a dendrogram, also using
the SciPy library.
The goal is to cluster the data properly in logical groups, in this case into the cancer types
represented by each sample’s expression data. We do this using agglomerative hierarchical
clustering, using Ward’s linkage method:
5
1.4Classification
weanalysed data that was unlabelled—we did not know to what class a sample belonged (known
as unsupervised learning). In contrast to this, a supervised problem deals with labelled data
where are aware of the discrete classes to which each sample belongs. When we wish to predict
which class a sample belongs to, we call this a classification problem. SciKit-Learn has a number
of algorithms for classification, in this section we will look at the Support Vector Machine.
We will work on the Wisconsin breast cancer dataset, split it into a training set and a test set,
train a Support Vector Machine with a linear kernel, and test the trained model on an unseen
dataset. The Support Vector Machine model should be able to predict if a new sample is
malignant or benign based on the features of a new, unseen sample:
6
You will notice that the SVM model performed very well at predicting the malignancy of new,
unseen samples from the test set—this can be quantified nicely by printing a number of metrics
using the classification report function. Here, the precision, recall, and F1 score (F1 = 2·
precision·recall/precision+recall) for each class is shown. The support column is a count of the
number of samples for each class.
Support Vector Machines are a very powerful tool for classification. They work well in high
dimensional spaces, even when the number of features is higher than the number of samples.
However, their running time is quadratic to the number of samples so large datasets can become
difficult to train. Quadratic means that if you increase a dataset in size by 10 times, it will take
100 times longer to train.
Last, you will notice that the breast cancer dataset consisted of 30 features. This makes it
difficult to visualize or plot the data. To aid in visualization of highly dimensional data, we can
apply a technique called dimensionality reduction.
Dimensionality Reduction
Another important method in machine learning, and data science in general, is dimensionality
reduction. For this example, we will look at the Wisconsin breast cancer dataset once again. The
dataset consists of over 500 samples, where each sample has 30 features. The features relate to
7
images of a fine needle aspirate of breast tissue, and the features describe the characteristics of
the cells present in the images. All features are real values. The target variable is a discrete value
(either malignant or benign) and is therefore a classification dataset.
You will recall from the Iris example in Sect. 7.3 that we plotted a scatter matrix of the data,
where each feature was plotted against every other feature in the dataset to look for potential
correlations (Fig. 3). By examining this plot you could probably find features which would
separate the dataset into groups. Because the dataset only had 4 features we were able to plot
each feature against each other relatively easily. However, as the numbers of features grow, this
becomes less and less feasible, especially if you consider the gene expression example in Sect.
9.4 which had over 6000 features.
One method that is used to handle data that is highly dimensional is Principle Component
Analysis, or PCA. PCA is an unsupervised algorithm for reducing the number of dimensions of a
dataset. For example, for plotting purposes you might want to reduce your data down to 2 or 3
dimensions, and PCA allows
you to do this by generating components, which are combinations of the original features, that
you can then use to plot your data.
PCA is an unsupervised algorithm. You supply it with your data, X, and you specify the number
of components you wish to reduce its dimensionality to. This is known as transforming the data:
8
Again, you would not use this model for new data—in a real world scenario, you would, for
example, perform a 10-fold cross validation on the dataset, choosing the model parameters that
perform best on the cross validation. This model would be much more likely to perform well on
new data. At the very least, you would randomly select a subset, say 30% of the data, as a test set
and train the model on the remaining 70% of the dataset. You would evaluate the model based on
the score on the test set and not on the training set
9
will use a training set consisting of only 50% of the entire dataset, and test our neural network on
the remaining 50% of the data.
Note,Keras is not installed as part of the Anaconda distribution, to install it use pip:
Keras additionally requires either Theano or TensorFlow to be installed. In the examples in this
chapter we are using Theano as a backend, however the code will work identically for either
backend. You can install Theano using pip, but it has a number of dependencies that must be
installed first. Refer to the Theano and TensorFlow documentation for more information [12].
Keras is a modular API. It allows you to create neural networks by building a stack of modules,
from the input of the neural network, to the output of the neural network, piece by piece until you
have a complete network. Also, Keras can be configured to use your Graphics Processing Unit,
or GPU. This makes training neural networks far faster than if we were to use a CPU. We begin
by importing Keras:
We may want to view the network’s accuracy on the test (or its loss on the training set) over time
(measured at each epoch), to get a better idea how well it is learning. An epoch is one complete
cycle through the training data.
Fortunately, this is quite easy to plot as Keras’ fit function returns a history object which we can
use to do exactly this:
This will result in a plot similar to that shown. Often you will also want to plot the loss on the
test set and training set, and the accuracy on the test set and training set.
Plotting the loss and accuracy can be used to see if you are over fitting (you experience tiny loss
on the training set, but large loss on the test set) and to see when your training has plateaued.
10
11
1.5 OBJECTIVE STATEMENT:
To better understand the mental health conditions and provide better patient care, early detection
of mental health problems is an essential step. Different from the diagnosis of other chronic
conditions that rely on laboratory tests and measurements, mental illnesses are typically
diagnosed based on an individual’s self-report to specific questionnaires designed for the
detection of specific patterns of feelings or social interactions3. Due to the increasing availability
of data pertaining to an individual’s mental health status, artificial intelligence (AI) and machine
learning (ML) technologies are being applied to improve our understanding of mental health
conditions and have been engaged to assist mental health providers for improved clinical
decision-making
1. Brain disease such as schizophrenia and bipolar disorder are highly prevalent and have
been shown to impact an individual’s physical health.
2. We predict the mental stability of people, by using the features from brain neuroimages.
3. The analysis is to predict whether the people have any mental stability issues or not along
with type of illness.
12
The machine learning algorithm trained with structural MR images detected schizophrenia in
randomly selected images with reliable performance (area under the receiver operating
characteristic curve [AUC] of 0.96). The algorithm could also identify MR images from
schizophrenia patients in a previously un encountered data set with an AUC of 0.71 to 0.90. The
deep learning algorithm’s classification performance degraded to an AUC of 0.71 when a new
data set with younger patients and a shorter duration of illness than the training data sets was
presented. The brain region contributing the most to the performance of the algorithm was the
right temporal area, followed by the right parietal area. Semi trained clinical specialists hardly
discriminated schizophrenia patients from healthy controls (AUC: 0.61) in the set of 100
randomly selected brain images.
Accuracy is low.
Dataset selection is not correct, whereas each dataset are in different pixels
Feature extraction is not accurate
Artificial intelligence (AI) for medical imaging is a technology with great potential. An in-depth
understanding of the principles and applications of magnetic resonance imaging (MRI), machine learning
(ML), and deep learning (DL) is fundamental for developing AI-based algorithms that can meet the
requirements of clinical diagnosis and have excellent quality and efficiency. Moreover, a more
comprehensive understanding of applications and opportunities would help to implement AI-based
methods in an ethical and sustainable manner. This review first summarizes recent research advances in
ML and DL techniques for classifying human brain magnetic resonance images. Then, the application of
13
ML and DL methods to six typical neurological and psychiatric diseases is summarized, including
Alzheimer’s disease (AD), Parkinson’s disease (PD), major depressive disorder (MDD), schizophrenia
(SCZ), attention-deficit/hyperactivity disorder (ADHD), and autism spectrum disorder (ASD). Finally, the
limitations of the existing research are discussed, and possible future research directions are proposed .
Neuroimaging‐based approaches have been extensively applied to study mental illness in recent
years and have deepened our understanding of both cognitively healthy and disordered brain
structure and function. Recent advancements in machine learning techniques have shown
promising outcomes for individualized prediction and characterization of patients with
psychiatric disorders. Studies have utilized features from a variety of neuroimaging modalities,
including structural, functional, and diffusion magnetic resonance imaging data, as well as jointly
estimated features from multiple modalities, to assess patients with heterogeneous mental
disorders, such as schizophrenia and autism. We use the term “predictome” to describe the use of
multivariate brain network features from one or more neuroimaging modalities to predict mental
illness. In the predictome, multiple brain network‐based features (either from the same modality
or multiple modalities) are incorporated into a predictive model to jointly estimate features that
are unique to a disorder and predict subjects accordingly. To date, more than 650 studies have
been published on subject‐level prediction focusing on psychiatric disorders. We have surveyed
about 250 studies including schizophrenia, major depression, bipolar disorder, autism spectrum
disorder, attention‐deficit hyperactivity disorder, obsessive–compulsive disorder, social anxiety
disorder, posttraumatic stress disorder, and substance dependence. In this review, we present a
comprehensive review of recent neuroimaging‐based predictomic approaches, current trends, and
common shortcomings and share our vision for future directions.
Title: Identifying Schizophrenia Using Structural MRI With a Deep Learning Algorithm
Authors: ]. Jihoon Oh1 , Baek-Lok Oh2 , Kyong-Uk Lee3 , Jeong-Ho Chae1* and Kyongsik Yun4
14
set with an AUC of 0.71 to 0.90. The deep learning algorithm’s classification performance degraded to an
AUC of 0.71 when a new data set with younger patients and a shorter duration of illness than the
training data sets was presented. The brain region contributing the most to the performance of the
algorithm was the right temporal area, followed by the right parietal area. Semitrained clinical
specialists hardly discriminated schizophrenia patients from healthy controls (AUC: 0.61) in the set of
100 randomly selected brain images.
Title: Brain MRI-based 3D Convolutional Neural Networks for Classification of Schizophrenia
and Controls
Authors: Mengjiao Hu, Kang Sim, Juan Helen Zhou, Xudong Jiang, Senior Member, IEEE, Cuntai Guan
— Convolutional Neural Network (CNN) has been successfully applied on classification of both natural
images and medical images but limited studies applied it to differentiate patients with schizophrenia
from healthy controls. Given the subtle, mixed, and sparsely distributed brain atrophy patterns of
schizophrenia, the capability of automatic feature learning makes CNN a powerful tool for classifying
schizophrenia from controls as it removes the subjectivity in selecting relevant spatial features. To
examine the feasibility of applying CNN to classification of schizophrenia and controls based on
structural Magnetic Resonance Imaging (MRI), we built 3D CNN models with different architectures and
compared their performance with a handcrafted feature-based machine learning approach. Support
vector machine (SVM) was used as classifier and Voxelbased Morphometry (VBM) was used as feature
for handcrafted feature-based machine learning. 3D CNN models with sequential architecture, inception
module and residual module were trained from scratch. CNN models achieved higher crossvalidation
accuracy than handcrafted feature-based machine learning. Moreover, testing on an independent
dataset, 3D CNN models greatly outperformed handcrafted feature-based machine learning. This study
underscored the potential of CNN for identifying patients with schizophrenia using 3D brain MR images
and paved the way for imaging-based individual-level diagnosis and prognosis in psychiatric disorders
15
resonance imaging (fMRI) data for schizophrenia as an example, to extract effective time series from
preprocessed fMRI data, and perform correlation analysis on regions of interest, using transfer learning
and VGG16 net, and the functional connection between schizophrenia and healthy controls is classified.
Experimental results show that the classification accuracy of fMRI based on VGG16 is up to 84.3%. On
the one hand, it can improve the early diagnosis of schizophrenia, and on the other hand, it can solve
the classification problem of small samples and high-dimensional data and effectively improve the
generalization ability of deep learning models.
Author: Chang Su, Zhenxing Xu, Jyotishman Pathak & Fei Wang
Year:2020
Abstract:
Mental illnesses, such as depression, are highly prevalent and have been shown to impact an
individual’s physical health. Recently, artificial intelligence (AI) methods have been introduced
to assist mental health providers, including psychiatrists and psychologists, for decision-making
based on patients’ historical data (e.g., medical records, behavioral data, social media usage,
etc.). Deep learning (DL), as one of the most recent generation of AI technologies, has
demonstrated superior performance in many real-world applications ranging from computer
vision to healthcare. The goal of this study is to review existing research on applications of DL
algorithms in mental health outcome research. Specifically, we first briefly overview the state-of-
the-art DL techniques. Then we review the literature relevant to DL applications in mental health
outcomes. According to the application scenarios, we categorize these relevant articles into four
groups: diagnosis and prognosis based on clinical data, analysis of genetics and genomics data
for understanding mental health conditions, vocal and visual expression data analysis for disease
detection, and estimation of risk of mental illness using social media data. Finally, we discuss
challenges in using DL algorithms to improve our understanding of mental health conditions and
suggest several promising directions for their applications in improving mental health diagnosis
and treatment.
16
Author: Lu Chen, Chunchao Xia, Huaiqiang Sun
Year:2020
Abstract:
Deep learning (DL) is a recently proposed subset of machine learning methods that has gained
extensive attention in the academic world, breaking benchmark records in areas such as visual
recognition and natural language processing. Different from conventional machine learning
algorithm, DL is able to learn useful representations and features directly from raw data through
hierarchical nonlinear transformations. Because of its ability to detect abstract and complex
patterns, DL has been used in neuroimaging studies of psychiatric disorders, which are
characterized by subtle and diffuse alterations. Here, we provide a brief review of recent
advances and associated challenges in neuroimaging studies of DL applied to psychiatric
disorders. The results of these studies indicate that DL could be a powerful tool in assisting the
diagnosis of psychiatric diseases. We conclude our review by clarifying the main promises and
challenges of DL application in psychiatric disorders, and possible directions for future research.
TITLE: A deep learning model for detecting mental illness from user
content on social media
Author: Jina Kim, Jieon Lee , Eunil Park , Jinyoung Han
Year:2020
Abstract:
Users of social media often share their feelings or emotional states through their posts. In this
study, we developed a deep learning model to identify a user's mental state based on his/her
posting information. To this end, we collected posts from mental health communities in Reddit.
By analyzing and learning posting information written by users, our proposed model could
accurately identify whether a user's post belongs to a specific mental disorder, including
depression, anxiety, bipolar, borderline personality disorder, schizophrenia, and autism. We
believe our model can help identify potential sufferers with mental illness based on their posts.
This study further discusses the implication of our proposed model, which can serve as a
17
supplementary tool for monitoring mental health states of individuals who frequently use social
media.
YEAR;2021
Abstract:
In recent medical research, tremendous progress has been made in the application of deep
learning (DL) techniques. This article systematically reviews how DL techniques have been
applied to electroencephalogram (EEG) data for diagnostic and predictive purposes in
conducting research on mental disorders. EEG-studies on psychiatric diseases based on the ICD-
10 or DSM-V classification that used either convolutional neural networks (CNNs) or long -
short-term-memory (LSTMs) networks for classification were searched and examined for the
quality of the information they contained in three domains: clinical, EEG-data processing, and
deep learning. Although we found that the description of EEG acquisition and pre-processing
was sufficient in most of the studies, we found, that many of them lacked a systematic
characterization of clinical features. Furthermore, many studies used misguided model selection
procedures or flawed testing. It is recommended that the study of psychiatric disorders using DL
in the future must improve the quality of clinical data and follow state of the art model selection
and testing procedures so as to achieve a higher research standard and head toward a clinical
significance.
Year:2021
Abstract:
18
Mental health problems are widely recognized as major public health challenges worldwide.
According to the World Health Organization, 264 million people were affected by depression
globally in 2020 .Mental illness, in general, is one of the leading causes of the global burden of
this disease. It was estimated that in England, 105 billion British pounds (US $145 billion) were
spent on mental health services and treatments or lost in productivity at work in 2018 with the
global costs expected to rise to US $6 trillion by 2030. A significant contributor to this cost is
that people living with mental health problems sometimes receive inaccurate assessments . This
highlights the need for effective mental health services and a novel approach for diagnosing
mental health disorders.
1.7PROPOSED SYSTEM
In proposed work, While challenging, early diagnosis of individuals at high risk of future mental
illness is very critical in order to delay or prevent the disease progression. Since most mental
illnesses typically have an onset in adolescence or early adulthood. early detection could delay,
or even prevent, future onset of these severe illnesses in high-risk adolescents. Predictive
modeling based approaches offer promising tools to be used for clinical diagnosis, such as
identification of neuroimaging-based biomarker that can support early identification of
potentially at-risk individuals of developing mental disorders, with the potential risk being
unidentified. Deep learning algorithm using neuroimaging data could help differentiate healthy
adolescents genetically at-risk for mental illness. The proposed CNN and VGG16 method will
give better accuracy.
Algorithms:
19
CNN
Introduction
In the past few decades, Deep Learning has proved to be a very powerful tool because of its
ability to handle large amounts of data. The interest to use hidden layers has surpassed traditional
techniques, especially in pattern recognition. One of the most popular deep neural networks is
Convolutional Neural Networks.
Since the 1950s, the early days of AI, researchers have struggled to make a system that can
understand visual data. In the following years, this field came to be known as Computer Vision.
In 2012, computer vision took a quantum leap when a group of researchers from the University
of Toronto developed an AI model that surpassed the best image recognition algorithms and that
too by a large margin.
The AI system, which became known as AlexNet (named after its main creator, Alex
Krizhevsky), won the 2012 ImageNet computer vision contest with an amazing 85 percent
accuracy. The runner-up scored a modest 74 percent on the test.
At the heart of AlexNet was Convolutional Neural Networks a special type of neural network
that roughly imitates human vision. Over the years CNNs have become a very important part of
many Computer Vision applications and hence a part of any computer vision cours. So let’s take
a look at the workings of CNNs.
20
Background of CNNs
CNN’s were first developed and used around the 1980s. The most that a CNN could do at that
time was recognize handwritten digits. It was mostly used in the postal sectors to read zip codes,
pin codes, etc. The important thing to remember about any deep learning model is that it requires
a large amount of data to train and also requires a lot of computing resources. This was a major
drawback for CNNs at that period and hence CNNs were only limited to the postal sectors and it
failed to enter the world of machine learning.
In 2012 Alex Krizhevsky realized that it was time to bring back the branch of deep learning that
uses multi-layered neural networks. The availability of large sets of data, to be more specific
ImageNet datasets with millions of labeled images and an abundance of computing resources
enabled researchers to revive CNNs.
21
But we don’t really need to go behind the mathematics part to understand what a CNN is or how
it works.
Bottom line is that the role of the ConvNet is to reduce the images into a form that is easier to
process, without losing features that are critical for getting a good prediction.
For simplicity, let’s stick with grayscale images as we try to understand how CNNs work.
22
The above image shows what a convolution is. We take a filter/kernel(3×3 matrix) and apply it
to the input image to get the convolved feature. This convolved feature is passed on to the next
layer.
In the case of RGB color, channel take a look at this animation to understand its working
23
Convolutional neural networks are composed of multiple layers of artificial neurons. Artificial
neurons, a rough imitation of their biological counterparts, are mathematical functions that
calculate the weighted sum of multiple inputs and outputs an activation value. When you input
an image in a ConvNet, each layer generates several activation functions that are passed on to the
next layer.
The first layer usually extracts basic features such as horizontal or diagonal edges. This output is
passed on to the next layer which detects more complex features such as corners or
combinational edges. As we move deeper into the network it can identify even more complex
features such as objects, faces, etc.
Based on the activation map of the final convolution layer, the classification layer outputs a set
of confidence scores (values between 0 and 1) that specify how likely the image is to belong to a
“class.” For instance, if you have a ConvNet that detects cats, dogs, and horses, the output of the
final layer is the possibility that the input image contains any of those animals.
24
What’s a pooling layer?
Similar to the Convolutional Layer, the Pooling layer is responsible for reducing the spatial size
of the Convolved Feature. This is to decrease the computational power required to process
the data by reducing the dimensions. There are two types of pooling average pooling and max
pooling. I’ve only had experience with Max Pooling so far I haven’t faced any difficulties.
So what we do in Max Pooling is we find the maximum value of a pixel from a portion of the
image covered by the kernel. Max Pooling also performs as a Noise Suppressant. It discards the
noisy activations altogether and also performs de-noising along with dimensionality reduction.
On the other hand, Average Pooling returns the average of all the values from the portion of
the image covered by the Kernel. Average Pooling simply performs dimensionality reduction as
a noise suppressing mechanism. Hence, we can say that Max Pooling performs a lot better
than Average Pooling.
25
Limitations:
Despite the power and resource complexity of CNNs, they provide in-depth results. At the root
of it all, it is just recognizing patterns and details that are so minute and inconspicuous that it
goes unnoticed to the human eye. But when it comes to understanding the contents of an image
it fails.
VGG-16
The ImageNet Large Scale Visual Recognition Challenge is an annual computer vision
competition. Each year, teams compete on two tasks. The first is to detect objects within an
image coming from 200 classes, which is called object localization. The second is to classify
images, each labeled with one of 1000 categories, which is called image classification. VGG
16 was proposed by Karen Simonyan and Andrew Zisserman of the Visual Geometry Group
Lab of Oxford University in 2014 in the paper “VERY DEEP CONVOLUTIONAL
NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION”. This model won the 1 st and
2nd place on the above categories in 2014 ILSVRC challenge.
26
VGG-16 architecture
This model achieves 92.7% top-5 test accuracy on ImageNet dataset which contains 14 million
images belonging to 1000 classes.
The ImageNet dataset contains images of fixed size of 224*224 and have RGB channels. So,
we have a tensor of (224, 224, 3) as our input. This model process the input image and outputs
the a vector of 1000 values.
This vector represents the classification probability for the corresponding class. Suppose we
have a model that predicts that image belongs to class 0 with probability .1, class 1 with
probability 0.05, class 2 with probability 0.05, class 3 with probability 0.03, class 780 with
probability 0.72, class 999 with probability 0.05 and all other class with 0. so, the classification
vector for this will be:
To make sure these probabilities add to 1, we use softmax function. This softmax function is
defined as :
After this we take the 5 most probable candidates into the vector. and our ground truth vector is
definedasfollows:
Architecture:
The input to the network is image of dimensions (224, 224, 3). The first two layers
27
have 64 channels of 3*3 filter size and same padding. Then after a max pool layer of stride (2,
2), two layers which have convolution layers of 256 filter size and filter size (3, 3). This
followed by a max pooling layer of stride (2, 2) which is same as previous layer. Then there
are 2 convolution layers of filter size (3, 3) and 256 filter. After that there are 2 sets
of 3 convolution layer and a max pool layer. Each have 512 filters of (3, 3) size with same
padding.This image is then passed to the stack of two convolution layers. In these convolution
and max pooling layers, the filters we use is of the size 3*3 instead of 11*11 in AlexNet
and 7*7 in ZF-Net. In some of the layers, it also uses 1*1 pixel which is used to manipulate the
number of input channels. There is a padding of 1-pixel (same padding) done after each
convolution layer to prevent the spatial feature of the image.
After the stack of convolution and max-pooling layer, we got a (7, 7, 512) feature map. We
flatten this output to make it a (1, 25088) feature vector.After this there are 3 fully connected
layer, the first layer takes input from the last feature vector and outputs a (1, 4096) vector,
second layer also outputs a vector of size (1, 4096) but the third layer output a 1000 channels
for 1000 classes of ILSVRC challenge, then after the output of 3rd fully connected layer is
passed to softmax layer in order to normalize the classification vector. After the output of
classification vector top-5 categories for evaluation. All the hidden layers use ReLU as its
activation function. ReLU is more computationally efficient because it results in faster learning
and it also decreases the likelihood of vanishing gradient problem.
28
CHAPTER 2
2.1 INTRODUCTION
Mental illness is a type of health condition that changes a person’s mind, emotions, or behavior
(or all three), and has been shown to impact an individual’s physical health1,2. Mental health
issues including depression, schizophrenia, attention-deficit hyperactivity disorder (ADHD), and
autism spectrum disorder (ASD), etc., are highly prevalent today and it is estimated that around
450 million people worldwide suffer from such problems1. In addition to adults, children and
adolescents under the age of 18 years also face the risk of mental health disorders. Moreover,
mental health illnesses have also been one of the most serious and prevalent public health
problems. For example, depression is a leading cause of disability and can lead to an increased
risk for suicidal ideation and suicide attempts2.
To better understand the mental health conditions and provide better patient care, early detection
of mental health problems is an essential step. Different from the diagnosis of other chronic
conditions that rely on laboratory tests and measurements, mental illnesses are typically
diagnosed based on an individual’s self-report to specific questionnaires designed for the
detection of specific patterns of feelings or social interactions3. Due to the increasing availability
of data pertaining to an individual’s mental health status, artificial intelligence (AI) and machine
learning (ML) technologies are being applied to improve our understanding of mental health
conditions and have been engaged to assist mental health providers for improved clinical
decision-making4,5,6. As one of the latest advances in AI and ML, deep learning (DL), which
transforms the data through layers of nonlinear computational processing units, provides a new
paradigm to effectively gain knowledge from complex data7. In recent years, DL algorithms
have demonstrated superior performance in many data-rich application scenarios, including
healthcare8,9,10.
In a previous study, Shatte et al.11 explored the application of ML techniques in mental health.
They reviewed literature by grouping them into four main application domains: diagnosis,
prognosis, and treatment, public health, as well as research and clinical administration. In
another study, Durstewitz et al.9 explored the emerging area of application of DL techniques in
29
psychiatry. They focused on DL in the studies of brain dynamics and subjects’ behaviors, and
presented the insights of embedding the interpretable computational models into statistical
context. In contrast, this study aims to provide a scoping review of the existing research applying
DL methodologies on the analysis of different types of data related to mental health conditions.
30
2.3 SYSTEM SPECIFICATION:
The hardware requirements may serve as the basis for a contract for the implementation of the
system and should therefore be a complete and consistent specification of the whole system.
They are used by software engineers as the starting point for the system design. It shows what
the system does and not how it should be implemented
PROCESSOR : Intel I5
RAM : 4GB
HARD DISK : 40 GB
31
2.4 MODULE DESCRIPTION
MODULE DESCRIPTION:
Data Collection
This experimental data set comes from the public data set of the Center for Biomedical Research
Excellence (COBRE). The Northwestern University Schizophrenia Data and Software Tool
(NUSDAST) is a repository of schizophrenia neuroimaging data collected from over 450
schizophrenia patients and healthy controls. 141 schizophrenia patients, bipolar disorder patients
and 134 healthy controls from this public dataset were included in training set after quality
control. A similar dataset of 148 schizophrenia, bipolar disorder patients and 76 healthy controls.
Data Augmentation
Since the image classes are heavily imbalanced, we augment the training data to get balanced
distribution among the classes. We mirror and rotate the images to create new augmented data
set
Splitting of data
After cleaning the data, data is normalized in training and testing the model. When data is spitted
then we train algorithm on the training data set and keep test data set aside. This training process
will produce the training model based on logic and algorithms and values of the feature in
training data. Basically aim of feature extraction is to bring all the values under same scale.
A dataset used for machine learning should be partitioned into three subsets — training,
test, and validation sets.
Training set: -A data scientist uses a training set to train a model and define its optimal
parameters — parameters it has to learn from data.
Test set: - A test set is needed for an evaluation of the trained model and its capability for
generalization. The latter means a model’s ability to identify patterns in new unseen data
after having been trained over a training data. It’s crucial to use different subsets for
32
training and testing to avoid model over fitting, which is the incapacity for generalization
we mentioned above.
Classification
When data has been ready we apply Deep Learning Technique. We use different classification
and ensemble techniques, to predict mental illness. The methods applied on brain MRI dataset.
Main objective to apply Deep Learning Techniques to analyze the performance of these methods
and find accuracy of them, and also been able to figure out the responsible/important feature
which play a major role in prediction.
Classifier Training:
A classifier is a function that takes features as input and generates a class label prediction. Based on the
learning function and underlying assumptions, different types of classifiers can be developed.
Neuroimaging studies have applied various classifiers for mental illness prediction. The dimensionality
issue associated with the relatively large number of features and the small number of samples should be
accounted for while applying such classification algorithms.
CNN is a type of Neural Networks widely used for image recognition and image
classification. CNN uses supervised learning. CNN consists of filters or neurons that have
biases or weights. Every filter takes some inputs and performs convolution on the
acquired input. The CNN classifier has four layers; Convolutional, pooling, Rectified
Linear Unit (ReLU), and Fully Connected layers.
i. Convolutional layer
This layer extracts the features from the image which is applied as input. The neurons
convolve the input image and produce a feature map in the output image and this output
image from this layer is fed as an input to the next convolutional layer.
33
iii. ReLu layer
ReLu is a non-linear operation which replaces all the negative values in the feature map
by zero. It is an element wise operation.
iv. Fully Connected layer
FLC means that each filter in the previous layer is connected to each filter in the next
layer. This is used to classify the input image based on the training dataset into various
classes.
It has four phases:
1. Model construction
2. Model training
3. Model testing
4. Model evaluation
Model construction depends on machine learning algorithms. In this projects case, it was
Convolution Neural Networks. After model construction it is time for model training.
Here, the model is trained using training data and expected output for this data. Once the
model has been trained it is possible to carry out model testing. During this phase a
second set of data is loaded. This data set has never been seen by the model and therefore
34
it’s true accuracy will be verified. After the model training is complete, the saved model
can be used in the real world. The name of this phase is model evaluation.
VGG16 model:
Transfer learning generally refers to a process where a model trained on one problem is
used in some way on a second related problem. In deep learning, transfer learning is a
technique whereby a neural network model is first trained on a problem similar to the
problem that is being solved. One or more layers from the trained model are then used in
a new model trained on the problem of interest.
Transfer learning has the benefit of decreasing the training time for a neural network
model and can result in lower generalization error.
The weights in re-used layers may be used as the starting point for the training process
and adapted in response to the new problem. This usage treats transfer learning as a type
of weight initialization scheme. This may be useful when the first related problem has a
lot more labeled data than the problem of interest and the similarity in the structure of the
problem may be useful in both contexts.
35
PERFORMANCE MATRICES:
Data was divided into two portions, training data and testing data, both these portions consisting
70% and 30% data respectively. All these six algorithms were applied on same dataset using
Enthought Canaopy and results were obtained.
Predicting accuracy is the main evaluation parameter that we used in this work. Accuracy can be
defied using equation. Accuracy is the overall success rate of the algorithm.
CONFUSION MATRIX:
It is the most commonly used evaluation metrics in predictive analysis mainly because it is very
easy to understand and it can be used to compute other essential metrics such as accuracy, recall,
precision, etc. It is an NxN matrix that describes the overall performance of a model when used
on some dataset, where N is the number of class labels in the classification problem.
36
All predicted true positive and true negative divided by all positive and negative. True Positive
(TP), True Negative (TN), False Negative (FN) and False Positive (FP) predicted by all
algorithms are presented in table.
True positive (TP) indicates that the positive class is predicted as a positive class, and the
number of sample positive classes was actually predicted by the model.
False negative indicates (FN) that the positive class is predicted as a negative class, and the
number of negative classes in the sample was actually predicted by the model.
False positive (FP) indicates that the negative class is predicted as a positive class, and the
number of positive classes of samples was actually predicted by the model.
True negative (TN) indicates that the negative class is predicted as a negative class, and the
number of sample negative classes was actually predicted by the model.
37
2.5.2 DATA FLOW DIAGRAM:
38
2.5.4 CLASS DIAGRAM:
The class diagram is a static diagram. It represents the static view of an application. Class diagram is not
only used for visualizing, describing and documenting different aspects of a system but also for
constructing executable code of the software application
39
2.5.6 ACTIVITY DIAGRAM:
The Activity Diagram forms effective while modeling the functionality of the system. Hence this diagram
reflects the activities, the types of flows between these activities and finally the response of objects to
these activities.
The below state chart diagram describes the flow of control from one state to another state
(event) in the flow of the events from the creation of an object to its termination.
40
ACTIVITY DIAGRAM FOR BACK END:
41
CHAPTER 3
SOFTWARE SPECIFICATION
3.1 GENERAL
ANACONDA
It is a free and open-source distribution of the Python and R programming languages for
scientific computing (data science, machine learning applications, large-scale data processing,
predictive analytics, etc.), that aims to simplify package management and deployment.
The big difference between Conda and the pip package manager is in how package dependencies
are managed, which is a significant challenge for Python data science and the reason Conda
exists. Pip installs all Python package dependencies required, whether or not those conflict with
other packages you installed previously.
42
So your working installation of, for example, Google Tensorflow, can suddenly stop working
when you pip install a different package that needs a different version of the Numpy library.
More insidiously, everything might still appear to work but now you get different results from
your data science, or you are unable to reproduce the same results elsewhere because you didn't
pip install in the same order.
Conda analyzes your current environment, everything you have installed, any version limitations
you specify (e.g. you only want tensorflow>= 2.0) and figures out how to install compatible
dependencies. Or it will tell you that what you want can't be done. Pip, by contrast, will just
install the thing you wanted and any dependencies, even if that breaks other things.Open source
packages can be individually installed from the Anaconda repository, Anaconda Cloud
(anaconda.org), or your own private repository or mirror, using the conda install command.
Anaconda Inc compiles and builds all the packages in the Anaconda repository itself, and
provides binaries for Windows 32/64 bit, Linux 64 bit and MacOS 64-bit. You can also install
anything on PyPI into a Conda environment using pip, and Conda knows what it has installed
and what pip has installed. Custom packages can be made using the conda build command, and
can be shared with others by uploading them to Anaconda Cloud, PyPI or other repositories.The
default installation of Anaconda2 includes Python 2.7 and Anaconda3 includes Python 3.7.
However, you can create new environments that include any version of Python packaged with
conda.
43
Anaconda Navigator is a desktop Graphical User Interface (GUI) included in Anaconda
distribution that allows users to launch applications and manage conda packages, environments
and channels without using command-line commands. Navigator can search for packages on
Anaconda Cloud or in a local Anaconda Repository, install them in an environment, run the
packages and update them. It is available for Windows, macOS and Linux.
JupyterLab
Jupyter Notebook
QtConsole
Spyder
Glueviz
Orange
Rstudio
Visual Studio Code
Microsoft .NET is a set of Microsoft software technologies for rapidly building and integrating
XML Web services, Microsoft Windows-based applications, and Web solutions. The .NET
Framework is a language-neutral platform for writing programs that can easily and securely
interoperate. There’s no language barrier with .NET: there are numerous languages available to
44
the developer including Managed C++, C#, Visual Basic and Java Script. The .NET framework
provides the foundation for components to interact seamlessly, whether locally or remotely on
different platforms. It standardizes common data types and communications protocols so that
components created in different languages can easily interoperate.
“.NET” is also the collective name given to various software components built upon the .NET
platform. These will be both products (Visual Studio.NET and Windows.NET Server, for
instance) and services (like Passport, .NET My Services, and so on).
Python is a powerful multi-purpose programming language created by Guido van Rossum. It has
simple easy-to-use syntax, making it the perfect language for someone trying to learn computer
programming for the first time. Python features are:
Easy to code
Free and Open Source
Object-Oriented Language
GUI Programming Support
High-Level Language
Extensible feature
Python is Portable language
45
Python is Integrated language
Interpreted
Large Standard Library
Dynamically Typed Language
3.3 PYTHON:
Python is a powerful multi-purpose programming language created by Guido van
Rossum.
It has simple easy-to-use syntax, making it the perfect language for someone
trying to learn computer programming for the first time.
Features Of Python :
1.Easy to code:
Python is high level programming language. Python is very easy to learn language as compared
to other language like c, c#, java script, java etc. It is very easy to code in python language and
anybody can learn python basic in few hours or days. It is also developer-friendly language.
3.Object-Oriented Language:
One of the key features of python is Object-Oriented programming. Python supports object
oriented language and concepts of classes, objects encapsulation etc.
5. High-Level Language:
Python is a high-level language. When we write programs in python, we do not need to
remember the system architecture, nor do we need to manage the memory.
6.Extensible feature:
Python is a Extensible language. we can write our some python code into c or c++ language and
also we can compile that code in c/c++ language.
46
7. Python is Portable language:
Python language is also a portable language. for example, if we have python code for windows
and if we want to run this code on other platform such as Linux, Unix and Mac then we do not
need to change it, we can run this code on any platform.
9. Interpreted Language:
Python is an Interpreted Language. because python code is executed line by line at a time. like
other language c, c++, java etc there is no need to compile python code this makes it easier to
debug our code. The source code of python is converted into an immediate form called bytecode.
APPLICATIONS OF PYTHON :
WEB APPLICATIONS
You can create scalable Web Apps using frameworks and CMS (Content Management
System) that are built on Python. Some of the popular platforms for creating Web Apps
are:Django, Flask, Pyramid, Plone, Django CMS.
Sites like Mozilla, Reddit, Instagram and PBS are written in Python.
There are numerous libraries available in Python for scientific and numeric computing.
There are libraries like:SciPy and NumPy that are used in general purpose computing.
And, there are specific libraries like: EarthPy for earth science, AstroPy for Astronomy
and so on.
Also, the language is heavily used in machine learning, data mining and deep learning.
47
3.2.2 CREATING SOFTWARE PROTOTYPES
Python is slow compared to compiled languages like C++ and Java. It might not be a
good choice if resources are limited and efficiency is a must.
However, Python is a great language for creating prototypes. For example: You can use
Pygame (library for creating games) to create your game's prototype first. If you like the
prototype, you can use language like C++ to create the actual game.
3.2.3 GOOD LANGUAGE TO TEACH PROGRAMMING
CHAPTER 4
IMPLEMENTATION
4.1 GENERAL
Python is a program that was originally designed to simplify the implementation of numerical
linear algebra routines. It has since grown into something much bigger, and it is used to
implement numerical algorithms for a wide range of applications
import os
import tqdm
import pandas as pd
import numpy as np
import cv2
import random
import tensorflow as tf
fromkeras.layersimportDense,Dropout,Flatten,Conv2D,MaxPool2D,GlobalAvgPool2D,GlobalM
axPooling2D
Images=[]
import os
49
for filename in filenames:
img=os.path.join(dirname,filename)
Images.append(img)
#shuffle two lists with same order
temp=list(zip(Images,class_label))
random.shuffle(temp)
Images,class_label=zip(*temp)
data=pd.DataFrame(list(zip(Images,class_label)),columns=['image_path','class_label'])
from sklearn.utils import resample
df_c0 = data[data['class_label']== 'schizoprehnia']
df_c1 = data[data['class_label']== 'healthy']
df_c2 = data[data['class_label']== 'bipolar']
#downsample the majority class
upsampled_df_c0=resample(df_c0,replace=True,random_state=23,n_samples=300)
upsampled_df_c1=resample(df_c1,replace=True,random_state=23,n_samples=300)
upsampled_df_c2=resample(df_c2,replace=True,random_state=23,n_samples=300)
50
#img_bgr = np.array(img_bgr)
return resized
from sklearn.preprocessing import LabelBinarizer
enc = LabelBinarizer()
y = enc.fit_transform(data['class_label'])
Splitting the data
x_train,x_test,y_train,y_test=train_test_split(images,y,test_size=0.3,stratify=y,random_state=43)
y_train[:5]
MODEL BUILDING
CNN
# Set the CNN model
batch_size = None
model = Sequential()
model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same',
activation ='relu', batch_input_shape = (batch_size,48, 48, 3)))
51
activation ='relu'))
model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.4))
model.add(GlobalMaxPooling2D())
model.add(Dense(256, activation = "relu"))
model.add(Dropout(0.5))
model.add(Dense(3, activation = "softmax"))
model.summary()
VGG-16
# Set the vgg16 model
model_1=Sequential()
model_1.add(vgg)
model_1.add(Flatten())
model_1.add(Dense(128, activation='relu'))
model_1.add(Dropout(0.2))
model_1.add(Dense(3, activation='softmax'))
4.3 SNAPSHOTS
52
53
54
55
CHAPTER 5
5.1 CONCLUSION AND REFERENCES
Recent years have witnessed the increasing use of DL algorithms in healthcare and medicine. In
this study, we reviewed existing studies on DL applications to study mental health outcomes. All
the results available in the literature reviewed in this work illustrate the applicability and promise
of DL in improving the diagnosis and treatment of patients with mental health conditions. Also,
this review highlights multiple existing challenges in making DL algorithms clinically actionable
for routine care, as well as promising future directions in this field.
56
5.2 REFERENCES
1. World Health Organization. The World Health Report 2001: Mental Health: New
Understanding, New Hope (World Health Organization, Switzerland, 2001).
2. Marcus, M., Yasamy, M. T., van Ommeren, M., Chisholm, D. & Saxena, S. Depression: A
Global Public Health Concern (World Federation of Mental Health, World Health Organisation,
Perth, 2012).
3. Hamilton, M. Development of a rating scale for primary depressive illness. Br. J. Soc. Clin.
Psychol. 6, 278–296 (1967).
4. Dwyer, D. B., Falkai, P. & Koutsouleris, N. Machine learning approaches for clinical
psychology and psychiatry. Annu. Rev. Clin. Psychol. 14, 91–118 (2018).
5. Lovejoy, C. A., Buch, V. & Maruthappu, M. Technology and mental health: the role of
artificial intelligence. Eur. Psychiatry 55, 1–3 (2019).
6. Wongkoblap, A., Vadillo, M. A. & Curcin, V. Researching mental health disorders in the era
of social media: systematic review. J. Med. Internet Res. 19,e228 (2017).
7. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436 (2015).
8. Miotto, R., Wang, F., Wang, S., Jiang, X. & Dudley, J. T. Deep learning for healthcare:
review, opportunities and challenges. Brief. Bioinformatics 19,1236–1246 (2017).
9. Durstewitz, D., Koppe, G. & Meyer-Lindenberg, A. Deep neural networks in psychiatry. Mol.
Psychiatry 24, 1583–1598 (2019).
10. Vieira, S., Pinaya, W. H. & Mechelli, A. Using deep learning to investigate the neuroimaging
correlates of psychiatric and neurological disorders: methods and applications. Neurosci.
Biobehav. Rev. 74, 58–75 (2017).
11. Shatte, A. B., Hutchinson, D. M. & Teague, S. J. Machine learning in mental health: a
scoping review of methods and applications. Psychol. Med. 49, 1426–1448 (2019).
12. Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, Cambridge,
2012).
13. Biship, C. M. Pattern Recognition and Machine Learning (Information Science and
Statistics) (Springer-Verlag, Berlin, 2007).
57
14. Bengio, Y., Simard, P. & Frasconi, P. Learning long-term dependencies with gradient
descent is difficult. IEEE Trans. Neural Netw. Learn. Syst. 5, 157–166 (1994).
15. LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to
document recognition. Proc. IEEE 86, 2278–2324 (1998).
16. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y. & Manzagol, P. A. Stacked denoising
autoencoders: learning useful representations in a deep network with a local denoising criterion.
J. Mach. Learn. Res. 11, 3371–3408 (2010).
17. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-
propagating errors. Cogn. modeling. 5, 1 (1988).
18. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780
(1997).
19. Cho, K., Van Merriënboer, B., Bahdanau, D. & Bengio, Y. On the properties of neural
machine translation: encoder-decoder approaches. In Proc. SSST-8, Eighth Workshop on Syntax,
Semantics and Structure in Statistical Translation 103–111 (Doha, Qatar, 2014).
20. Liou, C., Cheng, W., Liou, J. & Liou, D. Autoencoder for words. Neurocomputing
139, 84–96 (2014).
58