0% found this document useful (0 votes)

73 views21 pages

Ai ML Important Questions

The document discusses different types of artificial neural networks including feed forward neural networks, fully connected neural networks, multi-layer perceptrons, and feedback neural networks. It describes the structure and information flow of each type of neural network. The document also covers the advantages and disadvantages of using artificial neural networks.

Uploaded by

neha praveen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views21 pages

Ai ML Important Questions

Uploaded by

neha praveen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

where, T is the training dataset, Op- and O;d are the desired target output and estimated

actual output, respectively, for a trairing instance d.

The principle of gradient descent is an optimization approach which is used to minimize
the cost funcion by converging to a local minimal point moving in the negative direction of the
gradient and each step size during movement is determined by the learning rate and the slope of
the gradient.
Gradient descent learning is the foundation of back propagation algorithm used in MLP.
Before we study about an MLP, let us first understand the different types of neural networks that
differ in their structure, activation function and learning mechanism.

10.5 TYPES OF ARTIFICIAL NEURAL NETWORKS

ANNs consist of multiple neurons arranged in layers. There are different types of ANNS
that differ by the network structure, activation function involved and the learning rules used.
In an ANN, there are three layers called input layer, hidden layer and output layer. Any general
ANN would consist of one input layer, one output layer and zero or more hidden layers.

Artificial Neural Networks . 289

10.5.1 Feed Forward Neural Network

This is the simplest neural network that consists of neurons which are arranged in layers and
the information is propagated only in the forward direction. This model may or may not contain
ahidden layer and there is no back propagation. Based on the number of hidden layers they
are further classiied into single-layered and multi-layered feed forward networks. These ANNS
are simple to design and easy to maintain. They are fast but cannot be used for complex learning.
They are used for simple classification and simple image processing, etc. The model of a
Feed Forward Neural Network is shown in Figure 10.7.

y
Output

Input layer Hidden layer Output layer

Figure 10.7: Model of a Feed Forward Neural Network

10.5.2 Fully Connected Neural Network

Fully connected neural networks are the ones in which all the neurons in a layer are connected
to all other neurons in the next layer. The model of a fully connected neural network is shown in
Figure 10.8.

Input layer Hidden layer Output layer

Figure 10.8: Model of a Fully Connected Neural Network

10.5.3 Multi-Layer Perceptron (MLP)

This ANN consists of multiple layers with one input layer, one output layer and one or more
hidden layers. Every neuron in a layer is connected to all neurons in the next layer and thus
they are fully connected. The information flows in both the directions. ln the forward direction,
the inputs are multiplied by weights of neurons and forwarded to the activation function of the

290 Machine Learning

neuron and output is passed to the next layer. If the output is incorrect, then in the backward
direction, error is back propagated to adjust the weights and biases to get correct output. Thus.
la
the network learns with the training data. This type ofANN is used in deep learning for complex
ir
classification, speech recognition, medical diagnosis, forecasting, etc. They are comparatively
complex and slow. The model of an MLP is shown in Figure 10.9.
10.5.3 Multi-Layer Perceptron (MLP)
This ANN consists of multiple layers with one input layer, one output layer and one or more
hidden layers. Every neuron in a layer is connected to all neurons in the next layer and thus
direction,
they are fully connected. The information flows in both the directions. In the forward
activation function of the
the inputs are multiplied by weights of neurons and forwarded to the

290 Machine Learning

Input layer Hidden layer Output layer

Figure 10.9: Model of a Multi-Layer Perceptron

10.5.4 Feedback Neural Network

Feedback neural networks have feedback connections between neurons that allow information
flow in both directions in the network. The output signals can be sent back to the neurons in the
same layer or to the neurons in the preceding layers. Hence, this network is more dynamic during
training. The model of a feedback neural network is shown in Figure 10.10.
Feedback 174/226

Input layer Hidden layer Output layer

Figure 10.10: Model of a Feedback Neural Network
10.10 ADVANTAGES AND DISADVANTAGES OF ANN
Advantages of ANN
ANN can solve complex problems involving non-linear processes.
ANNs can learn and recognize complex patterms and solve problems as humans solve a problem.
ANNs have a parallel processing capability and can predict in less time.
4. They have an ability to work with inadequate knowledge. It can even handle incomplete and
noisy data.
5. They can scale well to larger data sets and outperforms other learning mechanisms.
Limitations of ANN
1. An ANN requires processors with parallel processing capability to train the network running
for many epochs. The function of each node requires a CPU capability which is difficult for very
large networks with alarge amount of data.
2. They work like a black box' and it is exceedingly difficult to understand their working in inner
layers. Moreover, it is hard to understand the relationship between the representations learned
at each layer.

Artificial Neural Networks 307

a The modelling with ANN is also extremely complicated and the development takes a much
longer time.
algorithms.
AGenerally, neural networks require more data than traditional machine learning
and they do not perform well on small datasets.
learning techniques.
5. They are also more computationally expensive than traditional
Challenges of Clustering Algorithms
A huge collection of data with higher dimensions (i.e., features or attributes) can pose a
problem for clustering algorithms. With the arrival of the internet, billions of data are available
for clustering algorithms. This is a diffcult task, as scaling is always anlower
issuedimension
with clustering
data
algorithms. Scaling is an issue where some algorithms work with
but do not perform well for higher dimension data. Also, units of data can post a problem,
like some weights in kg and some in pounds can pose a problem in clustering, Designing
a proximity measure is also a big challenge.
The advantages and disadvantages of the cluster analysis algorithms are given in Table 13.2.
Table 13.2: Advantages and Disadvantages of Clustering Algorithms
S.No. Advantages Disadvantages
1. Cluster analysis algorithms can handle missing Cluster analysis algorithms are sensitive to
data and outliers. initialization and order of the input data.
Can help classifiers in labelling the unlabelled Often, the number of clusters present in the
dala. Semi-supervised algorithn1s use cluster data have to be specified by the user.
analysis algorithms to label the unlabelled data
and then use classifiers to classity them.
(Contiued)

364 Machine Learning

S.No. Advantages Disadvantages

3. Itiseasy to explain the cluster analysisalgorithms Scaling is aproblem. Sc
and to implement them.
4. Clustering is the oldest technique in statisticsDesigning a proximity measure for the given
and it is easy to explain. It is also relatively easy data is an issue. CA
to implement.
ed
Step 4: Repeat the steps 2-3 till change is minimal within the
do not change at all. threshold value or parameters

13.8 CLUSTER EVALUATION METHODS

Scan for informotion on 'Purity', 'Evoluation bosed on Ground Truth, and

'Similarity-bosed Measures'

Evaluation clustering algorithms is a difficult task, as often, no benchmark data is available

as in classification. Also, in
clustering algorithms, domain knowledge is absent most of the times.
So, clustering algorithms' validation is difficult as
compared to the validation of
algorithms. There are three types of measurcs that can be used for cluster validation: classification
1. Internal
2. External
3. Relative
Internal metrics quantify the quality of clustering without the use of any
or knowledge. External metrics use the ground external information
truth or
quality of the validation. In relative measure, different externally supplied labels to quantify the
cluster algorithms are compared, or the
algorithm is run with multiple parameter values. This measure helps in finding optimal clusters.
Basically, two measures of information measures, that is
the idea that the objects in the cluster should be same and cohesion and separation, are based on
objects across clusters should be distinct.
Alternatively, the average distance within the cluster should be small and average distance across
the clusters should be large.

Cohesion and Separation

Cohesion (or compact) measures how close the samples are inside the
clusters are homogeneous. Cohesion is measured as sum of squared cluster. This ensures that the
and the centroid. The within cluster sum is given as: errors between the samples
N

(13.17)
Here, N is the number of clusters, C is the set of
centroids, x, is the centroid and m, is the
samples. A lower within cluster variation is a necessary
high cohesion. condition for greater compactness and

Clustering Algorithms 383

Separation indicates how well a sample differs from other clusters. This is measured as the
weighted sum of the differences of the centroid of the dataset and the centroid of the generated
clusters. This is given as:
(13.18)
Here, x is the centroid of the entire dataset, x. is the centroid of the clusters and C. is the size
of the clusters. A larger distance is required for well-separated clusters so that the clusters are
perfectly distinct. Sometimes, the connectivity between asample and other member of that cluster
may be important, indicating the sort of samples that can be put into the clusters. The connectivity
value ranges from 0to infinity. Dunn index can be computed as:
ax separation
Index = (13.19)
Bx compactness
Here, a and ß are parameters. Dunn index is a useful measure that can combine both cohesion
and separation.

Silhouette Co-eficient
Silhouette coefficient combines both cohesion and separation. The Silhouette coefficient measures
the average distance between clusters. It is given as follows:

s, = (13.20)
maxlb,,a,]
Here, a, is the distance between the sample and centroid of the same cluster and b, is the
distance between the sample and the nearest centroid. The silhouette coefficients of the individual
objects can be summed to get for the entire cluster as S, given as:
(13.21)
The value of the silhouette coefficient s, is between -l and +1. When it is closer to1, the clusters
are well formed. The value is zero when the data points are between two clusters and negative
when the clusters are not formed correctly.

Summary
1. Clustering is a technique of partitioning the objects with many attributes into meaningful disjoint
subgroups.
Ouantitative variables use distance measures, Euclidean distance, Manhattan distance and
Example 6.3: Assess a student's performance during his course of study and predict whether
T consists
a student will get a job offer or not in his final year of the course. The training dataset
of 10 data instances with attributes such as 'LGPA, 'Interactiveness', 'Practical Knowledge' and
'Communication Skills' as shown in Table 6.3. The target class attribute is the 'Job Offer.
Table 6.3: Training Dataset T
Interactiveness Practical Knowledge Communication Skills Job Offer
S.No. CGPA
Good Yes
1. Yes Very good
No Good Moderate Yes
2. 28
Poor No
3 29 No Average
4 <& No Average Good No
Good Moderate Yes
5 28 Yes
Yes Good Moderate Yes
>9
8 Yes Good Poor No
7
No Very good Good Yes
8
Yes Good Good Yes
28
Good Yes
28 Yes Average

164 Machine Learning

Solution:
Step 1:
Calculate the Entropy for the target class Yob Ofer.
Entropy_Info(Target Attribute =Job Offer) =Entropy_Info(7, 3) =
7 7.3 3
Iteration 1:
=io810*los,-4-0.3599 +-0.5208) =0.807
Step 2:
Cálculate the Entropy_Info and Gain(Information_Gain) for each of the attribute the training
dataset.
Table 6.4 shows the number of data instances classified with Job Offer as Yes or No for the attribute
CGPA.
164 Machine Learning

Solution:
Step 1:
Calculate the Entropy for the target class Job Offer.
Entropy_Info(larget Attribute =Job Offer) =Entropy_Info(7, 3) =
7 7,3 3
Iteration 1:
=lo%,0*olos, -0.3599 +-0.5208) =0.8807
Step 2:
Calculate the Entropy_Info and Gain(Information_Gain) for each of the attribute in the training
dataset.

Table 6.4 shows the number of data instances classified with Job Ofer as Yes or No for the attribute
CGPA.
Table 6.4: Entropy Information for CGPA
CGPA Job Offer = Yes Job Offer = No Total Entropy
3 1 4
4 0 4
2 2
atti
Entropy_Info(T, CGPA)
4| 3
10 4o 4 10 log, *o8,; -

4
10
(0.3111 +0.4997) +0 +0
=0.3243
Gain (CGPA) = 0.8807-0.3243
=0.5564
Table 6.5 shows the number of data instances classified with Job Offer as Yes or No for the
attribute Interactiveness.
Table 6.5: Entropy Information for Interactiveness
Interactiveness Job Offer Yes Job Ofter = No Total Entropy
YES 5 1 6
NO 2 2 4

Entropy_Info(T, Interactiveness) =
6 4
=(0.2191
10 +0.4306) +(0.4997
10 +0.4997)
=0.3898+0.3998 = 0.7896
Gain(Interactiveness) = 0.8807- 0.7896
=0.0911
Table 6.6 shows the number of data instances classified with Job Offer as Yes or No for the
attribute Practical Knowledge.
Decision Tree Learning " 165

Table 6.6: Entropy Information for Practical Knowledge

Practical Knowledge Job Offer = Yes Job Offer = No Total Entropy
Very Good 2 2

Average 1 2 3
Good 4 5

Entropy_
Info(T, Practical Knowiedge)

5
-0os30 +0.3897) +10(0.2574 +0.4641)
10
=0+0.2753 + 0.3608
= 0.6361

Gain(Practical Knowledge) =0.8807 - 0.6361

=0.2446
Table 6.7 hows the number of data instances classified with Job Offer as Yes or No for the
attribute Communication Skills.
Table 6.7: Entropy Information for Communication Skills
Communication Skills Job Offer =YesJob Offer No Total
Good 4 1 5

Moderate 3 3
Poor 0 2 2

Entropy_Info(T, Communication Skills)

3 3 3 0, 0 2
-lo8,-log,lo8,lo,o
10|
3 2
5
(0.5280 +0.3897) +(0) +0)
10
= 0.3609

Gain(Communication Skills) = 0.8813 - 0.36096

=0.5203
The Gain calculated for all the attributes is shown in Table 6.8:
Table 6.8: Gain
Attributes Gain
CGPA 0.5564
Interactiveness 0.0911

Practical Knowledge 0.2246

Communication Skills 0.5203
166 Machine Learning
therefore the gain
for which entropy is minimum and
G
From Table 6.8, choose the attribute
Step 3: Er
attribute.
is maximum as the best split
is CGPA since it has the maximum gain. So, we choose CGPA as the
The best split attribute entropy
three distinct values for CGPA with outcomes 29, 28 and <&. The
root node. There are Offer =No for
and <& with all instances classified as Job Offer =Yes for 28 and Job
value is 0 for 28 The tree grows with the subset of instances
with
both 28 and <& end up in a leaf node.
&. Hence,
CGPA 29 as shown in Figure 6.3.
CGPA

Gain
Pratical Communication Job Skil
nteractivenessKnowledge Skills offer Job offer-Yes
Very good GoodYes
Yes
PoO No
No Average
Moderate
Yes Good
Very good GOod
NO
1
Figure 6.3: Decision Tree After Iteration
data instances branched with
CGPA >9
same process for the subset of
Now, continue the
Iteration 2: 112/226
are repea
In this iteration, the same process of computing the Entropy_Info and Gain
of 4 data instances as shown in the above Figure 6.3.
subset of training set. The subset consists
Entropy_Info(T) =Entropy_Info(3, 1) =

={-0.3111 + -0.4997)
=0.8108

Entropy_Info(T, Interactiveness)
=0+0.4997
Gain(Interactiveness) =0.8108 -0.4997
=0.3111
Entropy_Info(T, Practical Knowledge)
Decision Tree Learning 167

in Gain(Practical Knowledge) = 0.8108

Entropy_Info(T, Communication Skills)
y 1
=0
th
Gain(Communication Skills) =0.8108
The gain calculated for all the attributes is showA in Table 6.9.
Table 6:9: Total Gain
Attributes 9Gain
Interactiveness 0.3111
Practical Knowledge 0.8108
Communication Skills 0.8108

Here, both the attributes "Practical Knowledge' and "Communication Skills' have the same
Gain. So, we can either construct the decision tree using 'Practical Knowledge' or'Communication
Skills'. The final decision tree is shown in Figure 6.4.
28 <8 113/226
CGPA

Job offerYes Job offer No

Good Very
good

Average

Job offer =Yes Job offer= No Job offere Yes

Figure 6.4: Final Decision Tree

6.2.2 C4.5 Construction

C4.5 is an improvement over ID3. C4.5 works with continuous and discrete attributes and missing
values, and it also supports post-pruning. C5.0 is the successor of C4.5 and is more efficient and
used for building smaller decision trees. C4.5 works with missing values by marking as '", but
these missing attribute values are not considered in the calculations.
Decision Tree Learning 169

which are calculated in ID3

Example 6.4: Make use of Information Gain of the attributes
algorithm in Example 6.3 to construct a decision tree using C4.5.
Solution:
Iteration 1:
Step 1: Calculate the Class_Entropy for the target class "Job Offer'.
Entropy_Info(Target Attribute =Job Offer) =Entropy_Info(7, 3) =
7,3 3

=(-0.3599 + -0.5208)
=0.8807
for each of the attribute
Step 2: Calculate the Entropy_Info, Gain(Info_Gain), Split_Info, Gain_ Ratio
in the training dataset.
CGPA:
4 3 4
Entropy Info(T, CGPA)=-log,
10 4 4

10 2%22 2
4
(0.3111 +0.4997) +0 + 0
10
= 0.3243
Gain(CGPA) =0.8807 -0.3243
-0.5564
4 4 4 2 2
Split_Info(T, CGPA) =-log, 10 10 log, 10 1n62 10
=0.5285 +0.5285+0.4641
=1.5211

Gain Ratio(CGPA) =(Gain(CGPA)/(Split_Info(T, CGPA)

0.5564
=0.3658
1.5211
Interactiveness:
5 lo16 2 210%:4
Entropy Info(T, Interactiveness) = 10 6
62l08:6 6

4
(0.2191 +0.4306) +(0.4997
10
+0.4997)
4 4

0.3898 + 0.3998 = 0.7896

Gain(Interactiveness) =0.8807 -0.7896=0.0911
4 4
Gain(Interactiveness) =
Olog: 10 10 log, 10
0.9704
170 " Machine Learning

Gain(Interactiveness)
Gain Ratio(Interactiveness) Split_Info(T, Interactiveness)
0.0911
0.9704
=0.0939
Practical Knowledge:

Entropy JInfo(T, Pracical Knowledge)

20,
log,-lo8, lo,1_1o827 Stef
10 attri
5
3
0.3897) +(0.2574 +0.4641)
-0) +(0.5280 +
10
The
=0+0.2753 + 0.3608 =0.6361

Gain(Practical Knowledge) =0.8807 - 0.6361

=0.2448
5 3,1061103
Split, Info(T, Practical Knowiedge) =olo8 10 10 10 10
=1.4853
Gain(Practical Knowledge)
Gain_Ratio(Practical Knowledge) = Split_Info(T, Practical Knowledge)
0.2448
1.4853
= 0.1648

Communication Skills:
Entropy_Info(T, Communication Skills) =
It
1o0)+2 T.
5
-(0.5280+03897) + (0)
10 R
- 0.3609
Gain(Communication Skills) =0.8813- 0.36096
=0.5202
5 3 2
Split_Info(T, Communication Skills) =-log, -log, log,
10
1n
=1.4853
Gain(Communication Skills) t

Gain Ratio(Communication Skills) = Split_Info(T, Communication Skills)

0.5202 - 0.3502
1.4853
Decision TreeLearning 171
Table 6.10 shows the Gain Ratiocomputed for all the attributes.
Table 6.10: Gain_Ratio
Attribute Gain Ratio
CGPA 0.3658
INTERACTIVENESS 0.0939
PRACTICAL KNOWLEDGE 0.1648
COMMUNICATION SKILLS0.3502
Step 3: Choose the attribute for which Gain Ratiois maximum as the best split attribute.
From Table 6.10, we can see that CGPA has highest gain ratio and it is selected as the best split
attribute. We can construct the decision tree placing CGPA as the root node showm in Figure 6.5.
The training dataset is split into subsets with 4data instances.
29
CGPA

Pratical Communication Job

InteractivenessKnowledge Skills Offer
Job offer= es Job offer No
Yes Very good Good Yes

No Average Poor No
Yes Good Moderate Yes

No Very good Good Yes

Figure 6.5: Decision Tree after Iteration 1

Iteration 2:
Total Samples: 4
Repeat the same process for this resultant dataset with 4 data instances.
Job Offer has 3 instances as Yes and 1 instance as No.

Entropy Info(Target Class =Job Offer) =-log,-lo

4
-0.3112 +0.5
-0.8112
Interactiveness:
Entropy_lnto(T, Interactiveness) 02 1
2 2 2
-0+0.4997
Gain(lnteractiveness) =0.8108 -0.4997 0.3111
172 " Machine Learning

2 =0.5 +0.5 =1
Split_Info(T, Interactiveness) =-log, -1
Gain(Interactiveness)
Gain_Ratio(Interactiveness) =
SplitInfo(T, Interactiveness)
0.3112
0.3112
1

Practical Knowledge:
Entropy Into(T, Pracical Knowiedge)
2
8,-lo8,o,-.

=0
Gain(Practical Knowledge) =0.8108

Split Info(T, Pracical Knowledge) =-log,-log, -log, ; =1.5

Gain(Practical Knowledge) 0.8108
=0.5408
Gain_ Ratio(Practical Knowledge) Split_Info(T, Practical Knowledge) 15

Communication Skills:

Entropy_Info(T, Communication Skills)

T
C

=0
Gain(Communication Skills) = 0.8108

Split Info(T, Communication Skill) =-o,-log, -o8,

4 -15
Gain(Practical Knowledge) 0.8108
= 0.5408
Gain Ratio(Communication Skills) = Split_Info(T, Practical Knowledge) 1.5

Table 6.11 shows the Gain_Ratio computed for all the attributes.
Table 6.11: Gain-Ratio
Attributes Gain Ratio

Interactiveness 0.3112
Practical Knowledge 0.5408
Communication Skills 0.5408

Both 'Practical Knowledge' and'Communication Skills' have the highest gairn ratio. So, the best
Skills', and therefore, the
splitting attribute can either be 'Practical Knowledge' or 'Communication
split can be based on any one of these.
shown in Figure 6.6.
Here, we split based on 'Practical Knowledge'. The final decision tree is
1 21 121
=0
Gain(Communication Skills) =0.8108
1
Split_ Info(T, Communication Skills) 2 1
=-7log,log,7827 -1.5
Gain(Practical Knowledge) 0.8108
Gain Ratio(Communication Skills) =0.5408
Split_Info(T, Practical Knowledge) 1.5

Table 6.11 shows the Gain Ratiocomputed for all the attributes.
Table 6.11: Gain-Ratio
Attributes Gain Ratio

Interactiveness 03112
0.5408
Practical Knowledge
Communication Skills 0.5408

gain ratio. So, the best

Both 'Practical Knowledge' and'Communication Skills' have the highestSkills', and therefore, the
splitting attribute can either be'Practical Knowledge' or'Communication
split can be based on any one of these.
in Figure 6.6.
Here, we split based on »Practical Knowledge'. The final decision tree is shown

Decision TreeLearning 173

>8
CGPA

Yes
Job offer= Job offer =No

PK
Good Very
good

Average

Job offer Yes Job offer =NO Job offer Yes

Figure 6.6: Final Decision Tree

Algorithm 13.3: k-means Algorithm
Step 1: Determine the number of clusters before the algorithm is started. This is called k.
Step 2: Choose k instances randomly. These are initial cluster centers.
Step 3: Compute the mean of the initial clusters and assign the remaining sample to the
closest cluster based on Euclidean distance or any other distance measure between
the instances and the centroid of the clusters.
Step 4: Compute new centroid again considering the newly added samples.
Step 5: Perform the steps 34 till the algorithm becomes stable with no more changes in
assignment of instances and clusters.

kemeans can also be viewed as greedy algorithm as it involves partitioning n samples to

k clusters to minimize Sum of Squared Error (SSE). SSE is ametric that is a measure of error
It is
that gives the sum of the squared Euclidean distances of cach data to its closest centroid.
given as:
SSE - Edist(c,, x (13.14)
Here, c is the centroid of the cluster, x is the sample or data point and dist is the Euclidean
distance. The aim of the k-means algorithm is to minimize SSE.

374 Machine Learning

Advantages
1. Simple
2. Easy to implement
Disadvantages
1. It is sensitive to initialization process as change of initial points leads to different clusters.
2. If the samples are large, then the algorithm takes a lot of time.
How to Choose the Value of k?
It is obvious that kis the user specified value specifying the number of clusters that are present.
Obviously, there are no standard rules availalble to pick the value of k. Normally, the k-means
algorithm is run with multiple values of kand within group variance (sum of squares of samples
with its cerntroid) and plotted as a line graph. This plot is called Elbow curve. The optimal or best
value of kcan be determined from the graph. The optimal value of k is identified by the flat or
horizontal part of the Elbow curve.

Complexity
The complexity of k-means algorithm is dependent on the parameters like n, the number of
samples, k, the number of clusters, O(nkd). I is the number of iterations and d is the number of
attributes. The complexity of k-means algorithm is O().

12 E. Coneidor ho follourino eot of A- oion in T,hlo 1 o C1,,etor t4

Bayesian Learning " 235

hence it exhibits th same initial conditions every time the model is run and is likely to get asingle
possible outcome as the solution.
Bayesian learring differs from probabilistic learning as it uses subjective probabilities
(e. probability that is based on an individual's belief or interpretation about the outcome of an
event and it can change over time) to infer parameters of amodel. Two practical learning algorithms
called Naive Bayes learning and Bayesian Belief Network (BBN) form the major part ofBayesian
learning. These algorithms use prior probabilities and apply Bayes rule to iníer useful information.
Bayesian Belief Networks (BBN) is explained in detail in Chapter 9.

Scan for information on Probabiliy Theory and for 'Additional Examples'

8.2 FUNDAMENTALS OF BAYES THEOREM

Naive Bayes Model relies on Bayes theorem that works on the principle of three kinds of probabil
ites called prior probability, ikelihood probability, and posterior probability.
Príor Probability
It is the general probability of an uncertain event before an observation is seen or some evidence is
collected. It is the initial probability that is believed before any new information is collected.
Likelihood Probability
Likelihood probability is the relative probability of the observation occurring for each class or the
sampling density for the evidence given the hypothesis. It is stated as P(Evidence IHypothesis),
which denotes the likeliness of the occurrence of the evidence given the parameters.

Posterior Probability
It is the updated or revised probability of an event taking into account the observations from the
training data. P(Hypothesis IEvidence) is the posterior distribution representing the belief about
the hypothesis, given the evidence from the training data. Therefore,
Posterior probability prior probability +new evidence

8.3 CLASSIFICATION USING BAYES MODEL

Naive Bayes Classification models work on the principle of Bayes theorem. Bayes' rule is amathe
matical formula used determine the posterior probability, given prior probabilities of events.
Generally, Bayes theorem is used to select the most probable hypothesis from data, considering
both prior knowledge and posterior distributions. It is based on the calculation of the posterior
probabilityy and is stated as:
P(Hypothesis h I Evidence E)
where, Hypothesis his the target class to be classified and Evidence Eis the given test instance.

Machine Learning
236
calculated from the prior probability P (Hypothesis h), the
E).
P(Hypothesis hlEvidence E) is
lHypothesis h) and the marginal probability P (Evidence
likelihood probability P (Evidence E
P(Hypothesis h)
It can be written as: P(Evidence ElHypothesis h) (8.1)
Evidence E) = P(Evidence E)
P(Hypothesis h | bserying the training
Probability and 1S Stated as:
P(Hypothesis h I Bvidence E)
where, Hypothesis his the target class to be classified and Evidence Eis the given test instance.

Learning
236 . Machine
probability P (Hypothesis h), the
Evidence E) is calculated from the prior probability P (Evidence E).
P (Hypothesis hl IHypothesis h) and the marginal
likelihood probability P (Evidence E
It can be written as: P(Hypothesis )
P(Evidence ElHypothesis h) (8.1)
Evidence E) = P(Evidence E)
P (Hypothesis h I the training
probability of the hypothesis h without observing
probability that the
the prior
where, P(Hypothesis h) isevidence. It denotes the prior belief or the initial from the training
E
data or considering any (Evidence E) is the prior probability of the evidencethe marginal proba
hypothesish is correct. P is also called
without any knowledge of which hypothesis holds. It
dataset
bility. prior probability of Evidence E given Hypothesistheh.
Hypothesis h) is the training data that
P (Evidence ETprobability of the Evidence E after observing the of Hypothesis h
likelihood probability
It is the (Hypothesis h I Evidence E) is the posterior training data that the
hypothesis his correct. P probability of the hypothesish after observing the
can observe that:
given Evidence E. It is theother words, by the equation of Bayes Eq. (8.1), one
evidence E is correct. In Probability
Probability xLikelihood from
Posterior Probability a Prior posterior probability for a number of hypotheses,
helps in calculating the
Bayes theorem highest probability can be
selected.
formally defined as
which the hypothesis with the probable hypothesis from a set of hypotheses is
This selection of the most Hypothesis.
Maximum A Posteriori (MAP)
Posteriori (MAP) Hypothesis, haP
Maximum A value is considered as
which has the maximum
hypotheses, the hypothesis
Given a set of candidate probable hypothesis is called
hypothesis or most probable hypothesis. This most can be used to find the h
the maximum probable Hypothesis h, Bayes theorem Eq.
(8.1)
the Maximum APosteriori P(Hypothesishl Evidence E)
, =max,, h)
P(Evidence ElHypothesis h)P(Hypothesis
= max,H P(Evidence E)
(8.2)
lHypothesis h)P(Hypothesis h)
= max,, P(Evidence E

h,M.
Maximum Likelihood (ML) Hypothesis, probable, only P(EIh) is used
candidate hypotheses, if every hypothesis is equallymaximum likelihood for P (E Ih)
Given a set of gives the
hypothesis.The hypothesis that
to find the most probable Likelihood (ML) Hypothesis, h,: (8.3)
is called the Maximum h)
max,, P(Evidence EIHypothesis

Correctness of Bayes Theorem

a sample space S.
Consider two events A and B in
ATFTTFTTE
BFTTFTFTF

P (A) =5/8
D R) AI8

Bayesian Learning . 237

P(AIB) =2/4
P(BIA)=2/5
P (A IB) =P (BIA) P (A)/P (B) == 2/4
PBIA)=P (A IB) PB) /P (A == 2/5
Let us consider anumerical example to illustrate the use of Bayes theorem now:
Example 8.1: Consider aboy who has avolleyball tournament on the next day, but today he feels
sick. t is unusual that there is only a40% chance he would fall sick since he is ahealty boy. Now,
Find the probability of the boy participating in the tourmament. The boy is very much interested in
Figure 10,4: Artificial Neural Network Structure

10.3.3 Activation Functions

Activation functions are mathematical functions associated with each neuron in the neural network
that map input signals to output signals. It decides whether to fire a neuron or not based on the
input signals the neuron receives. These functions normalize the output value of each neuron either
between 0and 1or between -1 and 41. Typical activation functions can be linear or no-linear.

Artificial Neural Networks 283

one of the two groups

Linear functions are useful when the input values can be classified into any hand, are continuous
perceptrons. Non-linear functions, on the other
and are generally used in binary functions are useful in learning
functions that map the input in the range of (0, 1) or (-1, 1), etc. These
high-dimensional data or complex data such as audio, video and images.
ANNs:
Below are some of the activation functions used in
1. Identity Function or Linear Function
(10.4)
the value of x. This function
The value off() increases linearly or proportionally with
threshold. The output would be just the
is useful when we do not want to apply any between -o and +0.
weighted sum of input values. The output value ranges
2. Binary Step Function
(10.5)
|0 iff(r) <0
threshold value 0. If value of fx)
The output value is binary, i.e., 0 or 1 based on the
is greater than or equal to @, it outputs 1 or else it outputs 0.
3. Bipolar Step Function
1iff(x) >0 (10.6)
f)=1if f) <0
value is bipolar, i.e, +1 or-1 based on the threshold value . If value of
The output
f) is greater than or equal to 0, it outputs +1 or else it
outputs -1.
4. Sigmoidal Function or Logistic Function
(10.7)
o(r)= 1+e*
produces an S-shaped curve
It is a widely used non-linear activation function which
range of 0 and 1. It has a vanishing gradient problem,
and the output values are in the values.
very high input
i.e., no change in the prediction for very low input values and
5. Bipolar Sigmoid Function
1-e* (10.8)
o() = 1+e
It outputs values between -1 and +1.
6. Ramp Functions
1 ifx>1
fx) =x if 0sr<1 (10.9)
0 ifx<0
limits are fixed.
It is a linear function whose upper and lower
7. Tanh - Hyperbolic Tangent Function
sigmoid function which is also non-linear.
The Tanh function is a scaled version of the
values range between
It also suffers from the vanishing gradient problem. The output
-1 and 1.
2 (10.10)
tan h(r) = 1+e -1

284 .. Machine Learning

8. ReLu - Recified Linear Unit Function

This activation function is a typical function generally used in deep learning neural
network models in the hidden layers. It avoids or reduces the vanishing gradient problem.
This function outputs avalue of0 for negative input values and works like alinear function
if the input values are positive.
(r if x 0
1 (10.7)
o(*) = 1+e*
produces an S-shaped curve
It is a widely used non-linear activation function which
vanishing gradient problem,
and the output values are in the range of 0 and 1. It has a
very high input values.
i.e., no change in the prediction for very low input values and
5. Bipolar Sigmoid Function
1-e (10.8)
ox) =
1+ e*
It outputs values between -1 and +1.
6. Ramp Functions
if x>1
(10.9)
fx) =x if 0sxs1
|0 ifx<0
fixed.
It is a linear function whose upper and lower limits are
7. Tanh- Hyperbolic Tangent Function
function which is also non-linear.
The Tanh function is a scaled version of the sigmoid
values range between
It also suffers fronn the vanishing gradient problem. The output
-1 and 1.
2 (10.10)
tan h(r) = 1+e -1

284 . Machine Learning

8. ReLu -Rectified Linear Unit Function
This activation function is a typical function generally used in deep learning neural
network models in the hidden layers. It avoids or reduces the vanishing gradient problem.
This function outputs avalue of 0for negative input values and works like alinear function as

if the input values are positive.

Yx20
r)= max (0,x) =J*
|0 if x<0 (10.11)
9. Softmax Function
This is a non-linear function used in the output layer that can handle multiple
classes. It calculates the probability of each target class which ranges between 0and1,
The probability of the input belonging to a particular class is computed by dividing
the exponential of the given input value by the sum of the exponential values of all the
inputs.
s(z) = -where i= 0 ...k (10.12)
10.2 BIOLOGICAL NEURONS
and synapse. The body
A typical biological neuron has four parts called dendrites, soma, axon
and process it in the cell
of the neuron is called as soma. Dendrites accept the input information
10,000 neurons and through
body called soma. A single neuron is connected by axons to around
to another neuron. A neuron
these axons the processed information is passed from one neuron
threshold value and transmits signals to another
gets fired if the input information crosses
neuron through a synapse. Asynapse gets fired with an electrical impulse called spikes which
synaptic inputs from one neuron
are transmitted to another neuron. A single neuron can receive processes input information
or multiple neurons. These neurons form a network structure which
is shown in Figure 10.1.
and gives out a response. The simple structure of a biological neuron
Input

Output
Cell body
Axon

Input Synapse
Dendrites

Axon

Figure 10.1: ABiological Neuron

Artificial Neural Networks 281

10.3 ARTIFICIAL NEURONS

Artificial neurons are like biological neurons which are called as nodes. A node or a neuron can
receive one or more input information and process it. Artificial neurons or nodes are connected
by connection links to one another. Each connection link is associated with a synaptic weight. The
structure of a single neuron is shown in Figure 10.2.
Dendrites
X W,
W,
Input x Output
W, Axon
W

Cell body

Figure 10.2: An Artificial Neuron

10.3.1 Simple Model of an Artificial Neuron

The first mathematical model of a biological neuron was designed by McCulloch &Pitts in 1943.
It includes two steps:
1. It receives weighted inputs from other neurons
2. It operates with a threshold function or activation function
The received inputs are computed as a weighted sum which is given to the activation
function and if the sum exceeds the threshold yalue the neuron gets fired. The mathematical

Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
From Everand
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
César Pérez López
No ratings yet
Unit-4 Full
No ratings yet
Unit-4 Full
48 pages
Physics Informed Neural Network Theory and Applications
No ratings yet
Physics Informed Neural Network Theory and Applications
44 pages
Unit 1
No ratings yet
Unit 1
29 pages
CBSE Class 8 Mathematics Worksheet - Algebraic Expressions and Identities
82% (11)
CBSE Class 8 Mathematics Worksheet - Algebraic Expressions and Identities
6 pages
Image Processing 7
No ratings yet
Image Processing 7
193 pages
NUMERICAL METHODS Final 12
100% (1)
NUMERICAL METHODS Final 12
63 pages
Notes ML 02 Slides RNN ANN
No ratings yet
Notes ML 02 Slides RNN ANN
105 pages
Lecture 2
No ratings yet
Lecture 2
52 pages
Unit V
No ratings yet
Unit V
42 pages
Types of Neural Networks and Definition of Neural Network
No ratings yet
Types of Neural Networks and Definition of Neural Network
15 pages
Gas Leakage PPT
No ratings yet
Gas Leakage PPT
21 pages
Ann 4
No ratings yet
Ann 4
15 pages
Resource 20231031105419 Neural Networks
No ratings yet
Resource 20231031105419 Neural Networks
36 pages
AI&ML Unit 5
No ratings yet
AI&ML Unit 5
122 pages
ML Exam Prep
No ratings yet
ML Exam Prep
14 pages
5th Sem 21CS54 AI - Module1
No ratings yet
5th Sem 21CS54 AI - Module1
162 pages
SEC III Artificial Intelligence Question Bank
No ratings yet
SEC III Artificial Intelligence Question Bank
86 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
24 pages
Chapter 2 - Artificial Neural Networks
No ratings yet
Chapter 2 - Artificial Neural Networks
19 pages
Supervised Learning Network Introduction: Unit 2
No ratings yet
Supervised Learning Network Introduction: Unit 2
52 pages
DL Presentation
No ratings yet
DL Presentation
82 pages
Lecture Slides-Week13,14
No ratings yet
Lecture Slides-Week13,14
62 pages
Week-12 - Introduction To ML-NN-CNN
No ratings yet
Week-12 - Introduction To ML-NN-CNN
45 pages
DL - Unit I
No ratings yet
DL - Unit I
84 pages
ANN Introduction
No ratings yet
ANN Introduction
37 pages
MLT Unit 4
No ratings yet
MLT Unit 4
15 pages
ML-Lec10-Artificial Neural Networks
No ratings yet
ML-Lec10-Artificial Neural Networks
76 pages
Chapter 02 Understanding of Data
No ratings yet
Chapter 02 Understanding of Data
96 pages
Ann 2
No ratings yet
Ann 2
12 pages
21CS72 Solutions
No ratings yet
21CS72 Solutions
30 pages
Chapter 5
No ratings yet
Chapter 5
13 pages
Exp6 - Artificial Neural Networks
No ratings yet
Exp6 - Artificial Neural Networks
16 pages
A Neural Network Is A Series of Algorithms That Endeavors To Recognize Underlying
No ratings yet
A Neural Network Is A Series of Algorithms That Endeavors To Recognize Underlying
4 pages
Searching in Problem Solving AI - PPTX - 20240118 - 183824 - 0000
No ratings yet
Searching in Problem Solving AI - PPTX - 20240118 - 183824 - 0000
57 pages
DesignandAnalysisofAlgorithms - LabManual - 2021scheme Ay2022 2023 2021batch
No ratings yet
DesignandAnalysisofAlgorithms - LabManual - 2021scheme Ay2022 2023 2021batch
88 pages
Artificial Intelligence Artificial Neural Networks - : Introduction
No ratings yet
Artificial Intelligence Artificial Neural Networks - : Introduction
43 pages
Question Bank - 231ECC304T - Signals and Systems - Answer Key - Unit3
No ratings yet
Question Bank - 231ECC304T - Signals and Systems - Answer Key - Unit3
16 pages
Fast Reliable Algorithms For Matrices With Structure-Ed Kailith-Sayed
No ratings yet
Fast Reliable Algorithms For Matrices With Structure-Ed Kailith-Sayed
356 pages
2021-Ann-Preliminaries Important
No ratings yet
2021-Ann-Preliminaries Important
11 pages
Neuro Fuzzy - Session 2
No ratings yet
Neuro Fuzzy - Session 2
37 pages
#7 Univariate - Search
No ratings yet
#7 Univariate - Search
24 pages
L2 Neural Network
No ratings yet
L2 Neural Network
44 pages
Chapter 6 - Neural Networks
No ratings yet
Chapter 6 - Neural Networks
4 pages
247-Article Text-253-1-10-20150217
No ratings yet
247-Article Text-253-1-10-20150217
13 pages
Most Important Algorithms That You Should Know - PYOFLIFE
No ratings yet
Most Important Algorithms That You Should Know - PYOFLIFE
51 pages
Generics
No ratings yet
Generics
14 pages
02 Algoanalysis
No ratings yet
02 Algoanalysis
36 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
48 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
Review On Neural Network and Its Applications
No ratings yet
Review On Neural Network and Its Applications
27 pages
Unit 1
No ratings yet
Unit 1
9 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
12 Neural Network
No ratings yet
12 Neural Network
52 pages
Java Programming Exam Answers
No ratings yet
Java Programming Exam Answers
9 pages
19 - Lecture - 19 - Routh Hurwithz Criteria - Special Cases
No ratings yet
19 - Lecture - 19 - Routh Hurwithz Criteria - Special Cases
38 pages
DL Lect 4
No ratings yet
DL Lect 4
41 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
43 pages
Java Solution
No ratings yet
Java Solution
7 pages
Car Evaluation
No ratings yet
Car Evaluation
33 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
17 pages
Java Sol 1,4
No ratings yet
Java Sol 1,4
6 pages
VHDL Codes
No ratings yet
VHDL Codes
6 pages
Newton-Raphson Method
No ratings yet
Newton-Raphson Method
32 pages
Artificial Neural Networks - Lect - 1
No ratings yet
Artificial Neural Networks - Lect - 1
18 pages
Back Propagation
100% (1)
Back Propagation
27 pages
A Neural Network Is A Series of Algorithms That Endeavors To Recognize Underlying
No ratings yet
A Neural Network Is A Series of Algorithms That Endeavors To Recognize Underlying
4 pages
Error Calculation Good
No ratings yet
Error Calculation Good
5 pages
A Neural Network Is A Series of Algorithms That Endeavors To Recognize Underlying
No ratings yet
A Neural Network Is A Series of Algorithms That Endeavors To Recognize Underlying
4 pages
Credit Scoring Models Enhancement Using Support Vector Machines
No ratings yet
Credit Scoring Models Enhancement Using Support Vector Machines
6 pages
18CS42 Daa m2 Notes
No ratings yet
18CS42 Daa m2 Notes
30 pages
DCT Presentation1
100% (1)
DCT Presentation1
39 pages
Neural Networks
No ratings yet
Neural Networks
22 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
37 pages
18CS42 Daa m3 Notes
No ratings yet
18CS42 Daa m3 Notes
24 pages
2A5 Linear Programming - Simplex Method - Maximization Case 3RD FILE Try This SC
No ratings yet
2A5 Linear Programming - Simplex Method - Maximization Case 3RD FILE Try This SC
21 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
Graph Theory Report
No ratings yet
Graph Theory Report
9 pages
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Neuro-Fuzzy Systems and Their Applications: Bogdan
No ratings yet
Neuro-Fuzzy Systems and Their Applications: Bogdan
15 pages
AI Lab 1
No ratings yet
AI Lab 1
11 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
23 pages
Lesson 14 ANN Supervised
No ratings yet
Lesson 14 ANN Supervised
37 pages
Introduction To The Stiffness (Displacement) Method: Analysis of A System of Springs
No ratings yet
Introduction To The Stiffness (Displacement) Method: Analysis of A System of Springs
43 pages
Imp Question
No ratings yet
Imp Question
1 page
Artificial Neural Networks For Predicting Customer Expenses
No ratings yet
Artificial Neural Networks For Predicting Customer Expenses
6 pages
Linear-Algebra With Python
100% (2)
Linear-Algebra With Python
26 pages
Vehicular Pollution Modeling Using Artificial Neural Network Technique
No ratings yet
Vehicular Pollution Modeling Using Artificial Neural Network Technique
7 pages
Unit 6: Stability of Linear Control System
No ratings yet
Unit 6: Stability of Linear Control System
14 pages
Artificial Neural Network
100% (2)
Artificial Neural Network
20 pages
Minin Handout
No ratings yet
Minin Handout
13 pages
Multilayer Perceptron
No ratings yet
Multilayer Perceptron
24 pages
Digital Signal Processing June 2022
No ratings yet
Digital Signal Processing June 2022
8 pages
Pulse Code Modulation
No ratings yet
Pulse Code Modulation
5 pages
Instance Based Machine Learning
No ratings yet
Instance Based Machine Learning
6 pages
Improved DSP Algorithms For Coherent 16-QAM Transmission
No ratings yet
Improved DSP Algorithms For Coherent 16-QAM Transmission
2 pages
Design of FFT Spectrum
No ratings yet
Design of FFT Spectrum
7 pages
An Improvement To The Brent's Method
No ratings yet
An Improvement To The Brent's Method
6 pages
21CD72 Model Paper
No ratings yet
21CD72 Model Paper
2 pages
Digital Communicationsg
100% (1)
Digital Communicationsg
8 pages
Project Title: Minimal Spanning Tree: Batch: A
No ratings yet
Project Title: Minimal Spanning Tree: Batch: A
6 pages
IFEM HW04 Sol Ch08 PDF
No ratings yet
IFEM HW04 Sol Ch08 PDF
3 pages
Module - 7 Lecture Notes - 2 Mixed Integer Programming: y C B X
No ratings yet
Module - 7 Lecture Notes - 2 Mixed Integer Programming: y C B X
3 pages
Design and Analysis of Algorithms: Course Outline
No ratings yet
Design and Analysis of Algorithms: Course Outline
1 page
Bisection Method
No ratings yet
Bisection Method
9 pages

Ai ML Important Questions

Uploaded by

Ai ML Important Questions

Uploaded by

where, T is the training dataset, Op- and O;d are the desired target output and estimated

actual output, respectively, for a trairing instance d.

10.5 TYPES OF ARTIFICIAL NEURAL NETWORKS

Artificial Neural Networks . 289

10.5.1 Feed Forward Neural Network

Input layer Hidden layer Output layer

10.5.2 Fully Connected Neural Network

Input layer Hidden layer Output layer

10.5.3 Multi-Layer Perceptron (MLP)

290 Machine Learning

290 Machine Learning

Input layer Hidden layer Output layer

10.5.4 Feedback Neural Network

Input layer Hidden layer Output layer

Artificial Neural Networks 307

364 Machine Learning

S.No. Advantages Disadvantages

13.8 CLUSTER EVALUATION METHODS

Scan for informotion on 'Purity', 'Evoluation bosed on Ground Truth, and

Evaluation clustering algorithms is a difficult task, as often, no benchmark data is available

Cohesion and Separation

Clustering Algorithms 383

164 Machine Learning

Table 6.6: Entropy Information for Practical Knowledge

Gain(Practical Knowledge) =0.8807 - 0.6361

Entropy_Info(T, Communication Skills)

Gain(Communication Skills) = 0.8813 - 0.36096

Practical Knowledge 0.2246

in Gain(Practical Knowledge) = 0.8108

Job offerYes Job offer No

Job offer =Yes Job offer= No Job offere Yes

Figure 6.4: Final Decision Tree

6.2.2 C4.5 Construction

which are calculated in ID3

Gain Ratio(CGPA) =(Gain(CGPA)/(Split_Info(T, CGPA)

0.3898 + 0.3998 = 0.7896

Entropy JInfo(T, Pracical Knowledge)

Gain(Practical Knowledge) =0.8807 - 0.6361

Gain Ratio(Communication Skills) = Split_Info(T, Communication Skills)

Pratical Communication Job

No Very good Good Yes

Figure 6.5: Decision Tree after Iteration 1

Entropy Info(Target Class =Job Offer) =-log,-lo

Split Info(T, Pracical Knowledge) =-log,-log, -log, ; =1.5

Entropy_Info(T, Communication Skills)

Split Info(T, Communication Skill) =-o,-log, -o8,

gain ratio. So, the best

Decision TreeLearning 173

Job offer Yes Job offer =NO Job offer Yes

Figure 6.6: Final Decision Tree

kemeans can also be viewed as greedy algorithm as it involves partitioning n samples to

374 Machine Learning

12 E. Coneidor ho follourino eot of A- oion in T,hlo 1 o C1,,etor t4

Scan for information on Probabiliy Theory and for 'Additional Examples'

8.2 FUNDAMENTALS OF BAYES THEOREM

8.3 CLASSIFICATION USING BAYES MODEL

Correctness of Bayes Theorem

Bayesian Learning . 237

10.3.3 Activation Functions

Artificial Neural Networks 283

one of the two groups

284 .. Machine Learning

8. ReLu - Recified Linear Unit Function

284 . Machine Learning

if the input values are positive.

Figure 10.1: ABiological Neuron

Artificial Neural Networks 281

10.3 ARTIFICIAL NEURONS

Figure 10.2: An Artificial Neuron

10.3.1 Simple Model of an Artificial Neuron

You might also like