CE4715 Ch5 MachineLearn DataMining
CE4715 Ch5 MachineLearn DataMining
CHAPTER 5
Machine Learning (ML) and Data Mining (DM)
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 1
AI: Machine Learning (ML)
and Data Mining (DM)
• 5.1 Data Analysis
• 5.2 Preceptor, Linear Classifier:
• 5.3 Nearest Neighbour Method:
• Two/Many Classes, Approximation.
• Distance Relevance, Computation Time.
• 5.4 Decision Tree Learning
• 5.5 Cross Validation and Overfitting
• 5.6 Clustering
• K-Means and EM Algorithm
• Hierarchical Clustering
• 5.7 Data Mining
• KNIME tool
2
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 2
AI: Machine Learning
• Artificial Intelligence is the study of how to make computers do things at which, at the
moment, people are better.
• The structure of the intelligent behavior can become so complex that it is very difficult
or even impossible to program close to optimally, even with modern high-level
languages such as PROLOG and Python.
• Machine learning algorithms are even used today to program robots in a way similar to
how humans learn.
3
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 3
AI: Machine Learning
• Describes the important machine learning algorithms and applications (supervised and
unsupervised learning algorithms).
• As an important class of learning algorithms, neural networks will be dealt with in Chap. 6.
• Due to its special place and important role for autonomous robots, reinforcement learning
will be dealt in Chap. 7.
4
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 4
AI: Machine Learning
• Learning vocabulary of a foreign language, or technical terms, or even memorizing a
poem can be difficult for many people. For computers, however, these tasks are quite
simple because they are little more than saving text in a file.
• Thus, memorization is uninteresting for AI. In contrast, the acquisition of mathematical
skills is usually not done by memorization.
• For addition of natural numbers this is not at all possible, because for each of the terms
in the sum x + y there are infinitely many values. For each combination of the two values
x and y, the triple (x, y, x + y) would have to be stored, which is impossible.
• This poses the question: how do we learn mathematics? The answer reads: The teacher
explains the process and the students practice it on examples until they no longer make
mistakes on new examples. After 50 examples the student (hopefully) understands
addition.
5
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 5
AI: ML, Data Analysis
• A fruit farmer wants to automatically divide harvested apples into the merchandise classes
A and B. The sorting device is equipped with sensors to measure two features, size and
color, and then decide which of the two classes the apple belongs to. This is a typical
classification task. Systems which are capable of dividing feature vectors into a finite
number of classes are called classifiers.
• The size is given in the form of diameter in centimeters and the color by a numeric value
between 0 (for green) and 1 (for red). A visualization of the data is listed as points in a
scatterplot diagram in the figure.
6
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 6
AI: ML, Data Analysis
• BayWa company apple sorting equipment; apples classified into merchandise
classes A and B in feature space.
• The curve drawn in into the diagram divides the classes and can then be applied
to arbitrary new apples.
7
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 7
AI: ML, Data Analysis
• The task in machine learning consists of generating a function which calculates the class
value (A or B) for a new apple from the two features size and color.
8
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 8
AI: ML, Data Analysis
• Exercise 5.1: Generate a sample of 100 apples with randomly distributed size (3-8 cm)
and color (green-0 and red-1) in Matlab. Then plot them as shown below:
a. Draw a i) line ii) curve to segregate class A / B apples: (1,5) and (0.3,8)
b. Make class A apples red and class B apples blue.
c. Enter an apple (0.6,5) and (0.9,8) and automatically classify it as A /B.
9
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 9
AI: ML, Data Analysis
• It is clearly a more difficult, and above all much less visualizable task, when the objects
to be classified are described by not just two, but many features. In practice 30 or more
features are usually used.
• A “good” division means that the percentage of falsely classified objects is as small as
possible.
• Functional structure of a learning agent for apple sorting (left) and in general (right).
• The Learning Agent We can formally describe a learning agent as a function which
maps a feature vector to a discrete class value or in general to a real number.
10
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 10
AI: ML, Data Analysis
• This function is not programmed, rather it comes into existence or changes itself during
the learning phase, influenced by the training data.
• During learning, the agent is fed with the already classified data. Thereafter the agent
constitutes as good a mapping as possible from the feature vector to the function value
(e.g. merchandise class).
11
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 11
AI: ML, Data Analysis
12
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 12
AI: ML, Data Analysis
• Data Mining: The task of a learning machine to extract knowledge from training data.
Extract knowledge, trends from raw data.
• Variable agent (more precisely a class of agents): here we have to decide which
learning algorithm will be worked with. If this has been chosen, thus the class of all
learnable functions is determined.
• Training data (experience): the training data (sample) contain the knowledge which the
learning algorithm is supposed to extract and learn.
14
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 14
AI: ML, Data Analysis
• Test data: important for evaluating whether the trained agent can generalize well from
the training data to new data.
• Performance measure: for the apple sorting device, the number of correctly classified
apples. We need it to test the quality of an agent.
• Knowing the performance measure is usually much easier than knowing the agent’s function.
• For example, it is easy to measure the performance (time) of a 10,000 meter runner. However, this
does not at all imply that the referee who measures the time can run as fast.
• The referee only knows how to measure the performance, but not the “function” of the agent whose
performance he is measuring.
15
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 15
AI: ML, Data Analysis
• Challenges from electronic business and knowledge management:
• Based on the actions of a customer to a page, the owner of an Internet business would
like to create a relationship between the customer and products that interest that
customer (example, www.amazon.com).
16
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 16
AI: ML, Data Analysis
• Whenever large amounts of data are available, one can attempt to use these data for
the analysis of customer preferences in order to show customer-specific advertisements.
• The emerging field of preference learning is dedicated to this purpose.
17
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 17
AI: ML, Data Analysis
18
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 18
AI: ML, Data Analysis
19
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 19
AI: ML, Data Analysis
• The standard deviation si as a measure of its average deviation from the average value
as,
• Whether two variables xi and xj are statistically dependent (correlated) is important for the
analysis of multidimensional data
20
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 20
AI: ML, Data Analysis
21
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 21
AI: ML, Data Analysis
• In this sum, the summand returns a positive entry for the pth data vector exactly when the
deviations of the ith and jth components from the average both have the same sign.
• If they have different signs, then the entry is negative. Therefore, the covariance 𝜎12,13 of
the two different fever values should be positive.
• To be able to compare the degree of dependence in the case of multiple variables, we
therefore define the correlation coefficient,
• The matrix K of all correlation coefficients contains values between −1 and 1, is symmetric,
and all of its diagonal elements have the value 1.
22
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 22
AI: ML, Data Analysis
• The correlation matrix for all 16 variables is given below:
• The correlation matrix as a frequency graph. Here black means Kij ≈ 0 (uncorrelated) and
white|Kij| ≈ 1 (strongly correlated)
23
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 23
AI: ML, Data Analysis
• Exercise 5.3: Input the correlation data from the slide/book:
• Thus we can very quickly see which variables display a weak or strong dependence.
• We can see, for example, that the variables 7, 9, 10 and 14 show the strongest correlation
with the class variable appendicitis and therefore are more important for the diagnosis than
the other variable.
• We also see, however, that the variables 9 and 10 are strongly correlated.
• This could mean that one of these two values is potentially sufficient for the diagnosis.
25
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 25
AI: ML, Data Analysis
• Exercise 5.4: Gather 10-100 students data from your University:
a) Mean b) Standard Deviation of each variable/feature and the c) Co-relation Matrix and plot the
output. What do you learn from it?
26
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 26
AI: Machine Learning (ML)
and Data Mining (DM)
• 5.1 Data Analysis
• 5.2 Preceptor, Linear Classifier:
• 5.3 Nearest Neighbour Method:
• Two/Many Classes, Approximation.
• Distance Relevance, Computation Time.
• 5.4 Decision Tree Learning
• 5.5 Cross Validation and Overfitting
• 5.6 Clustering
• K-Means and EM Algorithm
• Hierarchical Clustering
• 5.7 Data Mining
• KNIME tool
27
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 27
AI: ML, Linear Classifier
• In the apple sorting classification example, a curved dividing line is drawn between the two
classes. A simpler case is shown below.
• Here the two-dimensional training examples can be separated by a straight line. We call
such a set of training data linearly separable.
• A linearly separable two-dimensional data set. The equation for the dividing straight line is:
28
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 28
AI: ML, Linear Classifier
• AND function is linearly separable, but the XOR function is not. AND, for example, the line
−x1 + 3/2 separates true and false interpretations.
• XOR function does not have a straight line of separation. Clearly the XOR function has a
more complex structure than the AND function in this regard.
29
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 29
AI: ML, Linear Classifier
30
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 30
AI: ML, Perceptron
• The perceptron is a very simple classification algorithm. It is equivalent to a two-layer neural
network with activation by a threshold function.
• Graphical representation of a perceptron as a two-layer neural network:
• A mathematical function which maps a feature vector to a function value. Here the input
variables xi are denoted features.
• We can see in the above formula, all points x above the hyperplane are classified as positive
[P(x)=1], and all others as negative [P(x)=0].
31
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 31
AI: ML, Perceptron
• The perceptron should output the value 1 for all x ∈ M+. This is true when w∙x > 0. If this is
not the case then x is added to the weight vector w, such that it moves in the right direction.
• If the above step is repeated enough, then at a point the value w*x will become positive.
• For negative training data, the perceptron calculates an ever smaller value which at some
point becomes negative,
32
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 32
AI: ML, Perceptron
• Exercise 5.5: A perceptron is to be trained on the sets M+ = {(0, 1.8), (2, 0.6)} and M−
={(−1.2, 1.4), (0.4, −1)}. w =(1, 1) was used as an initial weight vector. The training data
and the line defined by the weight vector w∙x = x1 + x2 = 0 are shown in figure.
33
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 33
AI: ML, Perceptron
• Exercise 5.5 (continue): Below is a Matlab program to optimize the weight for the points
over the iteration.
35
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 35
AI: ML, Perceptron
• Because w∙x = 0, this is orthogonal to the line. In the first iteration through the loop of the
learning algorithm, the only falsely classified training example is (−1.2, 1.4) because,
• This results in w = (1, 1) − (−1.2, 1.4) = (2.2, −0.4), as drawn in the second image in the
top row as illustrated in the figure. The other images show how, after a total of five changes,
the dividing line lies between the two classes.
36
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 36
AI: Machine Learning and
Data Mining
• 5.1 Data Analysis
• 5.2 Preceptor, Linear Classifier:
• 5.3 Nearest Neighbour Method:
• Two/Many Classes, Approximation.
• Distance Relevance, Computation Time.
• 5.4 Decision Tree Learning
• 5.5 Cross Validation and Overfitting
• 5.6 Clustering
• K-Means and EM Algorithm
• Hierarchical Clustering
• 5.7 Data Mining
• KNIME tool
37
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 37
AI: ML, Nearest Neighbor
• For a perceptron, knowledge available in the training data is extracted and saved in a
compressed form in the weights wi. Thereby information about the data is lost.
• However, if the system is supposed to generalize from the training data to new data. Find a
compact representation of data in the form of a function which classifies new data as good
as possible.
• Saved knowledge is not easily applicable to new, unknown examples. After sufficiently long
practice, the knowledge stored in training examples is transformed into an internal
representation in the brain.
38
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 38
AI: ML, Nearest Neighbor
• What does similarity mean in the formal context we are constructing? We represent the
training samples as usual in a multidimensional feature space and define:
• The smaller their distance in the feature space, the more two examples are similar.
• In this example with negative and positive training examples, the nearest neighbor method
groups the new point marked in black into negative class.
39
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 39
AI: ML, Nearest Neighbor
• In the previous figure, the next neighbor to the black point is a negative example. Thus it is
assigned to the negative class.
• The distance d(x, y) between two points can for example be
measured by the Euclidean distance,
• In many applications, certain features are more important than others. Therefore it is often
sensible to scale the features differently by weights wi. The formula then reads
40
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 40
AI: ML, Nearest Neighbor
• In contrast to the perceptron, the nearest neighbor method does not generate a line that
divides the training data points.
• An imaginary line separating the two classes certainly exists. We can generate this by first
generating the so-called Voronoi diagram.
• A set of points together with their Voronoi-Diagram (left) and the dividing line generated for
the two classes M+ and M−.
41
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 41
AI: ML, Nearest Neighbor
• The class membership will then be taken from the nearest neighbor.
• Nearest neighbor method is significantly more powerful than the perceptron. It is capable of
correctly realizing arbitrarily complex dividing lines (in general: hyperplanes).
• However, there is a danger here. A single erroneous point can in certain circumstances lead
to very bad classification results.
42
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 42
AI: ML, Nearest Neighbor
• To prevent false classifications due to single outliers, it is recommended to smooth out the
division surface somewhat.
• This can be accomplished by, for example, with the K-NEARESTNEIGHBOR algorithm in
figure, which makes a majority decision among the k nearest neighbors.
43
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 43
AI: ML, Nearest Neighbor
• We use modified training examples with n consecutive inverted bits each. Percentage of
correctly classified test examples is shown as a function of the number of inverted bits b.
• For up to eight inverted bits, all patterns are correctly identified. Past that point, the number
of errors quickly increases.
44
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 44
AI: ML, Nearest Neighbor
• The six patterns used for training. The whole right pattern is one of the 22 test patterns for
the first pattern with a sequence of four inverted bits.
• We now train a perceptron with a threshold on six simple, graphical binary patterns,
represented in figure below, with 5×5 pixels each.
• The training data can be learned by PERCEPTRONLEARNING in four iterations over all
patterns.
• Training pattern number 2 from figure class M+ has a hamming distance of 9 to the two
training examples, numbers 4 and 5 from the other class.
45
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 45
AI: ML, Nearest Neighbor
• Exercise 5.6: Code below generated two figures with letter patterns for U and T in Matlab.
46
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 46
AI: ML, Nearest Neighbor
• This means that the test pattern is very likely close to the patterns of the other class.
• Quite clearly we see that nearest neighbor classification (blue plot) is superior to the
perceptron (grey plot) in this application for up to eight false bits.
• Relative correctness of the perceptron as a function of the number of inverted bits in the test
data
47
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 47
AI: ML, Nearest Neighbor
• Nearest neighbor classification can also be applied to more than two classes.
• For the k nearest neighbor method, the class is to be determined as the class with the most
members among the k nearest neighbors.
• If the number of classes is large, then it usually no longer makes sense to use classification
algorithms because the size of the necessary training data grows quickly with the number of
classes.
• In certain circumstances important information is lost during classification of many classes
48
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 48
AI: ML, Nearest Neighbor
• Example: An autonomous robot (Braitenberg vehicles) is shown below:
• It is supposed to learn to move away from light. This means it should learn as optimally as
possible to map its sensor values onto a steering signal which controls the driving direction.
• The robot is equipped with two simple light sensors on its front side. The learning agent,
which is supposed to avoid light (left), represented as a classifier (middle), and as an
approximation (right).
49
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 49
AI: ML, Nearest Neighbor
• The two sensor signals (with sl for the left and sr for the right sensor), the relationship x =
sr/sl is calculated.
• To control the electric motors of the two wheels from this value x, the difference v = Ur − Ul
of the two voltages Ur and Ul of the left and right motors, respectively.
• The learning agent’s task is now to avoid a light signal. It must therefore learn a mapping f
which calculates the “correct” value v = f(x).
• We carry out an experiment in which, for a few measured values x, we find as optimal a
value v as we can.
• These values are plotted as data points in figure and shall serve as training data for the
learning agent.
50
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 50
AI: ML, Nearest Neighbor
• During nearest neighbor classification each point in the feature space is classified like its
nearest neighbor among the training data.
• The function for steering the motors is then a step function with large jumps (figure middle). If
we want finer steps, more training data is required.
• The k nearest neighbor method can be applied in a simple way to the approximation
problem. In the algorithm K-NEARESTNEIGHBOR, after the set V = {x1, x2, … , xk} is
determined, the k nearest neighbors average function value is,
• This can be calculated and taken as an approximation 𝑓 for the query vector x. The larger k
becomes, the smoother the function 𝑓 is.
51
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 51
AI: ML, Nearest Neighbor
• Practical application of discrete as well as continuous variants of the k nearest neighbor
method, problems often occur.
• As k becomes large, there typically exist more neighbors with a large distance than those
with a small distance.
• During the majority decision in the algorithm K-NEARESTNEIGHBOR, the “votes” are
weighted with the weight
• Which decreases with the square of the distance. The constant a determines the speed of
decrease of the weights. So the new equation is,
52
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 52
AI: ML, Nearest Neighbor
• This ensures that the influence of points asymptotically approaches zero as distance
increases.
53
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 53
AI: ML, Nearest Neighbor
• To get a feeling for these methods, the figure illustrating the k-nearest neighbor method (in
the upper row) is compared with its distance weighted optimization.
• Due to the averaging, both methods can generalize, or in other words, cancel out noise, if
the number of neighbors for k-nearest neighbor or the parameter a is set appropriately.
• The diagrams show nicely that the distance weighted method gives a much smoother
approximation than k-nearest neighbor.
• With respect to approximation quality, this very simple method can compete well with
sophisticated approximation algorithms such as nonlinear neural networks, support vector
machines, and Gaussian processes.
54
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 54
AI: ML, Nearest Neighbor
• Training is accomplished in all variants of the nearest neighbor method by simply saving all
training vectors together with their labels (class values), or the function value f(x).
• Thus there is no other learning algorithm that learns as quickly. However, answering a query
for classification or approximation of a vector x can become very expensive.
• Just finding the k nearest neighbors for n training data requires a cost which grows linearly
with n. For classification or approximation, there is additionally a cost which is linear in k.
• The total computation time thus grows as H(n + k). For large amounts of training data, this
can lead to problems.
55
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 55
AI: ML, Nearest Neighbor
• Exercise 5.7: Below is a Matlab code to create a 20 samples of “*-red” and “*-blue” group
each in the space [-0.5 1.5 0 1]. The “o-black” located at (0.5,0.5) needs to be classified into
either group using nearest neighbor.
a) How does rand() +0.5 and -0.5 create groups in left/right in line 5. How would you modify
the code to create groups up/down.
b) Explain how 10-13 finds the distance from target to each of the points.
c) Continue the code to classify “o-black” using (k =1) and d) (k =5). [Hint: use sort()]
56
AI: Machine Learning and
Data Mining
• 5.1 Data Analysis
• 5.2 Preceptor, Linear Classifier:
• 5.3 Nearest Neighbour Method:
• Two/Many Classes, Approximation.
• Distance Relevance, Computation Time.
• 5.4 Decision Tree Learning
• 5.5 Cross Validation and Overfitting
• 5.6 Clustering
• K-Means and EM Algorithm
• Hierarchical Clustering
• 5.7 Data Mining
• KNIME tool
57
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 57
AI: ML, Decision Tree Learning
58
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 58
AI: ML, Decision Tree Learning
• A devoted skier who lives near the high sierra, a beautiful mountain range in California,
wants a decision tree to help him decide whether it is worthwhile to drive his car to a ski
resort in the mountains.
• We thus have a two-class problem ski yes/no based on the variables listed
59
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 59
AI: ML, Decision Tree Learning
• First the attribute with the highest information gain (Snow_ Dist) is chosen for the root node
from the set of all attributes. For each attribute value (100, >100) there is a branch in the
tree. Now for every branch this process is repeated recursively.
• During generation of the nodes, the attribute with the highest information gain among the
attributes which have not yet been used is always chosen, in the spirit of a greedy strategy.
• If we look at the binary variable skiing in the above example, then D (set of training data)
can be described as,
60
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 60
AI: ML, Decision Tree Learning
61
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 61
AI: ML, Decision Tree Learning
• The entropy function (H) for the case of two classes. We see the maximum at p = 1/2 and
the symmetry with respect to swapping p and 1–p.
62
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 62
AI: Machine Learning and
Data Mining
• 5.1 Data Analysis
• 5.2 Preceptor, Linear Classifier:
• 5.3 Nearest Neighbour Method:
• Two/Many Classes, Approximation.
• Distance Relevance, Computation Time.
• 5.4 Decision Tree Learning
• 5.5 Cross Validation and Overfitting
• 5.6 Clustering
• K-Means and EM Algorithm
• Hierarchical Clustering
• 5.7 Data Mining
• KNIME tool
63
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 63
AI: ML, Overfitting
• Many learning algorithms have the problem of overfitting.
• Powerful learning algorithms, such as for example decision tree learning, can adapt the
complexity of the learned model to the complexity of the training data. This leads to
overfitting if there is noise in the data.
• For k nearest neighbor method the parameter is k, the number of nearest neighbors, while
for neural networks it is the number of hidden neurons.
64
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 64
AI: ML, Overfitting
• The whole data set X is divided into k equally sized blocks. Then the algorithm is trained k
times on k -1 blocks and tested on the remaining
• The k computed errors are averaged and the value with the smallest mean error is
chosen to train the final model on the whole data set X.
65
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 65
AI: ML, Overfitting
• Exercise 5.8: Below is a Matlab code to generate a sin() wave with frequency of 100 Hz.
a) How many samples are there in 1 time period; what is the size of vector t?
b) If you run the code again, will you expect to see the red * in the same y-location,
(above or below); what about x-location (left or right)? Explain.
c) Use the polyfit() function to fit y, yn; which gives more error, which is better?
66
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 66
AI: ML, Overfitting
• Illustrated by the example of function approximation using polynomials. If one chooses the
degree of the polymonial as the complexity parameter degree 1 approximates a line, which is
not a good approximation for non-trivial data.
• If, on the other hand, the polynomial degree equals n - 1 for n points, it tends to hit every
point and reduce the error on the training data to zero. where g(0, 0.2) generates normally
distributed noise with zero mean and a standard deviation of 0.2.
• Left image, in addition to the data points, the underlying sine function (black) is inscribed, as
well as a straight line (red) approximated using the method of least squares. In addition, a
seventh degree poly nomial (green) is interpolated through the points, hitting each point, as
well as a fourth degree polynomial (blue), which comes closest to the sine curve.
67
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 67
AI: Machine Learning and
Data Mining
• 5.1 Data Analysis
• 5.2 Preceptor, Linear Classifier:
• 5.3 Nearest Neighbour Method:
• Two/Many Classes, Approximation.
• Distance Relevance, Computation Time.
• 5.4 Decision Tree Learning
• 5.5 Cross Validation and Overfitting
• 5.6 Clustering
• K-Means and EM Algorithm
• Hierarchical Clustering
• 5.7 Data Mining
• KNIME tool
68
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 68
AI: ML, Clustering
• If we search in a search engine for the term “mars”, we will get results like “the planet mars”
and “Chocolate, confectionery and beverage conglomerate” which are semantically quite
different.
• Simple two-dimensional example with four clearly separated clusters:
• Google, for example, still lists the results in an unstructured way. It would be better if the
search engine separated the clusters and presented them to the user accordingly because
the user is usually interested in only one.
69
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 69
AI: ML, Clustering
• The distinction of clustering in contrast to supervised learning is that the training data are
unlabeled.
• Thus the pre-structuring of the data by the supervisor is missing. Rather, finding structures is
the whole point of clustering.
• In a cluster, the distance of neighboring points is typically smaller than the distance between
points of different clusters.
• Therefore the choice of a suitable distance metric for points, that is, for objects to be
grouped and for clusters, is of fundamental importance.
• The Euclidean distance d between two vectors x and y,
70
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 70
AI: ML, Clustering
• The top left diagram in
shows a set of 2-D data
points with four obvious
clusters.
• The OMRk algorithm was
run on this data with p =
30 and kmax = 9.
• The best partition together
with its quality S for each
k.
• The algorithm finds the
maximum value S = 0.786
at k= 5.
71
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 71
AI: ML, Clustering
• The EM algorithm, which can approximate the difference in point density by using a
normal distribution, performs significantly better. As shown in
• The EM algorithm finds almost exactly the same aforementioned natural distribution for k =
4.
72
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 72
AI: ML, Clustering
• Exercise 5.9: Below is a Matlab code that generates two clusters of size 20 points: blue -
middle and red – lower left.
a) Modify the code to generate another cluster in the upper right corner with green color. If
you want it to be concentrated, what should be done?
b) Randomly generate a number in the space [-5 5 -5 5] then appoint it to a cluster based
on the clusrteing given above.
73
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 73
AI: Machine Learning and
Data Mining
• 5.1 Data Analysis
• 5.2 Preceptor, Linear Classifier:
• 5.3 Nearest Neighbour Method:
• Two/Many Classes, Approximation.
• Distance Relevance, Computation Time.
• 5.4 Decision Tree Learning
• 5.5 Cross Validation and Overfitting
• 5.6 Clustering
• K-Means and EM Algorithm
• Hierarchical Clustering
• 5.7 Data Mining
• KNIME tool
74
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 74
AI: ML, Data Mining tool KNIME
75
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 75
AI: ML, Data Mining tool KNIME
76
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 76
AI: ML, Data Mining tool KNIME
• Exercise 5.10: Download the following software and
implement a data mining exercise / application:
• https://www.knime.com/data-mining
• https://www.knime.com/data-mining-software
77
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 77
AI: ML, Summary
• The following taxonomy gives an overview of the important learning algos:
• Supervised learning
• Lazy learning
– k nearest neighbor method (classification + approximation)
– Locally weighted regression (approximation)
– Case-based learning (classification + approximation)
• Eager learning
– Decision trees induction (classification)
– Learning of Bayesian networks (classification + approximation)
– Neural networks (classification + approximation)
– Gaussian processes (classification + approximation)
– Support vector machines
– Wavelets, splines, radial basis functions.
78
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 78
AI: ML, Summary
79
Au-VME CE4715 Artificial Intelligence; Chapter Five – Machine Learning and Data Mining 79
Solution to Exercises
Ch5 Machine Learning and Data Mining
• Exercise 5.3: Input the correlation data from the slide/book: