4
4
QUESTION ONE
a) Define the following terms (4 Marks)
i. Data normalization
ii. Data binning
b) With the help of a diagram illustrate the Knowledge Discovery Process (8 Marks)
c) Giving examples, discuss the reasons why data cleaning stage in the KDD process is
necessary 3 Marks
d) Discuss any five desired features of cluster analysis algorithm. 5 Marks
e) A grocery shop sells six items which are Bread, Cheese, Eggs, Juice, Milk and
Yogurt. The shopkeeper also keeps a record of the transactions as follows.
TRANSACTION ID ITEMS
100 Bread, Cheese, Eggs, Juice
200 Bread, Cheese, Juice
300 Bread, Milk, Yogurt
400 Bread, Juice, Milk
500 Cheese, Juice, Milk
Using the improved naïve algorithm find the association rules with 50% and 75%
confidence. (10 Marks)
1
QUESTION TWO
a) Discuss four issues that are as a result of data mining and explain how to overcome
them. (4 Marks)
QUESTION THREE
a) Define the following terms (2 Marks)
i. Data warehousing
ii. Data mining-
b) Using two items X and Y, define the following terms. (2 Marks)
i. Support
ii. Confidence
c) In the context of association rules mining, describe the following terms (2 Marks)
i. Frequent item-sets
ii. Confident rules
d) Consider a retail shop with the following set of transactions
2
Using the improved Apriori algorithm find the association rules with minimum
support of 22% and 70% confidence. (14 Marks)
QUESTION FOUR
a) Discuss four types of distances in clustering (4 Marks)
b) Using appropriate examples, describe the following types of data (4 Marks)
i. Ordinal data
ii. Nominal data
c) Discuss four ways in which the data that has been mined can be visually presented to
the user. (4 Marks)
d) Describe four categories of data mining systems showing the basis for the
categorization for each. (4 Marks)
e) Discuss four applications of data mining in real life (4 Marks)
QUESTION FIVE
The table below contains the training data is used to classify animals. Read it and answer
the questions that follow.