[go: up one dir, main page]

0% found this document useful (0 votes)
18 views3 pages

4

Uploaded by

idemba90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views3 pages

4

Uploaded by

idemba90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

UNIVERSITY EXAMINATIONS: 2012/2013

THIRD YEAR EXAMINATION FOR THE BACHELOR OF


SCIENCE IN INFORMATION TECHNOLOGY
BIT 4204 DATA WAREHOUSING AND DATA MINING

DATE: DECEMBER, 2012 TIME: 2 HOURS


INSTRUCTIONS: Answer Question ONE and any other TWO

QUESTION ONE
a) Define the following terms (4 Marks)
i. Data normalization
ii. Data binning
b) With the help of a diagram illustrate the Knowledge Discovery Process (8 Marks)
c) Giving examples, discuss the reasons why data cleaning stage in the KDD process is
necessary 3 Marks
d) Discuss any five desired features of cluster analysis algorithm. 5 Marks
e) A grocery shop sells six items which are Bread, Cheese, Eggs, Juice, Milk and
Yogurt. The shopkeeper also keeps a record of the transactions as follows.
TRANSACTION ID ITEMS
100 Bread, Cheese, Eggs, Juice
200 Bread, Cheese, Juice
300 Bread, Milk, Yogurt
400 Bread, Juice, Milk
500 Cheese, Juice, Milk

Using the improved naïve algorithm find the association rules with 50% and 75%
confidence. (10 Marks)

1
QUESTION TWO
a) Discuss four issues that are as a result of data mining and explain how to overcome
them. (4 Marks)

b) Discuss five characteristics of OLAP (5 Marks)


c) Discuss six differences between OLAP and OLTP systems (6 Marks)
d) Discuss any five factors that you would consider when selection and acquiring a data
mining software. (5 Marks)

QUESTION THREE
a) Define the following terms (2 Marks)
i. Data warehousing
ii. Data mining-
b) Using two items X and Y, define the following terms. (2 Marks)
i. Support
ii. Confidence
c) In the context of association rules mining, describe the following terms (2 Marks)
i. Frequent item-sets
ii. Confident rules
d) Consider a retail shop with the following set of transactions

TID List of items


100 11, 12, 15
200 12, 14
300 12, 13
400 11, 12, 14
500 11, 13
600 12, 13
700 11, 13
800 11, 12, 13, 15
900 11, 12, 13

2
Using the improved Apriori algorithm find the association rules with minimum
support of 22% and 70% confidence. (14 Marks)

QUESTION FOUR
a) Discuss four types of distances in clustering (4 Marks)
b) Using appropriate examples, describe the following types of data (4 Marks)
i. Ordinal data
ii. Nominal data
c) Discuss four ways in which the data that has been mined can be visually presented to
the user. (4 Marks)
d) Describe four categories of data mining systems showing the basis for the
categorization for each. (4 Marks)
e) Discuss four applications of data mining in real life (4 Marks)

QUESTION FIVE
The table below contains the training data is used to classify animals. Read it and answer
the questions that follow.

Name Eggs Pouch Flies Feathers Class


Cockatoo Yes No Yes Yes Bird
Dugong No No No No Mammal
Echidna Yes Yes No No Marsupial
Emu Yes No No Yes Bird
Kangaroo No Yes No No Marsupial
Koala No Yes No No Marsupial
Kookaburra Yes No Yes Yes Bird
Owl Yes No Yes Yes Bird
Penguin Yes No No Yes Bird
Platypus Yes No No No Mammal
a) Using the split algorithm, find the attribute that has the highest information gain.
(16 Marks)
b) Draw the decision tree for the table above (4 Marks)

You might also like