Unit 4
Unit 4
INTRODUCTION
j) Apriori principle: If an itemset is frequent then, all its subsets are also
frequent. Say for example, {c, d, e} is frequent itemset, then its subsets
{c}, {d}, {e},{cd}, {de},{ce} should also be frequent.
Que 3: Explain Apriori algorithm in detail with
example.
(Or)
How to find frequent item sets using candidate
generation?
Once frequent itemsets are generated, then strong association rule can be
generated by using confidence of the rule.
Where,
support_count(AUB): Number of transactions containing both A & B together.
support_count(A): Number of transactions containing A.
Association rule generation is a two step process.
1) Generate all non empty subsets of a frequent itemset
2) For every subset calculate and find
su𝐩𝐩𝐨𝐫𝐭_𝐜𝐨𝐮𝐧𝐭 𝐨𝐟 𝐟𝐫𝐞𝐪𝐮𝐞𝐧𝐭 𝐢𝐭𝐞𝐦𝐬𝐞𝐭
> 𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒
𝐬𝐮𝐩𝐩𝐨𝐫𝐭_𝐜𝐨𝐮𝐧𝐭 𝐨𝐟 𝐬𝐮𝐛𝐬𝐞𝐭
For example,
Consider a frequent itemset {I1, I2, I5}
possible subsets are:
{I1, I2}, {I1, I5}, {I2, I5}, {I1}, {I2}, {I5} and the
possible association rules are:
If, Minimum confidence is say, 70%, then 2nd, 3rd and 6th association rules will
be retained.
Que 5: What are the procedures for generating
candidates in apriori algorithm?
(Or)
What are the techniques used in apriori_gen?
Ans:
Brute force method: All possible K- Itemsets are generated from k-1th item
sets and infrequent k-item candidates are deleted.
Fk-1 * FK method: As the name suggests, frequent ‘K-1’ item set is combined
with 1 itemset to generate ‘K’ item-candidate set.
Fk-1 *Fk-1 method: Frequent K-1th itemset is combined with itself to
generate Ck.
In the above example two item sets are combined if starting elements in
itemset are same and only last element in both item sets are different.
Que 6: What are the methods for generating support
counts for candidates?
(Or)
Discuss about support counting using Hash Tree.
• In level 1
o The itemsets containing 1, 4 and 7 as first elements are mapped to
left side of the tree.
o The itemsets containing 2, 5, and 8 as first elements are mapped
to middle of the tree.
o The itemsets containing 3, 6, and 9 as first element are mapped to
right side of the tree.
• In level 2
o Consider the leftmost sub tree, and apply the same hash function
applied at level one.
o The itemsets containing 1, 4 and 7 as second elements are mapped
to left side of the tree.
o The itemsets containing 2, 5, and 8 as second elements are
mapped to middle of the tree.
o The itemsets containing 3, 6, and 9 as second element are mapped
to right side of the tree.
Repeat the same process for every sub tree until 3 itemsets are generated.
Now, the candidate itemsets stored at leaf node are compared against each
transaction. If a candidate is subset of transaction, its support count is
incremented.
null
A B C D E
AB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD
E
In the above figure, consider ‘AD’ as frequent itemset, then if all supersets
containing ‘AD’ are infrequent then ‘AD’ is called maximal frequent itemset.
Example of FP Growth
• We first consider I5, which is the last item in L, rather than the first.
I5 occurs in two branches of the FP-tree. The paths formed by these
branches are {(I2, I1, I5: 1) and (I2, I1, I3, I5: 1)}. Therefore,
considering I5 as a suffix, its corresponding two prefix paths are (I2,
I1: 1) and (I2, I1, I3: 1), which form its conditional pattern base. Its
conditional FP-tree contains only a single path, {I2: 2, I1: 2}; I3 is not
included because its support count of 1 is less than the minimum
support count. The single path generates all the combinations of
frequent patterns: {I2, I5: 2}, {I1, I5: 2}, {I2, I1, I5: 2}.
• For I4, its two prefix paths form the conditional pattern base, {{I2 I1:
1}, {I2: 1}}, which generates a single-node conditional FP-tree, {I2: 2},
and derives one frequent pattern, {I2, I4: 2}.
• Similar to the above analysis, I3’s conditional pattern base is {{I2, I1:
2}, {I2: 2}, {I1: 2}}. its conditional FP-tree has two branches, (I2: 4, I1:
2) and (I1: 2), as shown in Figure, which generates the set of patterns,
{{I2, I3: 4}, {I1, I3: 4}, {I2, I1, I3: 2}}.