Compact Representation of Frequent Itemsets
Compact Representation of Frequent Itemsets
The number of frequent itemsets generated by the Apriori algorithm can often be very large, so
it is beneficial to identify a small representative set from which every frequent itemset can be
derived. There are two types of approaches we have. One such approach is using maximal
frequent itemsets. And another one is closed frequent item set.
A maximal frequent itemset is a frequent itemset for which none of its immediate supersets
are frequent. To illustrate this concept, consider the example given below:
The support counts are shown on the top left of each node. Assume support count threshold
= 50%, that is, each item must occur in 2 or more transactions. Based on that threshold, the
frequent itemsets are a, b, c, d, ab, ac, and ad (shaded nodes).
Out of these 7 frequent itemsets, 3 are identified as maximal frequent (having red
outline):
ab: Immediate supersets abc and abd are infrequent.
ac: Immediate supersets abc and acd are infrequent.
ad: Immediate supersets abd and bcd are infrequent.
The remaining 4 frequent nodes (a, b, c, and d) cannot be maximal frequent because they all
have at least 1 immediate superset that is frequent.
Advantage: Maximal frequent itemsets provide a compact representation of all the frequent
itemsets for a particular dataset. In the above example, all frequent itemsets are subsets of the
maximal frequent itemsets, since we can obtain sets a, b, c, and d by enumerating subsets of ab,
ac, and ad (including the maximal frequent itemsets themselves).
Disadvantage: The support count of maximal frequent itemsets does not provide any
information about the support count of their subsets. This means that an additional traversal of
data is needed to determine the support count for non-maximal frequent itemsets, which may
be undesirable in certain cases.
It is a frequent itemset that is both closed and its support is greater than or equal to minsup. An
itemset is closed in a data set if there exists no superset that has the same support count as this
original itemset.
In the above example Closed item sets are a, ab, ac, ad, abc, abd.
Closed frequency item set is{ a, ab, ac, ad, abc, abd }