4.
Apriori Algorithm
1
Recap: Steps for Finding Frequent Itemsets
1 Prepare data and set minsup
2 Create a list of frequent itemsets (support ≥ minsup) of length 1
3 Create a list of itemsets of length 2 by combining the frequent itemsets of length 1
4 Prune itemsets whose support is less than minsup
5 Create a list of itemsets of length 3 from the pruned list
6 Prune itemsets whose support is less than minsup
• In the following, lengthen the itemsets and check whether “support ≥ minsup.”
• Stop the process when you cannot create a list of frequent itemset.
2
Recap: Association Rule Selection
• Step 1. Generate rules from frequent itemsets
• Step 2. Select rules: Confidence ≥ minconf
• Step 3. Select rules: Lift > 1.0
3
Frequent Itemsets
# Import Apriori algorithm
from mlxtend.frequent_patterns import apriori
# Compute frequent itemsets
frequent_itemsets = apriori(onehot, min_support = 0.0005,
max_len = 4, use_colnames = True)
# Print number of itemsets
print(len(frequent_itemsets))
19788
4
# Print frequent itemsets
print(frequent_itemsets.head())
support itemsets
0 0.020397 (almonds)
1 0.008932 (antioxydant
2 0.004666 juice)
3 0.033329 (asparagus)
4 0.004533 (avocado)
(babies food)
5
Computing Association Rule
# Import association rules
from mlxtend.frequent_patterns import association_rules
# Compute association rules
Rules = association_rules(frequent_itemsets,
metric = "support",
min_threshold = 0.005)
Rules
6
antecedents consequents antecedent consequent support confidence lift
support support
0 (almonds) (burgers) 0.020397 0.087188 0.005199 0.254902 2.923577
1 (burgers) (almonds) 0.087188 0.020397 0.005199 0.059633 2.923577
2 (almonds) (chcolate) 0.020397 0.163845 0.005999 0.294118 1.795099
3 (chcolate) (almonds) 0.163845 0.020397 0.005999 0.036615 1.795099
4 (almonds) (eggs) 0.020397 0.179709 0.006532 0.320261 1.782108
… … … … … … … …
1935 (spaghetti, olive oil) (pancakes) 0.022930 0.095054 0.005066 0.220930 2.324260
1936 (pancakes, olive oil) (spaghetti) 0.010799 0.174110 0.005066 0.469136 2.694478
1937 (spaghetti) (pancakes, olive oil) 0.174110 0.010799 0.005066 0.029096 2.694478
1938 (pancakes) (spaghetti, olive oil) 0.095054 0.022930 0.005066 0.053296 2.324260
1939 (olive oil) (spaghetti, pancakes) 0.065858 0.025197 0.005066 0.076923 3.052910
7
filtered_rules = Rules[(Rules['antecedent support'] > 0.01) &
(Rules['support'] > 0.009) &
(Rules['confidence'] > 0.5) &
(Rules['lift'] > 1.00)]
filtered_rules
antecedents consequents antecedent support consequent support
1406 (ground beef, eggs) (mineral water) 0.019997 0.238368
1593 (ground beef, frozen vegetables) (mineral water) 0.016931 0.238368
1737 (ground beef milk) (mineral water) 0.021997 0.238368
support confidence lift
0.010132 0.506667 0.005365
0.009199 0.543307 0.005163
0.011065 0.503030 0.005822