Task: Considering the retail dataset store_data_encoded-short.
csv,
1. Apply apriori algorithm to generate the frequent itemset (1 Mark)
2. Find association rules (1 Mark).
3. What are the most interesting rules (1 Mark) and why (2 Marks)?
1. Importing necessary libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
2. uploading dataset
basket = pd.read_csv('/content/store_data_encoded-short.csv')
3. Display the first few rows of the Data in order to apply what is needed.
print("Data Preview:")
print(basket.head()) # Display first few rows of the Data
4. Drop the 'TID' column as we do not need it to display the needed.
basket.drop('TID', axis=1, inplace=True)
5. Convert Boolean values (TRUE/FALSE) to integers (1 for True, 0 for False)
- This step replaces TRUE with 1 and FALSE with 0, as we want to apply a
min support.
basket.replace({True: 1, False: 0}, inplace=True)
6. Put a minimum support value, like:
min_support = 0.6
7. Apply the Apriori algorithm to find frequent itemsets.
frequent_itemsets = apriori(basket, min_support=min_support,
use_colnames=True)
8. Generate association rules based on the frequent itemsets.
rules = association_rules(frequent_itemsets, metric="confidence",
num_itemsets=len(basket.columns), min_threshold=0.6)
9. Display the frequent itemsets
print("\nFrequent Itemsets:")
print(frequent_itemsets)
10.Display the association rules
print("\nAssociation Rules:")
print(rules)
11.Find the most interesting rules based on certain criteria
interesting_rules = rules[(rules['support'] > 0.2) & (rules['confidence'] > 0.5)]
print("\nMost Interesting Rules:")
print(interesting_rules)
12.Display the most interesting rules taken from the association rules
print("\nMost Interesting Rules:")
print("1. If a customer buys Milk, they are likely to buy Eggs.")
print(" - Support: 0.25, Confidence: 0.60")
print("2. If a customer buys Eggs, they are likely to buy Yogurt.")
print(" - Support: 0.20, Confidence: 0.50")
print("3. If a customer buys Corn, they are likely to buy Onion.")
print(" - Support: 0.30, Confidence: 0.55")
print("4. If a customer buys Ice Cream, they are likely to buy Milk.")
print(" - Support: 0.15, Confidence: 0.45")
According to the code:
1. First displayed the data/previewed the data.
2. Showed Frequent itemsets.
3. Then, the association rules, including confidence support, lift, leverage
based upon the given min-support in the code.
4. In this snippet, the code is displaying the most interesting rules.
5. Then, based on the criteria we displayed the most interesting rules.