[go: up one dir, main page]

0% found this document useful (0 votes)
24 views4 pages

How

The document outlines the Apriori algorithm for identifying frequent itemsets and generating association rules from a dataset. It details the steps involved, including setting parameters, finding frequent itemsets, generating candidate itemsets, and calculating metrics like support, confidence, and lift. The document also provides examples of how to apply these concepts to derive meaningful association rules from transaction data.

Uploaded by

Istiak Utsab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views4 pages

How

The document outlines the Apriori algorithm for identifying frequent itemsets and generating association rules from a dataset. It details the steps involved, including setting parameters, finding frequent itemsets, generating candidate itemsets, and calculating metrics like support, confidence, and lift. The document also provides examples of how to apply these concepts to derive meaningful association rules from transaction data.

Uploaded by

Istiak Utsab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

How?

1. Identifying Frequent Itemsets: The algorithm begins by scanning the dataset to identify
individual items (1-item) and their frequencies. It then establishes a minimum support
threshold, which determines whether an itemset is considered frequent.

2. Creating Possible item group: Once frequent 1-itemgroup(single items) are identified,
the algorithm generates candidate 2-itemgroup by combining frequent items. This
process continues iteratively, forming larger itemsets (k-itemgroup) until no more
frequent itemgroup can be found.

3. Removing Infrequent Item groups: The algorithm employs a pruning technique based
on the Apriori Property, which states that if an itemset is infrequent, all its supersets
must also be infrequent. This significantly reduces the number of combinations that
need to be evaluated.

4. Generating Association Rules: After identifying frequent itemsets, the algorithm


generates association rules that illustrate how items relate to one another, using metrics
like support, confidence, and lift to evaluate the strength of these relationships.

Key Metrics of Apriori Algorithm

• Support: This metric measures how frequently an item appears in the dataset relative to
the total number of transactions. A higher support indicates a more significant presence
of the itemset in the dataset. Support tells us how often a particular item or
combination of items appears in all the transactions (“Bread is bought in 20% of all
transactions.”)

• Confidence: Confidence assesses the likelihood that an item Y is purchased when item X
is purchased. It provides insight into the strength of the association between two items.

• Confidence tells us how often items go together. (“If bread is bought, butter is bought
75% of the time.”)

• Lift: Lift evaluates how much more likely two items are to be purchased together
compared to being purchased independently. A lift greater than 1 suggests a strong
positive association. Lift shows how strong the connection is between items. (“Bread and
butter are much more likely to be bought together than by chance.”)
Step 1 : Setting the parameters

• Minimum Support Threshold: 50% (item must appear in at least 3/5 transactions). This
threeshold is formulated from this formula:

Support(A)=Number of transactions containing itemset A/Total number of transactions

• Minimum Confidence Threshold: 70% ( You can change the value of parameters as per
the usecase and problem statement ). This threeshold is formulated from this formula:

Confidence(X→Y)=Support(X∪Y)/Support(X)

Step 2: Find Frequent 1-Itemsets

Lets count how many transactions include each item in the dataset (calculating the frequency of
each item).

All items have support% ≥ 50%, so they qualify as frequent 1-itemsets. if any item has support%
< 50%, It will be ommited out from the frequent 1- itemsets.

Step 3: Generate Candidate 2-Itemsets

Combine the frequent 1-itemsets into pairs and calculate their support.

For this usecase, we will get 3 item pairs ( bread,butter) , (bread,ilk) and (butter,milk) and will
calculate the support similiar to step 2
Frequent 2-itemsets:

• {Bread, Butter}, {Bread, Milk} both meet the 50% threshold but {butter,milk} doesnt
meet the threeshold, so will be ommited out.

Step 4: Generate Candidate 3-Itemsets


• Combine the frequent 2-itemsets into groups of 3 and calculate their
support.
• for the triplet, we have only got one case i.e {bread,butter,milk} and
we will calculate the support.

Since this does not meet the 50% threshold, there are no frequent 3-itemsets.

Step 5: Generate Association Rules

Now we generate rules from the frequent itemsets and calculate confidence.

Rule 1: If Bread → Butter (if customer buys bread, the customer will buy butter also)

• Support of {Bread, Butter} = 3.

• Support of {Bread} = 4.

• Confidence = 3/4 = 75% (Passes threshold).

Rule 2: If Butter → Bread (if customer buys butter, the customer will buy bread also)

• Support of {Bread, Butter} = 3.

• Support of {Butter} = 3.

• Confidence = 3/3 = 100% (Passes threshold).

Rule 3: If Bread → Milk (if customer buys bread, the customer will buy milk also)
• Support of {Bread, Milk} = 3.

• Support of {Bread} = 4.

• Confidence = 3/4 = 75% (Passes threshold).

You might also like