0% found this document useful (0 votes)

44 views21 pages

Unit 4

Uploaded by

sanjayyalla4661

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views21 pages

Unit 4

Uploaded by

sanjayyalla4661

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

UNIT 4

Association Analysis: Basic Concepts and Algorithms: Problem Definition, Frequent

Item Set Generation, Apriori Principle, Apriori Algorithm, Rule Generation, Compact
Representation of Frequent Itemsets, FP- Growth Algorithm. (Tan & Vipin)

INTRODUCTION

Que 1: What is association analysis?

Association analysis is used to discover interesting relationships hidden in
large data sets.
The relationship hidden between data elements are represented in the form of
association rules.

Consider the dataset above the corresponding association rule is:

{milk} →{Bread} [s=2%,c=50%]
This type of analysis is called market basket analysis.

Applications of Association Analysis:

1) Market basket analysis: Buying trends of customers can be analyzed
and retailers can make an offer between set of items to increase sales.
2) Earth science: Association between ocean, land, pollution and effect on
temperature can be analyzed.
3) Medical diagnosis: Relation between food habits, exercise and decease
can be analyzed.
Que 2: Write short note on:
a) Binary representation: f) Confidence
b) Item set g) Frequent Itemset
c) Transaction width h) Anti monotone property.
d) support count i) Support based pruning
e) Support j) Apriori principle

a) Binary representation of association data:

Each transaction is represented as ‘1’s and’0’s. Where ‘1’

represents presence of an item in the transaction and ‘0’ represent
absence of an item in the transaction.

b) Item set: Consider I be set of items. In association rule, collection of

item(s) forms an item-set.
{ }→ Null item set
{brush, paste)→ 2 Itemset

c) Transaction width: Number of items in itemset is called transaction

width.

d) Support count: Number of times the itemset is found in the transaction

set. (Or) Number of transactions containing the itemset.
Consider the table below,
Support count of itemset {Bread, Milk} is 3.
Because, {bread, Milk} is present in 3 transactions (1, 4 and 5).

e) Support: Support represents how often a rule is applicable to a dataset.

Where X and Y are items and N represents total number of transactions.

Means, Out of all transactions (N) how many transactions contains both
X and Y.

f) Confidence: Confidence represents, how often Y appear in transaction

containing X.

Out of all transactions containing X, how many transactions contains Y.

g) Frequent itemset: An itemset is called frequent itemset if has minimum

support and minimum confidence defined by the user.

h) Anti monotone property: Support of an itemset never exceeds its

subset. For example, {brush, toothpaste} may have support count of 5.
The item set (brush, toothpaste, bread} will never exceed 5.

i) Support based pruning: If an itemset is infrequent then all supersets

containing the itemsets should also be infrequent.
If an item set {ab} is infrequent itemset, then all supersets containing
both ‘a’ and ‘b’ are also infrequent and can be pruned(removed).

j) Apriori principle: If an itemset is frequent then, all its subsets are also
frequent. Say for example, {c, d, e} is frequent itemset, then its subsets
{c}, {d}, {e},{cd}, {de},{ce} should also be frequent.
Que 3: Explain Apriori algorithm in detail with
example.
(Or)
How to find frequent item sets using candidate
generation?

In the above algorithm, Ct= candidates in Ck present in

K= Size of itemset transaction data.
Fk= Frequent kth item set. c= each candidate (or) each itemset in
I= Set of items ( In below example, I={ the candidate set.
I1,I2,I3,I4,I5}) c = Support count of each candidate (or)
i= Each element in I Itemset.
{i}= Support count of items Apriori_gen= A function call to generate
Ck= Candidate set candidate sets. (For example 2 item
T= Transaction database candidate set can be generated from one
t= individual transaction item frequent set).
Algorithm Explanation Step by step: //Optional
1. K is the size of itemset. Its value is initially set to 1 because 1-itemset
should be generated first.
2. The support count of 1-itemset should be derived and the itemsets
satisfying minimum support are kept and rests of them are deleted (F1
is generated here).
a. Note: Here I is set of items ( In below example, I={ I1,I2,I3,I4,I5})
b. And i is individual item
c. {i} is support count of items ( Number of times the item i is found in our
dataset)
3. Repeat the process
4. Increment value of K to generate k+1th candidate set
5. After generating 1- item frequent set F1, in step 2, Generate 2-Item
candidate set using function call apriori_gen on F1 .
2- Item candidate set C2 is generated from 1-item frequent set using one of the
following approaches.
i. Brute force approach
ii. Fk-1*F1 approach
iii. Fk-1 *Fk-1 approach
6. Consider each individual transaction (t) from our transaction
dataset(T) and
7. Use function call ‘subset’ on Ck to generate itemsets present in our
Transaction dataset. Let these itemsets be Ct.
Note: delete itemsets from 2 item candidate sets whose support count is zero.
And call rest of them as Ct
8. Out of all individual itemsets ‘c’ in ‘Ct’
9. Find the support count of each candidate c in Ct. By scanning each
transaction t and incrementing support count. c = c+1
10. End loop
11. End loop
12. Repeat step 4 to step 9 until biggest frequent item set is
generated.
a. Support count for each candidate c is measured ( c). And candidates’ c whose
support count is greater than minimum support count is retained. This forms F k
13. This process is repeated until support count of all itemsets fall
below minimum support count.
14. At the end, we have biggest frequent item sets.
//
Example of apriori algorithm
Consider a Transaction dataset T

Apply apriori algorithm

Que 4: How to generate association rules from
frequent item sets in apriori. (Or)
What is confidence based pruning?

Once frequent itemsets are generated, then strong association rule can be
generated by using confidence of the rule.

Where,
support_count(AUB): Number of transactions containing both A & B together.
support_count(A): Number of transactions containing A.
Association rule generation is a two step process.
1) Generate all non empty subsets of a frequent itemset
2) For every subset calculate and find
su𝐩𝐩𝐨𝐫𝐭_𝐜𝐨𝐮𝐧𝐭 𝐨𝐟 𝐟𝐫𝐞𝐪𝐮𝐞𝐧𝐭 𝐢𝐭𝐞𝐦𝐬𝐞𝐭
> 𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒
𝐬𝐮𝐩𝐩𝐨𝐫𝐭_𝐜𝐨𝐮𝐧𝐭 𝐨𝐟 𝐬𝐮𝐛𝐬𝐞𝐭

For example,
Consider a frequent itemset {I1, I2, I5}
possible subsets are:
{I1, I2}, {I1, I5}, {I2, I5}, {I1}, {I2}, {I5} and the
possible association rules are:

Here, I1^I2 =>I5 Means,

Number of transaction containing I1,I2,I5
=2/4=50%
Number of transactions containing I1,I2

If, Minimum confidence is say, 70%, then 2nd, 3rd and 6th association rules will
be retained.
Que 5: What are the procedures for generating
candidates in apriori algorithm?
(Or)
What are the techniques used in apriori_gen?
Ans:
Brute force method: All possible K- Itemsets are generated from k-1th item
sets and infrequent k-item candidates are deleted.

Fk-1 * FK method: As the name suggests, frequent ‘K-1’ item set is combined
with 1 itemset to generate ‘K’ item-candidate set.
Fk-1 *Fk-1 method: Frequent K-1th itemset is combined with itself to
generate Ck.

In the above example two item sets are combined if starting elements in
itemset are same and only last element in both item sets are different.
Que 6: What are the methods for generating support
counts for candidates?
(Or)
Discuss about support counting using Hash Tree.

3 methods for generating support counts for candidates.

1) Compare itemsets against each transaction
2) Enumerate the itemsets
3) Support counting using a hash tree.

1) Compare itemsets against each transaction: Compare itemsets against

each transaction and increment support count of the candidates
contained in the transaction.

2) Enumerate the itemsets:

Arrange elements in increasing order of their occurrence in the transaction

database.
In level 1, Take 1, 2, and 3 as prefix. And in level 2, take each set from level
one and take the next elements are prefixes. This process repeats until 3
itemsets are generated.
3) Support counting using a hash tree:

Here, Items are stored in different buckets called as hash tables.

In hash tree, user needs to specify two things:
1) The hash function: here it is h(p)=p mod 3
2) Max leaf size: max number of itemsets stored in a leaf node (if number of
candidate itemsets exceeds max leaf size, split the node)
For example,
1 mod 3=1 2 mod 3=2 3 mod 3=0
4 mod 3=1 5 mod 3=2 6 mod 3=0
7 mod 3=1 8 mod 3=2 9 mod 3=0

• In level 1
o The itemsets containing 1, 4 and 7 as first elements are mapped to
left side of the tree.
o The itemsets containing 2, 5, and 8 as first elements are mapped
to middle of the tree.
o The itemsets containing 3, 6, and 9 as first element are mapped to
right side of the tree.

• In level 2
o Consider the leftmost sub tree, and apply the same hash function
applied at level one.
o The itemsets containing 1, 4 and 7 as second elements are mapped
to left side of the tree.
o The itemsets containing 2, 5, and 8 as second elements are
mapped to middle of the tree.
o The itemsets containing 3, 6, and 9 as second element are mapped
to right side of the tree.
Repeat the same process for every sub tree until 3 itemsets are generated.
Now, the candidate itemsets stored at leaf node are compared against each
transaction. If a candidate is subset of transaction, its support count is
incremented.

Que 7: Write the computational complexity of apriori

algorithm.
• Choice of minimum support threshold
– lowering support threshold results in more frequent itemsets
– this may increase number of candidates and max length of
frequent itemsets
• Dimensionality (number of items) of the data set
– more space is needed to store support count of each item
– if number of frequent items also increases, both computation and
I/O costs may also increase
• Size of database
– since Apriori makes multiple passes, run time of algorithm may
increase with number of transactions
• Average transaction width
– transaction width increases with denser data sets
– This may increase max length of frequent itemsets and traversals
of hash tree (number of subsets in a transaction increases with its
width)

Que 8: What are the advantages and disadvantages of

Apriori algorithm?
Disadvantages of Apriori
• The candidate generation could be extremely slow (pairs, triplets,
etc.).
• The counting method iterates through all of the transactions
each time.
• Constant items make the algorithm a lot heavier.
• Huge memory consumption
Advantages of Apriori
The Apriori Algorithm calculates more sets of frequent items.
Que 9: Write a short note on:
1) Maximal frequent itemset
2) Closed itemset
3) Closed frequent itemset
Maximal frequent itemset: An itemset is maximal frequent if none of its
immediate supersets is frequent.

null

A B C D E

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCD
E

In the above figure, consider ‘AD’ as frequent itemset, then if all supersets
containing ‘AD’ are infrequent then ‘AD’ is called maximal frequent itemset.

Closed itemset: An itemset is closed if none of its immediate supersets

has the same support as the itemset
Itemset Support
{A} 4
{B} 5
{C} 3
{D} 4
{A,B} 4
{A,C} 2
{A,D} 3
{B,C} 3
{B,D} 4
{C,D} 3
• Consider itemset {A},
o {A} has the support count of 4
o And, {A,B} has also the support count of 4
o So, {A} is not a closed itemset.

Closed frequent itemset: If an itemset is closed and its support count is

greater than min_Sup, then it is called closed frequent item set.
Que: How to generate frequent itemsets without
candidates?
(Or)
Explain FP Growth algorithm

Apriori uses a generate-and-test approach – generates candidate itemsets and

tests if they are frequent
1. Generation of candidate itemsets is expensive (in both space and time.
2. Support counting is expensive
• Subset checking (computationally expensive)
• Multiple Database scans (I/O)
FP-Growth: allows frequent itemset discovery without candidate itemset
generation. This is a two step approach:
• Build a compact data structure called the FP-tree
• Extracts frequent itemsets directly from the FP-tree

Algorithm for FP-Tree

Step 1:
• Scan DB once, find frequent 1-itemset
• Sort frequent items in frequency descending order, f-list
• Scan DB again, construct FP-tree
o FP-Growth reads 1 transaction at a time and maps it to a path
o Fixed order is used, so paths can overlap when transactions
share items
Step 2:
• Generating conditional pattern base
o Starting at the frequent item header table in the FP-tree
o Traverse the FP-tree by following the link of each frequent item
o Accumulate all of transformed prefix paths of the item to form
its conditional pattern base
• For each conditional pattern-base
o Accumulate the count for each item in the conditional pattern
base
o Construct the FP-tree for the frequent items of the pattern base
• Frequent itemsets is, all the combinations of its sub-paths, each of
which is a frequent pattern

Example of FP Growth

The first scan of the database is the same as

Apriori, which derives the set of frequent items
(1-itemsets) and their support counts
(frequencies). Let the minimum support count
be 2. The set of frequent items is sorted in the
order of descending support count. This
resulting set or list is denoted L. Thus, we have
L = {{I2: 7}, {I1: 6}, {I3: 6}, {I4: 2}, {I5: 2}}.
Mining of the FP-tree is summarized in the above table.
//Optional

• We first consider I5, which is the last item in L, rather than the first.
I5 occurs in two branches of the FP-tree. The paths formed by these
branches are {(I2, I1, I5: 1) and (I2, I1, I3, I5: 1)}. Therefore,
considering I5 as a suffix, its corresponding two prefix paths are (I2,
I1: 1) and (I2, I1, I3: 1), which form its conditional pattern base. Its
conditional FP-tree contains only a single path, {I2: 2, I1: 2}; I3 is not
included because its support count of 1 is less than the minimum
support count. The single path generates all the combinations of
frequent patterns: {I2, I5: 2}, {I1, I5: 2}, {I2, I1, I5: 2}.

• For I4, its two prefix paths form the conditional pattern base, {{I2 I1:
1}, {I2: 1}}, which generates a single-node conditional FP-tree, {I2: 2},
and derives one frequent pattern, {I2, I4: 2}.

• Similar to the above analysis, I3’s conditional pattern base is {{I2, I1:
2}, {I2: 2}, {I1: 2}}. its conditional FP-tree has two branches, (I2: 4, I1:
2) and (I1: 2), as shown in Figure, which generates the set of patterns,
{{I2, I3: 4}, {I1, I3: 4}, {I2, I1, I3: 2}}.

• Finally, I1’s conditional pattern base is {{I2: 4}}, whose FP-tree

contains only one node, (I2: 4), which generates one frequent pattern,
{I2, I1: 4}.
Another version of algorithm (As in Kamber text book, students can learn
any one of the two algorithm)
Que: What are the advantages and disadvantages of FP-
Tree?
Advantages:
• no candidate generation, no candidate test
• compressed database: FP-tree structure
• no repeated scan of entire database
• basic ops—counting local freq items and building sub FP-tree, no
pattern search and matching
• leads to focused search of smaller databases
Disadvantage:
Parallelization of FP Growth technique may end with bottlenecks in shared
memory.

Que: Compare and contrast Apriori and FP Growth.

Algorithm Technique Runtime Memory Parallelizability
usage
Apriori Generate Candidate Saves Candidate
singletons, generation is singletons, generation is
pairs, extremely slow. pairs, triplets, very
triplets, etc. Runtime increases etc. parallelizable
exponentially
depending on the
number of different
items.
FP- Insert sorted Runtime increases Stores a Data are very
Growth items by linearly, depending compact inter-
frequency on the number of version of the dependent,
into a transactions and database. each node
pattern tree items needs the root.

Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Data Mining Techniques & Applications: Association Rules
No ratings yet
Data Mining Techniques & Applications: Association Rules
50 pages
DMDW 3rd Module
No ratings yet
DMDW 3rd Module
34 pages
Association Analysis: Basic Concepts and Algorithms: Problem Definition
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Problem Definition
15 pages
UNIT-3 DM
No ratings yet
UNIT-3 DM
9 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
Dm&bi - L10-Association Rules
No ratings yet
Dm&bi - L10-Association Rules
43 pages
Mod 5
No ratings yet
Mod 5
56 pages
Apriori and FP-Growth Algorithm
No ratings yet
Apriori and FP-Growth Algorithm
48 pages
Ariori DHP
No ratings yet
Ariori DHP
53 pages
DWDM Unit-3
No ratings yet
DWDM Unit-3
35 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Improving Efficiency of Apriori Algorithm Using Transaction Reduction
No ratings yet
Improving Efficiency of Apriori Algorithm Using Transaction Reduction
4 pages
Unit 4
No ratings yet
Unit 4
72 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Ijctt V27P116
No ratings yet
Ijctt V27P116
7 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Unit 2 Decision Tree
No ratings yet
Unit 2 Decision Tree
16 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
Association Rule Mining 2023 (Compatibility Mode)
No ratings yet
Association Rule Mining 2023 (Compatibility Mode)
44 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
FDS Unit - 3
No ratings yet
FDS Unit - 3
10 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
DWDM FINAL4
No ratings yet
DWDM FINAL4
37 pages
DWM Unit-4 Sem Ans
No ratings yet
DWM Unit-4 Sem Ans
9 pages
Week 6 - Basic Association Analysis
No ratings yet
Week 6 - Basic Association Analysis
71 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
Unit-5 DWDM
No ratings yet
Unit-5 DWDM
7 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Term Paper CS705A
No ratings yet
Term Paper CS705A
8 pages
DWM Unit 4
No ratings yet
DWM Unit 4
11 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Association Rules
No ratings yet
Association Rules
24 pages
Association Rules
No ratings yet
Association Rules
48 pages
Week 3
No ratings yet
Week 3
56 pages
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
No ratings yet
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
41 pages
Study On Application of Apriori Algorithm in Data Mining
No ratings yet
Study On Application of Apriori Algorithm in Data Mining
4 pages
Closet - An Efficient Algorithm For Mining Frequent
No ratings yet
Closet - An Efficient Algorithm For Mining Frequent
8 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
12 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
14 pages
Chapter 5 - Association Rule Mining
No ratings yet
Chapter 5 - Association Rule Mining
45 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Frequent Pattern Analysis-Arpriori
No ratings yet
Frequent Pattern Analysis-Arpriori
27 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
A New Efficient Matrix Based Frequent Itemset Mining Algorithm With Tags
No ratings yet
A New Efficient Matrix Based Frequent Itemset Mining Algorithm With Tags
4 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
Unit 5 Notes DWM
No ratings yet
Unit 5 Notes DWM
11 pages
Unit IV DWDM
No ratings yet
Unit IV DWDM
17 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
IR Mid-I QB
No ratings yet
IR Mid-I QB
1 page
Devops External
No ratings yet
Devops External
16 pages
VND Openxmlformats-Officedocument Wordprocessingml Document&rendition 1
No ratings yet
VND Openxmlformats-Officedocument Wordprocessingml Document&rendition 1
8 pages
Java - Programming Module3 (Packages & Exception Handling)
No ratings yet
Java - Programming Module3 (Packages & Exception Handling)
15 pages
Certificate That This Is Bonafide Record of Practical Work Done in The Laboratory by The Candidate P.Hemanth During The Academic Year 2022 - 2023
No ratings yet
Certificate That This Is Bonafide Record of Practical Work Done in The Laboratory by The Candidate P.Hemanth During The Academic Year 2022 - 2023
36 pages
VND Openxmlformats-Officedocument Wordprocessingml Document&rendition 1
No ratings yet
VND Openxmlformats-Officedocument Wordprocessingml Document&rendition 1
10 pages
Essence of Indian Traditional Knowledge Lecture Notes - by - Saraswathi
100% (1)
Essence of Indian Traditional Knowledge Lecture Notes - by - Saraswathi
66 pages
Security and Privacy in Virtual Reality - A Literature Survey
No ratings yet
Security and Privacy in Virtual Reality - A Literature Survey
16 pages
Accessing Input File Upload Field in JavaScript - Stack Overflow
No ratings yet
Accessing Input File Upload Field in JavaScript - Stack Overflow
3 pages
90C05-C HLN9742 FlashAdapter
No ratings yet
90C05-C HLN9742 FlashAdapter
7 pages
KErala PSC Time Table
No ratings yet
KErala PSC Time Table
10 pages
IN GPP Dev Deck - Scaled - Core Partners - Workshop (7 Dec 2022)
No ratings yet
IN GPP Dev Deck - Scaled - Core Partners - Workshop (7 Dec 2022)
20 pages
Carrier PRO-DIALOG 30GK User Manual
No ratings yet
Carrier PRO-DIALOG 30GK User Manual
36 pages
Fire Alarm Systems: Program Detail Manual
No ratings yet
Fire Alarm Systems: Program Detail Manual
25 pages
IEEE 802.1X Configuration Management
No ratings yet
IEEE 802.1X Configuration Management
140 pages
06 Java-Prolog Integration: Mirko Viroli Mirko - Viroli@unibo - It
No ratings yet
06 Java-Prolog Integration: Mirko Viroli Mirko - Viroli@unibo - It
26 pages
CRM Course Home Assignment 1
No ratings yet
CRM Course Home Assignment 1
9 pages
Ople Vs Torres
No ratings yet
Ople Vs Torres
2 pages
Maths Sample Paper 3 Term 1
No ratings yet
Maths Sample Paper 3 Term 1
8 pages
Function of Management: Jahid Hasan Assistant Professor Department of IPE, SUST
No ratings yet
Function of Management: Jahid Hasan Assistant Professor Department of IPE, SUST
54 pages
Java J2EE Made Easy
No ratings yet
Java J2EE Made Easy
726 pages
Vansh Internship Report
No ratings yet
Vansh Internship Report
24 pages
Security in SAP HANA 2016
No ratings yet
Security in SAP HANA 2016
17 pages
SoS BS Mathematics (ICT) Revised 2023
No ratings yet
SoS BS Mathematics (ICT) Revised 2023
2 pages
Zero - Memory - Procedure Fadal
No ratings yet
Zero - Memory - Procedure Fadal
2 pages
Introduction of Iot Course: Lecturer
No ratings yet
Introduction of Iot Course: Lecturer
8 pages
Parkindo Solution Compro
No ratings yet
Parkindo Solution Compro
11 pages
Nokia 7750 SR Data Sheet en
No ratings yet
Nokia 7750 SR Data Sheet en
10 pages
MIS3200 Review For Exam 2 S16
No ratings yet
MIS3200 Review For Exam 2 S16
37 pages
CISA Chapter 1 Practice
No ratings yet
CISA Chapter 1 Practice
132 pages
GP3 (Delta Modulation)
No ratings yet
GP3 (Delta Modulation)
11 pages
Bartec Tablet Solution PDF
No ratings yet
Bartec Tablet Solution PDF
12 pages
Prog Controller Wiring Systems
No ratings yet
Prog Controller Wiring Systems
208 pages
Handout 4 Types of Media
No ratings yet
Handout 4 Types of Media
2 pages
Modelling Timing Constraints
No ratings yet
Modelling Timing Constraints
7 pages
Summative Test in Etech q1 Week 1-2
100% (1)
Summative Test in Etech q1 Week 1-2
7 pages
Substation Automation System
0% (1)
Substation Automation System
5 pages
Tda 7372
No ratings yet
Tda 7372
10 pages

Unit 4

Uploaded by

Unit 4

Uploaded by

UNIT 4

Association Analysis: Basic Concepts and Algorithms: Problem Definition, Frequent

Que 1: What is association analysis?

Consider the dataset above the corresponding association rule is:

Applications of Association Analysis:

a) Binary representation of association data:

Each transaction is represented as ‘1’s and’0’s. Where ‘1’

b) Item set: Consider I be set of items. In association rule, collection of

c) Transaction width: Number of items in itemset is called transaction

d) Support count: Number of times the itemset is found in the transaction

e) Support: Support represents how often a rule is applicable to a dataset.

Where X and Y are items and N represents total number of transactions.

f) Confidence: Confidence represents, how often Y appear in transaction

Out of all transactions containing X, how many transactions contains Y.

g) Frequent itemset: An itemset is called frequent itemset if has minimum

h) Anti monotone property: Support of an itemset never exceeds its

i) Support based pruning: If an itemset is infrequent then all supersets

In the above algorithm, Ct= candidates in Ck present in

Apply apriori algorithm

Here, I1^I2 =>I5 Means,

3 methods for generating support counts for candidates.

1) Compare itemsets against each transaction: Compare itemsets against

2) Enumerate the itemsets:

Arrange elements in increasing order of their occurrence in the transaction

Here, Items are stored in different buckets called as hash tables.

Que 7: Write the computational complexity of apriori

Que 8: What are the advantages and disadvantages of

ABCD ABCE ABDE ACDE BCDE

Closed itemset: An itemset is closed if none of its immediate supersets

Closed frequent itemset: If an itemset is closed and its support count is

Apriori uses a generate-and-test approach – generates candidate itemsets and

Algorithm for FP-Tree

The first scan of the database is the same as

• Finally, I1’s conditional pattern base is {{I2: 4}}, whose FP-tree

Que: Compare and contrast Apriori and FP Growth.

You might also like