0% found this document useful (0 votes)

29 views17 pages

Unit 2 Material

Unit II covers various methods of association and correlation mining, focusing on market basket analysis and the Apriori algorithm for mining frequent itemsets. It discusses the importance of support and confidence in association rules, as well as techniques for mining closed frequent itemsets and multidimensional association rules. Additionally, it explores correlation analysis and constraint-based mining strategies to enhance the effectiveness of data mining processes.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views17 pages

Unit 2 Material

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

UNIT II

Unit-2 - Associations and Correlations

Market Basket Analysis – Apriori Algorithm –
Mining Frequent Itemsets without Candidate Generation –
Mining Frequent Itemsets Using Vertical Data Format –
Mining Closed Frequent Itemsets –
Mining Multilevel Association Rules –
Mining Multidimensional Association Rules –
Correlation Analysis –
Constraint-Based Association Mining
Mining Frequent Patterns, Associations and Correlations – Mining Methods – Mining Various Kinds of
Association Rules – Correlation Analysis – Constraint Based Association Mining

Association Mining

• Association rule mining:

– Finding frequent patterns, associations, correlations, or causal structures among sets of
items or objects in transaction databases, relational databases, and other information
repositories.
• Applications:
– Basket data analysis, cross-marketing, catalog design, loss-leader analysis,
clustering, classification, etc.
• Examples.
– Rule form: “Body ® Head [support, confidence]”.
– buys(x, “diapers”) ® buys(x, “beers”) [0.5%, 60%]
– major(x, “CS”) ^ takes(x, “DB”) ® grade(x, “A”) [1%, 75%]

Association Rule: Basic Concepts

• Given: (1) database of transactions, (2) each transaction is a list of items (purchased by
a customer in a visit)
• Find: all rules that correlate the presence of one set of items with that of another set of items
– E.g., 98% of people who purchase tires and auto accessories also get automotive
services done
• Applications
– *  Maintenance Agreement (What the store should do to boost
Maintenance Agreement sales)
– Home Electronics  * (What other products should the store stocks up?)
– Attached mailing in direct marketing
– Detecting “ping-pong”ing of patients, faulty “collisions”

Rule Measures: Support and Confidence

• Find all the rules X & Y  Z with minimum confidence and support
– support, s, probability that a transaction contains {X  Y  Z}
– confidence, c, conditional probability that a transaction having {X  Y} also contains Z

Let minimum support 50%, and minimum confidence 50%, we have

– A  C (50%, 66.6%)
– C  A (50%, 100%)

Transaction ID Items
Bought
2000 A,B,C
1000 A,C
4000 A,D
5000 B,E,F

Association Rule Mining: A Road Map

• Boolean vs. quantitative associations (Based on the types of values handled)

– buys(x, “SQLServer”) ^ buys(x, “DMBook”) ® buys(x, “DBMiner”) [0.2%, 60%]
– age(x, “30..39”) ^ income(x, “42..48K”) ® buys(x, “PC”) [1%, 75%]
• Single dimension vs. multiple dimensional associations (see ex. Above)
• Single level vs. multiple-level analysis
– What brands of beers are associated with what brands of diapers?
• Various extensions
– Correlation, causality analysis
• Association does not necessarily imply correlation or causality
– Maxpatterns and closed itemsets
– Constraints enforced
• E.g., small sales (sum < 100) trigger big buys (sum > 1,000)?

Market – Basket analysis

A market basket is a collection of items purchased by a customer in a single transaction, which

is a well-defined business activity. For example, a customer's visits to a grocery store or an online
purchase from a virtual store on the Web are typical customer transactions. Retailers accumulate
huge collections of transactions by recording business activities over time. One common analysis run
against a transactions database is to find sets of items, or itemsets, that appear together in many
transactions. A business can use knowledge of these patterns to improve the Placement of these items
in the store or the layout of mail- order catalog page and Web pages. An itemset containing i items is
called an i-itemset. The percentage of transactions that contain an itemset is called the itemset's
support. For an itemset to be interesting, its support must be higher than a user-specified minimum.
Such itemsets are said to be frequent.

Figure : Market basket analysis.

Rule support and confidence are two measures of rule interestingness. They respectively
reflect the usefulness and certainty of discovered rules. A support of 2% for association Rule means
that 2% of all the transactions under analysis show that computer and financial management
software are purchased together. A confidence of 60% means that 60% of the customers who
purchased a computer also bought the software. Typically, association rules are considered
interesting if they satisfy both a minimum support threshold and a minimum confidence threshold.

Mining Frequent Patterns

The method that mines the complete set of frequent itemsets with candidate generation.
Apriori property & The Apriori Algorithm.

Apriori property

• All nonempty subsets of a frequent item set most also be frequent.

– An item set I does not satisfy the minimum support threshold, min-sup, then I is not
frequent, i.e., support(I) < min-sup
– If an item A is added to the item set I then the resulting item set (I U A) can not
occur more frequently than I.
• Monotonic functions are functions that move in only one direction.
• This property is called anti-monotonic.
• If a set can not pass a test, all its supersets will fail the same test as well.
• This property is monotonic in failing the test.

The Apriori Algorithm

• Join Step: Ck is generated by joining Lk-1with itself

• Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset
Example
The method that mines the complete set of frequent itemsets without generation.

• Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure

– highly condensed, but complete for frequent pattern mining
– avoid costly database scans
• Develop an efficient, FP-tree-based frequent pattern mining method
– A divide-and-conquer methodology: decompose mining tasks into smaller ones
– Avoid candidate generation: sub-database test only!

Construct FP-tree from a Transaction DB

TID Items bought (ordered) frequentitems

100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m} min_support = 0.5
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
Steps:

1. Scan DB once, find frequent 1-itemset (single item pattern)

2. Order frequent items in frequency descending order
3. Scan DB again, construct FP-tree
Header Table

Item frequency head

f 4
c 4
a 3
b 3
m 3
p 3

Benefits of the FP-tree Structure

• Completeness:
– never breaks a long pattern of any transaction
– preserves complete information for frequent pattern mining
• Compactness
– reduce irrelevant information—infrequent items are gone
– frequency descending ordering: more frequent items are more likely to be shared
– never be larger than the original database (if not count node-links and counts)
– Example: For Connect-4 DB, compression ratio could be over 100

Mining Frequent Patterns Using FP-tree

• General idea (divide-and-conquer)

– Recursively grow frequent pattern path using the FP-tree
• Method
– For each item, construct its conditional pattern-base, and then its conditional FP-tree
– Repeat the process on each newly created conditional FP-tree
– Until the resulting FP-tree is empty, or it contains only one path (single path will
generate all the combinations of its sub-paths, each of which is a frequent
pattern)
Major Steps to Mine FP-tree

1) Construct conditional pattern base for each node in the FP-tree

2) Construct conditional FP-tree from each conditional pattern-base
3) Recursively mine conditional FP-trees and grow frequent patterns obtained so far
 If the conditional FP-tree contains a single path, simply enumerate all the patterns

Principles of Frequent Pattern Growth

• Pattern growth property

– Let  be a frequent itemset in DB, B be 's conditional pattern base, and  be an
itemset in B. Then    is a frequent itemset in DB iff  is frequent in B.
• “abcdef ” is a frequent pattern, if and only if
– “abcde ” is a frequent pattern, and
– “f ” is frequent in the set of transactions containing “abcde ”

Why Is Frequent Pattern Growth Fast?

• Our performance study shows

– FP-growth is an order of magnitude faster than Apriori, and is also faster than
tree- projection
• Reasoning
– No candidate generation, no candidate test
– Use compact data structure
– Eliminate repeated database scan
Basic operation is counting and FP-tree building

Mining multilevel association rules from transactional databases.

• Items often form hierarchy.

• Items at the lower level are expected to have lower support.
• Rules regarding itemsets at
appropriate levels could be quite useful.

• Transaction database can be encoded based on dimensions and levels

• We can explore shared multi-level mining

Food

Milk Bread

Wheat
Skim 2% White

Fraser
Sunset

TID Items
T1 {111, 121, 211, 221}
T2 {111, 211, 222, 323}
T3 {112, 122, 221, 411}
T4 {111, 121}
T5 {111,122,211, 221, 413}

Mining Multi-Level Associations

• A top_down, progressive deepening approach:

– First find high-level strong
rules: milk ® bread [20%,
60%].

– Then find their lower-level “weaker”

rules: 2% milk ® wheat bread [6%,
50%].

• Variations at mining multiple-level association rules.

– Level-crossed association rules:
2% milk ® Wonder wheat bread
– Association rules with multiple, alternative hierarchies:
2% milk ® Wonder bread

Multi-level Association: Uniform Support vs. Reduced Support

• Uniform Support: the same minimum support for all levels

– + One minimum support threshold. No need to examine itemsets containing any
item whose ancestors do not have minimum support.
– – Lower level items do not occur as frequently. If support threshold
• too high  miss low level associations
• too low  generate too many high level associations
• Reduced Support: reduced minimum support at lower levels
– There are 4 search strategies:
• Level-by-level independent
• Level-cross filtering by k-itemset
• Level-cross filtering by single item
• Controlled level-cross filtering by single item

Multi-level Association: Redundancy Filtering

• Some rules may be redundant due to “ancestor” relationships between items.

• Example
– milk  wheat bread [support = 8%, confidence = 70%]
– 2% milk  wheat bread [support = 2%, confidence = 72%]
• We say the first rule is an ancestor of the second rule.
• A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor

Multi-Level Mining: Progressive Deepening

• A top-down, progressive deepening approach:

– First mine high-level frequent items:
milk (15%), bread (10%)
– Then mine their lower-level “weaker” frequent
itemsets: 2% milk (5%), wheat bread (4%)

• Different min_support threshold across multi-levels lead to different algorithms:

– If adopting the same min_support across multi-levels then toss t if any of t’s ancestors is
infrequent.
– If adopting reduced min_support at lower levels then examine only those
descendents whose ancestor’s support is frequent/non-negligible.

Correlation in detail.

• Interest (correlation, lift)

– taking both P(A) and P(B) in consideration
– P(A^B)=P(B)*P(A), if A and B are independent events
– A and B negatively correlated, if the value is less than 1; otherwise A and B
positively correlated

X 1 1 1 1 0 0 0 0 Itemset Support Interest

Y11000000 X,Y 25% 2
Z01111111 X,Z 37.50% 0.9
2 Correlation
Y,Z 12.50% 0.57
• 2 measures correlation between categorical attributes

2
  (observed _ exp ected )
2

 exp ected

game not game sum(row)

video 4000(4500) 3500(3000) 7500
not video 2000(1500) 500 (1000) 2500
sum(col.) 6000 4000 10000
• expected(i,j) = count(row i) * count(column j) / N
• 2 = (4000 - 4500)2 / 4500 - (3500 - 3000)2 / 3000 - (2000 - 1500)2 / 1500 - (500 - 1000)2 /
1000 = 555.6
• 2 > 1 and observed value of (game, video) < expected value, there is a negative correlation

Numeric correlation

• Correlation concept in statistics

– Used to study the relationship existing between 2 or more numeric variables
– A correlation is a measure of the linear relationship between variables
Ex: number of hours spent studying in a class with grade received
– Outcomes:
•  positively related
•  Not related
•  negatively related
– Statistical relationships
• Covariance
• Correlation coefficient

Constraint-Based Mining in detail.

• Interactive, exploratory mining giga-bytes of data?

– Could it be real? — Making good use of constraints!
• What kinds of constraints can be used in mining?
– Knowledge type constraint: classification, association, etc.
– Data constraint: SQL-like queries
• Find product pairs sold together in Vancouver in Dec.’98.
– Dimension/level constraints:
• in relevance to region, price, brand, customer category.
– Rule constraints
• small sales (price < $10) triggers big sales (sum > $200).
– Interestingness constraints:
• strong rules (min_support  3%, min_confidence  60%).

Rule Constraints in Association Mining

• Two kind of rule constraints:

– Rule form constraints: meta-rule guided mining.
• P(x, y) ^ Q(x, w) ® takes(x, “database systems”).
– Rule (content) constraint: constraint-based query optimization (Ng, et al., SIGMOD’98).
• sum(LHS) < 100 ^ min(LHS) > 20 ^ count(LHS) > 3 ^ sum(RHS) > 1000
• 1-variable vs. 2-variable constraints (Lakshmanan, et al. SIGMOD’99):
– 1-var: A constraint confining only one side (L/R) of the rule, e.g., as shownabove.
– 2-var: A constraint confining both sides (L and R).
• sum(LHS) < min(RHS) ^ max(RHS) < 5* sum(LHS)

Constrain-Based Association Query

• Database: (1) trans (TID, Itemset ), (2) itemInfo (Item, Type, Price)
• A constrained asso. query (CAQ) is in the form of {(S1, S2 )|C },
– where C is a set of constraints on S1, S2 including frequency constraint
• A classification of (single-variable) constraints:
– Class constraint: S  A. e.g. S  Item
– Domain constraint:
• S v,   { , , , , ,  }. e.g. S.Price < 100
• v S,  is  or . e.g. snacks  S.Type
• V S, or S V,   { , , , ,  }
– e.g. {snacks, sodas }  S.Type
– Aggregation constraint: agg(S)  v, where agg is in {min, max, sum, count, avg}, and  
{ , , , , ,  }.
• e.g. count(S1.Type)  1 , avg(S2.Price)  100
Constrained Association Query Optimization Problem

• Given a CAQ = { (S1, S2) | C }, the algorithm should be :

– sound: It only finds frequent sets that satisfy the given constraints C
– complete: All frequent sets satisfy the given constraints C are found
• A naïve solution:
– Apply Apriori for finding all frequent sets, and then to test them for constraint
satisfaction one by one.
• Our approach:
– Comprehensive analysis of the properties of constraints and try to push them as
deeply as possible inside the frequent set computation.
Categories of Constraints.

1. Anti-monotone and Monotone Constraints

• constraint Ca is anti-monotone iff. for any pattern S not satisfying Ca, none of the super-
patterns of S can satisfy Ca
• A constraint Cm is monotone iff. for any pattern S satisfying Cm, every super-pattern of S also
satisfies it

2. Succinct Constraint
• A subset of item Is is a succinct set, if it can be expressed as p(I) for some selection predicate
p, where  is a selection operator
• SP2I is a succinct power set, if there is a fixed number of succinct set I1, …, Ik I, s.t. SP can
be expressed in terms of the strict power sets of I1, …, Ik using union and minus
• A constraint Cs is succinct provided SATCs(I) is a succinct power set

3. Convertible Constraint
• Suppose all items in patterns are listed in a total order R
• A constraint C is convertible anti-monotone iff a pattern S satisfying the constraint implies
that each suffix of S w.r.t. R also satisfies C
• A constraint C is convertible monotone iff a pattern S satisfying the constraint implies
that each pattern of which S is a suffix w.r.t. R also satisfies C
Property of Constraints: Anti-Monotone

• Anti-monotonicity: If a set S violates the constraint, any superset of S violates theconstraint.

• Examples:
– sum(S.Price)  v is anti-monotone
– sum(S.Price)  v is not anti-monotone
– sum(S.Price) = v is partly anti-monotone
• Application:
– Push “sum(S.price)  1000” deeply into iterative frequent set computation.

Example of Convertible Constraints: Avg(S)  V

• Let R be the value descending order over the set of items

– E.g. I={9, 8, 6, 4, 3, 1}
• Avg(S)  v is convertible monotone w.r.t. R
– If S is a suffix of S1, avg(S1)  avg(S)
• {8, 4, 3} is a suffix of {9, 8, 4, 3}
• avg({9, 8, 4, 3})=6  avg({8, 4, 3})=5
– If S satisfies avg(S) v, so does S1
• {8, 4, 3} satisfies constraint avg(S)  4, so does {9, 8, 4, 3}

Property of Constraints: Succinctness

• Succinctness:
– For any set S1 and S2 satisfying C, S1  S2 satisfies C
– Given A1 is the sets of size 1 satisfying C, then any set S satisfying C are based on A1
, i.e., it contains a subset belongs to A1 ,
• Example :
– sum(S.Price )  v is not succinct
– min(S.Price )  v is succinct
• Optimization:
– If C is succinct, then C is pre-counting prunable. The satisfaction of the constraint
alone is not affected by the iterative support counting.

DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
Unit II
No ratings yet
Unit II
22 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
DWDM Unit III Notes
No ratings yet
DWDM Unit III Notes
23 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
DMDW U3
No ratings yet
DMDW U3
16 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
Unit 2
No ratings yet
Unit 2
65 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
2 Unit DM K Raj Kuamr
No ratings yet
2 Unit DM K Raj Kuamr
26 pages
Inbound 5799672056943946753
No ratings yet
Inbound 5799672056943946753
47 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Data Mining M2
No ratings yet
Data Mining M2
18 pages
Retail Market Basket Analysis
No ratings yet
Retail Market Basket Analysis
43 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
DMDW - Association Analysis
No ratings yet
DMDW - Association Analysis
12 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Data Mining: Frequent Itemsets & Clustering
No ratings yet
Data Mining: Frequent Itemsets & Clustering
152 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Apriori Algorithm for Association Rule Mining
No ratings yet
Apriori Algorithm for Association Rule Mining
32 pages
Data - Analytics - Chapter 3
No ratings yet
Data - Analytics - Chapter 3
54 pages
Association RuleMining
No ratings yet
Association RuleMining
52 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
Mining Frequent Patterns Unit-3
No ratings yet
Mining Frequent Patterns Unit-3
13 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
Association Rule Mining Explained
No ratings yet
Association Rule Mining Explained
16 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Chapter 4
No ratings yet
Chapter 4
32 pages
06 Association Rule Mining
No ratings yet
06 Association Rule Mining
20 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Mining Frequent Patterns
No ratings yet
Mining Frequent Patterns
108 pages
Association Rule: Frequent Pattern Approach
No ratings yet
Association Rule: Frequent Pattern Approach
16 pages
DM Unit2 - 1 Association Mining 19I504
No ratings yet
DM Unit2 - 1 Association Mining 19I504
86 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
FDS Unit02
No ratings yet
FDS Unit02
16 pages
Chap 6
No ratings yet
Chap 6
77 pages
Association Rules
No ratings yet
Association Rules
48 pages
Contents
No ratings yet
Contents
59 pages
Unit 2 - Apriori and FP Growth Algortithm
No ratings yet
Unit 2 - Apriori and FP Growth Algortithm
15 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
Data Mining Frequent Patterns
No ratings yet
Data Mining Frequent Patterns
22 pages
Association Rule Mining (ARM)
No ratings yet
Association Rule Mining (ARM)
24 pages
UNIT 5 Frequent Pattern Mining
No ratings yet
UNIT 5 Frequent Pattern Mining
42 pages
Unit 1 Data Objects Attributes Visualization
No ratings yet
Unit 1 Data Objects Attributes Visualization
34 pages
21CSE356T-NLP-Unit 4.1
No ratings yet
21CSE356T-NLP-Unit 4.1
46 pages
Unit 4
No ratings yet
Unit 4
106 pages
21CSS303T Data Science Syllabus
No ratings yet
21CSS303T Data Science Syllabus
2 pages
DataScience Project-New
No ratings yet
DataScience Project-New
16 pages
21CSE356T-NLP - Unit 5
No ratings yet
21CSE356T-NLP - Unit 5
118 pages
NLP Unit-2 QB Updated
No ratings yet
NLP Unit-2 QB Updated
10 pages
Dsa Team 4 Project
No ratings yet
Dsa Team 4 Project
11 pages
Regions Bank Statement 2
100% (4)
Regions Bank Statement 2
4 pages
3 - Prosegur Corporate Presentation
No ratings yet
3 - Prosegur Corporate Presentation
27 pages
Entry Strategy in Venezuela
No ratings yet
Entry Strategy in Venezuela
53 pages
DHgate - Diane Linkedin Article From Accenture Report
No ratings yet
DHgate - Diane Linkedin Article From Accenture Report
2 pages
SAP S4 HANA General Ledger Configuration & End-User
No ratings yet
SAP S4 HANA General Ledger Configuration & End-User
32 pages
Hospital Fuel Expense Ledger
No ratings yet
Hospital Fuel Expense Ledger
4 pages
People vs. Sevilla
No ratings yet
People vs. Sevilla
3 pages
Resume Format 1234
No ratings yet
Resume Format 1234
2 pages
AFAR Quiz 3 (B44)
No ratings yet
AFAR Quiz 3 (B44)
12 pages
8 Simple Tricks: Killer Linkedin Profile
No ratings yet
8 Simple Tricks: Killer Linkedin Profile
5 pages
Demand PDF
No ratings yet
Demand PDF
15 pages
Reliability Test Result
No ratings yet
Reliability Test Result
2 pages
Depreciation REPORT
No ratings yet
Depreciation REPORT
3 pages
Dholera Tent City PPP EOI
No ratings yet
Dholera Tent City PPP EOI
10 pages
ExM PE
No ratings yet
ExM PE
16 pages
Personal Bank Account Form
No ratings yet
Personal Bank Account Form
8 pages
Finshots Best of 2024
No ratings yet
Finshots Best of 2024
141 pages
Economics Students' Guide to Elasticity
No ratings yet
Economics Students' Guide to Elasticity
33 pages
Book Excerpt: Project Management in The Hybrid Workplace
No ratings yet
Book Excerpt: Project Management in The Hybrid Workplace
31 pages
Principles of Accounting Review: Professor Alicia Davis, Eds Spring 2021
No ratings yet
Principles of Accounting Review: Professor Alicia Davis, Eds Spring 2021
3 pages
Dissolution of Partnership Contract 2
No ratings yet
Dissolution of Partnership Contract 2
6 pages
Apm Dynamic Conditions For Project Success 2021 v2
No ratings yet
Apm Dynamic Conditions For Project Success 2021 v2
35 pages
Customer Retention Strategies
No ratings yet
Customer Retention Strategies
29 pages
Irc 3113
No ratings yet
Irc 3113
2 pages
24BCA543
No ratings yet
24BCA543
3 pages
Budget Review
No ratings yet
Budget Review
15 pages
Comparison of Lithium-Ion Battery Supply Chains - A Life Cycle Sustainability
No ratings yet
Comparison of Lithium-Ion Battery Supply Chains - A Life Cycle Sustainability
6 pages
Enhancing The Security and Integrity of Americas Research Enterprise
No ratings yet
Enhancing The Security and Integrity of Americas Research Enterprise
39 pages
Search Engine Marketing (SEM) Strategy
No ratings yet
Search Engine Marketing (SEM) Strategy
49 pages
Sherwin Garan's Resume: Skills & Experience
No ratings yet
Sherwin Garan's Resume: Skills & Experience
2 pages

Unit 2 Material

Uploaded by

Unit 2 Material

Uploaded by

UNIT II

Unit-2 - Associations and Correlations

• Association rule mining:

Association Rule: Basic Concepts

Rule Measures: Support and Confidence

Let minimum support 50%, and minimum confidence 50%, we have

Association Rule Mining: A Road Map

• Boolean vs. quantitative associations (Based on the types of values handled)

Market – Basket analysis

A market basket is a collection of items purchased by a customer in a single transaction, which

Figure : Market basket analysis.

Mining Frequent Patterns

• All nonempty subsets of a frequent item set most also be frequent.

The Apriori Algorithm

• Join Step: Ck is generated by joining Lk-1with itself

• Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure

Construct FP-tree from a Transaction DB

TID Items bought (ordered) frequentitems

1. Scan DB once, find frequent 1-itemset (single item pattern)

Item frequency head

Benefits of the FP-tree Structure

Mining Frequent Patterns Using FP-tree

• General idea (divide-and-conquer)

1) Construct conditional pattern base for each node in the FP-tree

Principles of Frequent Pattern Growth

• Pattern growth property

Why Is Frequent Pattern Growth Fast?

• Our performance study shows

Mining multilevel association rules from transactional databases.

• Items often form hierarchy.

• Transaction database can be encoded based on dimensions and levels

Mining Multi-Level Associations

• A top_down, progressive deepening approach:

– Then find their lower-level “weaker”

• Variations at mining multiple-level association rules.

Multi-level Association: Uniform Support vs. Reduced Support

• Uniform Support: the same minimum support for all levels

Multi-level Association: Redundancy Filtering

• Some rules may be redundant due to “ancestor” relationships between items.

Multi-Level Mining: Progressive Deepening

• A top-down, progressive deepening approach:

• Different min_support threshold across multi-levels lead to different algorithms:

• Interest (correlation, lift)

X 1 1 1 1 0 0 0 0 Itemset Support Interest

game not game sum(row)

• Correlation concept in statistics

Constraint-Based Mining in detail.

• Interactive, exploratory mining giga-bytes of data?

Rule Constraints in Association Mining

• Two kind of rule constraints:

Constrain-Based Association Query

• Given a CAQ = { (S1, S2) | C }, the algorithm should be :

1. Anti-monotone and Monotone Constraints

• Anti-monotonicity: If a set S violates the constraint, any superset of S violates theconstraint.

Example of Convertible Constraints: Avg(S)  V

• Let R be the value descending order over the set of items

Property of Constraints: Succinctness

You might also like