Pattern Mining-
Advanced Methods
Mining various kinds of patterns
• People may like to uncover more complex patterns
• Multilevel associations
• Involve concepts at different abstraction levels
• Multidimensional associations
• Involve more than one dimension or predicate (what customer buys to his age?)
• Quantitative associations
• Involve numeric attributes (eg: age, salary)
• Rate patterns
• Suggest interesting, although rare item combinations
• Negative pattern
• Shows negative correlation between items
Mining multilevel associations
• Strong associations are discovered at high abstraction levels
• (eg: buying bread and milk)
• There may be a need to drill down to find novel patterns at more detailed levels
• (eg: buying what kind of bread and what kind of milk together)
• It is interesting to mine patterns at multiple abstraction levels
Mining multilevel association rules
• Concept hierarchies defines a
• Sequence of mappings from a set of low-level concepts to higher-level
• Has five levels, (0 through 4)
• concept hierarchies for nominal attributes are
• specified by experts
• Generated from data, based on the analysis of
• Product specifications
• Attribute values
• Data distributions
• Concept hierarchies for numeric attributes are generated using discretization techniques
• It is difficult to find interesting patterns in primitive-level data
• Eg:
• Dell Studio XPS 16 Notebook
• Logitech VX Nano Cordless Laser Mouse
• Occurs in very small fraction of transactions
• It is difficult to find strong associations involving specific items
• Easier to find strong associations b/w generalized abstractions as
• Dell Notebook
• Cordless Mouse
Multiple-level/multilevel association rules
• Association rules generated from multiple abstraction levels
• Can be mined using concept hierarchies under support-confidence framework
• In general, top-down strategy can be employed
• Counts are accumulated at each concept level
• Starting at level 1
• Working downward to more specific concept levels
• Until no more frequent itemsets can be found
Multiple-level/multilevel association rules
• Using same/uniform support for all levels
• Using reduced minimum support at lower levels
• Using item-group based minimum support
•
• Using uniform minimum support for all levels
• Same minimum support threshold is used at each level
• Users specify only one minimum support threshold
• An apriori-like optimization technique can be adopted
• An ancestor is a superset of its descendants
• Search procedure is simplified
• Search avoids examining itemsets that do not have a minimum support
• Drawbacks
• Items at lower abstraction levels will not occur frequently as items at higher abstraction
levels
• If minimum support threshold is too high
• Could miss some meaningful associations occurring at low abstraction levels
• If threshold is too low
• May generate many uninteresting associations occurring at high-level abstraction
levels
• Using reduced minimum support at lower levels
• Each abstraction level has its own minimum support threshold
• The deeper the abstraction level,
• Smaller the corresponding threshold
• For mining multilevel patterns with reduced support
• The support threshold should be minimum at lowest abstraction level
• For the final pattern/rule extraction,
• thresholds associated with the corresponding items should be enforced to print
only interesting associations
• Using item or group-based minimum support
• It is sometimes desirable to set up
• User-specific
• Item-based
• Group-based
• Minimal support thresholds when mining multilevel rules
• Eg:
• User setting up minimum support thresholds based on product price/items of interest
• Camera with price over $1000
• Side effect of multilevel association rules
• Generation of many redundant rules across multiple abstraction levels
Mining Multidimensional associations
• Multidimensional associations
• Association rules containing multiple predicates
• Contains three predicates
• Age
• Occupation
• Buys
• Each of which occurs only once in the rule (no repeated predicates)
• Interdimensional association rules
• Multidimensional association rules with no repeated predicates
• Hybrid-dimensional association rules
• Multidimensional association rules with repeated predicates
• Contain multiple occurrences of some predicates
• Database attributes can be nominal/quantitative
• Nominal
• ‘names or things’
• Have finite number of possible values
• No ordering among the values
• Eg:
• Occupation
• Brand
• color
• Quantitative
• Numeric
• Have implicit ordering among values
• Eg:
• Age
• Income
• Price
• Treatment of quantitative attributes
• Quantitative attributes are discretized using predefined concept hierarchies
• Occurs before mining
• Replace the original numeric values by intervals
• 0….20K, 21…40K, and so on
• Discretization is static and predetermined
• Discretized numeric attributes, treated as nominal attributes
• We refer this as
• mining multidimensional association rules using static discretization of quantitative
attributes
• Treatment of quantitative attributes
• Quantitative attributes are discretized or clustered into ‘bins’ based on data distribution
• Discretization is dynamic
• Treats numeric attribute values as quantities
• We refer this as
• Dynamic quantitative association rules
• Multidimensional association rules
• Search for frequent predicate set
• A k-predicate set
• Contains k conjunctive predicates
• Eg:
• Set of predicates {age, occupation, buys} is a 3-predicate set