0% found this document useful (0 votes)

10 views5 pages

Homework 1 Data

Big data analysis homework question

Uploaded by

De Zheng Zhao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views5 pages

Homework 1 Data

Big data analysis homework question

Uploaded by

De Zheng Zhao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

import pandas as pd

from mlxtend.preprocessing import TransactionEncoder

from mlxtend.frequent_patterns import apriori, fpmax, fpgrowth, association_rules

Problem 1

transactions = [['Python', 'Machine Learning'],

['Introduction to AI', 'JavaScript', 'C++'],
['Python', 'Machine Learning', 'Data Science', 'Introduction to AI', 'JavaScript'],
['Mathematics', 'Machine Learning'],
['Python', 'Data Science', 'Introduction to AI', 'Machine Learning'],
['Python', 'Data Science'],
['Mathematics', 'Machine Learning', 'Data Science'],
['Python', 'Mathematics'],
['Python', 'Machine Learning', 'Introduction to AI'],
['Introduction to AI', 'Mathematics', 'C++']
]

trans_encode = TransactionEncoder()
fitted = trans_encode.fit(transactions).transform(transactions)

df = pd.DataFrame(fitted, columns=trans_encode.columns_)

freq_itemsets = apriori(df, min_support=0.3, use_colnames=True)

print(freq_itemsets)

support itemsets
0 0.4 (Data Science)
1 0.5 (Introduction to AI)
2 0.6 (Machine Learning)
3 0.4 (Mathematics)
4 0.6 (Python)
5 0.3 (Machine Learning, Data Science)
6 0.3 (Python, Data Science)
7 0.3 (Machine Learning, Introduction to AI)
8 0.3 (Python, Introduction to AI)
9 0.4 (Machine Learning, Python)
10 0.3 (Machine Learning, Python, Introduction to AI)

association_rules(freq_itemsets, metric='confidence', min_threshold=0.6)

antecedent consequent
antecedents consequents support confidence lift leverage conviction zhangs_metric
support support

(Machine
0 (Data Science) 0.4 0.6 0.3 0.750000 1.250000 0.06 1.6 0.333333
Learning)

1 (Data Science) (Python) 0.4 0.6 0.3 0.750000 1.250000 0.06 1.6 0.333333

(Introduction to (Machine
2 0.5 0.6 0.3 0.600000 1.000000 0.00 1.0 0.000000
AI) Learning)

(Introduction to
3 (Python) 0.5 0.6 0.3 0.600000 1.000000 0.00 1.0 0.000000
AI)

(Machine
4 (Python) 0.6 0.6 0.4 0.666667 1.111111 0.04 1.2 0.250000
Learning)

(Machine
5 (Python) 0.6 0.6 0.4 0.666667 1.111111 0.04 1.2 0.250000
Learning)

(Machine
(Introduction
6 Learning, 0.4 0.5 0.3 0.750000 1.500000 0.10 2.0 0.555556
to AI)
Python)

(Machine
Learning,
7 (Python) 0.3 0.6 0.3 1.000000 1.666667 0.12 inf 0.571429
Introduction to
AI)

(Python,
(Machine
8 Introduction to 0.3 0.6 0.3 1.000000 1.666667 0.12 inf 0.571429
Learning)
AI)

(Machine
(Introduction to
9 Learning, 0.5 0.4 0.3 0.600000 1.500000 0.10 1.5 0.666667
AI)
Python)

Some of the key strengths of the Apriori algorithm include its relative simplicity, both in terms of readability and implementation. More
specifically, its usage of candidate itemset generation and using support/confidence thresholds is straightforward and intuitve. Another
strength is that it incorporates pruning to remove unlikely itemsets from consideration early on, which reduces the number of calculations
and speeds up computation.

On the other hand, a glaring disadvantage of this algorithm is that it needs multiple scans in order to calculate support and generate
itemsets. For modern-day applications with enormous datasets, problems such as computational complexity and time arise as a result.

Problem 2
Compared to the Apriori algorithm, the FP-Growth algorithm's greatest strength is its computational speed. Because the FP-Growth
algorithm doesn't require multiple scans of the database and its scanning time only increases linearly, it is significantly faster than the
Apriori algorithm.

In terms of relative weaknesses, the FP-Growth algorithm requires a substantial amount of setup in order to use, as you need to build the
FP-tree as well as the conditional trees for each frequent item. These steps can be relatively complex in terms of implementation and
readability compared to to the simplicity of the apriori algorithm.

Problem 3

t2 = [['Python', 'Machine Learning', 'C++'],

['Introduction to AI', 'JavaScript', 'C++', 'Machine Learning'],
['Python', 'Data Science', 'Introduction to AI'],
['Mathematics', 'Machine Learning'],
['Python', 'Data Science', 'Introduction to AI', 'Machine Learning'],
['Python', 'Data Science'],
['Mathematics', 'Machine Learning', 'Data Science', 'JavaScript'],
['Python', 'Mathematics', 'Machine Learning'],
['Python', 'Machine Learning', 'Introduction to AI'],
['Python', 'Machine Learning', 'Introduction to AI']
]

te = TransactionEncoder()
te_ary = te.fit(t2).transform(t2)
df1 = pd.DataFrame(te_ary, columns=te.columns_)

fpgrowth(df1, min_support=0.3, use_colnames=True)

support itemsets

0 0.8 (Machine Learning)

1 0.7 (Python)

2 0.5 (Introduction to AI)

3 0.4 (Data Science)

4 0.3 (Mathematics)

5 0.5 (Machine Learning, Python)

6 0.4 (Machine Learning, Introduction to AI)

7 0.4 (Python, Introduction to AI)

8 0.3 (Machine Learning, Python, Introduction to AI)

9 0.3 (Python, Data Science)

10 0.3 (Machine Learning, Mathematics)

After generating the above frequent itemsets, we then apply the budget and timeslot constraints. Because no individual course costs
more than 3800 dollars, the following itemsets comply with the constraints: {Machine Learning}, {Python}, {Introduction to AI}, {Data
Science}, and {Mathematics}. Next, we eliminate itemsets with items whose summed costs exceed 3800, removing itemsets {Machine
Learning, Introduction to AI} and {Machine Learning, Python, Introduction to AI}. Therefore, we are left with {Machine Learning, Python},
{Python, Introduction to AI}, {Python, Data Science}, and {Machine Learning, Mathematics}. In short, the compliant frequent itemsets are
the following (represented by their index number in the above chart): 0, 1, 2, 3, 4, 5, 7, 9, and 10.

The budget constraint is in fact anti-monotone. For any of the itemsets that satisfies the budget constraint, all of its subsets also satisfy
the budget constraint. This is useful to consider because anti-monotone constraints can be used very handily in the pruning process for
frequent itemset mining. If an itemset violates a given anti-monotone constraint, then we can prune away all of its supersets, which
should decrease computational costs and runtime (fewer candidates to consider).

One monotone constraint that could be applied to this dataset is a budget minimum of 3000 dollars (this situation could arise if an
individual had to spend over a certain amount to receive government or company financial aid). This constraint satisfies monotonicity
because if any itemset violates this constraint, its subsets will also violate it. For a similar reason as above, this constraint is useful to
consider because it allows us to more quickly converge upon itemsets of interest while eliminating superfluous itemsets from
consideration early.

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

Dewey How We Think. Revised Edition
No ratings yet
Dewey How We Think. Revised Edition
229 pages
Module 4 (3)
No ratings yet
Module 4 (3)
71 pages
Association Rules Problem Statement
100% (1)
Association Rules Problem Statement
29 pages
Module 4 DM
No ratings yet
Module 4 DM
86 pages
Equent Itemsets & Clustering
No ratings yet
Equent Itemsets & Clustering
27 pages
Advanced Database
No ratings yet
Advanced Database
23 pages
Weantuday: T Deuhh Anytha
No ratings yet
Weantuday: T Deuhh Anytha
23 pages
Intro To Ai ML
No ratings yet
Intro To Ai ML
21 pages
Rule Mining and The Apriori Algorithm: M I, 2 I, 3 I 1 I, 5
No ratings yet
Rule Mining and The Apriori Algorithm: M I, 2 I, 3 I 1 I, 5
6 pages
Apriori Algorithm (Python 3.0) - A Data Analyst
No ratings yet
Apriori Algorithm (Python 3.0) - A Data Analyst
13 pages
Code:: To Find Frequent Itemsets and Association Between Different Itemsets Using Apriori Algorithm
No ratings yet
Code:: To Find Frequent Itemsets and Association Between Different Itemsets Using Apriori Algorithm
28 pages
Apriori
No ratings yet
Apriori
28 pages
Apriori Algorithm in Machine Learning
No ratings yet
Apriori Algorithm in Machine Learning
8 pages
DW Ans
No ratings yet
DW Ans
19 pages
Dwdm Answer
No ratings yet
Dwdm Answer
19 pages
Prac7 8 9 10
No ratings yet
Prac7 8 9 10
12 pages
Exp 9
No ratings yet
Exp 9
9 pages
DM Lab Cycle 7 1
No ratings yet
DM Lab Cycle 7 1
7 pages
UNIT 4 NOTES
No ratings yet
UNIT 4 NOTES
21 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
4 pages
DA_EXP_9 (1)
No ratings yet
DA_EXP_9 (1)
5 pages
APRIARI Algorithm
No ratings yet
APRIARI Algorithm
55 pages
Nano Technology in Electrical Engineering
67% (3)
Nano Technology in Electrical Engineering
17 pages
Association Rule Mining Activity
No ratings yet
Association Rule Mining Activity
4 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
23 pages
15th QN
No ratings yet
15th QN
3 pages
Pract4 63
No ratings yet
Pract4 63
3 pages
Apriori Algorithm Example Problems
No ratings yet
Apriori Algorithm Example Problems
8 pages
Algorithm
No ratings yet
Algorithm
8 pages
Solutions to All Problem (1)_compressed
No ratings yet
Solutions to All Problem (1)_compressed
25 pages
L9
No ratings yet
L9
24 pages
Circuit-Bending and The DIY Culture
100% (1)
Circuit-Bending and The DIY Culture
23 pages
6
No ratings yet
6
2 pages
Ia1 Ml Scheme Common to is,Ai,Cs - Copy
No ratings yet
Ia1 Ml Scheme Common to is,Ai,Cs - Copy
10 pages
Unit 4
No ratings yet
Unit 4
72 pages
Ex. 9 Association Rule Learning Using Apriori Algorithm
No ratings yet
Ex. 9 Association Rule Learning Using Apriori Algorithm
3 pages
chota bheem
No ratings yet
chota bheem
6 pages
Big Data Prcatical
No ratings yet
Big Data Prcatical
3 pages
Ds 2
No ratings yet
Ds 2
3 pages
Association Rule Mining 2023 (Compatibility Mode)
No ratings yet
Association Rule Mining 2023 (Compatibility Mode)
44 pages
Fa22-bcs-025 MOAZ Assignment 1
No ratings yet
Fa22-bcs-025 MOAZ Assignment 1
9 pages
DWM Exp8
No ratings yet
DWM Exp8
8 pages
Data Mining - Module 6
No ratings yet
Data Mining - Module 6
7 pages
Shweta Singh-Dwdm2024
No ratings yet
Shweta Singh-Dwdm2024
5 pages
Datasheet 24c16
No ratings yet
Datasheet 24c16
17 pages
2 Coating-Failure-Defects
100% (1)
2 Coating-Failure-Defects
46 pages
Apriori
No ratings yet
Apriori
5 pages
apriori - mlxtend
No ratings yet
apriori - mlxtend
4 pages
Unit-4 Da
No ratings yet
Unit-4 Da
15 pages
report
No ratings yet
report
5 pages
Unit 4 Data Analytics
No ratings yet
Unit 4 Data Analytics
11 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Topic1 PDF
No ratings yet
Topic1 PDF
17 pages
11 Association Rules Mining and Recommendation Systems
No ratings yet
11 Association Rules Mining and Recommendation Systems
70 pages
Lab 10
No ratings yet
Lab 10
2 pages
Data Mining Lab Report
No ratings yet
Data Mining Lab Report
6 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Lecture 2. Linear Classification: Prof. Simone Formentin
No ratings yet
Lecture 2. Linear Classification: Prof. Simone Formentin
10 pages
Topic 1, 2, 3
No ratings yet
Topic 1, 2, 3
5 pages
Student Worksheet
No ratings yet
Student Worksheet
12 pages
5 - Acoustic Fluid Level
100% (1)
5 - Acoustic Fluid Level
53 pages
Hyundai FB Machiningcenter
No ratings yet
Hyundai FB Machiningcenter
28 pages
The Self Center
No ratings yet
The Self Center
8 pages
Service Manual - Pala Eléctrica 7495 CAT
0% (1)
Service Manual - Pala Eléctrica 7495 CAT
1 page
7.capillary Pressure
No ratings yet
7.capillary Pressure
24 pages
Top 9 Data Science Algorithms
No ratings yet
Top 9 Data Science Algorithms
152 pages
Angrenaj Cilindric PDF
No ratings yet
Angrenaj Cilindric PDF
17 pages
Population of Ethiopia
100% (1)
Population of Ethiopia
23 pages
English Advanced Higher Dissertation Examples
100% (2)
English Advanced Higher Dissertation Examples
8 pages
Lesson Project Cls. 5 Present Simple
100% (1)
Lesson Project Cls. 5 Present Simple
5 pages
Forms of Energy
No ratings yet
Forms of Energy
8 pages
Impact of The Power Quality of The Wind Farm and The Effects On The Power System
No ratings yet
Impact of The Power Quality of The Wind Farm and The Effects On The Power System
5 pages
STS Report Outline1
100% (1)
STS Report Outline1
14 pages
EFMEA Form Elsmar 20160415
No ratings yet
EFMEA Form Elsmar 20160415
11 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
13 pages
Ayushi Resume
No ratings yet
Ayushi Resume
3 pages
How to read PID - Safety expressed in P&ID - part 1
No ratings yet
How to read PID - Safety expressed in P&ID - part 1
9 pages
I See Fire - Ed Sheeran - Cifra Club
No ratings yet
I See Fire - Ed Sheeran - Cifra Club
1 page
Auto Tune Cheatsheet
No ratings yet
Auto Tune Cheatsheet
4 pages
A. Personal Information: Stuart Musson
No ratings yet
A. Personal Information: Stuart Musson
4 pages
Reliability Engineering
No ratings yet
Reliability Engineering
5 pages
Association Rules Ans
No ratings yet
Association Rules Ans
28 pages
GXM-20U/40U 2 and 4 Channel Universal Sensor Input Extension Module
No ratings yet
GXM-20U/40U 2 and 4 Channel Universal Sensor Input Extension Module
2 pages
Aerogen_Direct_Parts_List_US1241B
No ratings yet
Aerogen_Direct_Parts_List_US1241B
2 pages
Candidate Generation and Pruning
No ratings yet
Candidate Generation and Pruning
9 pages
Anthropology
100% (1)
Anthropology
4 pages
Mastering Python Scientific Computing: A complete guide for Python programmers to master scientific computing using Python APIs and tools
From Everand
Mastering Python Scientific Computing: A complete guide for Python programmers to master scientific computing using Python APIs and tools
Hemant Kumar Mehta
4/5 (1)
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet
Mastering Python: A Comprehensive Approach for Beginners and Beyond
From Everand
Mastering Python: A Comprehensive Approach for Beginners and Beyond
Williams Asiedu
No ratings yet