100% found this document useful (1 vote)

769 views1 page

As Quiz 3 PCA Solution PDF

This document shows code for performing principal component analysis (PCA) on a Facebook post engagement dataset. It loads and preprocesses the data, applies PCA to extract six components, and answers questions about the analysis results, including eigenvectors and eigenvalues.

Uploaded by

BhagyaSree J

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

769 views1 page

As Quiz 3 PCA Solution PDF

Uploaded by

BhagyaSree J

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

In [1]: import numpy as np

import pandas as pd
from sklearn.decomposition import PCA
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [5]: df = pd.read_csv("FB-1 (1).csv")

df.head()

Out[5]: status_id num_reactions num_comments num_shares num_likes num_loves num_wows num_hahas num_sads

0 246675545449582_1649696485147474 529 512 262 432 92 3 1 1

1 246675545449582_1649426988507757 150 0 0 150 0 0 0 0

2 246675545449582_1648730588577397 227 236 57 204 21 1 1 0

3 246675545449582_1648576705259452 111 0 0 111 0 0 0 0

4 246675545449582_1645700502213739 213 0 0 204 9 0 0 0

Q5) Which of the variables in the dataset is not significant for doing Principal
Component Analysis?
Now, let us go ahead and drop the column 'status_id' as that variable is of no use to us when we are doing
Principal Component Analysis.
In [6]: df_new = df.drop(['status_id'],axis = 1)
df_new.head()

Out[6]: num_reactions num_comments num_shares num_likes num_loves num_wows num_hahas num_sads num_angrys status_link status_phot

0 529 512 262 432 92 3 1 1 0 0

1 150 0 0 150 0 0 0 0 0 0

2 227 236 57 204 21 1 1 0 0 0

3 111 0 0 111 0 0 0 0 0 0

4 213 0 0 204 9 0 0 0 0 0

In [ ]:

Q6) After doing z-score scaling on the dataset, what is the value of the 2nd observation of the
variable ‘ num_hahas’?
In [7]: from scipy.stats import zscore
df_new=df_new.apply(zscore)
df_new.head()

Out[7]: num_reactions num_comments num_shares num_likes num_loves num_wows num_hahas num_sads num_angrys status_link status_phot

0 0.646104 0.323350 1.686879 0.482727 1.983266 0.196196 0.076713 0.473570 -0.155748 -0.094957 -1.24599

1 -0.173192 -0.252206 -0.304144 -0.144720 -0.318454 -0.147879 -0.176010 -0.152587 -0.155748 -0.094957 0.80257

2 -0.006738 0.013089 0.129017 -0.024571 0.206938 -0.033187 0.076713 -0.152587 -0.155748 -0.094957 -1.24599

3 -0.257499 -0.252206 -0.304144 -0.231495 -0.318454 -0.147879 -0.176010 -0.152587 -0.155748 -0.094957 0.80257

4 -0.037003 -0.252206 -0.304144 -0.024571 -0.093286 -0.147879 -0.176010 -0.152587 -0.155748 -0.094957 0.80257

ANS - The value of the 2nd observation of the variable ‘ num_hahas’ is -0.176010 .

In [ ]:

Q7) Apply PCA taking all features and extract 6 components and Find out the eigenvector of
the 5th component
In [9]: #Apply PCA taking all features
from sklearn.decomposition import PCA
pca = PCA(n_components=6, random_state=123)
pca_transformed = pca.fit_transform(df_new)

In [10]: #Extract eigen vectors

pca.components_

Out[10]: array([[ 0.29363054, 0.34749787, 0.44325444, 0.2517696 , 0.46125508,

0.29634039, 0.30885435, 0.16313058, 0.23724676, -0.00138341,
-0.23261371, 0.01379735],
[ 0.60664114, -0.230746 , -0.20491048, 0.6406539 , -0.16591724,
0.01626203, -0.13903343, -0.11041549, -0.12687418, 0.06418546,
0.03655064, 0.21318874],
[ 0.11200241, -0.087548 , -0.00392859, 0.10570202, 0.05181555,
0.21154873, 0.101801 , -0.04987934, 0.08923166, -0.23521304,
0.64341911, -0.65653464],
[ 0.00104601, -0.01595734, 0.03483879, -0.00173808, 0.03336338,
0.03375172, 0.01780145, -0.25206584, -0.042459 , 0.89259956,
-0.07188694, -0.35877103],
[ 0.08189114, 0.1862877 , -0.06986598, 0.1020903 , -0.13942737,
-0.37729947, -0.13429183, 0.81640504, 0.12355741, 0.20996813,
0.11148599, -0.15861432],
[-0.08520722, -0.43754044, -0.19674073, -0.09669555, -0.00487 ,
0.37224941, 0.05770028, 0.17312055, 0.66984295, 0.19021436,
0.12087046, 0.28612412]])
ANS - [ 0.08189114, 0.1862877 , -0.06986598, 0.1020903 , -0.13942737, -0.37729947, -0.13429183, 0.81640504, 0.12355741, 0.20996813,
0.11148599, -0.15861432],

In [ ]:

Q8) What is the eigenvector associated with the Second variable?

ANS - [ 0.60664114, -0.230746 , -0.20491048, 0.6406539 , -0.16591724, 0.01626203, -0.13903343, -0.11041549, -0.12687418, 0.06418546,
0.03655064, 0.21318874]

In [ ]:

Q9) Using the scaled dataset, Find out eigenvalues?

In [11]: #Check the eigen values
#Note: This is always returned in descending order
pca.explained_variance_

Out[11]: array([3.596288 , 1.78479109, 1.2511225 , 1.02089676, 0.95528279,

0.84959164])
ANS -([3.596288 , 1.78479109, 1.2511225 , 1.02089676, 0.95528279, 0.84959164])

In [ ]:

Q10) Using the given dataset, What are explained variances

In This
[12]: #Check
study source the explained
was downloaded variance
by 100000838437785 for each
from CourseHero.com PC
on 02-18-2022 07:44:44 GMT -06:00
#Note: Explained variance = (eigen value of each PC)/(sum of eigen values of all PCs)
pca.explained_variance_ratio_
https://www.coursehero.com/file/102630915/AS-Quiz-3-PCA-Solutionpdf/

Out[12]: array([0.29964816, 0.14871149, 0.10424542, 0.08506266, 0.07959561,

0.07078926])
ANS - ([0.29964816, 0.14871149, 0.10424542, 0.08506266, 0.07959561, 0.07078926])

In [ ]:

Powered by TCPDF (www.tcpdf.org)

Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
No ratings yet
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
28 pages
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
100% (3)
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
49 pages
Data Analysis for Marketing Experts
100% (2)
Data Analysis for Marketing Experts
24 pages
Data Mining Clustering PDF
No ratings yet
Data Mining Clustering PDF
15 pages
SMDM - Project Report - Lakshmi
No ratings yet
SMDM - Project Report - Lakshmi
26 pages
Assignment Clustering
No ratings yet
Assignment Clustering
22 pages
Advance Statistics Business Report
No ratings yet
Advance Statistics Business Report
15 pages
Pranjal - Singh - 25.12.2022 - Data Mining Project
No ratings yet
Pranjal - Singh - 25.12.2022 - Data Mining Project
8 pages
Data Analysis for Python Users
100% (1)
Data Analysis for Python Users
14 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
Advanced Statistics Project Report
100% (1)
Advanced Statistics Project Report
34 pages
Dbms db03 2020 Assessment (Solved) : Find Study Resources
50% (2)
Dbms db03 2020 Assessment (Solved) : Find Study Resources
12 pages
Asphalt Shingles Data Analysis PDF
No ratings yet
Asphalt Shingles Data Analysis PDF
4 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages
Business Report SMDM Project - Coded
No ratings yet
Business Report SMDM Project - Coded
27 pages
Jupyter Notebook Project CART RF ANN
100% (1)
Jupyter Notebook Project CART RF ANN
41 pages
SMDM Assignment: Problem 1
0% (1)
SMDM Assignment: Problem 1
16 pages
Anshul Dyundi Predictive Modelling Alternate Project July 2022
No ratings yet
Anshul Dyundi Predictive Modelling Alternate Project July 2022
11 pages
Advance Statistics-Project Report
50% (2)
Advance Statistics-Project Report
17 pages
Anamit Deb Gupta Mra - Project Milestone - 1
100% (1)
Anamit Deb Gupta Mra - Project Milestone - 1
30 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
Factor-Hair RV PDF
No ratings yet
Factor-Hair RV PDF
23 pages
Detail Project Report SMDM
100% (1)
Detail Project Report SMDM
25 pages
Data Mining Project PCA Report
100% (1)
Data Mining Project PCA Report
27 pages
Rajiv Ranjan 11 Dec 2022
No ratings yet
Rajiv Ranjan 11 Dec 2022
18 pages
FRA Main Project Part B Guided
No ratings yet
FRA Main Project Part B Guided
23 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
AS Project Report
No ratings yet
AS Project Report
22 pages
SMDM Project
No ratings yet
SMDM Project
16 pages
Data Mining Assignment Guide
100% (1)
Data Mining Assignment Guide
21 pages
A Wholesale Distributor
100% (3)
A Wholesale Distributor
5 pages
Problem Statement1
No ratings yet
Problem Statement1
1 page
Shoe Sales Time Series Analysis
100% (3)
Shoe Sales Time Series Analysis
105 pages
Social Media Tourism: Model Analysis
No ratings yet
Social Media Tourism: Model Analysis
39 pages
Advanced Statistics Project - Jayant Chandra
No ratings yet
Advanced Statistics Project - Jayant Chandra
20 pages
Election Prediction Model Analysis
100% (2)
Election Prediction Model Analysis
46 pages
Data Mining Project Ashwani 3 PDF
100% (1)
Data Mining Project Ashwani 3 PDF
20 pages
Project - Advanced Statistics - Final-1
100% (3)
Project - Advanced Statistics - Final-1
15 pages
SMDM Extended Project Report
No ratings yet
SMDM Extended Project Report
9 pages
Predictive Modeling for Analysts
100% (1)
Predictive Modeling for Analysts
28 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
Advanced Statistics: Business Report Ranvijay Sharma
No ratings yet
Advanced Statistics: Business Report Ranvijay Sharma
16 pages
Problem Statement
0% (2)
Problem Statement
2 pages
Project Report
100% (3)
Project Report
36 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
Machine Learning Business Report - Compress (AutoRecovered)
100% (3)
Machine Learning Business Report - Compress (AutoRecovered)
69 pages
SMDM Project Instructions & Analysis
50% (2)
SMDM Project Instructions & Analysis
5 pages
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
No ratings yet
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
9 pages
Data Mining Quiz for Students
100% (1)
Data Mining Quiz for Students
5 pages
Bank Customer Segmentation Guide
No ratings yet
Bank Customer Segmentation Guide
32 pages
FRA Report
100% (1)
FRA Report
30 pages
SMDM Business-Report Arvind Soni-2
0% (1)
SMDM Business-Report Arvind Soni-2
15 pages
FRA Extended
No ratings yet
FRA Extended
22 pages
Pranjal - Singh - 27.11.2022 AS Project
No ratings yet
Pranjal - Singh - 27.11.2022 AS Project
9 pages
Time Series Rose Shehroz Arfeen
100% (1)
Time Series Rose Shehroz Arfeen
42 pages
Business Report Problem 2
No ratings yet
Business Report Problem 2
10 pages
Problem Statement2
0% (1)
Problem Statement2
2 pages
Week 5
No ratings yet
Week 5
4 pages
PCA for Data Science Beginners
No ratings yet
PCA for Data Science Beginners
52 pages
Problem 2 - Survey: Importing Nessceary Libraries
No ratings yet
Problem 2 - Survey: Importing Nessceary Libraries
10 pages
Which Year Has The Most Number of Records?: AS Quiz 2: Exploratory Data Analysis
100% (2)
Which Year Has The Most Number of Records?: AS Quiz 2: Exploratory Data Analysis
5 pages
PROJECT Advanced Statistics
No ratings yet
PROJECT Advanced Statistics
58 pages
Project: Advanced Statistics: Anova, Eda and Pca
No ratings yet
Project: Advanced Statistics: Anova, Eda and Pca
35 pages
Motion in One and Two Dimensions Quiz
No ratings yet
Motion in One and Two Dimensions Quiz
4 pages
Week 5 - Cases
No ratings yet
Week 5 - Cases
17 pages
Part 3 - Matrix Algebra - Matrix Operations
No ratings yet
Part 3 - Matrix Algebra - Matrix Operations
18 pages
IMU and UWB Indoor Positioning System
No ratings yet
IMU and UWB Indoor Positioning System
8 pages
Formulatingand Testing Hypothesis
No ratings yet
Formulatingand Testing Hypothesis
24 pages
Regression Results MGT 646-2
No ratings yet
Regression Results MGT 646-2
6 pages
NEGOCIOS INTERNACIONALES 13th Edition Charles Hill Instant Download
100% (1)
NEGOCIOS INTERNACIONALES 13th Edition Charles Hill Instant Download
75 pages
Labview Multicore Systems
No ratings yet
Labview Multicore Systems
86 pages
Prerequisite 2ND Grade
No ratings yet
Prerequisite 2ND Grade
6 pages
Java OOP Course: Concepts & Design
No ratings yet
Java OOP Course: Concepts & Design
3 pages
Deep Learning
100% (3)
Deep Learning
32 pages
Wind Load For Superstructure: F Pagc
No ratings yet
Wind Load For Superstructure: F Pagc
2 pages
Graphical Solution - Mohr'S Stress Circle
No ratings yet
Graphical Solution - Mohr'S Stress Circle
4 pages
Chapter 2 - Engineering Design
No ratings yet
Chapter 2 - Engineering Design
31 pages
2019 20TOG ElementaryGrade3
No ratings yet
2019 20TOG ElementaryGrade3
34 pages
Solid State Theory Notes
No ratings yet
Solid State Theory Notes
164 pages
Research Topics
No ratings yet
Research Topics
15 pages
A A Glossary Traffic Analysis Terms
No ratings yet
A A Glossary Traffic Analysis Terms
13 pages
18IS62 - Software Testing - Question Bank
No ratings yet
18IS62 - Software Testing - Question Bank
8 pages
Lesson 2. Measures of Central Tendency
No ratings yet
Lesson 2. Measures of Central Tendency
9 pages
Heliocentric Astrology
No ratings yet
Heliocentric Astrology
10 pages
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
No ratings yet
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
17 pages
Self-Supervised Learning-Based General Laboratory Progress Pretrained Model For Cardiovascular Event Detection
No ratings yet
Self-Supervised Learning-Based General Laboratory Progress Pretrained Model For Cardiovascular Event Detection
13 pages
TC Syllabus
No ratings yet
TC Syllabus
3 pages
FirstRoundReport08 PDF
No ratings yet
FirstRoundReport08 PDF
9 pages
YB49 Analysis
No ratings yet
YB49 Analysis
283 pages
CAT Prep Schedule & Sessions
No ratings yet
CAT Prep Schedule & Sessions
10 pages
Week 1-3 Lesson Exemplar in Practical Research 2
No ratings yet
Week 1-3 Lesson Exemplar in Practical Research 2
36 pages
Basics of Vibration Isolation
No ratings yet
Basics of Vibration Isolation
8 pages
TC HKMO 0405 H GP
No ratings yet
TC HKMO 0405 H GP
3 pages

As Quiz 3 PCA Solution PDF

Uploaded by

As Quiz 3 PCA Solution PDF

Uploaded by

In [1]: import numpy as np

In [5]: df = pd.read_csv("FB-1 (1).csv")

0 246675545449582_1649696485147474 529 512 262 432 92 3 1 1

1 246675545449582_1649426988507757 150 0 0 150 0 0 0 0

2 246675545449582_1648730588577397 227 236 57 204 21 1 1 0

3 246675545449582_1648576705259452 111 0 0 111 0 0 0 0

4 246675545449582_1645700502213739 213 0 0 204 9 0 0 0

0 529 512 262 432 92 3 1 1 0 0

2 227 236 57 204 21 1 1 0 0 0

In [10]: #Extract eigen vectors

Out[10]: array([[ 0.29363054, 0.34749787, 0.44325444, 0.2517696 , 0.46125508,

Q8) What is the eigenvector associated with the Second variable?

Q9) Using the scaled dataset, Find out eigenvalues?

Out[11]: array([3.596288 , 1.78479109, 1.2511225 , 1.02089676, 0.95528279,

Q10) Using the given dataset, What are explained variances

Out[12]: array([0.29964816, 0.14871149, 0.10424542, 0.08506266, 0.07959561,

Powered by TCPDF (www.tcpdf.org)

You might also like