0% found this document useful (0 votes)

294 views5 pages

Foundations of Data Analytics

1. The document contains exercises from a CS course involving loading and analyzing datasets in R. It loads an automobiles dataset, analyzes missing values, and calculates summary statistics like median price and mean price of four-door cars. It also loads an abalone dataset, plots relationships between variables, identifies outliers, and calculates Pearson correlations. 2. Key results include: the number of cars starting with M is 39; there are 36 combinations with missing values in the automobiles data; the median price of four-door cars is $11,245 and mean is $13,565.67. For the abalone data, outliers are identified and correlations above 0.95 are listed. 3. Empirical CDF

Uploaded by

akhileshpandey023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

294 views5 pages

Foundations of Data Analytics

Uploaded by

akhileshpandey023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CS 910 Exercise Sheet 2: Trying out tools

Question 1
## Loading the .csv file to R into dataframe Auto
Auto <- read.csv("C:\\Users\\Akhilesh Pandey\\Desktop\\Automobiles.csv", header = F,
sep = ",")
## Taking the first alphabet of the column make and storing it as a dataframe
count <- as.data.frame(table(substr(Auto$V3, start = 1, stop = 1)))
## Taking count of rows with first alphabet as M or m
subset(count, Var1 == "m" | Var1 == "M")
##
Var1 Freq
## 8
m
39
The count of number if Cars with name starting with M are 39
Question 2 (a)
## Storing the required columns in a separate data frame count
count <- as.data.frame(table(Auto$V4, Auto$V5, Auto$V6, Auto$V7, Auto$V8, Auto$V9))
## removing the combinatons with zero occrence
count <- subset(count, as.numeric(count$Freq) > 0)
## taking count of rows
nrow(count)
## [1] 36
The total number of unique combinations for which there are one or more missing values in one of the vectors
is 36
(b)
## Storing the required columns in a separate data frame count
count <- as.data.frame(table(Auto$V4, Auto$V5, Auto$V6, Auto$V7, Auto$V8, Auto$V9))
## removing the combinatons with zero occrence
count <- subset(count, as.numeric(count$Freq) > 0)
## saving the list of cols and removing the rows with ? in any field
collist <- c("Var1", "Var2", "Var3", "Var4", "Var5", "Var6")
sel <- apply(count[, collist], 1, function(row) !"?" %in% row)
count <- count[sel, ]
## taking count of rows
nrow(count)
## [1] 34
The total number of unique combinations for which there are one or more missing values in one of the vectors
is 34

Question 3
##Selecting cars with four doors
q3.Auto <- subset(Auto, Auto$V6 == "four")
##converting the column cost into numeric
q3.Auto$V26 <- as.numeric(as.character(q3.Auto$V26))
##Removing the NA values and displaying median
median(q3.Auto$V26, na.rm= TRUE)

## [1] 11245
The median of price of four door cars is 11245
##Removing the NA values and displaying mean
mean(q3.Auto$V26, na.rm= TRUE)

## [1] 13565.67
The mean of price of four door cars is 13565.67
Question 4
## Loading the .csv file to R into dataframe Abal
Abal <- read.csv("C:\\Users\\Akhilesh Pandey\\Desktop\\Abalone.csv", header = T,
sep = ",")
## Plotting the graph of height and length columns
plot(Abal$Height, Abal$Length, main = "Scatterplot showing Height and Length of Abalone",
xlab = "Height", ylab = "Length", pch = 1, ylim = c(0, 1.2))
abline(lm(Abal$Length ~ Abal$Height), col = "red")
lines(lowess(Abal$Height, Abal$Length), col = "blue")

0.6
0.0

0.2

0.4

Length

0.8

1.0

1.2

Scatterplot showing Height and Length of Abalone

0.0

0.2

0.4

0.6

0.8

1.0

Height

##Equation of the scatterplot

lm(formula = Abal$Length ~ Abal$Height)->equation
equation

##
##
##
##
##
##
##

Call:
lm(formula = Abal$Length ~ Abal$Height)
Coefficients:
(Intercept) Abal$Height
0.1925
2.3761

Outliers are the values in a dataset which are not similar or along the lines of most of the dataset and hence
tend to standout. These are usually present because of many reasons, e.g. , data being entered incorrectly,
missing values, etc. In our plot the outliers are the points (0, 0.43) and (0, 0.315) being present as the Height
has been entered as 0 for these plots. Also, points (0.515, 0.705) and (1.13, 0.455) are outliers as these values
are very far from regression line, and hence, are outliers.

Question 5
##Taking numeric columns
nAbal <- Abal[sapply(Abal, is.numeric)]
##making combinations of 2 columns
combn(colnames(nAbal),2)-> combo
##calculate PPCC
apply(combo, 2, function(x) cor(nAbal[,x[1]], nAbal[,x[2]])) -> PPCCnAbal
##Storing result as data frame
as.data.frame(PPCCnAbal)-> PPCCnAbal
##taking transpose to convert column into rows
t(combo) -> combo
##binding with result
cbind(combo, PPCCnAbal)-> soln
##filtering as per condition
subset(soln,as.numeric(as.character(soln$ PPCCnAbal))>0.95)
##
##
##
##
##

1
2 PPCCnAbal
1
Length
Diameter 0.9868116
19 Whole.weight Shucked.weight 0.9694055
20 Whole.weight Viscera.weight 0.9663751
21 Whole.weight
Shell.weight 0.9553554

The combinations for which Pearson product coefficient is more than 0.95 are (Length,Diameter),
(Whole.weight,Shucked.weight), (Whole.weight,Viscera.weight) and (Whole.weight,Shell.weight)

Question 6
## taking rows with sex as Males
Abal_m <- subset(Abal, as.character(Abal$Sex) == "M")
## calculating the ecdf subset
ecdf.male.rings <- ecdf(Abal_m$Rings)
## taking rows with sex as Females
Abal_f <- subset(Abal, as.character(Abal$Sex) == "F")
## calculating the ecdf subset
ecdf.female.rings <- ecdf(Abal_f$Rings)
## taking rows with sex as Infants
Abal_i <- subset(Abal, as.character(Abal$Sex) == "I")
## calculating the ecdf subset
ecdf.infant.rings <- ecdf(Abal_m$Rings)
## Plotting the ECDF for males
plot(ecdf.male.rings, main = "Emperical CDF of various Sexes", ylab = "Quantiles of diff Sexes",
xlab = "Number of Rings", pch = 19, col = "blue")
## Adding female ECDF
lines(ecdf.female.rings, pch = 20, col = "red")
## Adding infant ECDF
lines(ecdf.infant.rings, pch = 20, col = "green")

0.8
0.6
0.4
0.2
0.0

Quantiles of diff Sexes

1.0

Emperical CDF of various Sexes

15
Number of Rings

R Programing Bhagu
No ratings yet
R Programing Bhagu
40 pages
Ma 3
No ratings yet
Ma 3
32 pages
Applied Statistics MAT1011
No ratings yet
Applied Statistics MAT1011
22 pages
R Programming Lab Assignments
No ratings yet
R Programming Lab Assignments
40 pages
Bigdata Programs&Solutions
No ratings yet
Bigdata Programs&Solutions
7 pages
(Practical) Programming With R
No ratings yet
(Practical) Programming With R
5 pages
Exploratory Data Analysis and Visualization
No ratings yet
Exploratory Data Analysis and Visualization
10 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Day 2
No ratings yet
Day 2
5 pages
R Examples
No ratings yet
R Examples
56 pages
Experiment 5
No ratings yet
Experiment 5
13 pages
Maths Record Output .
No ratings yet
Maths Record Output .
24 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
DMDWLab Book Answers
100% (2)
DMDWLab Book Answers
44 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
Lab Manual Record: St. Josephs PG College
No ratings yet
Lab Manual Record: St. Josephs PG College
14 pages
R Assignment 10
No ratings yet
R Assignment 10
12 pages
R-Programming Lab Mannual
No ratings yet
R-Programming Lab Mannual
33 pages
Stastistics and Probability With R Programming Language: Lab Report
50% (2)
Stastistics and Probability With R Programming Language: Lab Report
44 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
Solutions Manual Using R Introductory ST
No ratings yet
Solutions Manual Using R Introductory ST
33 pages
R Practicals
No ratings yet
R Practicals
32 pages
DA Lab Manual
No ratings yet
DA Lab Manual
42 pages
IntroR 2
No ratings yet
IntroR 2
18 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
Assignment 2 PDF
No ratings yet
Assignment 2 PDF
25 pages
Statistic and R Programming Lab Exercise
No ratings yet
Statistic and R Programming Lab Exercise
8 pages
R Programming Interview Questions-1
No ratings yet
R Programming Interview Questions-1
20 pages
Shahun Term Workr1
No ratings yet
Shahun Term Workr1
34 pages
How To Do Reliability Analysis and Basic Factor Analysis in R
No ratings yet
How To Do Reliability Analysis and Basic Factor Analysis in R
4 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
Rust
No ratings yet
Rust
24 pages
Data Cleansing
No ratings yet
Data Cleansing
18 pages
(R) Internal-2 Q & A
No ratings yet
(R) Internal-2 Q & A
65 pages
Practical 1 EDA
No ratings yet
Practical 1 EDA
14 pages
R Lecture 2-1
No ratings yet
R Lecture 2-1
28 pages
Midterm Session II #0000000224 - On March 25, 2016 14 13: Processing
No ratings yet
Midterm Session II #0000000224 - On March 25, 2016 14 13: Processing
11 pages
Arunav Da Prac
No ratings yet
Arunav Da Prac
55 pages
ProbList2 24 SLN
No ratings yet
ProbList2 24 SLN
20 pages
Machine Learning Transport Analysis
100% (4)
Machine Learning Transport Analysis
42 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
R Assignment
No ratings yet
R Assignment
9 pages
Ali
No ratings yet
Ali
31 pages
Lec 13
No ratings yet
Lec 13
46 pages
Lab Manual New
No ratings yet
Lab Manual New
12 pages
R Record-1
No ratings yet
R Record-1
57 pages
Data Science
No ratings yet
Data Science
20 pages
UL2
No ratings yet
UL2
2 pages
Assignment 1
No ratings yet
Assignment 1
8 pages
Module 2.9
No ratings yet
Module 2.9
12 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
CH 3
No ratings yet
CH 3
33 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
R Code
No ratings yet
R Code
9 pages
R Commands
No ratings yet
R Commands
18 pages
Data Analyst
No ratings yet
Data Analyst
1 page
Do The Same For All The Questions
No ratings yet
Do The Same For All The Questions
6 pages
Functions and Packages
No ratings yet
Functions and Packages
7 pages
Value Stream Mapping
No ratings yet
Value Stream Mapping
15 pages
Failure Analysis for Engineers
No ratings yet
Failure Analysis for Engineers
2 pages
Indigenous Knowledge
No ratings yet
Indigenous Knowledge
6 pages
Tec Cirurgica Minimax99 13mm 12 Eng-1
No ratings yet
Tec Cirurgica Minimax99 13mm 12 Eng-1
16 pages
Asterisk 1.8.28 Reference
No ratings yet
Asterisk 1.8.28 Reference
533 pages
Datasheet Sensores Magneticos
No ratings yet
Datasheet Sensores Magneticos
2 pages
ISO 55001:2024 Key Updates
No ratings yet
ISO 55001:2024 Key Updates
2 pages
AGA Report No. 9 - Measurement of Gas by Multipath Ultrasonic Meters 2nd Ed. April 2007 XQ0701
100% (1)
AGA Report No. 9 - Measurement of Gas by Multipath Ultrasonic Meters 2nd Ed. April 2007 XQ0701
113 pages
DGCA Question Paper
No ratings yet
DGCA Question Paper
13 pages
CAT-B Phase-2-Provisional-list
No ratings yet
CAT-B Phase-2-Provisional-list
6 pages
Social Network Data Mining Guide
No ratings yet
Social Network Data Mining Guide
28 pages
Surge Drum Cata For Ammonia
100% (1)
Surge Drum Cata For Ammonia
2 pages
Deeksha Reddy - Bench Sales Recruiter
No ratings yet
Deeksha Reddy - Bench Sales Recruiter
3 pages
Unit-1: Introduction: Question Bank
No ratings yet
Unit-1: Introduction: Question Bank
12 pages
Live Backlinks
100% (2)
Live Backlinks
30 pages
Acer Aspire 3680 (Quanta ZR1) Schematics
No ratings yet
Acer Aspire 3680 (Quanta ZR1) Schematics
30 pages
Trachtenberg Speed Math Guide
No ratings yet
Trachtenberg Speed Math Guide
6 pages
Seaeye Sabertooth Rev 3
No ratings yet
Seaeye Sabertooth Rev 3
6 pages
04 Xtreme 160r Fi Reckoner
0% (1)
04 Xtreme 160r Fi Reckoner
4 pages
Airgas Booklet
No ratings yet
Airgas Booklet
6 pages
Robert Monroe
No ratings yet
Robert Monroe
6 pages
ATA 72 Eng Birdstrike Insp
100% (1)
ATA 72 Eng Birdstrike Insp
17 pages
PLDT KaAsenso Subscription Certificate
100% (2)
PLDT KaAsenso Subscription Certificate
1 page
Cutting List Ramp Back Portion Retaining Wall & Slab at Te & TC - A - 08feb2023
No ratings yet
Cutting List Ramp Back Portion Retaining Wall & Slab at Te & TC - A - 08feb2023
8 pages
Almeida Statement On H.R. 3261, The "Stop Online Piracy Act" 2011-11-16
No ratings yet
Almeida Statement On H.R. 3261, The "Stop Online Piracy Act" 2011-11-16
13 pages
Cataloge T501T502T503T504T505
No ratings yet
Cataloge T501T502T503T504T505
6 pages
Narrative Report
No ratings yet
Narrative Report
10 pages
Flex I/O Setup Guide for Engineers
No ratings yet
Flex I/O Setup Guide for Engineers
16 pages
0 Extras
No ratings yet
0 Extras
21 pages

Foundations of Data Analytics

Uploaded by

Foundations of Data Analytics

Uploaded by

CS 910 Exercise Sheet 2: Trying out tools

Scatterplot showing Height and Length of Abalone

##Equation of the scatterplot

Quantiles of diff Sexes

Emperical CDF of various Sexes

You might also like