SRI RAMAKRISHNA MISSION VIDYALAYA
COLLEGE OF ARTS AND SCIENCE
[AUTONOMOUS]
Accredited by NAAC with A Grade
Coimbatore – 641 020
June – 2022
DEPARTMENT OF COMPUTER APPLICATIONS
NAME :
REG. NO :
SEMESTER : IV
SUBJECT : NME: Data Science and Big data Analytics
SUBJECT CODE : 20UCA4NM2
SRI RAMAKRISHNA MISSION VIDYALAYA
COLLEGE OF ARTS AND SCIENCE
[AUTONOMOUS]
Accredited by NAAC with A Grade
Coimbatore – 641 020
DEPARTMENT OF COMPUTER APPLICATIONS
BONAFIDE CERTIFICATE
REGISTER NO:______________
This is to certify that it is a bonafide record work done by
_______________________in “NME: Data Science and Big data Analytics
Lab”(20UCA4NM2) for the IV Semester during the Academic year 2021 - 2022.
Submitted for the Semester Practical Examinations held on ______________.
Head of the Department Staff in Charge
Internal Examiner External Examiner
INDEX
S.NO PARTICULARS PAGE.NO SIGN
1 Adding two matrixes using array
2 Simple Calculator
3 Multiplication Table
4 Find Sum, Mean and Product of Vector
5 To Convert Decimal into Binary using Recursive
6 K-Means clustering technique.
7 Hierarchal Clustering
8 Linear Regression
9 To Visualize the data using histogram
10 To Visualize the data using Box plot
11 To Visualize the Scatter plot.
1. Adding two matrixes using array
Aim
To Perform a R Program on adding two matrixes using array
Algorithm
Step 1: Start the Process
Step 2: Initialize two matrixes, and assign values in two different variables.
Step 3: print the addition of two matrixes as a result.
Step 4: As in the same way of addition, we can subtract, multiply, and
division process can be carried out.
Step 5: print the result of subtracted, multiplied and divided values of two
matrix.
Step 6: Stop the process
R Program
1. # Create two 2x3 matrixes.
2. m1 = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2)
3. print("Matrix-1:")
4. print(m1)
5. m2 = matrix(c(0, 1, 2, 3, 0, 2), nrow = 2)
6. print("Matrix-2:")
7. print(m2)
8. result = m1 + m2
9. print("Result of addition")
10. print(result)
11. result = m1 - m2
12. print("Result of subtraction")
13. print(result)
14. result = m1 * m2
15. print("Result of multiplication")
16. print(result)
17. result = m1 / m2
18. print("Result of division:")
19. print(result)
Output Screen
[1] "Matrix-1:"
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[1] "Matrix-2:"
[,1] [,2] [,3]
[1,] 0 2 0
[2,] 1 3 2
[1] "Result of addition"
[,1] [,2] [,3]
[1,] 1 5 5
[2,] 3 7 8
[1] "Result of subtraction"
[,1] [,2] [,3]
[1,] 1 1 5
[2,] 1 1 4
[1] "Result of multiplication"
[,1] [,2] [,3]
[1,] 0 6 0
[2,] 2 12 12
[1] "Result of division:"
[,1] [,2] [,3]
[1,] Inf 1.500000 Inf
[2,] 2 1.333333 3
Result
Thus, the program was successfully verified and Executed
2. Simple Calculator
Aim
To Perform a R Program on arithmetic operations as a simple
calculator.
Algorithm
Step1: Start the Process
Step2: Initialize add, subtract, multiply and division function to
perform the addition, subtraction, multiplication and division operations
Step3: Getting a input of two numbers for the arithmetic operation as a
integer value.
Step4: Execute the program to perform the arithmetic operation and we
can get the output.
Step5: Stop the process
R Program
1. add <- function(x, y) {
2. return(x + y)
3. }
4. subtract <- function(x, y) {
5. return(x - y)
6. }
7. multiply <- function(x, y) {
8. return(x * y)
9. }
10. divide <- function(x, y) {
11. return(x / y)
12. }
13. # take input from the user
14. print("Select operation.")
15. print("1.Add")
16. print("2.Subtract")
17. print("3.Multiply")
18. print("4.Divide")
19. choice = as.integer(readline(prompt="Enter choice[1/2/3/4]:
"))
20. num1 = as.integer(readline(prompt="Enter first number: "))
21. num2 = as.integer(readline(prompt="Enter second number: "))
22. operator <- switch(choice,"+","-","*","/")
23. result <- switch(choice, add(num1, num2), subtract(num1,
num2), multiply(num1, num2), divide(num1, num2))
24. print(paste(num1, operator, num2, "=", result))
Output
[1] "Select operation."
[1] "1.Add"
[1] "2.Subtract"
[1] "3.Multiply"
[1] "4.Divide"
Enter choice[1/2/3/4]: 4
Enter first number: 20
Enter second number: 4
[1] "20 / 4 = 5"
Result
Thus, the program was successfully verified and Executed
3. Multiplication Table
Aim
To Perform a R Program on print the Multiplication table from the
given number.
Algorithm
Step1: Start the Process
Step2: Initialize the variable, for print the multiplication table.
Step3: To print the multiplication table, we can use the for-loop statement.
Step4: Using with the for-loop statement increment the variable value.
Step5: Print the given value and also print the given number multiplication
table.
Step6. Stop the Process.
R Program
# R Program to find the multiplicationtable (from 1 to 10)
# take input from the user
num = as.integer(readline(prompt = "Enter a number: "))
# use for loop to iterate 10 times
for(i in 1:10) {
print(paste(num,'x', i, '=', num*i))
}
Output:
Enter a number: 7
[1] "7 x 1 = 7"
[1] "7 x 2 = 14"
[1] "7 x 3 = 21"
[1] "7 x 4 = 28"
[1] "7 x 5 = 35"
[1] "7 x 6 = 42"
[1] "7 x 7 = 49"
[1] "7 x 8 = 56"
[1] "7 x 9 = 63"
[1] "7 x 10 = 70"
Result
Thus, the program was successfully verified and Executed
4. Find Sum, Mean and Product of Vector.
Aim
To Perform a R Program on Find Sum, Mean and Product of Vector.
Algorithm
Step1: Start the Process
Step2: Initialize the variable x, which holds the set of values.
Step3: From the set of values, we find the sum, mean and product of
vector using sum(), mean(), and prod() functions.
Step4: print the output of sum, mean and product value.
Step5: Stop the process
R Program
x = c(10, 20, 30)
print("Sum:")
print(sum(x))
print("Mean:")
print(mean(x))
print("Product:")
print(prod(x))
Output:
[1] "Sum:"
[1] 60
[1] "Mean:"
[1] 20
[1] "Product:"
[1] 6000
Result
Thus, the program was successfully verified and Executed
5. To Convert Decimal into Binary using Recursive
Aim
To Perform a R Program on Decimal into Binary using Recursive
Algorithm
STEP 1: Call function convert_to_binary()
STEP 2: Pass the decimal number which needs to be converted to binary as
decnum to the function.
STEP 3: inside the function check if the given number is greater than 1, if yes
call function convert_to_binary() again with decnum/2 as the argument
STEP 4: divide the number decnum successively by 2 and print the
remainder
R Program
# Program to convert decimal number into binary number using
recursive function
convert_to_binary <- function(n) {
if(n > 1) {
convert_to_binary(as.integer(n/2))
cat(n %% 2)
OUTPUT
> convert_to_binary(52)
110100
Result
Thus, the program was successfully verified and Executed
6. K-Means clustering technique.
Aim
To Perform a R Program on K-means clustering technique.
Algorithm
Step 1: Installing the relevant packages and calling their libraries
Step 2: Loading and making sense of the dataset.
Step 3: Eliminating the target variable
Step 4: The elbow point technique
Step 5: Implementing K-means
Step 6: Plotting our data-points in clusters
Step 7: Kmeans with K = 3
Step 8: Plotting the new clustered graph
R Program
# Installing Packages
install.packages("ClusterR")
install.packages("cluster")
# Loading package
library(ClusterR)
library(cluster)
# Removing initial label of
# Species from original dataset
iris_1 <- iris[, -5]
# Fitting K-Means clustering Model
# to training dataset
set.seed(240) # Setting seed
kmeans.re <- kmeans(iris_1, centers = 3, nstart = 20)
kmeans.re
# Cluster identification for
# each observation
kmeans.re$cluster
# Confusion Matrix
cm <- table(iris$Species, kmeans.re$cluster)
cm
# Model Evaluation and visualization
plot(iris_1[c("Sepal.Length", "Sepal.Width")])
plot(iris_1[c("Sepal.Length", "Sepal.Width")],
col = kmeans.re$cluster)
plot(iris_1[c("Sepal.Length", "Sepal.Width")],
col = kmeans.re$cluster,
main = "K-means with 3 clusters")
## Plotiing cluster centers
kmeans.re$centers
kmeans.re$centers[, c("Sepal.Length", "Sepal.Width")]
# cex is font size, pch is symbol
points(kmeans.re$centers[, c("Sepal.Length", "Sepal.Width")],
col = 1:3, pch = 8, cex = 3)
## Visualizing clusters
y_kmeans <- kmeans.re$cluster
clusplot(iris_1[, c("Sepal.Length", "Sepal.Width")],
y_kmeans,
lines = 0,
shade = TRUE,
color = TRUE,
labels = 2,
plotchar = FALSE,
span = TRUE,
main = paste("Cluster iris"),
xlab = 'Sepal.Length',
ylab = 'Sepal.Width')
Output:
Model kmeans_re:
The 3 clusters are made which are of 50, 62, and 38 sizes
respectively. Within the cluster, the sum of squares is 88.4%.
Cluster identification:
Confusion Matrix:
K-means with 3 clusters plot:
Plotting cluster centers:
Plot of clusters:
Result
Thus, the program was successfully verified and Executed
7. Hierarchal Clustering
Aim
To Perform a R Program on Hierarchal Clustering
Algorithm
Step 1: Installing the relevant packages and calling their libraries
Step 2: Make each data point in a single point cluster that forms N clusters.
Step 3: Take the two closest data points and make them one cluster that
forms N-1 clusters.
Step 4: Take the two closest clusters and make them one cluster that
forms N-2 clusters.
`
Step 5: Repeat steps 3 until there is only one cluster.
R Program
install.packages ( "cluster" ) # for clustering
algorithms
install.packages ( "tidyverse" ) # for data manipulation
install.packages ( "factoextra" ) # for clustering
visualization
# includes package in R as –
library ( "cluster" )
library( "tidyverse" )
library( "factoextra" )
data <- iris
print(data)
# the sample of data set showing below which contain 1
sample for each class
data <- na.omit(data) # remove missing value
data <- scale(df) # scaling the variables or features
cluster <- hclust(data, method = "average" )
# matrix of Dissimilarity
dis_mat <- dist(data, method = "euclidean")
# Hierarchical clustering using Complete Linkage
cluster <- hclust(data, method = "complete" )
# or Compute with agnes
cluster <- agnes(data, method = "complete")
cutree(as.hclust(cluster), k = 3)
ibrary(scatterplot3d)
data <- iris
print( data )
data <- na.omit(data)
data <- scale(data)
# matrix of Dissimilarity
dis_mat <- dist(data, method = "euclidean")
# creating hierarchical clustering with Complete Linkage
cluster <- hclust(data, method = "complete" )
# Dendrogram plot
plot(cluster)
# or agnes can be used to compute hierarchical clustering
Cluster2 <- agnes(data, method = "complete")
# Dendrogram plot
plot(cluster2)
Output :
Result
Thus, the program was successfully verified and Executed
8.Linear Regression.
Aim
To Perform a R Program on Linear Regression
Algorithm
Step1: Start the Process
Step2: Initialize
lm([target variable] ~ [predictor variables], data = [data source])
library(readxl)
ageandheight <- read_excel("ageandheight.xls", sheet = "Hoja2")
#Upload the data
lmHeight = lm(height~age, data = ageandheight) #Create the linear
regression
summary(lmHeight) #Review the results
lmHeight2 = lm(height~age + no_siblings, data = ageandheight) #Create
a linear regression with two variables
summary(lmHeight2) #Review the results
OUTPUT
Call:
lm(formula = Pressure ~ Temperature + I(Temperature^2), data =
pressure)
Residuals:
Min 1Q Median 3Q Max
-4.6045 -1.6330 0.5545 1.1795 4.8273
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.750000 3.615591 9.335 3.36e-05 ***
Temperature -1.731591 0.151002 -11.467 8.62e-06 ***
I(Temperature^2) 0.052386 0.001338 39.158 1.84e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.074 on 7 degrees of freedom
Multiple R-squared: 0.9996, Adjusted R-squared: 0.9994
F-statistic: 7859 on 2 and 7 DF, p-value: 1.861e-12
plot(lmTemp2$residuals, pch = 16, col = "red")
Result
Thus, the program was successfully verified and Executed
9. To Visualize the data using histogram.
Aim
To Perform a R Program on adding two matrixes using array
Algorithm
Step1: Start the Process
Step2: Initialize
> str(airquality)
'data.frame': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
Temperature <- airquality$Temp
hist(Temperature)
# histogram with added parameters
hist(Temperature,
main="Maximum daily temperature at La Guardia Airport",
xlab="Temperature in degrees Fahrenheit",
xlim=c(50,100),
col="darkmagenta",
freq=FALSE
)
OUTPUT
Result
Thus, the program was successfully verified and Executed
10.
To Visualize the data using Box plot.
Aim
To Perform a R Program on adding two matrixes using array
Algorithm
Step1: Start the Process
Step2: Initialize
R Program
> str(airquality)
'data.frame': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
boxplot(airquality$Ozone)
boxplot(airquality$Ozone,
main = "Mean ozone in parts per billion at Roosevelt Island",
xlab = "Parts Per Billion",
ylab = "Ozone",
col = "orange",
border = "brown",
horizontal = TRUE,
notch = TRUE
)
OUTPUT
Result
Thus, the program was successfully verified and Executed
11.
To write a R program to Visualize the Scatter plot.
Aim
To Perform a R Program on adding two matrixes using array
Algorithm
Step1: Start the Process
Step2: Initialize
R Program
# Installing install.packages("readr") # Loading library("readr")
# General function
read_delim(file, delim, col_names = TRUE)
# Read comma (",") separated values
read_csv(file, col_names = TRUE)
# Read semicolon (";") separated values
# (this is common in European countries)
read_csv2(file, col_names = TRUE)
# Read tab separated values
read_tsv(file, col_names = TRUE)
# Read a txt file, named "mtcars.txt"
my_data <- read_tsv("mtcars.txt")
# Read a csv file, named "mtcars.csv"
my_data <- read_csv("mtcars.csv")
# Read a txt file
my_data <- read_tsv(file.choose())
# Read a csv file
my_data <- read_csv(file.choose())
my_data <- read_delim(file.choose(), sep = "|")
x <- mtcars$wt
y <- mtcars$mpg
# Plot with main and axis titles
# Change point shape (pch = 19) and remove frame.
plot(x, y, main = "Main title",
xlab = "X axis title", ylab = "Y axis title",
pch = 19, frame = FALSE)
# Add regression line
plot(x, y, main = "Main title",
xlab = "X axis title", ylab = "Y axis title",
pch = 19, frame = FALSE)
abline(lm(y ~ x, data = mtcars), col = "blue")
# Add loess fit plot(x, y, main = "Main title", xlab =
"X axis title", ylab = "Y axis title", pch = 19, frame
= FALSE) lines(lowess(x, y), col = "blue")
OUTPUT
Result
Thus, the program was successfully verified and Executed