R
R
*We can convert from one type to another with the following functions:
as.numeric()
as.integer()
as.complex()
*min() and max() are built-in mathematical functions in R.
*sqrt() function : is the square root function.
*abs() is the absolute value function.
*The ceiling() function rounds a number upwards to its nearest integer.
*The floor() function rounds a number downwards to its nearest integer,
and returns the result.
*cat() function is used to line break a paragraph.
*The nchar() function used to provide the length of the character.
*The grepl() function is used to check if a character is present inside
another string.
*Operators: + addition
- substraction
* multiplication
/ division
^ exponent
%% modulus (remainder from division)
%/% integer division
*R comparison Operators:
== Equal
!= Not equal
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
* & Element-wise Logical AND operator. It returns TRUE if both
elements are TRUE.
&& Logical AND operator - Returns TRUE if both statements are
TRUE.
| Elementwise- Logical OR operator. It returns TRUE if one of the
statement is TRUE
|| Logical OR operator. It returns TRUE if one of the statement is
TRUE.
! Logical NOT - returns FALSE if statement is TRUE.
*R miscellaneous operators:
: Creates a series of numbers in a sequence x <- 1:10
%in% Find out if an element belongs to a vector x %in% y
%*% Matrix Multiplication x <- Matrix1 %*% Matrix2
*The & symbol (and) is a logical operator, and is used to combine
conditional statements.
*The | symbol (or) is a logical operator, and is used to combine
conditional statements.
*R has two loop commands: while loops / for loops.
*To create a function, use the function() keyword.
*To create a function, use the function() keyword.
*There are two ways to create a nested function:
Call a function within another function.
Write a function within a function.
*R also accepts function recursion, which means a defined function can
call itself.
*Global variables can be used by everyone, both inside of functions and
outside.
*A vector is simply a list of items that are of the same type. To combine
the list of items to a vector, use the c() function and separate the items
by a comma.
*To create a vector with numerical values in a sequence, use
the : operator.
*To find out how many items a vector has, use the length() function.
*You can access the vector items by referring to its index number inside
brackets []. The first item has index 1, the second item has index 2, and
so on.
* To repeat vectors, use the rep() function :
repeat_each <- rep(c(1,2,3), each = 3)
repeat_each
repeat_indepent <- rep(c(1,2,3), times = c(5,2,1))
repeat_indepent
* To make bigger or smaller steps in a sequence, use the seq() function :
numbers <- seq(from = 0, to = 100, by = 20)
numbers
* To create a list, use the list() function.
* To find out if a specified item is present in a list, use the %in
% operator.
* To add an item to the end of the list, use the append() function.
*Creation of a matrix using the matrix function :
thismatrix <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)
*You can access the items by using [ ] brackets. The first number "1" in
the bracket specifies the row-position, while the second number "2"
specifies the column-position.
*The whole row can be accessed if you specify a comma after the
number in the bracket : thismatrix[2,].
*The whole column can be accessed if you specify a comma before the
number in the bracket : thismatrix[,2].
*More than one row can be accessed if you use the c() function :
thismatrix[c(1,2),].
*More than one column can be accessed if you use the c() function :
thismatrix[, c(1,2)].
*Use the cbind() function to add additional columns in a Matrix.
*Use the rbind() function to add additional rows in a Matrix.
*Use the dim() function to find the number of rows and columns in a
Matrix.
*Use the array() function to create an array.
*Data Frames are data displayed in a format as a table.
*Data Frames can have different types of data inside it. While the first
column can be character, the second and third can be numeric or logical.
However, each column should have the same type of data.
*Use the data.frame() function to create a data frame.
*Exp: Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
*Use the summary() function to summarize the data from a Data Frame.
For the previous example we’ll get the following output :
Training Pulse Duration
Other :1 Min. :100.0 Min. :30.0
Stamina :1 1st Qu.:110.0 1st Qu.:37.5
Strength:1 Median :120.0 Median :45.0
Mean :123.3 Mean :45.0
3rd Qu.:135.0 3rd Qu.:52.5
Max. :150.0 Max. :60.0
*We can use single brackets [ ], double brackets [[ ]] or $ to access
columns from a data frame:
Data_Frame[1]
Data_Frame[["Training"]]
Data_Frame$Training
*Factors are used to categorize data. Examples of factors are:
Demography: Male/Female
Music: Rock, Pop, Classic, Jazz
Training: Strength, Stamina
*To create a factor, use the factor() function and add a vector as
argument.
*The plot() function is used to draw points (markers) in a diagram.
The function takes parameters for specifying points in the diagram.
Parameter 1 specifies points on the x-axis.
Parameter 2 specifies points on the y-axis.
*To draw more points we use vectors :
plot(x, y)
*If you want to draw dots in a sequence, on both the x-axis and the y-
axis, use the : operator :
plot(1:10)
*Use col="color" to add a color to the points :
plot(1:10, col="red")
*Use pch with a value from 0 to 25 to change the point shape format :
plot(1:10, pch=25)
*To create a line, use the plot() function and add the type parameter
with a value of "l" :
plot(1:10, type="l")
*To change the width of the line, use the lwd parameter (1 is default,
while 0.5 means 50% smaller, and 2 means 100% larger) :
# y-axis values
y <- c(2, 4, 6, 8)
barplot(y, names.arg = x)
The x variable represents values in the x-axis (A,B,C,D)
The y variable represents values in the y-axis (2,4,6,8)
Then we use the barplot() function to create a bar chart of the
values
names.arg defines the names of each observation in the x-axis
*If you want the bars to be displayed horizontally instead of vertically,
use horiz=TRUE
barplot(y, names.arg = x, horiz = TRUE)
*To sort the values, use the sort() function.
*we can use the summary() function to get a statistical summary of the
data.
*The summary() function returns six statistical numbers for each
variable:
Min
First quantile (percentile)
Median
Mean
Third quantile (percentile)
Max
*The min() and max() functions can be used to find the lowest or
highest value in a set.
*We can use the which.max() and which.min() functions to find the
index position of the max and min value in the table.
*The mean() function in R is used to find the mean in a dataset.
*the median() function is used to find the middle value in a dataset.
*In R we don’t have a function to find the mode but we can use the
following code :
Data_x <- y
names(sort(-table(Data_x$y)))[1]
*R is not sensitive to space.
*R is case sensitive.
*T is True and F is False.
*\ This is a back slash and it’s made to avoid a special character.
*We can create a factor with the function as.factor() .
*We create a vector composed from a factors as below:
Exp : fact1 <- as.factor(c(“male”,”female”))
*A list in R can contain anything, it can contain a list of lists or
characters, a list o vectors or whatever.
*In a data frame we can put vectors with different number of rows.
*Indexes in R starts with 1.
Facto Mine R:
install.packages("FactoMineR")
install.packages("factoextra")
library(FactoMineR)
library(factoextra)
# Load a built-in dataset (iris dataset in this example)
data(iris)
# Perform PCA
res_pca <- PCA (iris[, 1:4], graph = FALSE)
# Print summary
summary(res_pca)
K-means clustering :
# Generate some example data
set.seed(123) data <- data.frame( x = rnorm(100), y = rnorm(100) )
# Perform K-means clustering with K = 3
k <- 3 kmeans_result <- kmeans(data, centers = k)
# View the clustering
results print(kmeans_result)
# Plot the data with cluster assignments
plot(data, col = kmeans_result$cluster, main = "K-means Clustering")
points (kmeans_result$centers, col = 1:k, pch = 8, cex = 2)
Hierarchical Clustering:
# Generate some example data
set.seed(123) data <- data.frame( x = rnorm(100), y = rnorm(100) ) #
Compute hierarchical clustering using Euclidean distance and complete
linkage
hc_result <- hclust(dist(data), method = "complete")
# Plot the dendrogram
plot(hc_result, main = "Hierarchical Clustering Dendrogram")
# Cut the dendrogram to create clusters
k <- 3 cluster_cut <- cutree(hc_result, k)
# Plot the data with cluster assignments
plot(data, col = cluster_cut, main = "Hierarchical Clustering")
DBSCAN:
install.packages("dbscan")
library(dbscan)
# Generate some example data
set.seed(123)
data <- data.frame( x = c(rnorm(50, mean = 5), rnorm(50, mean = 10)),
y = c(rnorm(50), rnorm(50, mean = 5)) )
# Perform DBSCAN clustering
dbscan_result <- dbscan(data, eps = 2, MinPts = 5)
# Plot the data with cluster assignments
plot(data, col = dbscan_result$cluster + 1, pch = 16, main = "DBSCAN
Clustering")
spectral clustering:
install.packages("kernlab")library(kernlab)
# Generate some example data
set.seed(123) data <- matrix(rnorm(200), ncol = 2)
# Perform spectral clustering
num_clusters <- 3
spectral_result <- specc(data, centers = num_clusters)
# Plot the data with cluster assignments
plot(data, col = spectral_result, pch = 16, main = "Spectral Clustering")
#########################################
X=iris[,1:4]
D=dist(X,method = "euclidean")
h1=hclust(d = D,method = "complete")
plot(h1)
p1=cutree(h1,3)
plot(X[,1:2],col=p1,pch=p1)
D=dist(X,method= "manhattan")
h2=hclust(d = D,method = "complete")
p2=cutree(h2,3)
plot(X[,1:2],col=p2,pch=p2)
NbClust(iris[,1:4],method="kmeans")