[go: up one dir, main page]

0% found this document useful (0 votes)
4 views31 pages

RStudio

,ne,n,me

Uploaded by

221it040
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views31 pages

RStudio

,ne,n,me

Uploaded by

221it040
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Experiment 1: Working with Objects in Memory

Aim:

To understand the creation, manipulation, and management of objects in R's memory.

Algorithm:

1. Start the R Environment: Open RStudio or the R Console.


2. Create Basic Objects:
○ Use the assignment operator = or = to create variables.
○ Example: x = 10 or y = "Hello"
3. Manipulate Objects:
○ Perform operations on numeric objects.
○ Example: z = x + 20
4. Check Object Class and Type:
○ Use functions like class() and typeof() to verify the type of objects.
○ Example: class(x) or typeof(y)
5. Inspect Objects in Memory:
○ Use ls() to list all objects in the current environment.
○ Example: ls()
6. Remove Objects:
○ Use rm() to delete objects from memory.
○ Example: rm(x)
7. Perform Simple Operations:
○ Work with sequences, vectors, and logical conditions.
○ Example: vec = c(1, 2, 3, 4)
8. End: Display the final state of objects in the memory.

R Code:

# Create objects
x = 10
y = "Hello"
z = x + 20

# Print objects
print(x)
print(y)
print(z)

# Check object types


cat("Class of x:", class(x), "\n")
cat("Type of y:", typeof(y), "\n")
# List objects in memory
cat("Objects in memory:", ls(), "\n")

# Remove an object
rm(x)

# Confirm removal
cat("Objects after removing 'x':", ls(), "\n")

# Work with a vector


vec = c(1, 2, 3, 4, 5)
print(vec)

# Perform an operation on the vector


vec_squared = vec^2
print(vec_squared)

Output Example:

[1] 10
[1] "Hello"
[1] 30
Class of x: numeric
Type of y: character
Objects in memory: y z
Objects after removing 'x': y z vec
[1] 1 2 3 4 5
[1] 1 4 9 16 25

Experiment 2: Demonstrate Data Frame


Aim:
To create and manipulate a Data Frame in R, showcasing its structure, operations, and
applications.

Algorithm:

1. Start the R Environment: Open RStudio or the R Console.


2. Create a Data Frame:
○ Use the data.frame() function.
○ Example: df = data.frame(Column1, Column2, Column3)
3. Inspect the Data Frame:
○ View the structure using str().
○ Check dimensions using dim().
4. Access Data Frame Elements:
○ Use indexing: df[row, column].
○ Access columns using the $ operator: df$ColumnName.
5. Perform Operations:
○ Add, modify, or delete rows and columns.
○ Example: df$NewColumn = some_operation.
6. Summary and Viewing:
○ Display the first few rows with head().
○ Summarize data using summary().
7. End: Save or display the modified Data Frame.

R Code:

# Create a data frame


students = data.frame(
Roll_No = c(101, 102, 103, 104),
Name = c("Alice", "Bob", "Charlie", "Diana"),
Marks = c(85, 90, 78, 92),
Grade = c("A", "A+", "B", "A+")
)

# Display the data frame


print("Original Data Frame:")
print(students)

# View structure and dimensions


cat("\nStructure of the Data Frame:\n")
str(students)

cat("\nDimensions of the Data Frame: ")


print(dim(students))
# Access specific elements
cat("\nMarks of the second student:")
print(students[2, "Marks"])

cat("\nNames of all students:")


print(students$Name)

# Add a new column


students$Attendance = c(90, 95, 85, 88)
cat("\nData Frame after adding Attendance column:\n")
print(students)

# Modify a column
students$Marks = students$Marks + 5
cat("\nData Frame after increasing marks by 5:\n")
print(students)

Output Example:

Original Data Frame:


Roll_No Name Marks Grade
1 101 Alice 85 A
2 102 Bob 90 A+
3 103 Charlie 78 B
4 104 Diana 92 A+

Structure of the Data Frame:


'data.frame': 4 obs. of 4 variables:
$ Roll_No: num 101 102 103 104
$ Name : chr "Alice" "Bob" "Charlie" "Diana"
$ Marks : num 85 90 78 92
$ Grade : chr "A" "A+" "B" "A+"

Dimensions of the Data Frame:


[1] 4 4

Marks of the second student:


[1] 90
Names of all students:
[1] "Alice" "Bob" "Charlie" "Diana"

Data Frame after adding Attendance column:


Roll_No Name Marks Grade Attendance
1 101 Alice 85 A 90
2 102 Bob 90 A+ 95
3 103 Charlie 78 B 85
4 104 Diana 92 A+ 88

Data Frame after increasing marks by 5:


Roll_No Name Marks Grade Attendance
1 101 Alice 90 A 90
2 102 Bob 95 A+ 95
3 103 Charlie 83 B 85
4 104 Diana 97 A+ 88

This program demonstrates the creation and manipulation of a data frame in R.

Experiment 3: Perform Matrix Operations


Aim:

To create and perform various operations on matrices in R, such as addition, multiplication,


transposition, and inversion.
Algorithm:

1. Start the R Environment: Open RStudio or the R Console.


2. Create Matrices:
○ Use the matrix() function.
○ Example: matrix(data, nrow, ncol)
3. Perform Basic Matrix Operations:
○ Addition, subtraction, and multiplication.
○ Use operators like +, -, *.
4. Transpose the Matrix:
○ Use the t() function.
5. Find the Determinant:
○ Use the det() function.
6. Find the Inverse of a Matrix:
○ Use the solve() function (for square matrices).
7. End: Display the final results of the operations.

R Code:

# Create two matrices


matrix1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
matrix2 <- matrix(c(6, 5, 4, 3, 2, 1), nrow = 2, ncol = 3)

# Display the matrices


cat("Matrix 1:\n")
print(matrix1)

cat("\nMatrix 2:\n")
print(matrix2)

# Matrix addition
matrix_sum <- matrix1 + matrix2
cat("\nMatrix Addition (Matrix 1 + Matrix 2):\n")
print(matrix_sum)

# Transpose of a matrix
transpose_matrix <- t(matrix1)
cat("\nTranspose of Matrix 1:\n")
print(transpose_matrix)

# Multiplication of matrices (requires compatible dimensions)


matrix3 <- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2)
matrix4 <- matrix(c(5, 6, 7, 8), nrow = 2, ncol = 2)

matrix_product <- matrix3 %*% matrix4


cat("\nMatrix Multiplication (Matrix 3 x Matrix 4):\n")
print(matrix_product)

# Determinant of a square matrix


det_matrix <- det(matrix3)
cat("\nDeterminant of Matrix 3:\n")
print(det_matrix)

# Inverse of a square matrix (if determinant is not zero)


if (det_matrix != 0) {
inverse_matrix <- solve(matrix3)
cat("\nInverse of Matrix 3:\n")
print(inverse_matrix)
} else {
cat("\nMatrix 3 is not invertible.\n")
}

Output Example:

Matrix 1:
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6

Matrix 2:
[,1] [,2] [,3]
[1,] 6 4 2
[2,] 5 3 1

Matrix Addition (Matrix 1 + Matrix 2):


[,1] [,2] [,3]
[1,] 7 7 7
[2,] 7 7 7

Transpose of Matrix 1:
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6

Matrix Multiplication (Matrix 3 x Matrix 4):


[,1] [,2]
[1,] 19 22
[2,] 43 50

Determinant of Matrix 3:
[1] -2

Inverse of Matrix 3:
[,1] [,2]
[1,] -2 1.5
[2,] 1 -0.5

This program demonstrates how to create matrices, perform basic arithmetic operations,
transpose, find determinants, and calculate inverses in R.

Experiment 4: Working with Various Built-in Functions in R


Aim:

To explore and demonstrate the use of various built-in functions in R for mathematical,
statistical, and data manipulation tasks.

Algorithm:

1. Start the R Environment: Open RStudio or the R Console.


2. Use Mathematical Functions:
○ Demonstrate functions like sqrt(), log(), exp(), and abs().
3. Use Statistical Functions:
○ Demonstrate functions like mean(), median(), sd(), var(), and
summary().
4. Use Character Functions:
○ Demonstrate functions like toupper(), tolower(), substr(), and
paste().
5. Use Sequence and Repetition Functions:
○ Use seq() and rep() to generate sequences and repeated values.
6. Perform Aggregation:
○ Use aggregate() to group and summarize data.
7. End: Display the results of all operations.

R Code:

# 1. Mathematical Functions
x <- 16
y <- -4
cat("Square root of", x, ":", sqrt(x), "\n")
cat("Absolute value of", y, ":", abs(y), "\n")
cat("Natural logarithm of", x, ":", log(x), "\n")
cat("Exponential of", y, ":", exp(y), "\n\n")

# 2. Statistical Functions
data <- c(10, 20, 30, 40, 50)
cat("Mean of data:", mean(data), "\n")
cat("Median of data:", median(data), "\n")
cat("Standard deviation of data:", sd(data), "\n")
cat("Variance of data:", var(data), "\n")
cat("Summary of data:\n")
print(summary(data))
cat("\n")

# 3. Character Functions
text <- "Hello R"
cat("Uppercase:", toupper(text), "\n")
cat("Lowercase:", tolower(text), "\n")
cat("Substring (1 to 5):", substr(text, 1, 5), "\n")
cat("Concatenate strings:", paste("Learning", "R", sep = " "), "\n\
n")

# 4. Sequence and Repetition


sequence <- seq(1, 10, by = 2)
cat("Generated sequence:", sequence, "\n")
repeated <- rep(5, times = 4)
cat("Repeated values:", repeated, "\n\n")

# 5. Aggregation
df <- data.frame(
Category = c("A", "A", "B", "B", "C"),
Value = c(10, 15, 10, 20, 30)
)
aggregated <- aggregate(Value ~ Category, data = df, sum)
cat("Aggregated values by category:\n")
print(aggregated)

Output Example:

Square root of 16 : 4
Absolute value of -4 : 4
Natural logarithm of 16 : 2.772589
Exponential of -4 : 0.018316

Mean of data: 30
Median of data: 30
Standard deviation of data: 15.81139
Variance of data: 250
Summary of data:
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.0 20.0 30.0 30.0 40.0 50.0

Uppercase: HELLO R
Lowercase: hello r
Substring (1 to 5): Hello
Concatenate strings: Learning R

Generated sequence: 1 3 5 7 9
Repeated values: 5 5 5 5

Aggregated values by category:


Category Value
1 A 25
2 B 30
3 C 30

This program demonstrates the use of various built-in functions in R for handling
mathematical operations, statistical analysis, character string manipulations, sequence
generation, and data aggregation.

Experiment 5: Import and Export Files in R


Aim:

To demonstrate the import and export of data files in R, such as CSV, Excel, and text files,
and perform basic operations on the imported data.

Algorithm:

1. Start the R Environment: Open RStudio or the R Console.


2. Import a CSV File:
○ Use the read.csv() function to load data.
○ Example: data <- read.csv("file.csv").
3. View and Manipulate Data:
○ Use functions like head(), str(), and summary() to inspect the data.
4. Export a CSV File:
○ Use the write.csv() function to save the modified data to a new file.
5. Import an Excel File (Optional):
○ Use the readxl package and the read_excel() function.
6. Export Data to an Excel File:
○ Use the writexl package and the write_xlsx() function.
7. Import and Export Text Files:
○ Use read.table() and write.table() functions.
8. End: Verify the imported and exported files.

R Code:

# 1. Importing a CSV File


cat("Importing CSV file...\n")
data <- read.csv("sample.csv", header = TRUE)
cat("First few rows of the data:\n")
print(head(data))
# 2. Display Summary and Structure
cat("\nSummary of the imported data:\n")
print(summary(data))

cat("\nStructure of the imported data:\n")


str(data)

# 3. Modify Data
cat("\nModifying data: Adding a new column...\n")
data$NewColumn <- data$ExistingColumn * 2 # Replace ExistingColumn
with an actual column name

# 4. Exporting Data to a New CSV File


cat("\nExporting modified data to a new CSV file...\n")
write.csv(data, "modified_data.csv", row.names = FALSE)
cat("Data exported to 'modified_data.csv'\n")

# 5. Importing and Exporting Excel Files (requires `readxl` and


`writexl` packages)
if (!requireNamespace("readxl", quietly = TRUE))
install.packages("readxl")
if (!requireNamespace("writexl", quietly = TRUE))
install.packages("writexl")

library(readxl)
library(writexl)

cat("\nImporting Excel file...\n")


excel_data <- read_excel("sample.xlsx")
cat("First few rows of the Excel data:\n")
print(head(excel_data))

cat("\nExporting data to an Excel file...\n")


write_xlsx(data, "exported_data.xlsx")
cat("Data exported to 'exported_data.xlsx'\n")

# 6. Importing and Exporting Text Files


cat("\nImporting Text file...\n")
text_data <- read.table("sample.txt", header = TRUE, sep = "\t")
cat("First few rows of the text data:\n")
print(head(text_data))
cat("\nExporting data to a text file...\n")
write.table(data, "exported_data.txt", row.names = FALSE, sep = "\
t")
cat("Data exported to 'exported_data.txt'\n")

Output Example:

Importing CSV file...


First few rows of the data:
Column1 Column2 Column3
1 1 A 10
2 2 B 20
3 3 C 30

Summary of the imported data:


Column1 Column2 Column3
Min. :1 A:1 Min. :10
1st Qu.:2 B:1 1st Qu.:20
Median :3 C:1 Median :30
Mean :3 Mean :30
3rd Qu.:3 3rd Qu.:30
Max. :3 Max. :30

Structure of the imported data:


'data.frame': 3 obs. of 3 variables:
$ Column1: int 1 2 3
$ Column2: Factor w/ 3 levels "A","B","C": 1 2 3
$ Column3: num 10 20 30

Modifying data: Adding a new column...

Exporting modified data to a new CSV file...


Data exported to 'modified_data.csv'

Importing Excel file...


First few rows of the Excel data:
Column1 Column2 Column3
1 1 A 10
2 2 B 20
Exporting data to an Excel file...
Data exported to 'exported_data.xlsx'

Importing Text file...


First few rows of the text data:
Column1 Column2 Column3
1 1 X 5
2 2 Y 10

Exporting data to a text file...


Data exported to 'exported_data.txt'

This program demonstrates importing and exporting CSV, Excel, and text files in R, with
basic operations performed on the imported data.

Experiment 6: Implement Statistical Methods


Aim:

To implement statistical methods such as mean, median, variance, standard deviation,


correlation, and regression analysis using R.

Algorithm:

1. Start the R Environment: Open RStudio or the R Console.


2. Create or Import Data:
○ Define a dataset manually or import it using read.csv().
3. Calculate Basic Statistics:
○ Use functions like mean(), median(), var(), and sd().
4. Perform Correlation Analysis:
○ Use the cor() function to calculate the correlation between variables.
5. Perform Linear Regression:
○ Use the lm() function to fit a linear regression model.
6. Visualize the Results:
○ Use plot() to create a scatter plot and abline() to add a regression line.
7. End: Print the results and display the visualization.

R Code:

# Step 1: Create a dataset


data <- data.frame(
x = c(5, 10, 15, 20, 25),
y = c(12, 20, 28, 36, 44)
)

cat("Dataset:\n")
print(data)

# Step 2: Calculate Basic Statistics


mean_x <- mean(data$x)
median_x <- median(data$x)
variance_x <- var(data$x)
sd_x <- sd(data$x)

cat("\nBasic Statistics for x:\n")


cat("Mean:", mean_x, "\n")
cat("Median:", median_x, "\n")
cat("Variance:", variance_x, "\n")
cat("Standard Deviation:", sd_x, "\n")

# Step 3: Correlation Analysis


correlation <- cor(data$x, data$y)
cat("\nCorrelation between x and y:", correlation, "\n")

# Step 4: Perform Linear Regression


cat("\nPerforming Linear Regression:\n")
model <- lm(y ~ x, data = data)
cat("Regression Summary:\n")
print(summary(model))

# Step 5: Visualize Data and Regression Line


plot(data$x, data$y, main = "Scatter Plot with Regression Line",
xlab = "X", ylab = "Y", col = "blue", pch = 19)
abline(model, col = "red", lwd = 2)
Expected Output:

Dataset:
x y
1 5 12
2 10 20
3 15 28
4 20 36
5 25 44

Basic Statistics for x:


Mean: 15
Median: 15
Variance: 62.5
Standard Deviation: 7.905694

Correlation between x and y: 1

Performing Linear Regression:


Regression Summary:
Call:
lm(formula = y ~ x, data = data)

Residuals:
1 2 3 4 5
-1.421e-14 -7.105e-15 0.000e+00 7.105e-15 1.421e-14

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.000 5.568e-15 7.18e+14 <2e-16 ***
x 1.600 3.712e-16 4.31e+15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.693e-15 on 3 degrees of freedom


Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.86e+31 on 1 and 3 DF, p-value: < 2.2e-16
Visualization:

● A scatter plot is displayed with data points (x vs. y) in blue and the regression line in
red.

This program demonstrates how to calculate statistical measures, evaluate correlations, and
perform regression analysis, along with visualizing results in R.

Experiment 7: Working with Machine Learning Algorithms


Aim:

To implement a basic machine learning algorithm, such as linear regression or k-nearest


neighbors (KNN), using R.

Algorithm for K-Nearest Neighbors (KNN):

1. Start the R Environment: Open RStudio or the R Console.


2. Load Required Libraries: Install and load the class package for KNN.
3. Prepare the Dataset:
○ Use a built-in dataset such as iris, or load your own dataset.
○ Split the dataset into training and testing sets.
4. Normalize the Data:
○ Scale the features to ensure they are on a comparable scale.
5. Implement KNN:
○ Use the knn() function to classify test data based on training data.
6. Evaluate the Model:
○ Compare predictions with actual labels to calculate accuracy.
7. End: Display the results and accuracy.
R Code:

# Step 1: Load Required Libraries


if (!requireNamespace("class", quietly = TRUE))
install.packages("class")
library(class)

# Step 2: Load and Prepare Dataset


data(iris) # Load the iris dataset
cat("First few rows of the iris dataset:\n")
print(head(iris))

# Step 3: Split the Data into Training and Testing Sets


set.seed(123) # For reproducibility
indices <- sample(1:nrow(iris), size = 0.7 * nrow(iris)) # 70%
training
train_data <- iris[indices, ]
test_data <- iris[-indices, ]

train_features <- train_data[, 1:4] # Sepal and Petal dimensions


train_labels <- train_data[, 5] # Species column
test_features <- test_data[, 1:4]
test_labels <- test_data[, 5]

# Step 4: Normalize the Features (Optional)


normalize <- function(x) {
return((x - min(x)) / (max(x) - min(x)))
}

train_features <- as.data.frame(lapply(train_features, normalize))


test_features <- as.data.frame(lapply(test_features, normalize))

# Step 5: Implement KNN


k <- 3 # Number of neighbors
predicted_labels <- knn(train_features, test_features, train_labels,
k)

# Step 6: Evaluate the Model


accuracy <- sum(predicted_labels == test_labels) /
length(test_labels) * 100
cat("\nAccuracy of the KNN model:", accuracy, "%\n")
# Confusion Matrix
cat("\nConfusion Matrix:\n")
print(table(Predicted = predicted_labels, Actual = test_labels))

Expected Output:

First few rows of the iris dataset:


Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

Accuracy of the KNN model: 97.7777777777778 %

Confusion Matrix:
Actual
Predicted setosa versicolor virginica
setosa 15 0 0
versicolor 0 14 1
virginica 0 0 15

Conclusion:

The KNN algorithm was successfully implemented, and the accuracy of the model was
calculated. This demonstrates the effectiveness of KNN for classification tasks.

Experiment 8: Implement Time Series Analysis


Aim:

To analyze and forecast a time series dataset using R.

Algorithm:

1. Start the R Environment: Open RStudio or the R Console.


2. Load Required Libraries: Install and load necessary libraries such as forecast
and ggplot2.
3. Load the Time Series Data:
○ Use built-in datasets like AirPassengers or import your own dataset.
○ Convert the dataset into a time series object using the ts() function if not
already in time series format.
4. Visualize the Data:
○ Use plot() to visualize the time series data.
5. Decompose the Time Series:
○ Apply decomposition using the decompose() function to separate the trend,
seasonality, and residuals.
6. Apply Forecasting Method:
○ Use methods like ARIMA or exponential smoothing for forecasting.
7. Evaluate the Forecast:
○ Compare the predicted values with the actual data.
8. End: Display the plots and results.

R Code:

# Step 1: Load Required Libraries


if (!requireNamespace("forecast", quietly = TRUE))
install.packages("forecast")
if (!requireNamespace("ggplot2", quietly = TRUE))
install.packages("ggplot2")

library(forecast)
library(ggplot2)

# Step 2: Load the Time Series Data


data("AirPassengers") # Built-in dataset
ts_data <- AirPassengers

# Step 3: Visualize the Time Series Data


cat("Time Series Data Summary:\n")
print(summary(ts_data))
plot(ts_data, main = "AirPassengers Data", xlab = "Year", ylab =
"Passengers", col = "blue")

# Step 4: Decompose the Time Series


decomposed <- decompose(ts_data)
plot(decomposed)

# Step 5: Apply ARIMA Model for Forecasting


model <- auto.arima(ts_data)
cat("\nARIMA Model Summary:\n")
print(summary(model))

# Forecast the next 12 months


forecasted <- forecast(model, h = 12)
cat("\nForecasted Values:\n")
print(forecasted)

# Step 6: Plot the Forecast


plot(forecasted, main = "AirPassengers Forecast", xlab = "Year",
ylab = "Passengers", col = "blue")

# Step 7: Evaluate the Model (Optional)


accuracy_metrics <- accuracy(forecasted)
cat("\nAccuracy Metrics:\n")
print(accuracy_metrics)

Expected Output:

Time Series Data Summary:


Min. 1st Qu. Median Mean 3rd Qu. Max.
104 180 265 280 360 622

ARIMA Model Summary:


Series: ts_data
ARIMA(0,1,1)(0,1,1)[12]

Coefficients:
ma1 sma1
-0.401 -0.627
s.e. 0.088 0.076

sigma^2 estimated as 1378: log likelihood=-508.33


AIC=1022.67 AICc=1022.91 BIC=1031.47

Forecasted Values:
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 1961 443.961 421.3156 466.6064 409.3448 478.5772
Feb 1961 444.450 421.9786 466.9214 409.9040 478.9960
...
Accuracy Metrics:
ME RMSE MAE MPE MAPE
Training set 0.1234 35.67891 28.2345 -0.0234 3.67890

Visualizations:

1. Original Time Series Plot:


○ Shows trends and seasonality in the dataset.
2. Decomposition Plot:
○ Displays trend, seasonality, and residual components.
3. Forecast Plot:
○ Presents the original data with forecasted values and confidence intervals.

Conclusion:

The time series analysis was successfully performed, including decomposition and
forecasting using ARIMA. The forecasted values provide insights into future trends.

Experiment 9: Demonstrate Data Mining Algorithms


Aim:

To demonstrate a basic data mining algorithm, such as association rule mining using the
Apriori algorithm in R.

Algorithm:

1. Start the R Environment: Open RStudio or the R Console.


2. Load Required Libraries: Install and load the arules library for association rule
mining.
3. Prepare the Dataset:
○ Use a built-in dataset like Groceries or create your own transactional data.
4. Perform Data Preprocessing:
○ Convert the dataset into a transaction format if necessary.
5. Apply the Apriori Algorithm:
○ Use the apriori() function to discover frequent itemsets and association
rules.
6. Analyze the Rules:
○ Sort and inspect the rules based on confidence, support, or lift.
7. End: Display the mined rules and relevant metrics.
R Code:

# Step 1: Load Required Libraries


if (!requireNamespace("arules", quietly = TRUE))
install.packages("arules")

library(arules)

# Step 2: Load the Dataset


data("Groceries") # Built-in transactional dataset
cat("Summary of Groceries Dataset:\n")
print(summary(Groceries))

# Step 3: Apply the Apriori Algorithm


rules <- apriori(
Groceries,
parameter = list(support = 0.01, confidence = 0.5)
)

# Step 4: Inspect the Rules


cat("\nSummary of Association Rules:\n")
print(summary(rules))

# Inspect the top 5 rules sorted by lift


cat("\nTop 5 Association Rules:\n")
inspect(head(sort(rules, by = "lift"), 5))

# Step 5: Visualize Rules (Optional)


if (!requireNamespace("arulesViz", quietly = TRUE))
install.packages("arulesViz")
library(arulesViz)
plot(rules, method = "graph", control = list(type = "items"))

Expected Output:

Summary of Groceries Dataset:


transactions as itemMatrix in sparse format with
9835 rows (elements/itemsets/transactions) and
169 columns (items) and a density of 0.02609146

Summary of Association Rules:


set of 420 rules

Top 5 Association Rules:


lhs rhs support confidence
lift
[1] {whole milk} => {other vegetables} 0.0745 0.5587045
3.122
[2] {root vegetables} => {whole milk} 0.0486 0.4937238
2.250
[3] {yogurt} => {whole milk} 0.0560 0.4023948
1.834
...

Visualizations:

1. Graph Plot:
○ Displays items and association rules in a network format.
2. Scatter Plot (Optional):
○ Shows the relationship between support, confidence, and lift.

Conclusion:

The Apriori algorithm was successfully implemented to discover frequent itemsets and
generate association rules. This demonstrates the basic principles of data mining and
association rule learning in R.

Experiment 10: Implement Text Mining Algorithms


Aim:

To implement text mining using R by preprocessing textual data and extracting insights such
as frequent terms or word clouds.

Algorithm:

1. Start the R Environment: Open RStudio or the R Console.


2. Load Required Libraries: Install and load necessary libraries such as tm,
wordcloud, and SnowballC.
3. Load the Text Data:
○ Use a built-in dataset or read a text file containing the data.
4. Preprocess the Data:
○ Convert the text to lowercase.
○ Remove stopwords, punctuation, and numbers.
○ Perform stemming to normalize words.
5. Create a Document-Term Matrix:
○ Use the DocumentTermMatrix() function to create a term-document
matrix.
6. Analyze the Data:
○ Find the most frequent terms.
○ Visualize the terms using a word cloud.
7. End: Display the insights and visualizations.

R Code:

# Step 1: Load Required Libraries


if (!requireNamespace("tm", quietly = TRUE)) install.packages("tm")
if (!requireNamespace("wordcloud", quietly = TRUE))
install.packages("wordcloud")
if (!requireNamespace("SnowballC", quietly = TRUE))
install.packages("SnowballC")

library(tm)
library(wordcloud)
library(SnowballC)

# Step 2: Load Text Data


text_data <- c(
"Text mining is the process of deriving meaningful information
from text.",
"It involves cleaning, preprocessing, and analyzing textual
data.",
"Applications of text mining include sentiment analysis, topic
modeling, and more."
)

# Step 3: Create a Corpus


corpus <- Corpus(VectorSource(text_data))

# Step 4: Preprocess the Data


corpus <- tm_map(corpus, content_transformer(tolower)) # Convert to
lowercase
corpus <- tm_map(corpus, removePunctuation) # Remove
punctuation
corpus <- tm_map(corpus, removeNumbers) # Remove
numbers
corpus <- tm_map(corpus, removeWords, stopwords("en")) # Remove
stopwords
corpus <- tm_map(corpus, stemDocument) # Perform
stemming

# Step 5: Create a Document-Term Matrix


dtm <- DocumentTermMatrix(corpus)
cat("\nDocument-Term Matrix Summary:\n")
print(dtm)

# Step 6: Analyze and Visualize Data


# Find the most frequent terms
freq_terms <- findFreqTerms(dtm, lowfreq = 2)
cat("\nFrequent Terms (Appearing >= 2 times):\n")
print(freq_terms)

# Visualize with Word Cloud


word_freq <- as.data.frame(as.matrix(dtm))
word_freq <- colSums(word_freq)
wordcloud(names(word_freq), word_freq, max.words = 50, colors =
brewer.pal(8, "Dark2"))

Expected Output:

Document-Term Matrix Summary:


A document-term matrix (3 documents, 20 terms)

Frequent Terms (Appearing >= 2 times):


[1] "data" "text" "mine"

Word Cloud Visualization:

A colorful word cloud showing frequent terms like "text," "data," and "mine."

Conclusion:

Text mining was successfully performed using R. Preprocessing techniques and analysis,
including generating a word cloud, helped extract meaningful insights from textual data.
Experiment 11: Data Visualization Techniques
Aim:

To demonstrate various data visualization techniques in R using basic and advanced plots.

Algorithm:

1. Start the R Environment: Open RStudio or the R Console.


2. Load Required Libraries: Install and load necessary libraries like ggplot2.
3. Load the Dataset: Use a built-in dataset such as mtcars or import your own.
4. Generate Visualizations:
○ Create basic plots (line plot, bar plot, etc.).
○ Create advanced plots (scatter plot, histogram, etc.).
○ Add labels, titles, and themes to the plots.
5. Customize the Plots:
○ Use colors, point shapes, and additional features for better insights.
6. Display the Results: Render the plots and analyze the insights.
7. End: Save the plots if required.

R Code:

# Step 1: Load Required Libraries


if (!requireNamespace("ggplot2", quietly = TRUE))
install.packages("ggplot2")

library(ggplot2)

# Step 2: Load Dataset


data("mtcars")
cat("Dataset Summary:\n")
print(summary(mtcars))

# Step 3: Generate Basic Visualizations


# Bar Plot - Number of cylinders
barplot(table(mtcars$cyl), main = "Number of Cylinders", col =
"blue",
xlab = "Cylinders", ylab = "Frequency")

# Scatter Plot - MPG vs Horsepower


plot(mtcars$mpg, mtcars$hp, main = "MPG vs Horsepower",
xlab = "Miles Per Gallon (MPG)", ylab = "Horsepower (HP)",
col = "red", pch = 19)

# Step 4: Generate Advanced Visualizations with ggplot2

# Histogram - MPG Distribution


ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 2, fill = "skyblue", color = "black") +
labs(title = "MPG Distribution", x = "Miles Per Gallon", y =
"Frequency")

# Box Plot - MPG by Cylinders


ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_boxplot(fill = "orange") +
labs(title = "MPG by Cylinder Count", x = "Number of Cylinders", y
= "MPG")

# Step 5: Customize a Scatter Plot with ggplot2


ggplot(mtcars, aes(x = wt, y = mpg, color = factor(gear))) +
geom_point(size = 3) +
labs(title = "MPG vs Weight by Gear", x = "Weight", y = "Miles Per
Gallon") +
theme_minimal()

# Step 6: Save a Plot (Optional)


ggsave("scatter_plot.png", width = 8, height = 6)

Expected Output:

1. Bar Plot:
○ Displays the frequency of cars based on the number of cylinders.
2. Scatter Plot:
○ Shows the relationship between miles per gallon (MPG) and horsepower
(HP).
3. Histogram:
○ Represents the distribution of MPG across the dataset.
4. Box Plot:
○ Compares MPG values for different cylinder categories.
5. Advanced Scatter Plot:
○ Highlights the relationship between weight and MPG, grouped by the number
of gears.

Conclusion:

Various data visualization techniques were successfully implemented using R. Both basic
and advanced plots provide insights into the dataset, demonstrating the power of visual
analysis.

Experiment 12: Experiment with Hypothesis Testing Methods


Aim:

To perform hypothesis testing in R to determine if there is a significant difference between


two sample groups.

Algorithm:

1. Start the R Environment: Open RStudio or the R Console.


2. Set Up Hypotheses:
○ Define the null hypothesis (H₀) and the alternative hypothesis (H₁).
○ Example: H₀ - There is no significant difference between the means of two
groups.
3. Load or Generate Data:
○ Use a built-in dataset or simulate data for testing.
4. Perform Hypothesis Testing:
○ Use appropriate statistical tests (e.g., t-test, ANOVA, chi-square test).
○ Choose the test based on the data type and hypothesis.
5. Interpret the Results:
○ Compare the p-value with the significance level (α = 0.05).
○ Accept or reject the null hypothesis based on the p-value.
6. End: Report the conclusion of the test.

R Code:

# Step 1: Generate Sample Data


set.seed(123)
group1 <- rnorm(30, mean = 50, sd = 5) # Group 1 data
group2 <- rnorm(30, mean = 55, sd = 5) # Group 2 data
# Step 2: Define Hypotheses
# H₀: The means of group1 and group2 are equal.
# H₁: The means of group1 and group2 are not equal.

# Step 3: Perform an Independent t-test


t_test_result <- t.test(group1, group2, alternative = "two.sided")

# Step 4: Display the Results


cat("T-Test Results:\n")
print(t_test_result)

# Step 5: Interpret the Results


if (t_test_result$p.value < 0.05) {
cat("\nConclusion: Reject the null hypothesis. There is a
significant difference between the groups.\n")
} else {
cat("\nConclusion: Fail to reject the null hypothesis. No
significant difference is found.\n")
}

# Step 6: Visualization (Optional)


boxplot(group1, group2, names = c("Group 1", "Group 2"),
main = "Boxplot of Two Groups",
col = c("lightblue", "pink"),
ylab = "Values")

Expected Output:

T-Test Results:
Welch Two Sample t-test

data: group1 and group2


t = -3.632, df = 57.76, p-value = 0.0006345
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-7.898827 -2.101173
sample estimates:
mean of x mean of y
50.28389 54.48388
Conclusion: Reject the null hypothesis. There is a significant
difference between the groups.

Visualization:

A boxplot comparing the two groups, showing the difference in their distributions.

Conclusion:

Hypothesis testing was successfully conducted using an independent t-test. The results
indicate whether there is a statistically significant difference between the means of the two
sample groups.

You might also like