0% found this document useful (0 votes)

125 views12 pages

Unit V - R Programming Notes

r programming

Uploaded by

akileshsivakrishnan1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views12 pages

Unit V - R Programming Notes

r programming

Uploaded by

akileshsivakrishnan1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

UNIT – V STUDY MATERIAL

Interfacing R with Other Languages

Interfacing R with other programming languages allows you to combine R's statistical
capabilities with the performance or specialized libraries of languages like Python, C++, or
Java. Below are the most common and useful interfaces:
1. Interfacing R with Python using reticulate
 Access powerful Python libraries (e.g., TensorFlow, pandas, NumPy, scikit-learn).
 Seamlessly integrate R and Python workflows.
Installation:
install.packages("reticulate")
Basic Example:
library(reticulate)
# Run Python code from R
py_run_string("import numpy as np; x = np.array([1, 2, 3]); x = x * 2")
py$x # Access the Python variable x in R
Calling Python Functions:
np <- import("numpy")
np$mean(c(1, 2, 3)) # Calls numpy’s mean
Running a Python script:
py_run_file("script.py")
2. Interfacing R with C/C++ using Rcpp
 Significant performance improvements for computationally heavy tasks.
 Use C++ functions directly in R.
Installation:
install.packages("Rcpp")
Basic Example:
library(Rcpp)
cppFunction('
int factorial(int n) {
if (n <= 1) return 1;
else return n * factorial(n - 1);
}
')
factorial(5) # Output: 120
Using External C++ Files:
sourceCpp("your_cpp_code.cpp")

3. Interfacing R with Java using rJava

 Use Java libraries directly in R.
 Important for working with enterprise software systems and Java-based APIs.
Installation:
install.packages("rJava")
Basic Example:
library(rJava)
.jinit()
a <- .jnew("java/lang/String", "Hello from Java")
.jcall(a, "I", "length") # Calls the length method
Use Cases:
 Python: Deep learning, image processing, web scraping.
 C/C++: High-performance simulations, numerical methods.
 Java: Working with JVM libraries, interacting with large-scale enterprise apps.

Parallel R
R is traditionally single-threaded, but for large computations or simulations, parallel
computing can drastically improve performance. R offers several packages to support parallel
execution on multiple cores or processors.
1. Why Use Parallel R?
 Faster computation by distributing tasks across CPU cores.
 Efficient for loops, simulations, bootstrapping, and large data processing.
2. Core Packages for Parallelism
a. parallel Package (Built-in)
R’s base package, available by default.
Example: Parallel Apply (parLapply)
library(parallel)
# Detect available cores
cores <- detectCores()
cl <- makeCluster(2) # Use 2 cores
# Parallel version of lapply
result <- parLapply(cl, 1:5, function(x) x^2)
stopCluster(cl)

print(result)
Other Functions:
 parSapply() – parallel version of sapply
 mclapply() – fork-based parallelism (Unix/macOS only)

b. foreach + doParallel: Elegant Parallel Loops

More readable and scalable than parLapply.
Example: Using foreach with %dopar%
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)

result <- foreach(i = 1:5) %dopar% {

i^2
}
stopCluster(cl)

print(result)
Benefit:
 More readable.
 Supports nested loops and conditionals inside.

c. future and furrr: Parallel and Functional

Modern approach using futures for parallelism.
library(future)
library(furrr)

plan(multisession, workers = 2) # Automatically chooses the right strategy

future_map(1:5, ~ .x^2)
Tips for Effective Parallel R
 Avoid using large global variables inside parallel code.
 Always stop the cluster (stopCluster) to free resources.
 Use detectCores() to dynamically scale across different machines.
 Use clusterExport() to share variables between processes.

Basic Statistics in R
R is designed for statistical computing, and it provides extensive built-in functions for
descriptive and inferential statistics.
1. Descriptive Statistics
These summarize and describe features of a dataset.
Measures of Central Tendency
data <- c(2, 4, 6, 8, 10)
 mean(data) # Arithmetic mean
 median(data) # Middle value
Measures of Dispersion
 var(data) # Variance
 sd(data) # Standard deviation
 range(data) # Minimum and maximum
Five-Number Summary
summary(data) # Min, 1st Qu., Median, Mean, 3rd Qu., Max
Frequency Tables
table(c("A", "B", "A", "C", "B", "A"))

2. Graphical Summary
Histograms and Boxplots
hist(data, main = "Histogram", col = "lightblue")
boxplot(data, main = "Boxplot")
Density Plot
plot(density(data), main = "Density Plot")

3. Inferential Statistics
These help draw conclusions from sample data.
One-Sample t-Test
Test if the mean of a sample differs from a given value.
t.test(data, mu = 5)
Two-Sample t-Test
group1 <- c(5, 6, 7, 8)
group2 <- c(7, 8, 9, 10)
t.test(group1, group2)

Paired t-Test
before <- c(10, 12, 14, 16)
after <- c(11, 14, 13, 17)
t.test(before, after, paired = TRUE)

4. Correlation and Covariance

Pearson Correlation
x <- c(1, 2, 3, 4)
y <- c(2, 4, 6, 8)

cor(x, y) # Correlation coefficient

cov(x, y) # Covariance

5. Chi-Square Test
Used for categorical data to test independence.
# Create a contingency table
obs <- matrix(c(20, 30, 50, 100), nrow = 2)
chisq.test(obs)

6. ANOVA (Analysis of Variance)

To compare means across multiple groups.
data(iris)
anova_result <- aov(Sepal.Length ~ Species, data = iris)
summary(anova_result)
Linear Models (LM) & Generalized Linear Models (GLM) in R
1. Linear Models (LM)
Linear models describe the relationship between a continuous response variable and one or
more predictor variables using a linear equation.
Model Form:
Y=β0+β1X1+β2X2+⋯+ ϵ

Basic Linear Model in R

data(mtcars)
model <- lm(mpg ~ wt + hp, data = mtcars)
summary(model)
Interpretation:
 Estimate: Coefficients (slopes)
 Pr(>|t|): p-values for testing if coefficients are significantly different from zero
 R-squared: Goodness of fit
 Residuals: Differences between actual and predicted values
Diagnostic Plots
par(mfrow = c(2, 2))
plot(model)
These help check assumptions:
 Linearity
 Normality of residuals
 Homoscedasticity (equal variance)
 Independence

Model with Interaction Term

lm(mpg ~ wt * hp, data = mtcars)

2. Generalized Linear Models (GLM)

GLMs extend LMs to handle non-normal response distributions using a link function.
Model Structure:
g(E(Y))=β0+β1X1+…
Where g() is the link function, and E(Y) is the expected value.
GLM Syntax in R
glm(formula, family = <distribution>, data = ...)
Common Families and Link Functions:

Family Link Function Use Case

gaussian identity Linear regression

binomial logit Logistic regression (binary)

poisson log Count data

a. Logistic Regression (Binomial Family)

glm_model <- glm(vs ~ mpg + wt, data = mtcars, family = binomial)
summary(glm_model)
 Used when the response variable is binary (0/1).
 vs: engine shape (0 = V-shaped, 1 = straight).

b. Poisson Regression
# Example: modeling counts (hypothetical)
counts <- rpois(100, lambda = 5)
group <- gl(2, 50)

glm(counts ~ group, family = poisson)

Model Evaluation
For Logistic Regression
# Predicted probabilities
predicted_probs <- predict(glm_model, type = "response")

# Confusion matrix
predicted_class <- ifelse(predicted_probs > 0.5, 1, 0)
table(predicted_class, mtcars$vs)
When to Use LM vs. GLM?

Situation Use LM Use GLM

Response is continuous ✅ ✅ (gaussian)

Response is binary ❌ ✅ (binomial/logit)

Response is count data ❌ ✅ (poisson/log)

Non-constant variance or skewed ❌ ✅ (use link function)

Non-linear Models in R
Non-linear models are used when the relationship between variables can't be explained well by
a straight line.
Model Form:
y=f(x,θ)+ϵy = f(x, \theta) + \epsilony=f(x,θ)+ϵ
Using nls() for Non-Linear Least Squares
# Simulated exponential growth data
x <- 1:10
y <- 5 * exp(0.4 * x) + rnorm(10, sd = 2)

# Fit non-linear model: y = a * exp(b * x)

model <- nls(y ~ a * exp(b * x), start = list(a = 1, b = 0.1))
summary(model)
Visualizing the Fit
plot(x, y, main = "Non-linear Fit")
lines(x, predict(model), col = "red", lwd = 2)

Time Series and Auto-Correlation in R

Time series analysis involves data points ordered in time, often with the goal of forecasting
future values, identifying trends, or detecting patterns like seasonality and cyclicity.

1. What is a Time Series?

A time series is a sequence of observations taken at successive equally spaced points in time.
Examples: stock prices, temperature data, sales over months.
2. Creating Time Series in R
Use ts() to create a time series object.
data <- c(112, 118, 132, 129, 121, 135, 148, 148, 136, 119)
ts_data <- ts(data, start = c(2020, 1), frequency = 12)
plot(ts_data, main = "Monthly Time Series", ylab = "Value")
 start: starting year and period
 frequency: 12 (monthly), 4 (quarterly), 1 (yearly)

3. Time Series Components

A time series typically has 4 components:
 Trend: Long-term progression
 Seasonality: Regular periodic fluctuations
 Cyclic: Irregular, long-term patterns
 Residual: Random noise
Decomposition Example
decomposed <- decompose(ts_data)
plot(decomposed)

4. Auto-Correlation
Auto-correlation is how current values are related to past values (lags). It's a key concept for
building models like ARIMA.
Auto-Correlation Function (ACF)
acf(ts_data, main = "ACF Plot")
Partial Auto-Correlation Function (PACF)
pacf(ts_data, main = "PACF Plot")

5. ARIMA Model (AutoRegressive Integrated Moving Average)

Best model for univariate forecasting.
Automatic ARIMA
library(forecast)
fit <- auto.arima(ts_data)
summary(fit)
forecasted <- forecast(fit, h = 12)
plot(forecasted)
 auto.arima() selects the best (p,d,q) model
 h = 12 forecasts 12 future time points
6. Stationarity Check
Time series must be stationary for ARIMA.
Augmented Dickey-Fuller Test (ADF Test)
library(tseries)
adf.test(ts_data)
 p < 0.05: data is stationary
 If not, you may need to difference the series:
diffed_data <- diff(ts_data)
plot(diffed_data)

7. Seasonal Decomposition Using Loess (STL)

More robust to irregular seasonality.
fit_stl <- stl(ts_data, s.window = "periodic")
plot(fit_stl)
✅ Applications
 Sales forecasting
 Temperature prediction
 Website traffic analysis
 Financial time series
Clustering in R – Detailed Explanation with Examples
Clustering is an unsupervised learning technique that groups similar data points into clusters.
R offers powerful tools to perform and visualize clustering for both numerical and categorical
data.
1. Types of Clustering Methods

Method Description Suitable For

K-Means Partitions data into K clusters Numerical features

Hierarchical Creates a tree of clusters (dendrogram) Numerical features

DBSCAN Density-based clustering Arbitrary shapes/no.

Model-Based Assumes data comes from a mixture of models Probabilistic methods

2. K-Means Clustering
a. Load Data and Apply Clustering
data(iris)
set.seed(123)
kmodel <- kmeans(iris[, 1:4], centers = 3)
print(kmodel)
b. Compare with True Species Labels
table(kmodel$cluster, iris$Species)
c. Visualize Clusters
library(ggplot2)
iris$Cluster <- as.factor(kmodel$cluster)
ggplot(iris, aes(x = Petal.Length, y = Petal.Width, color = Cluster)) +
geom_point(size = 3) +
labs(title = "K-Means Clustering on Iris Data")

3. Hierarchical Clustering
a. Compute Distance Matrix and Apply Clustering
d <- dist(iris[, 1:4]) # Euclidean distance
hc <- hclust(d, method = "complete")

plot(hc, labels = iris$Species, main = "Hierarchical Clustering Dendrogram")

b. Cut into Clusters
cutree(hc, k = 3)

4. DBSCAN – Density-Based Clustering

library(dbscan)

# Using scaled Iris data

data <- scale(iris[, 1:4])
db <- dbscan(data, eps = 0.5, minPts = 5)

plot(db, data)
 eps: neighborhood radius
 minPts: minimum points to form a cluster
5. Clustering Evaluation
Silhouette Score (Higher = Better)
library(cluster)

sil <- silhouette(kmodel$cluster, dist(iris[, 1:4]))

plot(sil)

✅ Use Cases of Clustering

 Customer segmentation (marketing)
 Document grouping (text mining)
 Image compression (pixel clustering)
 Anomaly detection (outliers form their own cluster)

Evolution of Big Data
No ratings yet
Evolution of Big Data
21 pages
Introduction To R Programming
No ratings yet
Introduction To R Programming
14 pages
Lecture Notes
100% (1)
Lecture Notes
82 pages
Data Structures Course Outline
No ratings yet
Data Structures Course Outline
34 pages
R Programming
No ratings yet
R Programming
11 pages
FDS Unit 5
No ratings yet
FDS Unit 5
22 pages
Brief Resume Dr-R-H-Goudar CS Dept1
No ratings yet
Brief Resume Dr-R-H-Goudar CS Dept1
2 pages
R Language
No ratings yet
R Language
59 pages
Digital Literacy - All Units
No ratings yet
Digital Literacy - All Units
29 pages
Stqa Viva
No ratings yet
Stqa Viva
10 pages
Unit I R Data Structures
No ratings yet
Unit I R Data Structures
30 pages
Unit 1 Python Notes
No ratings yet
Unit 1 Python Notes
46 pages
3G & 4G Network Evolution Guide
No ratings yet
3G & 4G Network Evolution Guide
178 pages
RISC vs CISC: Characteristics & Processing
No ratings yet
RISC vs CISC: Characteristics & Processing
16 pages
Unit 1 A Closer Look at Methods and Classes
100% (2)
Unit 1 A Closer Look at Methods and Classes
21 pages
CPL Practical 1
No ratings yet
CPL Practical 1
14 pages
Excel Guide for Data Analysts
No ratings yet
Excel Guide for Data Analysts
62 pages
User Defined Functions in Javascript
No ratings yet
User Defined Functions in Javascript
6 pages
Unit 1 Full Notes
No ratings yet
Unit 1 Full Notes
52 pages
Data Mining and Business Intelligence Lab Manual
No ratings yet
Data Mining and Business Intelligence Lab Manual
52 pages
SMB-R Programming Lab
No ratings yet
SMB-R Programming Lab
57 pages
MCA1020 Programming With C
No ratings yet
MCA1020 Programming With C
5 pages
QB Answers Ia 1 18ai733
No ratings yet
QB Answers Ia 1 18ai733
11 pages
Unit 4
No ratings yet
Unit 4
105 pages
R Programming 1-5
No ratings yet
R Programming 1-5
13 pages
Chapter 5: Queues: Bcs1223: Data Structures & Algorithms
No ratings yet
Chapter 5: Queues: Bcs1223: Data Structures & Algorithms
43 pages
Backtracking ADA
No ratings yet
Backtracking ADA
20 pages
Nptel - Data Mining - Week 2
No ratings yet
Nptel - Data Mining - Week 2
4 pages
Data Structure Syllabus
No ratings yet
Data Structure Syllabus
4 pages
FSD Module 5 Notes
No ratings yet
FSD Module 5 Notes
13 pages
Computational Methods and Techniques
No ratings yet
Computational Methods and Techniques
15 pages
7 Csesyll
No ratings yet
7 Csesyll
4 pages
Module 3-1
No ratings yet
Module 3-1
53 pages
Linux VI Editor
No ratings yet
Linux VI Editor
4 pages
Unit - Iii RDBMS Notes
No ratings yet
Unit - Iii RDBMS Notes
26 pages
M.sc. Computer Science
No ratings yet
M.sc. Computer Science
18 pages
Introduction To The Message Passing Interface (MPI) : Parallel and High Performance Computing
No ratings yet
Introduction To The Message Passing Interface (MPI) : Parallel and High Performance Computing
41 pages
R22-Ids-Question Bank
No ratings yet
R22-Ids-Question Bank
4 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
21 pages
Matplotlib Line and Scatter Plot Guide
No ratings yet
Matplotlib Line and Scatter Plot Guide
32 pages
Unit 3
No ratings yet
Unit 3
28 pages
CS502 DBMS Notes Unit 1
No ratings yet
CS502 DBMS Notes Unit 1
19 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
73 pages
R Programming Lab Programs
No ratings yet
R Programming Lab Programs
16 pages
Space and Time Trade-Off - PPT
No ratings yet
Space and Time Trade-Off - PPT
29 pages
Intro to Algorithms & Python
No ratings yet
Intro to Algorithms & Python
26 pages
Java: A Beginner's Guide 7th Edition Herbert Schildt Instant Download Full Chapters
No ratings yet
Java: A Beginner's Guide 7th Edition Herbert Schildt Instant Download Full Chapters
90 pages
The Role of Algorithms in Computing
No ratings yet
The Role of Algorithms in Computing
9 pages
Avanthi'S Research &technological Academy: Data Mining Lab
No ratings yet
Avanthi'S Research &technological Academy: Data Mining Lab
50 pages
Data Mining Handout
No ratings yet
Data Mining Handout
4 pages
IV 2 JavaLab Edited
No ratings yet
IV 2 JavaLab Edited
40 pages
Optimization Technique Course Objective
No ratings yet
Optimization Technique Course Objective
1 page
RBasics Handout
No ratings yet
RBasics Handout
6 pages
Introduction To R
No ratings yet
Introduction To R
36 pages
Rintro
No ratings yet
Rintro
14 pages
Introduction To R: 1 Getting Started
No ratings yet
Introduction To R: 1 Getting Started
14 pages
Advantages of R Programming Language:: Extensive Libraries
No ratings yet
Advantages of R Programming Language:: Extensive Libraries
34 pages
An R Companion To Statistical Thinking For The 21st Century
No ratings yet
An R Companion To Statistical Thinking For The 21st Century
159 pages
Unit 2 R
No ratings yet
Unit 2 R
16 pages
R Programming Lab
No ratings yet
R Programming Lab
26 pages
Artificial Neural Network Notes
No ratings yet
Artificial Neural Network Notes
9 pages
Adaptive Networks: Presentation By: C. Vinoth Kumar SSN College of Engineering
No ratings yet
Adaptive Networks: Presentation By: C. Vinoth Kumar SSN College of Engineering
19 pages
Modul ML
No ratings yet
Modul ML
7 pages
(Athena Scientific Series in Optimization and Neural Computation, 6) Dimitris Bertsimas, John N. Tsitsiklis - Introduction To Linear Optimization-Athena Scientific (1997) PDF
No ratings yet
(Athena Scientific Series in Optimization and Neural Computation, 6) Dimitris Bertsimas, John N. Tsitsiklis - Introduction To Linear Optimization-Athena Scientific (1997) PDF
186 pages
05 - Ch5 System Modeling
No ratings yet
05 - Ch5 System Modeling
61 pages
NOTES SQQS1043 CHAPTER 5 - Student
No ratings yet
NOTES SQQS1043 CHAPTER 5 - Student
74 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
11 pages
YY-Deep Learning PDF
No ratings yet
YY-Deep Learning PDF
46 pages
Online - Wed - Planner Project
0% (1)
Online - Wed - Planner Project
31 pages
Normal Distribution Table A-4
No ratings yet
Normal Distribution Table A-4
1 page
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
CH05 TB
No ratings yet
CH05 TB
17 pages
OOP Basics for Programmers
No ratings yet
OOP Basics for Programmers
16 pages
UNIT 4 (MCQS)
No ratings yet
UNIT 4 (MCQS)
13 pages
Settings Provider
No ratings yet
Settings Provider
279 pages
Artificial Intelligence and Deep Learning: Certificate Program
No ratings yet
Artificial Intelligence and Deep Learning: Certificate Program
12 pages
Fundamental - Deep Learning
No ratings yet
Fundamental - Deep Learning
69 pages
Lesson 1 Introduction To Object Oriented Programming
No ratings yet
Lesson 1 Introduction To Object Oriented Programming
17 pages
Unit 2 DL
No ratings yet
Unit 2 DL
43 pages
Automata Theory Exam Guide
No ratings yet
Automata Theory Exam Guide
2 pages
Lec 01
No ratings yet
Lec 01
37 pages
Generative AI Complete Questions
No ratings yet
Generative AI Complete Questions
3 pages
Theory of Computation Question Bank
No ratings yet
Theory of Computation Question Bank
7 pages
DL Syllabus
No ratings yet
DL Syllabus
2 pages
Lecture 4 AR1 MA1
No ratings yet
Lecture 4 AR1 MA1
23 pages
Artistic Intelligence at Kunstverein Hannover
No ratings yet
Artistic Intelligence at Kunstverein Hannover
4 pages
Perbandingan Peramalan Penjualan Produk Aknil PT - Sunthi Sepurimengguanakan Metode Single Moving Average Dan Single
No ratings yet
Perbandingan Peramalan Penjualan Produk Aknil PT - Sunthi Sepurimengguanakan Metode Single Moving Average Dan Single
8 pages
Praktikum TT M9.
No ratings yet
Praktikum TT M9.
6 pages
RNN and LSTM Concepts Explained
No ratings yet
RNN and LSTM Concepts Explained
128 pages
10 Forward - Backward Algorithm
No ratings yet
10 Forward - Backward Algorithm
21 pages

Unit V - R Programming Notes

Uploaded by

Unit V - R Programming Notes

Uploaded by

UNIT – V STUDY MATERIAL

Interfacing R with Other Languages

3. Interfacing R with Java using rJava

b. foreach + doParallel: Elegant Parallel Loops

result <- foreach(i = 1:5) %dopar% {

c. future and furrr: Parallel and Functional

plan(multisession, workers = 2) # Automatically chooses the right strategy

4. Correlation and Covariance

cor(x, y) # Correlation coefficient

6. ANOVA (Analysis of Variance)

Basic Linear Model in R

Model with Interaction Term

2. Generalized Linear Models (GLM)

Family Link Function Use Case

gaussian identity Linear regression

binomial logit Logistic regression (binary)

poisson log Count data

a. Logistic Regression (Binomial Family)

glm(counts ~ group, family = poisson)

Situation Use LM Use GLM

Response is continuous ✅ ✅ (gaussian)

Response is binary ❌ ✅ (binomial/logit)

Response is count data ❌ ✅ (poisson/log)

Non-constant variance or skewed ❌ ✅ (use link function)

# Fit non-linear model: y = a * exp(b * x)

Time Series and Auto-Correlation in R

1. What is a Time Series?

3. Time Series Components

5. ARIMA Model (AutoRegressive Integrated Moving Average)

7. Seasonal Decomposition Using Loess (STL)

Method Description Suitable For

K-Means Partitions data into K clusters Numerical features

Hierarchical Creates a tree of clusters (dendrogram) Numerical features

DBSCAN Density-based clustering Arbitrary shapes/no.

Model-Based Assumes data comes from a mixture of models Probabilistic methods

plot(hc, labels = iris$Species, main = "Hierarchical Clustering Dendrogram")

4. DBSCAN – Density-Based Clustering

# Using scaled Iris data

sil <- silhouette(kmodel$cluster, dist(iris[, 1:4]))

✅ Use Cases of Clustering

You might also like