[go: up one dir, main page]

0% found this document useful (0 votes)
16 views6 pages

Statistical Tests in R

Uploaded by

edwinmwongera3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views6 pages

Statistical Tests in R

Uploaded by

edwinmwongera3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

CMT 315

1 Correlattion
Correlation measures the strength and direction of the relationship between two variables. The most
common types of correlation are Pearsons correlation coefficient and Spearmans rank correlation
coefficient.

1.1 Pearson Correlation Coefficient


The Pearson correlation coefficient, denoted as r, measures the linear relationship between two con-
tinuous variables.

1.1.1 Formula
The Pearson correlation coefficient is given by:

∑(Xi − X̄)(Yi − Ȳ )
r= √ (1)
∑(Xi − X̄)2 ∑(Yi − Ȳ )2
where:

• Xi ,Yi are individual data points,

• X̄, Ȳ are the means of X and Y ,

• The numerator represents the covariance,

• The denominator normalizes the covariance by the product of the standard deviations.

1.1.2 Interpretation
• r = 1: Perfect positive correlation

• r = −1: Perfect negative correlation

• r = 0: No correlation

1.2 R Code for Pearson Correlation


# Sample Data
x <- c(10, 20, 30, 40, 50)
y <- c(5, 15, 25, 35, 45)

# Compute Pearson correlation


cor(x, y, method = "pearson")

1.3 Spearman Rank Correlation


The Spearman rank correlation coefficient, denoted as ρ (rho), measures the strength and direction of
a monotonic relationship between two variables by ranking the data.

1
1.3.1 Formula
6 ∑ di2
ρ = 1− (2)
n(n2 − 1)
where:
• di is the difference between the ranks of corresponding values of X and Y ,
• n is the number of observations.

1.3.2 Interpretation
Similar to Pearson correlation, ρ ranges from −1 to 1, with the same interpretation.

1.3.3 R Code for Spearman Correlation


# Compute Spearman correlation
cor(x, y, method = "spearman")

1.4 Differences Between Pearson and Spearman Correlation


• Pearson measures linear relationships, while Spearman measures monotonic relationships.
• Pearson uses actual data values, while Spearman uses ranked values.
• Spearman is more robust to outliers and non-normal distributions.

1.5 Conclusion
Both Pearson and Spearman correlation coefficients are useful depending on the nature of data. Pear-
son is appropriate for linear relationships, whereas Spearman is suitable for monotonic relationships
and non-parametric data.

2 SIMPLE LINEAR REGRESSION


2.1 Introduction
Simple Linear Regression (SLR) is a statistical method that models the relationship between a depen-
dent variable (Y ) and a single independent variable (X) using a linear equation.

2.2 Model Formulation


The simple linear regression model is given by:
Y = β0 + β1 X + ε (3)
where:
• Y is the dependent variable (response variable),
• X is the independent variable (predictor),
• β0 is the intercept,
• β1 is the slope,
• ε is the random error term.

2
2.3 Estimation of Parameters
Using the method of least squares, the estimates of β0 and β1 are obtained as:

∑(Xi − X̄)(Yi − Ȳ )
β̂1 = (4)
∑(Xi − X̄)2

β̂0 = Ȳ − β̂1 X̄ (5)


where:

• X̄ = 1n ∑ Xi is the mean of X,

• Ȳ = 1n ∑ Yi is the mean of Y .

2.4 Goodness of Fit: R2


The coefficient of determination (R2 ) measures how well the regression line fits the data:
SSres
R2 = 1 − (6)
SStot
where:

• SStot = ∑(Yi − Ȳ )2 (Total Sum of Squares),

• SSres = ∑(Yi − Ŷi )2 (Residual Sum of Squares).

2.5 Implementation in R
# Load necessary library
library(ggplot2)

# Sample data
X <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Y <- c(2.3, 2.5, 3.1, 3.8, 4.2, 4.8, 5.0, 5.6, 6.1, 6.8)

# Fit linear model


model <- lm(Y ~ X)

# Display model summary


summary(model)

# Plot the regression line


plot(X, Y, main="Simple Linear Regression", xlab="X", ylab="Y", pch=19, col="blue")
abline(model, col="red", lwd=2)

ggplot(data.frame(X, Y), aes(x=X, y=Y)) +


geom_point(color=’blue’) +
geom_smooth(method=’lm’, col=’red’) +
labs(title="Simple Linear Regression", x="X", y="Y")

3
3 Wilcoxon Signed-Rank Test (One-Sample
& Paired)
The Wilcoxon Signed-Rank Test is a non-parametric alternative to the one-sample and paired t-tests.
It is used when the assumptions of normality are violated. This test assesses whether the median
of a single sample differs from a specified value (one-sample test) or whether the median difference
between paired observations is zero (paired test).

3.1 Assumptions
• The data are paired (for paired tests) or from a single sample (for one-sample tests).

• The differences (for paired tests) or sample values (for one-sample tests) are continuous and at
least ordinal.

• The differences are symmetrically distributed around the median.

3.2 Hypotheses
For the one-sample test:

H0 : The median of the sample is equal to a specified value m0 .


HA : The median of the sample is different from m0 .

For the paired test:

H0 : The median of the differences is zero.


HA : The median of the differences is not zero.

3.3 Test Statistic


For a paired test:

1. Compute differences di = Xi −Yi .

2. Rank absolute differences |di |, ignoring zeros.

3. Compute the test statistic:


W = ∑ R+ (7)
where R+ is the sum of ranks for positive differences.

3.4 Decision Rule


Compare the test statistic W to the critical value from the Wilcoxon Signed-Rank table (for small
samples) or use the normal approximation for large samples.

4
3.5 R Code
# One-sample test
x <- c(10, 12, 14, 15, 17, 18, 19, 21, 23, 25)
wilcox.test(x, mu = 15, alternative = "two.sided")

# Paired test
y <- c(9, 11, 13, 14, 16, 17, 18, 20, 22, 24)
wilcox.test(x, y, paired = TRUE, alternative = "two.sided")

4 Mann-Whitney U Test (Wilcoxon Rank-Sum Test)


Used for comparing two independent groups.

4.1 Formula
n1 (n1 + 1)
U = n1 n2 + − R1 (8)
2
where R1 is the sum of ranks for group 1.

4.2 R Code
group1 <- c(10, 15, 14, 18, 20)
group2 <- c(12, 17, 16, 19, 22)
wilcox.test(group1, group2, alternative = "two.sided")

5 Kruskal-Wallis Test
Used for comparing more than two independent groups (non-parametric ANOVA).

5.1 Formula
12 R2j
N(N + 1) ∑ n j
H= − 3(N + 1) (9)

where R j is the sum of ranks for group j and n j is the sample size for group j.

5.2 R Code
groupA <- c(15, 18, 21, 24, 27)
groupB <- c(17, 20, 23, 26, 29)
groupC <- c(19, 22, 25, 28, 31)
data <- data.frame(values = c(groupA, groupB, groupC),
group = factor(rep(1:3, each = 5)))
kruskal.test(values ~ group, data = data)

6 Spearman’s Rank Correlation


Measures the strength and direction of a monotonic relationship.

5
6.1 Formula
6 ∑ di2
ρ = 1− (10)
n(n2 − 1)
where di is the rank difference.

6.2 R Code
x <- c(1, 2, 3, 4, 5)
y <- c(2, 3, 5, 6, 7)
cor.test(x, y, method = "spearman")

7 Kendall’s Tau Correlation


Used for ordinal data or small sample sizes.

7.1 Formula
C−D
τ= (11)
C+D
where C is the number of concordant pairs and D is the number of discordant pairs.

7.2 R Code
cor.test(x, y, method = "kendall")

8 Friedman Test
Used for comparing repeated measures across multiple treatments.

8.1 Formula
12
nk(k + 1) ∑ j
Q= R2 − 3n(k + 1) (12)

where k is the number of conditions.

8.2 R Code
treatment1 <- c(15, 20, 25)
treatment2 <- c(18, 22, 28)
treatment3 <- c(16, 21, 27)
friedman.test(y = c(treatment1, treatment2, treatment3),
groups = rep(1:3, each = 3),
blocks = rep(1:3, 3))

You might also like