[go: up one dir, main page]

0% found this document useful (0 votes)
109 views12 pages

R Studio How To

This document provides an overview of common commands in R studio for conducting data analysis. It covers general commands for working with datasets, summary statistics, probability distributions, confidence intervals, plots, correlation matrices, regression models, and residual diagnostics. The document serves as a reference for important functions to attach, save, modify and analyze data in R studio.

Uploaded by

Gerad Teo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views12 pages

R Studio How To

This document provides an overview of common commands in R studio for conducting data analysis. It covers general commands for working with datasets, summary statistics, probability distributions, confidence intervals, plots, correlation matrices, regression models, and residual diagnostics. The document serves as a reference for important functions to attach, save, modify and analyze data in R studio.

Uploaded by

Gerad Teo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

R studio how to…

GENERAL COMMANDS ................................................................................................................................................ 3


ATTACH A DATASET ............................................................................................................................................................... 3
SAVE A DATASET ................................................................................................................................................................... 3
CREATE A NEW DATASET FROM AN EXISTING DATASET ................................................................................................................. 3
DELETE VARIABLES FROM A DATASET ........................................................................................................................................ 3
EXCLUDE NA FROM A DATASET ............................................................................................................................................... 3
COMBINE DATAFRAMES ......................................................................................................................................................... 3
RENAME A DATASET .............................................................................................................................................................. 3
ESTIMATE THE LOGARITHM OF A SERIES .................................................................................................................................... 3
RECODE A CATEGORICAL VARIABLE ........................................................................................................................................... 3

SUMMARY STATISTICS COMMANDS ............................................................................................................................ 4


PRODUCE SUMMARY STATISTICS .............................................................................................................................................. 4
ESTIMATE THE MEAN ............................................................................................................................................................. 4
ESTIMATE THE VARIANCE........................................................................................................................................................ 4
ESTIMATE THE STANDARD DEVIATION ....................................................................................................................................... 4

THE NORMAL AND THE STUDENT- T DISTRIBUTION ..................................................................................................... 5


ESTIMATE THE PROBABILITY DENSITY FUNCTION FOR THE NORMAL DISTRIBUTION ............................................................................. 5
ESTIMATE THE INTEGRAL FROM −∞ TO Q OF THE PDF OF THE NORMAL DISTRIBUTION WHERE Q IS A Z-SCORE ........................................ 5
ESTIMATE THE Z-SCORE OF THE PTH QUANTILE OF THE NORMAL DISTRIBUTION ................................................................................ 5
ESTIMATE A VECTOR OF NORMALLY DISTRIBUTED RANDOM NUMBERS ............................................................................................ 5
ESTIMATE THE DENSITY FUNCTION FOR THE T DISTRIBUTION WITH DF DEGREES OF FREEDOM.............................................................. 5
ESTIMATE THE DISTRIBUTION FUNCTION FOR THE T DISTRIBUTION WITH DF DEGREES OF FREEDOM ...................................................... 5
ESTIMATE THE QUANTILE FUNCTION FOR THE T DISTRIBUTION WITH DF DEGREES OF FREEDOM ............................................................ 5
ESTIMATE RANDOM NUMBERS WITH T DISTRIBUTION .................................................................................................................. 5

CONFIDENCE INTERVALS.............................................................................................................................................. 6
ESTIMATE A 90% CONFIDENCE INTERVAL ................................................................................................................................. 6
ESTIMATE A 95% CONFIDENCE INTERVAL ................................................................................................................................. 6
ESTIMATE A 99% CONFIDENCE INTERVAL ................................................................................................................................. 6
ESTIMATE A P-VALUE T-DISTRIBUTION ...................................................................................................................................... 6

PLOTS COMMANDS ..................................................................................................................................................... 7


PRODUCE A HISTOGRAM ........................................................................................................................................................ 7
ADJUST THE X- AND Y-AXIS OF A HISTOGRAM ............................................................................................................................. 7
ADJUST THE NUMBER OF BINS OF A HISTOGRAM ......................................................................................................................... 7
PRODUCE A SCATTER PLOT ...................................................................................................................................................... 7
ADD A MAIN TITLE IN A SCATTER PLOT ...................................................................................................................................... 7
ADD THE REGRESSION LINE IN A SCATTER PLOT ........................................................................................................................... 7
PLOT TWO GRAPHS SIDE BY SIDE .............................................................................................................................................. 7
PLOT TWO GRAPHS ON THE SAME AXIS ..................................................................................................................................... 7
PLOT A GRAPH AND ADD A LEGEND .......................................................................................................................................... 7

CORRELATION MATRIX COMMANDS ........................................................................................................................... 8


ESTIMATE THE CORRELATION MATRIX ....................................................................................................................................... 8

REGRESSION COMMANDS ........................................................................................................................................... 9


ESTIMATE AN OLS MODEL...................................................................................................................................................... 9
STORE THE RESIDUALS OF AN OLS MODEL ................................................................................................................................. 9
ESTIMATE ROBUST STANDARD ERRORS ..................................................................................................................................... 9
ACCESS IF TWO VARIABLES ARE JOINTLY SIGNIFICANT ................................................................................................................... 9

Dr. Maria Giamouzi 1


RESIDUAL DIAGNOSTIC COMMANDS ......................................................................................................................... 10
ESTIMATE THE WHITE TEST - TESTING FOR HETEROSKEDASTICITY ................................................................................................. 10
ESTIMATE WHITE TEST - TESTING FOR HETEROSKEDASTICITY ....................................................................................................... 10
ESTIMATE BREUSCH AND PAGAN TEST - TESTING FOR HETEROSKEDASTICITY .................................................................................. 10
PLOT THE REGRESSION LINE AND THE DATA .............................................................................................................................. 10
ESTIMATE THE RESET TEST .................................................................................................................................................. 11
ESTIMATE JARQUE-BERA TEST TO ACCESS THE NORMALITY OF THE RESIDUALS ................................................................................ 11
ESTIMATE THE AIC VALUES ................................................................................................................................................... 11
ESTIMATE THE BIC VALUES ................................................................................................................................................... 11

CROSS VALIDATION COMMANDS .............................................................................................................................. 12


CONSTRUCT THE IN-SAMPLE AND OUT-OF-SAMPLE PERIOD ......................................................................................................... 12
ESTIMATE THE OLS MODEL USING THE IN-SAMPLE AND OUT-OF-SAMPLE PERIOD ........................................................................... 12

Dr. Maria Giamouzi 2


General Commands

attach a dataset
attach(name of the dataset)
example – attach(data)

save a dataset
save(name of the new dataset,file="name of the file.Rda")
example - save(combined,file="inc_literacy_2016.Rda")

create a new dataset from an existing dataset


NameOfTheNewDataset <- ExistingDatasetName[,c(“NameOfVariable#1”, “NameOfVariable#2”,…)]
example - temp <- income[,c("Country.Name","X2016")]

delete variables from a dataset


NameOfTheNewDataset <- subset(NameOfExistingDataset = -c(“NameOfVariable#1ToBeEliminated”,
“NameOfVariable#2 ToBeEliminated”,…)
example – hprices2 <- subset(hprices1=-c(`prices`))

exclude NA from a dataset


NameOfDataset_NA <- na.exclude(NameOfDataset)

combine dataframes
NewDatasetName <- cbind(NameOfDataframe1, NameOfDataframe1,..)
example - combined <- cbind(temp,literacy[,"X2016"])

rename a dataset
NewName <- OldName

estimate the logarithm of a series


New_variable_name <- log(Name_of_the_Series)

recode a categorical variable


library(car)
accepted <- recode(Variable_Name, " 'yes' = 0; 'no' = 1; ", as.factor = FALSE)

Dr. Maria Giamouzi 3


Summary Statistics Commands

produce summary statistics


summary (Name_of_dataset)

estimate the mean


mean(Name_of_variable, na.rm=TRUE)

estimate the variance


var(Name_of_variable, na.rm = TRUE)

estimate the standard deviation


sqrt(var(Name_of_variable, na.rm = TRUE))
or
sd(Name_of_variable, na.rm = TRUE)

Dr. Maria Giamouzi 4


The Normal and the Student- t Distribution

estimate the probability density function for the normal distribution


dnorm(state the value)

estimate the integral from −∞ to q of the pdf of the normal distribution where q
is a Z-score
pnorm(value)

estimate the Z-score of the pth quantile of the normal distribution


qnorm(value)

estimate a vector of normally distributed random numbers


rnorm(number of random numbers to be generated,mean,sd)
example - rnorm(10,mean=5,sd = 2)

estimate the density function for the t distribution with df degrees of freedom
dt(value, df)

estimate the distribution function for the t distribution with df degrees of


freedom
pt(value, df)

estimate the quantile function for the t distribution with df degrees of freedom
qt(value, df)

estimate random numbers with t distribution


rt(10,20)

Dr. Maria Giamouzi 5


Confidence Intervals

estimate a 90% Confidence Interval


ci = 0.90
dfs = state the degrees of freedom
tval <- qt((ci +0.5*(1 - ci)), dfs)

upper_bound <- coeeficient + tval*Standard_Error


lower_bound <- coeeficient - tval*Standard_Error

estimate a 95% Confidence Interval


ci = 0.95
dfs = 38
tval <- qt((ci +0.5*(1 - ci)), dfs)
upper_bound <- coeeficient + tval*Standard_Error
lower_bound <- coeeficient - tval*Standard_Error

estimate a 99% Confidence Interval


ci = 0.99
dfs = 38
tval <- qt((ci +0.5*(1 - ci)), dfs)
upper_bound <- coeeficient + tval*Standard_Error
lower_bound <- coeeficient - tval*Standard_Error

estimate a p-value t-distribution


pvalue <- pt(t_statistic_value, degrees_of_freedom, lower.tail = TRUE)
pvalue <- pt(t_statistic_value, degrees_of_freedom, lower.tail = FALSE)

Dr. Maria Giamouzi 6


Plots commands

produce a histogram
hist(name_of_variable)

adjust the x- and y-axis of a histogram


hist(name_of_variable, xlim=c(lower value,max value), ylim=c(lower value,max value))

adjust the number of bins of a histogram


hist(name_of_variable, breaks=number_of_bins)

produce a scatter plot


plot(name_of_variable_x, name_of_variable_y)

add a main title in a scatter plot


plot(name_of_variable_x, name_of_variable_y, main="TITLE")

add the regression line in a scatter plot


abline(name_of_variable_y~name_of_variable_x)

plot two graphs side by side


par(mfrow=c(1,2))
hist(variable name 1) hist(variable name 2)

plot two graphs on the same axis


par(mfrow=c(1,1)) # set the plot to a single axes
p1 <- hist(variable name 1)
p2 <- hist(variable name 2)
plot(p1, col=“green”, xlim =c(lower bound,upper bound), main = "TITLE")
#the col option specifies the colouring of the bars
plot(p2, col= “red”, xlim=c(lower bound,upper bound), add=T)
#the add=T option adds the second distribution onto the existing plot

plot a graph and add a legend


png(filename="Name_of_Graph.png")
#this line, prior to the plotting commands, leads to the plot being saved in #the current directory, with the specified
filename and format
plot(Name_of_variable, type = "l", col = "green", main = "TITLE")
lines(Name_of_variable, type = "l", col = "black")
legend("topright", legend=c("Name of Variable 1", " Name of Variable 2"), col=c("green", "black"), lwd = 2)
dev.off() # this line `finishes' the command to save the plot to the file named above

Dr. Maria Giamouzi 7


Correlation Matrix Commands

estimate the correlation matrix


install.packages (“corrplot”)
Correlation_Matrix <- cor(na.omit(Name_of_dataset))

library(corrplot)
corrplot(Correlation_Matrix,method="number",type="lower")
or
corrplot(Correlation_Matrix,method="number",type="upper")
or
corrplot(Correlation_Matrix,method="circle")
or
corrplot(Correlation_Matrix,method="pie")
or
corrplot(Correlation_Matrix,method=="color")

Here you can find additional commands on how to modify your correlation matrix.

Dr. Maria Giamouzi 8


Regression Commands

estimate an OLS model


name_the_model < - lm(dependent_variable ~ independent variable_1 + independent variable_2,…)
summary (Name_the_model)

store the residuals of an OLS model


name_the_model < - lm(dependent_variable ~ independent variable_1 + independent variable_2,…)
summary (Name_the_model)

resids <- resid(Name_the_model)

estimate robust standard errors


cov1 <- hccm(Name_the_model)
robustse <- sqrt(diag(cov1))
standard_se <- sqrt(diag(vcov(eq1)))
t_values_with_RSE <- Name_the_model$coefficients/robustse
pvalues <- pt(Name_the_model$coefficients/robustse, df, lower.tail = FALSE)

access if two variables are jointly significant


linearHypothesis(Name_the_model, c("Variable_name_1=0", " Variable_name_1=0"))

or
H0 <- c("Variable_name_1 = 0", " Variable_name_2= 0")
#we have specified each parameter = 0 to separately, but the function runs the test of both jointly, so it is a
joint test
linearHypothesis(Name_the_model, H0)

Dr. Maria Giamouzi 9


Residual Diagnostic Commands

estimate the White test - testing for heteroskedasticity


name_the_model < - lm(dependent_variable ~ independent variable_1 + independent variable_2,…)
summary (Name_the_model)

resids <- resid(Name_the_model)


sqres < resids^2

white1 <- lm(sqres ~ independent variable_1 + independent variable_2,… independent variable_1^2,…,


independent variable_1* independent variable_2,…)
summary(white1)

estimate White test - testing for heteroskedasticity


name_the_model < - lm(dependent_variable ~ independent variable_1 + independent variable_2,…)
summary (Name_the_model)

yhat <- Name_the_model$fitted.values


white2 <- lm(sqres ~ yhat + I(yhat^2))
summary(white2)

estimate Breusch and Pagan Test - testing for heteroskedasticity


library(wooldridge)
library(car)
name_the_model < - lm(dependent_variable ~ independent variable_1 + independent variable_2,…)
summary (Name_the_model)
ncvTest(Name_the_model)

H0: the residuals are homoscedastic


H1: the residuals are heteroscedastic

If the p-value is less than the significance level we have to reject the null hypothesis, therefore that implies
that the residuals are heteroscedastic.

If the p-value is greater than the significance level we have to accept the null hypothesis, therefore that
implies that the residuals are homoscedastic.

plot the regression line and the data


plot(independent_variable, dependent_variable, main="TITLE")
abline(dependent_variable ~ independent_variable)

Dr. Maria Giamouzi 10


estimate the RESET test
name_the_model < - lm(dependent_variable ~ independent variable_1 + independent variable_2,…)
summary (Name_the_model)

resids <- Name_the_model $residuals


yhats <- Name_the_model $fitted.values
reseteq <- lm(resids ~ I(yhats^2))
summary(reseteq)
reseteq2 <- lm(resids ~ I(yhats^2) + I(yhats^3))
summary(reseteq2)

estimate Jarque-Bera test to access the normality of the residuals


install.packages(“xts”)
install.packages (“tseries”)

library(tseries)
name_the_model < - lm(dependent_variable ~ independent variable_1 + independent variable_2,…)
summary (Name_the_model)

resids <- Name_the_model $residuals


jarque.bera.test(resids)

H0: The residuals are normally distributed


H1: The residuals are not normally distributed

If the p-value is less than the significance level we have to reject the null hypothesis, therefore that implies
that the residuals are not normally distributed.
If the p-value is greater than the significance level we have to accept the null hypothesis, therefore that
implies that the residuals are normally distributed.

estimate the AIC values


name_the_model < - lm(dependent_variable ~ independent variable_1 + independent variable_2,…)
summary (Name_the_model)
library(stats)
aic <- AIC(Name_the_model)

estimate the BIC values


name_the_model < - lm(dependent_variable ~ independent variable_1 + independent variable_2,…)
summary (Name_the_model)
library(nlme)
bic <- BIC(Name_the_model)

Dr. Maria Giamouzi 11


Cross Validation Commands

construct the in-sample and out-of-sample period


In_Sample_Period <- Dataset_Name[1:200,]
Out_of_Sample_Period <- Dataset_Name [-(1:200),]

200 is just a random number that needs to be adjusted depending on the data under investigation.

estimate the OLS model using the in-sample and out-of-sample period
name_the_model < - lm(dependent_variable ~ independent variable_1 + independent variable_2,…, data=
In_Sample_Period)
summary (Name_the_model)

# Then use this model to predict the “out-of-sample” values


ypred2 <- predict.lm(name_the_model, newdata = Out_of_Sample_Period)
errors2 <- In_Sample_Period $Dependent_Variable - ypred2
plot(errors2)

or
name_the_model < - lm(dependent_variable ~ independent variable_1 + independent variable_2,…, data=
In_Sample_Period)
summary (Name_the_model)
errors <- resids (name_the_model)

name_the_model_2 < - lm(dependent_variable ~ independent variable_1 + independent variable_2,…, data=


Out_of_Sample_Period)
summary (Name_the_model_2)
resids <- resids (name_the_model_2)

sde1 <- sqrt(var(errors))


sde2 <- sqrt(var(resids))

Dr. Maria Giamouzi 12

You might also like