Last Modified January 26, 2007
Basic Descriptive Statistics Using R
In the following handout words and symbols in bold are R functions and words and
symbols in italics are entries supplied by the user; underlined words and symbols are
optional entries (all current as of version R-2.4.1). Sample texts from an R session are
highlighted with gray shading.
Measures of Central Tendency
• mean(object) – provides the mean of the object’s elements
> quarters = c(5.683, 5.620, 5.551, 5.549, 5.536,
+ 5.552, 5.548, 5.539, 5.554, 5.552, 5.684, 5.632
> mean(quarters)
[1] 5.583333
• median(object) – provides the median of the object’s elements
> median(quarters)
[1] 5.552
• mode– there is no built in function for finding an object’s mode; however, the
command table(object) creates a frequency table for the object’s elements and the
mode is the element in this table with the greatest frequency
> table(quarters)
5.536 5.539 5.548 5.549 5.551 5.552 5.554 5.62 5.632 5.683 5.684
1 1 1 1 1 2 1 1 1 1 1
• midrange – there is no built in function for reporting the midrange; the command
shown below use the functions for an object’s maximum (max) and minimum (min)
to calculate and print the object’s midrange
> midrange = (max(quarters) + min(quarters))/2; midrange
[1] 5.61
1
Last Modified January 26, 2007
Measures of Spread
• var(object) – provides the sample variance of the object’s elements
> var(quarters)
[1] 0.003116606
• sd(object) – provides the sample standard deviation of the object’s elements
> sd(quarters)
[1] 0.05582657
• standard error of the mean – there is no built in function for reporting the standard
error of the mean; the command shown below use the functions for the object’s
standard deviation (sd) and number of elements (length), as well as the mathematical
function for finding a square root (sqrt) to calculate and print the object’s standard
error of the mean
> sem = sd(quarters)/sqrt(length(quarters)); sem
[1] 0.01611574
• range – there is no built in function for reporting the range; the command shown
below use the functions for an object’s maximum (max) and minimum (min)
elements to calculate and print the object’s range
> range = (max(quarters) – min(quarters)); range
[1] 0.148
• IQR(object) – provides the object’s interquartile range; note – this value may differ
slightly from that provided by other programs because there is no single accepted
definition for FU and FL
> IQR(quarters)
[1] 0.07425
2
Last Modified January 26, 2007
Quantitative and Visual Representations of a Distribution’s Shape
• skew(object) – provides the skewness for an object; this function is not included in R,
but is available from the file “skew&kurt.RData,” which is available on the course’s
I-drive account.
> skew(quarters)
[1] 0.8508155
• kurt(object) – provides the kurtosis for an object relative to that of a normal
distribution; this function is not included in R, but is available from the file
“skew&kurt.RData,” which is available on the course’s I-drive account.
> kurt(quarters)
[1] -1.075001
• hist(object) – creates a histogram of the object’s elements with the number of
compartments chosen by R.
> hist(quarters)
3
Last Modified January 26, 2007
• boxplot(object 1, object 2…, names, horizontal = TRUE) – creates a boxplot of the
object’s elements (for multiple objects, a boxplot is drawn for each); names is a
vector containing the names of the objects, which adds labels on the x-axis when
plotting more than one boxplot. Setting horizontal to TRUE (the default value is
FALSE) creates a horizontal boxplot.
> boxplot(quarters)