A (Very) Short Introduction To R: Paul Torfs & Claudia Brauer
A (Very) Short Introduction To R: Paul Torfs & Claudia Brauer
A (Very) Short Introduction To R: Paul Torfs & Claudia Brauer
4 August 2017
1 Introduction www.r-project.org
R is a powerful language and environment for statisti- and do the following (assuming you work on a win-
cal computing and graphics. It is a public domain (a dows computer):
so called “GNU”) project which is similar to the com-
1. click download CRAN in the left bar
mercial S language and environment which was devel-
2. choose a download site
oped at Bell Laboratories (formerly AT&T, now Lu-
3. choose Windows as target operation system
cent Technologies) by John Chambers and colleagues.
R can be considered as a different implementation of 4. click base
S, and is widely used as an educational language and 5. choose Download R 3.4.1 for Windows2 and
research tool. choose default answers for all questions
The main advantages of R are the fact that R is It is also possible to run R and RStudio from a USB
freeware and that there is a lot of help available on- stick instead of installing them. This is useful when
line. It is quite similar to other programming packages you don’t have administrator rights on your computer.
such as MatLab (not freeware), but more user-friendly See our separate note “How to use portable versions
than programming languages such as C++ or Fortran. of R and RStudio" for help on this topic.
You can use R as it is, but for educational purposes we
prefer to use R in combination with the RStudio in-
2.2 Install RStudio
terface (also freeware), which has an organized layout
and several extra options. After finishing this setup, you should see an "R" icon
This document contains explanations, examples on you desktop. Clicking on this would start up the
and exercises, which can also be understood (hope- standard interface. We recommend, however, to use
fully) by people without any programming experience. the RStudio interface.3 To install RStudio, go to:
Going through all text and exercises takes about 1 or
www.rstudio.com
2 hours. Examples of frequently used commands and
error messages are listed on the last two pages of this and do the following:
document and can be used as a reference while pro-
gramming. 1. under Download RStudio, click Download
2. below RStudio Desktop (free), click Download
3. click the version for your operating system
2 Getting started 4. download the .exe file and run it (choose default
answers for all questions)
2.1 Install R
To install R on your computer (legally for free!), go 2.3 RStudio layout
to the home website of R1 :
The RStudio interface consists of several windows (see
1
On the R-website you can also find an older version of “A Figure 1).
(very) short introduction to R” : cran.r-project.org/doc/
2
contrib/Torfs+Brauer-Short-R-Intro.pdf. The last ver- At the moment of writing 3.4.1 was the latest version.
sion of this document is always published on www.github.com/ Choose the most recent one.
3
ClaudiaBrauer/A-very-short-introduction-to-R. There are many other (freeware) interfaces, such as Tinn-R.
1
Figure 1: The editor, workspace, console and plots windows in RStudio.
• Bottom left: console window (also called com- You can change the size of the windows by drag-
mand window). Here you can type commands ging the grey bars between the windows.
after the “>” prompt and R will then execute
your command. This is the most important win-
dow, because this is where R actually does stuff. 2.4 Working directory
2
2.5 Libraries 3.2 Workspace
R can do many statistical and data analyses. They You can also give numbers a name. By doing so, they
are organized in so-called packages or libraries. With become so-called variables which can be used later.
the standard installation, most common packages are For example, you can type in the command window:
installed.
To get a list of all installed packages, go to the > a = 4
packages window or type library() in the console
window. If the box in front of the package name is You can see that a appears in the workspace window,
ticked, the package is loaded (activated) and can be which means that R now remembers what a is.4 You
used. can also ask R what a is (just type a ENTER in the
There are many more packages available on the R command window):
website. If you want to install and use a package (for
example, the package called “geometry") you should: > a
• Install the package: click install packages in [1] 4
the packages window and type geometry or type
install.packages("geometry") in the command
window. or do calculations with a:
• Load the package: check box in front of geometry
or type library("geometry") in the command win- > a * 5
dow. [1] 20
3 First examples of R commands If you specify a again, it will forget what value it
had before. You can also assign a new value to a
3.1 Calculator using the old one.
ToDo
or click “clear all" in the workspace window. You can
Compute the difference between 2014 and the year
see that RStudio then empties the workspace window.
you started at this university and divide this by
If you only want to remove the variable a, you can type
the difference between 2014 and the year you were
rm(a).
born. Multiply this with 100 to get the percentage
of your life you have spent at this university. Use
brackets if you need them. ToDo
Repeat the previous ToDo, but with several steps
in between. You can give the variables any name
If you use brackets and forget to add the closing you want, but the name has to start with a letter.
bracket, the “>" on the command line changes into
a “+". The “+" can also mean that R is still busy 4
Some people prefer te use <- instead of = (they do the
with some heavy computation. If you want R to quit
same thing). <- consists of two characters, < and -, and repre-
what it was doing and give back the “>", press ESC sents an arrow pointing at the object receiving the value of the
(see the reference list on the last page). expression.
3
3.3 Scalars, vectors and matrices
1 > rnorm(10)
Like in many other programs, R organizes numbers in 2 [1] -0.949 1.342 -0.474 0.403
scalars (a single number – 0-dimensional), vectors (a 3 [5] -0.091 -0.379 1.015 0.740
row of numbers, also called arrays – 1-dimensional) 4 [9] -0.639 0.950
and matrices (like a table – 2-dimensional).
The a you defined before was a scalar. To de-
• Line 1 contains the command: rnorm is the func-
fine a vector with the numbers 3, 4 and 5, you need
tion and the 10 is an argument specifying how many
the function5 c, which is short for concatenate (paste
random numbers you want — in this case 10 numbers
together).
(typing n=10 instead of just 10 would also work).
b=c(3,4,5) • Lines 2-4 contain the results: 10 random numbers
organised in a vector with length 10.
Matrices and other 2-dimensional structures will Entering the same command again produces 10
be introduced in Section 6. new random numbers. Instead of typing the same text
again, you can also press the upward arrow key (↑) to
3.4 Functions access previous commands. If you want 10 random
numbers out of normal distribution with mean 1.2
If you would like to compute the mean of all the el-
and standard deviation 3.4 you can type
ements in the vector b from the example above, you
could type > rnorm(10, mean=1.2, sd=3.4)
> (3+4+5)/3
showing that the same function (rnorm) may have
But when the vector is very long, this is very boring different interfaces and that R has so called named
and time-consuming work. This is why things you arguments (in this case mean and sd). By the way,
do often are automated in so-called functions. Some the spaces around the “," and “=" do not matter.
functions are standard in R or in one of the pack- Comparing this example to the previous one also
ages. You can also program your own functions (Sec- shows that for the function rnorm only the first ar-
tion 11.3). When you use a function to compute a gument (the number 10) is compulsory, and that R
mean, you’ll type: gives default values to the other so-called optional
arguments.6
> mean(x=b)
RStudio has a nice feature: when you type rnorm(
in the command window and press TAB, RStudio will
Within the brackets you specify the arguments.
show the possible arguments (Fig. 2).
Arguments give extra information to the function. In
this case, the argument x says of which set of num-
bers (vector) the mean should computed (namely of 3.5 Plots
b). Sometimes, the name of the argument is not nec-
R can make graphs. This is a very simple7 example:
essary: mean(b) works as well.
1 > x = rnorm(100)
ToDo
2 > plot(x)
Compute the sum of 4, 5, 8 and 11 by first combin-
ing them into a vector and then using the function
• In the first line, 100 random numbers are assigned
sum.
to the variable x, which becomes a vector by this
operation.
• In the second line, all these values are plotted in
The function rnorm, as another example, is a stan- the plots window.
dard R function which creates random samples from
a normal distribution. Hit the ENTER key and you will
6
Use the help function (Sect. 4) to see which values are used
see 10 random numbers as:
as default.
5 7
See next Section for the explanation of functions. See Section 7 for slightly less trivial examples.
4
Figure 2: RStudio shows possible arguments when you press TAB after the function name and bracket.
ToDo 5 Scripts
Plot 100 normal random numbers.
R is an interpreter that uses a command line based
environment. This means that you have to type com-
mands, rather than use the mouse and menus. This
4 Help and documentation has the advantage that you do not always have to
retype all commands and are less likely to get com-
There is a large amount of (free) documentation and
plaints of arms, neck and shoulders.
help available. Some help is automatically installed.
You can store your commands in files, the so-called
Typing in the console window the command
scripts. These scripts have typically file names with
> help(rnorm) the extension .R, e.g. foo.R. You can open an editor
gives help on the rnorm function. It gives a descrip- window to edit these files by clicking File and New
tion of the function, possible arguments and the val- or Open file...8 .
ues that are used as default for optional arguments. You can run (send to the console window) part of
Typing the code by selecting lines and pressing CTRL+ENTER
or click Run in the editor window. If you do not select
> example(rnorm)
anything, R will run the line your cursor is on. You can
gives some examples of how the function can be used. always run the whole script with the console command
An HTML-based global help can be called with: source, so e.g. for the script in the file foo.R you
> help.start() type:
5
6 Data structures
1 mat=matrix(data=c(9,2,3,4,5,6),ncol=3)
If you are unfamiliar with R, it makes sense to just 2 > mat
retype the commands listed in this section. Maybe 3 [,1] [,2] [,3]
you will not need all these structures in the beginning, 4 [1,] 9 3 5
but it is always good to have at least a first glimpse 5 [2,] 2 4 6
of the terminology and possible applications.
The argument data specifies which numbers should
be in the matrix. Use either ncol to specify the num-
6.1 Vectors ber of columns or nrow to specify the number of rows.
6
9 [1] 8.666667 Histogram of rnorm(100)
10 > mean(t[["z"]])
11 [1] 8.666667
20
15
• In lines 1-2 a typical data frame called t is
Frequency
constructed. Its columns have the names x, y and z.
10
• Line 8-11 show two ways of how you can select the
5
column called z from the data frame called t.
0
−3 −2 −1 0 1 2
rnorm(100)
ToDo
Make a script file which constructs three random Figure 3: A simple histogram plot.
normal vectors of length 100. Call these vectors x1,
x2 and x3. Make a data frame called t with three
columns (called a, b and c) containing respectively
7 Graphics
x1, x1+x2 and x1+x2+x3. Call plot(t) for this
Plotting is an important statistical activity. So it
data frame. Can you understand the results? Re-
should not come as a surprise that R has many plot-
run this script a few times.
ting facilities. The following lines show a simple plot:
7
To learn more about formatting plots, search for
par in the R help. Google “R color chart" for a pdf
file with a wealth of color options.
To copy your plot to a document, go to the plots
window, click the “Export" button, choose the nicest
width and height and click Copy or Save.
Figure 4: The files tst0.txt of section 8 (left) and
tst1.txt from the ToDo below (right) opened in two
8 Reading and writing data files text editors.
> j = c(1,2,NA)
• In lines 1-2, a simple example data frame is
constructed and stored in the variable d.
Computing statistics of incomplete data sets is
• Lines 3-7 show the content of this data frame:
strictly speaking not possible. Maybe the largest value
two columns (called a and b), each containing three
occurred during the weekend when you didn’t mea-
numbers.
sure. Therefore, R will say that it doesn’t know what
• Line 8 writes this data frame to a text file, called
the largest value of j is:
tst0.txt The argument row.names=FALSE pre-
vents that row names are written to the file. Because
> max(j)
nothing is specified about col.names, the default
[1] NA
option col.names=TRUE is chosen and column
names are written to the file. Figure 4 shows the
resulting file (opened in an editor, such as Notepad), If you don’t mind about the missing data and want
with the column names (a and b) in the first line. to compute the statistics anyway, you can add the
• Lines 10-11 illustrate how to read a file into a argument na.rm=TRUE (Should I remove the NAs?
data frame. Note that the column names are also Yes!).
read. The data frame also appears in the workspace
window. > max(j, na.rm=TRUE)
[1] 2
8
10 Classes • In line 3 the argument format specifies how the
character string should be read. In this case the year
The exercises you did before were nearly all with num- is denoted first (%Y), then the month (%m), day
bers. Sometimes you want to specify something which (%d), hour (%H), minute (%M) and second (%S).
is not a number, for example the name of a measure- You don’t have to specify all of them, as long as the
ment station or data file. In that case you want the format corresponds to the character string.
variable to be a character string instead of a number.
An object in R can have several so-called classes. ToDo
The most important three are numeric, character and Make a graph with on the x-axis: today, Sinterklaas
POSIX (date-time combinations). You can ask R 2017 and your next birthday and on the y-axis the
what class a certain variable is by typing class(...). number of presents you expect on each of these
days. Tip: make two vectors first.
10.1 Characters
To tell R that something is a character string, you
should type the text between apostrophes, otherwise 11 Programming tools
R will start looking for a defined variable with the
same name: When you are building a larger program than in the ex-
amples above or if you’re using someone else’s scripts,
> m = "apples"
you may encounter some programming statements. In
> m
this Section we describe a few tips and tricks.
[1] "apples"
> n = pears
Error: object ‘pears’ not found 11.1 If-statement
The if-statement is used when certain computations
Of course, you cannot do computations with char- should only be done when a certain condition is met
acter strings: (and maybe something else should be done when the
condition is not met). An example:
> m + 2
Error in m + 2 : non-numeric argument to 1 > w = 3
binary operator 2 > if( w < 5 )
3 {
10.2 Dates 4 d=2
5 }else{
Dates and times are complicated. R has to know
6 d=10
that 3 o’clock comes after 2:59 and that February
7 }
has 29 days in some years. The easiest way to tell R
8 > d
that something is a date-time combination is with the
9 2
function strptime:
9
these statements are evaluated without any explicit
1 > a = c(1,2,3,4) error messages.
2 > b = c(5,6,7,8)
3 > f = a[b==5 | b==8] ToDo
4 > f Make a vector from 1 to 100. Make a for-loop
5 [1] 1 4 which runs through the whole vector. Multiply
the elements which are smaller than 5 and larger
• In line 1 and 2 two vectors are made. than 90 with 10 and the other elements with 0.1.
• In line 3 you say that f is composed of those
elements of vector a for which b equals 5 or 8.
10
12 Some useful references • mean: mean of a vector
• sd: standard deviation of a vector
12.1 Functions • max or min: largest or smallest element
• rowSums (or rowMeans, colSums and colMeans):
This is a subset of the functions explained in the R
sums (or means) of all numbers in each row (or
reference card.
column) of a matrix. The result is a vector.
• quantile(x,c(0.1,0.5)): sample the 0.1 and
Data creation 0.5th quantiles of vector x
• read.table: read a table from file. Arguments:
header=TRUE: read first line as titles of the columns;
sep=",": numbers are separated by commas; Data processing
skip=n: don’t read the first n lines. • seq: create a vector with equal steps between the
• write.table: write a table to file numbers
• c: paste numbers together to create a vector • rnorm: create a vector with random numbers
• array: create a vector, Arguments: dim: length • with normal distribution (other distributions are also
matrix: create a matrix, Arguments: ncol and/or available)
nrow: number of rows/columns • sort: sort elements in increasing order
• data.frame: create a data frame • t: transpose a matrix
• list: create a list • aggregate(x,by=ls(y),FUN="mean"): split
• rbind and cbind: combine vectors into a matrix data set x into subsets (defined by y) and computes
by row or column means of the subsets. Result: a new list.
• na.approx: interpolate (in zoo package). Argu-
ment: vector with NAs. Result: vector without NAs.
Extracting data
• cumsum: cumulative sum. Result is a vector.
• x[n]: the nth element of a vector
• rollmean: moving average (in the zoo package)
• x[m:n]: the mth to nth element
• paste: paste character strings together
• x[c(k,m,n)]: specific elements
• substr: extract part of a character string
• x[x>m & x<n]: elements between m and n
• x$n: element of list or data frame named n
• x[["n"]]: idem Fitting
• [i,j]: element at ith row and jth column • lm(v1∼v2): linear fit (regression line) between
• [i,]: row i in a matrix vector v1 on the y-axis and v2 on the x-axis
• nls(v1∼a+b*v2, start=ls(a=1,b=0)): non-
Information on variables linear fit. Should contain equation with variables
• length: length of a vector (here v1 and v2 and parameters (here a and b) with
• ncol or nrow: number of columns or rows in a starting values
matrix • coef: returns coefficients from a fit
• class: class of a variable • summary: returns all results from a fit
• names: names of objects in a list
• print: show variable or character string on the Plotting
screen (used in scripts or for-loops) • plot(x): plot x (y-axis) versus index number
• return: show variable on the screen (used in (x-axis) in a new window
functions) • plot(x,y): plot y (y-axis) versus x (x-axis) in a
• is.na: test if variable is NA new window
• as.numeric or as.character: change class to • image(x,y,z): plot z (color scale) versus x
number or character string (x-axis) and y (y-axis) in a new window
• strptime: change class from character to date- • lines or points: add lines or points to a previous
time (POSIX) plot
• hist: plot histogram of the numbers in a vector
Statistics • barplot: bar plot of vector or data frame
• sum: sum of a vector (or matrix) • contour(x,y,z): contour plot
11
• abline: draw line (segment). Arguments: a,b for Not R-specific, but very useful keyboard shortcuts:
intercept a and slope b; or h=y for horizontal line at • CTRL+C, CTRL+X and CTRL+V: copy, cut and paste
y; or v=x for vertical line at x. • ALT+TAB: change to another program window
• curve: add function to plot. Needs to have an x • ↑, ↓, ← or →: move cursor
in the expression. Example: curve(x^2) • HOME or END: move cursor to begin or end of line
• legend: add legend with given symbols (lty • Page Up or Page Down: move cursor one page up
or pch and col) and text (legend) at location or down
(x="topright") • SHIFT+↑/↓/←/→/HOME/END/PgUp/PgDn: select
• axis: add axis. Arguments: side – 1=bottom,
2=left, 3=top, 4=right
• mtext: add text on axis. Arguments: text 12.3 Error messages
(character string) and side
• No such file or directory or Cannot change
• grid: add grid
working directory
• par: plotting parameters to be specified before the
Make sure the working directory and file names are
plots. Arguments: e.g. mfrow=c(1,3)): number of
correct.
figures per page (1 row, 3 columns); new=TRUE: draw
• Object ‘x’ not found
plot over previous plot.
The variable x has not been defined yet. Define x or
write apostrophes if x should be a character string.
Plotting parameters
• Argument ‘x’ is missing without default
These can be added as arguments to plot, lines,
You didn’t specify the compulsory argument x.
image, etc. For help see par.
•+
• type: "l"=lines, "p"=points, etc.
R is still busy with something or you forgot closing
• col: color – "blue", "red", etc
brackets. Wait, type } or ) or press ESC.
• lty: line type – 1=solid, 2=dashed, etc.
• Unexpected ’)’ in ")" or Unexpected ’}’
• pch: point type – 1=circle, 2=triangle, etc.
in "}"
• main: title - character string
The opposite of the previous. You try to close
• xlab and ylab: axis labels – character string
something which hasn’t been opened yet. Add
• xlim and ylim: range of axes – e.g. c(1,10)
opening brackets.
• log: logarithmic axis – "x", "y" or "xy"
• Unexpected ‘else’ in "else"
Put the else of an if-statement on the same line as
Programming
the last bracket of the “then"-part: }else{.
• function(arglist){expr}: function definition:
• Missing value where TRUE/FALSE needed
do expr with list of arguments arglist
Something goes wrong in the condition-part
• if(cond){expr1}else{expr2}: if-statement: if
(if(x==1)) of an if-statement. Is x NA?
cond is true, then expr1, else expr2
• The condition has length > 1 and only
• for(var in vec) {expr}: for-loop: the counter
the first element will be used
var runs through the vector vec and does expr each
In the condition-part (if(x==1)) of an if-statement,
run
a vector is compared with a scalar. Is x a vector?
• while(cond){expr}: while-loop: while cond is
Did you mean x[i]?
true, do expr each run
• Non-numeric argument to binary operator
You are trying to do computations with something
12.2 Keyboard shortcuts which is not a number. Use class(...) to find
out what went wrong or use as.numeric(...) to
There are several useful keyboard shortcuts for
transform the variable to a number.
RStudio (see Help → Keyboard Shortcuts):
• Argument is of length zero or Replacement
• CRL+ENTER: send commands from script window
is of length zero
to command window
The variable in question is NULL, which means that
• ↑ or ↓ in command window: previous or next
it is empty, for example created by c(). Check the
command
definition of the variable.
• CTRL+1, CTRL+2, etc.: change between the windows
12