Biol2001 R-Lecture 1
Biol2001 R-Lecture 1
Course Convener
Dr. Stephen Zozaya A/Prof. Ana Sequeira
Part II Probability & Statistics, weeks 4 - 6 Part III Statistical Models, weeks 7 - 12
The R-part – organization
• Lectures are in-person. It really is a good idea to attend.
• Lectures recordings will be available on Echo360, slides on the Wattle.
• Workshops: The 3-hour workshop has two time-slots to choose from:
• We will learn fundamentals of R and computer programming, starting from complete novice
level.
• This part of BIOL2001/6200 course requires use of a computer (Windows, OS-X, or Linux)
with R and RStudio installed. Please bring fully charged laptop to the workshops.
• To program in R we will use the RStudio. Please install it on your laptop.
The R-part – organization
• How to install R and RStudio on your own laptop
Go to: https://www.rstudio.com/products/rstudio/download/#download
1. Install R.
2. Download and Install RStudio (the free version 'RStudio Desktop').
R and RStudio are supported on Windows, OS-X, Linux (Ubuntu, Fedora, Debian, OpenSUSE).
The R-part – organization
• RStudio installed and ready to go
What is R?
• R is an open source programming language and software environment for statistical
computing and graphics that is supported by the R Foundation for Statistical Computing. The
What is Sislanguage?
R language widely used among statisticians and data miners for developing statistical
S is a statistical programming language developed primarily by John Chambers and Rick
software and data analysis.
Becker and Allan Wilks of Bell Laboratories. The aim of the language, as expressed by John
Chambers, is “to turn ideas into software, quickly and faithfully”.
https://en.wikipedia.org/wiki/R_(programming_language)
• R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New
Zealand.
linear and nonlinear modeling, classical statistical tests, time-series analysis, classification,
• R is easily extensible through functions and extensions, and the R community is noted for its active
• Many of R's standard functions are written in R itself, which makes it easy for users to follow the
algorithmic choices made. For computationally intensive tasks, C, C++, and Fortran code can be linked
Knowing more than one language can help us to find the right tool for the job and avoid the
golden hammer pitfall (Law of The Instrument).
“It is tempting, if the only tool you have (know) is a hammer, to treat everything as if it was a
nail”. https://en.wikipedia.org/wiki/Law_of_the_instrument
Why R?
• R is the leading tool for statistics, data analysis, and machine learning in science.
• Popularity in the field. It’s good to speak the language your colleagues speak.
• R is interactively interpreted:
• You can open R session and work in it as you would work with a notebook.
• R will work with you in Read – Evaluate – Print Loop (REPL) meaning that when you type
a command and then presses the Enter key, R reads the command, evaluates it,
immediately prints its output, and is ready to read your next command.
RStudio IDE
What is R Studio?
• R Studio (written as RStudio without the space) is a free and open-source
Integrated Development Environment (IDE) for R. https://en.wikipedia.org/wiki/RStudio
R console
RStudio IDE
variables, structures,
history of your
your scripts (programs), commands
data, etc.
RStudio IDE
variables, structures,
history of your
your scripts (programs), commands
RStudio is a convenientdata,
tooletc.
for programming in R.
In computer programming, an assignment statement sets and/or re-sets the value stored in
the storage location(s) denoted by a variable name; in other words, it copies a value into the
variable. https://en.wikipedia.org/wiki/Assignment_(computer_science)
Variables and Assignments in R
• Assigning value to variable
𝜕 𝜕
( h ( 𝑥 ) 𝜑 ( 𝑥 ) ) + ( h ( 𝑦 ) 𝜑 ( 𝑦 ) ) =0
𝜕𝑥
2
symbol 𝜕<-𝑦 value 2
left-hand
x=side8.34
assignment operator right-hand side
← value 8.34
assigned to variable x
The right-hand side, or the expression is evaluated and the value of the evaluation is assigned
to the left-hand side, the variable.
> x <- 10L note the ‘L’ – it denotes that the number
> typeof(x) is intended to be integer, e.g. 10, not
[1] "integer" double, e.g. 10.0.
> e <- 2.718281828459
> typeof(e)
[1] "double"
> isTen <- x == 10
> typeof(x == 10)
[1] "logical"
Special Values
• There are a few special values that are used in R
value description example
NA Not Available – represent missing values v <- NA
Inf, -Inf Infinity, -Infinity – Number too big (plus or minus) 2 ^ 2000, -2 ^ 2000
NaN Not a Number – result of evaluation has no sense 0/0
NULL NULL object – no value, undefined, empty v <- NULL
value typeof()
NA logical typeof(NA)
Inf, -Inf double typeof(Inf)
NaN double typeof(NaN)
NULL NULL typeof(NULL)
Data Structures
• Data structure is a way of organizing and storing data in a computer.
• Relation and difference between data structure and data type
• Data structure is an abstract description of a way of organizing data.
• Data type describes pieces of data that all share a common property.
• It is easier to understand on example:
Vector of integer numbers, e.g. a series [1, 2, 3, 4]:
Vector is the data structure, integer is the data type.
Vector is a way of organizing and storing data (1-dimensional, ordered list of elements);
• Scalar – a single value of a specified data type, e.g. 0, 9, TRUE, “Home”. Scalar can be seen
as a naïve data structure storing only a single element of given type.
Vectors
• In most programming languages scalar is the basic way of storing data. That is how you store
a result from e.g. summation of numbers, and the numbers themselves.
Homework
• Find/Install and open RStudio
• If you feel adventurous try to:
• calculate
• create variable year and assign value 2020 to it
• check if year + 10 is equal to 2030
• check data type of the result of the comparison above
End of Lecture 1, Part I of
BIOL2001