[go: up one dir, main page]

0% found this document useful (0 votes)
33 views35 pages

Biol2001 R-Lecture 1

Introductory Lecture slides to biology R coding course University Level

Uploaded by

Tony Tan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views35 pages

Biol2001 R-Lecture 1

Introductory Lecture slides to biology R coding course University Level

Uploaded by

Tony Tan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Introduction to Quantitative Biology

BIOL2001 / BIOL6200 Semester 1,


2024
Part I – Programming in R
Lecture 1

Assoc. Prof. Marcin Adamski


marcin.adamski@anu.edu.au
Research School of Biology
College of Science Assoc. Dean Education
What is Academic integrity means acting with the values of honesty, trust, fairness, respect, and
responsibility in your learning and in your assessment.

academic At ANU, we expect you to complete your own work honestly.


The importance of acting with integrity extends beyond academic study and beyond
integrity? graduation: integrity is a core value of ethical professionals in all fields.
If you don’t complete your own work, gaps in your knowledge and skills may put yourself and
others at risk in the future.
How does AI Plagiarism and cheating are breaches of academic integrity
under the ANU Academic Integrity Rule (2021). Breaches of the rule can result in penalties
fit? that include exclusion from the University or the revoking of your degree.
Our ANU Academic Integrity Rule, Policy, Procedure and
Academic Integrity: Best Practice Principles for Learners, and the Student Assessment
(Coursework) Policy have been updated and will cover AI tools.
Courses that use AI tools such as Chat GPT will offer clear guidance on how they can be used
and how that use should be acknowledged. Some courses may specify that they not be used.
Check with the course convenor.
The unacknowledged use of AI tools constitutes a breach of academic integrity.
If you are unsure of what is appropriate, ask for help. You can also visit: Academic Skills
and Wellbeing support

2 LEARNING AND TEACHING @ANU | ACADEMIC INTEGRITY 2023


The Other Lecturers

Course Convener
Dr. Stephen Zozaya A/Prof. Ana Sequeira
Part II Probability & Statistics, weeks 4 - 6 Part III Statistical Models, weeks 7 - 12
The R-part – organization
• Lectures are in-person. It really is a good idea to attend.
• Lectures recordings will be available on Echo360, slides on the Wattle.
• Workshops: The 3-hour workshop has two time-slots to choose from:

1. 9am – 12pm on Wednesday


2. 2pm – 5pm on Wednesday
3. 2pm – 5pm on Thursday
• Choose one workshop slot – sign up on My Timetable.
• Number of students per time-slot is limited. Sign up early if you have a preference!
• Bring your laptop to the workshops, you will do coding on it.
• Be warned: These are long and exhausting workshops. They are worth the effort. Trust me
The R-part – organization
• During lectures we will learn the theory, during workshops we program in R.
• Sometimes the lectures may seem a bit confusing, don’t worry – it will make sense in the
workshops. Programming is a practical skill.
• It will be helpful to have the lecture slides and your own notes as a reference for the
workshops.
• We will learn new concepts during the lectures and during the workshops.
• A workshop manual will be available on the course Wattle site at the beginning of the week.
Please get familiar with it before the workshop starts.
• Workshop manual including model solution to the workshop tasks will be posted after the
workshop (after the second session is finished).
The R-part – organization
No chats are allowed during the tests:
No physical, no online. Not with
• Grading of Part I (25% of BIOL2001 score): artificial, nor natural intelligence.
• There will be 40-min in-class test at the beginning of workshop #2 and #3 (x2 9% of course mark)
and an assignment after workshop #3 (7% of course mark).
• The tests will be delivered through the course Wattle site. Tests will open at 5 minutes past the
workshop start time – 2:05pm on Wed/Thu. Please, don’t be late!
• You realy should attend the tests, it's a lot of points... To be allowed to take the test at different
time you will need a medical certificate, or my approval before the test.
• Assignment: You will have 7 days to complete your assignment. Standard ANU penalty for late
submission applies.
• If you have any questions or comments ask on the course Wattle forum or email me directly
marcin.adamski@anu.edu.au.
The R-part – organization
• You are not expected to have any experience in R or any other programming language.

• We will learn fundamentals of R and computer programming, starting from complete novice
level.

• This part of BIOL2001/6200 course requires use of a computer (Windows, OS-X, or Linux)
with R and RStudio installed. Please bring fully charged laptop to the workshops.
• To program in R we will use the RStudio. Please install it on your laptop.
The R-part – organization
• How to install R and RStudio on your own laptop

Go to: https://www.rstudio.com/products/rstudio/download/#download

1. Install R.
2. Download and Install RStudio (the free version 'RStudio Desktop').

R and RStudio are supported on Windows, OS-X, Linux (Ubuntu, Fedora, Debian, OpenSUSE).
The R-part – organization
• RStudio installed and ready to go
What is R?
• R is an open source programming language and software environment for statistical
computing and graphics that is supported by the R Foundation for Statistical Computing. The
What is Sislanguage?
R language widely used among statisticians and data miners for developing statistical
S is a statistical programming language developed primarily by John Chambers and Rick
software and data analysis.
Becker and Allan Wilks of Bell Laboratories. The aim of the language, as expressed by John
Chambers, is “to turn ideas into software, quickly and faithfully”.
https://en.wikipedia.org/wiki/R_(programming_language)
• R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New
Zealand.

• R is an implementation of the S programming language. R is named partly after the first


names of the first two R authors and partly as a play on the name of S. The project was
conceived in 1992, with an initial version released in 1995.
What is R?
• R and its libraries implement a wide variety of statistical and graphical techniques, including

linear and nonlinear modeling, classical statistical tests, time-series analysis, classification,

clustering, and others.

• R is easily extensible through functions and extensions, and the R community is noted for its active

contributions in terms of packages.

• Many of R's standard functions are written in R itself, which makes it easy for users to follow the

algorithmic choices made. For computationally intensive tasks, C, C++, and Fortran code can be linked

and called at run time.


Programming languages
R is not the only programming language.

There are other languages:


• General purpose: C++, Python, Java, Fortran, Pascal, Ada, Basic, Cobol, …
• Web-programming: PHP, JavaScript.
• Relational databases: SQL.
• Logic programming: Prolog.
• Machine languages, assemblers: Intel x86-64, Sun, ARM.
• Statistical: R, S, SAS.
• MATLAB – Matrix algebra, simulations, engineering, …

Knowing more than one language can help us to find the right tool for the job and avoid the
golden hammer pitfall (Law of The Instrument).
“It is tempting, if the only tool you have (know) is a hammer, to treat everything as if it was a
nail”. https://en.wikipedia.org/wiki/Law_of_the_instrument
Why R?
• R is the leading tool for statistics, data analysis, and machine learning in science.

• Popularity in the field. It’s good to speak the language your colleagues speak.

• R is a programming language not a software package, meaning:


• Is not limited to set of pre-defined operations (functions).
• Access to a wealth of functions already created by others.
• Ability to create your own functions, modules, and workflows.

• R is interactively interpreted:
• You can open R session and work in it as you would work with a notebook.
• R will work with you in Read – Evaluate – Print Loop (REPL) meaning that when you type
a command and then presses the Enter key, R reads the command, evaluates it,
immediately prints its output, and is ready to read your next command.

• R is system-independent, meaning it is available for Windows, OS-X and Linux.


Why R?
• Astounding selection of already written extensions (packages also known as libraries).
There usually is more than one package available for most problems and tasks you may
encounter.

Currently Over 20k R primary package repositories


• CRAN – the primary one, data from which is
plotted on left
• Bioconductor – for the analysis and
comprehension of high-throughput genomic
data.
• GitHub – growing quickly and gaining
popularity
R if you feel aRtsy
Nadieh Bremer
https://www.r-graph-gallery.com/207-nadieh-bremer

Antonio Sánchez Victor Perrier


https://fronkonstin.com/2017/07/18/plants https://www.r-graph-gallery.com/144-droid-bb-8-data-art
https://www.interactive-maths.com/batman-equation-agg.html
What is R Studio?
• R Studio (written as RStudio without the space) is a free and open-source
Integrated Development Environment (IDE) for R. https://en.wikipedia.org/wiki/RStudio
R console
h i s dy …
ke t l ca n
ks li is u a
l o o e v
fte n so m
e n o j oy
sc re o e n
te r fe r t
m p u p re
c o f u s
my o st o
m
but
What is R Studio?
• R Studio (written as RStudio without the space) is a free and open-source
Integrated Development Environment (IDE) for R. https://en.wikipedia.org/wiki/RStudio
R console

RStudio IDE
What is R Studio?
• R Studio (written as RStudio without the space) is a free and open-source
Integrated Development Environment (IDE) for R. https://en.wikipedia.org/wiki/RStudio
R console

RStudio IDE

variables, structures,
history of your
your scripts (programs), commands
data, etc.

your plots, files,


R help, etc.

R console (now it is only a


part of the environment)
What is R Studio?
• R Studio (written as RStudio without the space) is a free and open-source
Integrated Development Environment (IDE) for R. https://en.wikipedia.org/wiki/RStudio
R console

RStudio IDE

variables, structures,
history of your
your scripts (programs), commands
RStudio is a convenientdata,
tooletc.
for programming in R.

your plots, files,


R help, etc.

R console (now it is only a


part of the environment)
Let the story begin…
R is interactively interpreted (actually it is one of two modes in which it can be used)
Read – Evaluate – Print Loop (REPL)
• R presents a prompt: ‘>’ symbol waiting to read your command.
• After you type your command e.g. ‘2 + 2’ and hit ENTER, R evaluates it.
• And prints the output: ‘[1] 4’.
• Then presents another prompt waiting to read your next command.

The actual session may look like this:

> 2 + 2 <ENTER key> ← this is the command


[1] 4 ← this is the output
> ← and this is the next prompt

We will use blue to show the command we enter


and black to show the R output.
Operators
• Arithmetic operators return result of arithmetical evaluation

operator use description Example session in R


+ x+y summation
– x–y subtraction > 3 + 4
* x*y multiplication [1] 7
> 22 / 7
/ x/y division
[1] 3.142857
^, ** x ^ y, x ** y power
> 22 %/% 7
%% x %% y modulo* [1] 3
%/% x %/% y integer division > 22 %% 7
[1] 1
*) modulo is a reminder after dividing one number by another >
Operators
• Logical operators return TRUE or FALSE

operator use description Example session in R


< x<y less than
<= x <= y less than or equal to, ≤ > 3 < 4
> x>y greater tha [1] TRUE
> 3 == 4
>= x >= y greater than or equal to, ≥
[1] FALSE
== x == y exactly equal to > !TRUE
!= z != y not equal to [1] FALSE
! !x negation of x; not x > (3 < 4) & (!FALSE)
| x|y x OR y [1] TRUE
& x&y x AND y >
Operators
• Other operators

operator domain examples


matrix multiplication, product, determinant
set union, intersection, inclusion
indexing []
date/time addition, subtraction, conversion
text sub-setting, searching, concatenation
assignment <-, =, ->

We will learn about most of them later.


Variables
• What variable is:
A variable is a symbolic name for information. The variable's name represents the information
the variable contains. They are called variables because the represented information can
change but the operations on the variable remain the same.
http://www.cs.utah.edu/~germain/PPS/Topics/variables.html 𝜕 𝜕
2
( h 𝑥 𝜑 𝑥 ) + 2 ( h ( 𝑦 ) 𝜑 ( 𝑦 ) ) =0
( ) ( )
𝜕𝑥 𝜕𝑦

• Variables and assignments: x = 8.34 ← variable

In computer programming, an assignment statement sets and/or re-sets the value stored in
the storage location(s) denoted by a variable name; in other words, it copies a value into the
variable. https://en.wikipedia.org/wiki/Assignment_(computer_science)
Variables and Assignments in R
• Assigning value to variable
𝜕 𝜕
( h ( 𝑥 ) 𝜑 ( 𝑥 ) ) + ( h ( 𝑦 ) 𝜑 ( 𝑦 ) ) =0
𝜕𝑥
2
symbol 𝜕<-𝑦 value 2

left-hand
x=side8.34
assignment operator right-hand side
← value 8.34
assigned to variable x
The right-hand side, or the expression is evaluated and the value of the evaluation is assigned
to the left-hand side, the variable.

• R’s dirty secrets:


• Left and right sides may be reversed.
• There is more than one assignment operator: ‘<-’, ‘<<-’, ‘=‘, ‘->>’, ‘->’.
• Not all variable symbols are allowed (well, even this is not exactly true).
• We will deal with them when the right time comes…
Variables and Assignments in R
• A few examples:
> x <- 10
> y <- (2 + 8) ^ 3
> e <- 2.718281828459
> isTen <- x == 10
> welcome <- "Hallo World!"
• Data Types, the type of variable:
• integer – integer numbers as 0, 1, 98, -7
• double (float) – floating-point numbers as 0.1, -2.1, 98.0
• logical – Boolean logical values: TRUE and FALSE
• character – text as “Home Sweet Home”, “R”, “23”, “TRUE” – character variables are
always shown enclosed in quotation marks.
Variables and Assignments in R
• A few examples:
> x <- 10
R is case sensitive!
> y <- (2 + 8) ^ 3
> x <- 10
> e <- 2.718281828459
> X <- 20
> isTen <- x == 10
will create two variables:
> welcome <- "Hallo World!"
x (lower case) and X (upper case)
• Data Types, the type of variable:
• integer – integer numbers as 0, 1, 98, -7
• double (float) – floating-point numbers as 0.1, -2.1, 98.0
• logical – Boolean logical values: TRUE and FALSE
• character – text as “Home Sweet Home”, “R”, “23”, “TRUE” – character variables are
always shown enclosed in quotation marks.
Types of Variables
• To check the type of a variable or evaluation of an expression use:

typeof(x) where x is an ‘R object’ e.g. variable or expression

> x <- 10L note the ‘L’ – it denotes that the number
> typeof(x) is intended to be integer, e.g. 10, not
[1] "integer" double, e.g. 10.0.
> e <- 2.718281828459
> typeof(e)
[1] "double"
> isTen <- x == 10
> typeof(x == 10)
[1] "logical"
Special Values
• There are a few special values that are used in R
value description example
NA Not Available – represent missing values v <- NA
Inf, -Inf Infinity, -Infinity – Number too big (plus or minus) 2 ^ 2000, -2 ^ 2000
NaN Not a Number – result of evaluation has no sense 0/0
NULL NULL object – no value, undefined, empty v <- NULL

value typeof()
NA logical typeof(NA)
Inf, -Inf double typeof(Inf)
NaN double typeof(NaN)
NULL NULL typeof(NULL)
Data Structures
• Data structure is a way of organizing and storing data in a computer.
• Relation and difference between data structure and data type
• Data structure is an abstract description of a way of organizing data.
• Data type describes pieces of data that all share a common property.
• It is easier to understand on example:
Vector of integer numbers, e.g. a series [1, 2, 3, 4]:
Vector is the data structure, integer is the data type.
Vector is a way of organizing and storing data (1-dimensional, ordered list of elements);

Integer is a property of the data being stored.

• Scalar – a single value of a specified data type, e.g. 0, 9, TRUE, “Home”. Scalar can be seen
as a naïve data structure storing only a single element of given type.
Vectors
• In most programming languages scalar is the basic way of storing data. That is how you store
a result from e.g. summation of numbers, and the numbers themselves.

• There are no scalars in R.


• The simplest data structure is vector. Vector is a sequence of values (scalars), e.g. (1, 4, 3, -7,
2) or ("Lea", "Tia", "Amy", "Abi",
How to"Leo",
create"Ian"). The first one is a vector of numbers, the
a vector?
second one a vector of texts. Precisely speaking these are atomic vectors. The ‘atomic’
• c("Lea", "Tia") – with the function c()
means that all the elements of the vector are of the same type, e.g. integer, double,
combine
character (text) or logical. In•other
1:5 words it means that atomic
– with thevector is homogenous in
':' sequence
opposite to heterogeneous. operator
• seq(1, 5)by a vector with
A scalar (a single value) is represented – with
only the
onefunction
element.seq()
e.g (1), or ("Lea").
sequence
Easy!
Data Structures in R cont.
homogenous heterogeneous
Vector: List:
• 8 How to create a vector, list, matrix and data • (1, TRUE, "Tia", 2, 8, FALSE)
frame?
• (1, 2, 4, 5, 6, 8) • ((1, -8), TRUE, ("Tia", "Ian"), 2, FALSE)
• vector:
• (TRUE, FALSE,useTRUE,
the c(),TRUE)
seq() function, or the : operator
• list: use the list() function
• ("Lea", "Tia",use
• matrix: "Amy", "Abi",function
the matrix() "Ian")
Matrix: Data Frame:
• data frame: use the data.frame(), or read.table() functions
state sex weight_gain
1 4 7 Lea Abi May
ACT male 2.1
2 5 8 Tia Ian Rob
ACT male 8.7
3 6 9 Amy Tom Pam
NSW female 1.2
Do Not Stress
• It will all make sense when we start using R.
• During the workshops we will go through many examples and exercises till we all feel
comfortable with all the theory.
• If you are looking for on-line R training get started here:
https://www.datacamp.com/courses/free-introduction-to-r.

Homework
• Find/Install and open RStudio
• If you feel adventurous try to:
• calculate
• create variable year and assign value 2020 to it
• check if year + 10 is equal to 2030
• check data type of the result of the comparison above
End of Lecture 1, Part I of
BIOL2001

You might also like