0% found this document useful (0 votes)

53 views67 pages

Lecture 3

biostat 607 r lecture

Uploaded by

yuea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views67 pages

Lecture 3

biostat 607 r lecture

Uploaded by

yuea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

BIOSTAT 607 R MODULE: LECTURE 3

Biostatistics 607: Module 1

ASSIGNMENTS AND DUE DATES
The first homework assignment is due on September 13th before
midnight.
The first quiz is due on September 13th before midnight.

Biostatistics 607: Module 1

TODAY’S MAIN TOPICS
Finish up discussion of functions in R (default arguments).
Comments in R
Vectors

Biostatistics 607: Module 1

FUNCTIONS

Biostatistics 607: Module 1

DEFINING YOUR OWN FUNCTION
There are three key components of a function definition in R.
Function name : the name which will be used to call the function
Arguments : values to pass to a function as input.
Return value : value returned by a function as output.
The general form for writing your own R function is
1 function_name <- function(params){
2 ## function_name is the name of the function
3 ## params name of the input variable within this function
4
5 statement1 ## statements executed when the function is called
6 statement2 ## statements convert params into some value to be
7 ... ## returned
8 return(return_value) ## return the variable return_value
9 }

Biostatistics 607: Module 1

DEFINING YOUR OWN FUNCTION: EXAMPLES
Example. Let’s write a function that takes a number as an input and
returns the square of that number.
1 ## define a new function named square
2 square <- function(x) { ## function name: square, argument : x
3 return(x*x) ## returns x*x
4 }
5
6 square(10) ## example of using the function square
[1] 100

Biostatistics 607: Module 1

DEFINING FUNCTIONS: AN EXAMPLE
Let’s write a function called PositiveEven that takes a number
(assumed to be an integer) as input and outputs another number
according to the following rule:
if the input number is positive, return 2 if the input number is even and
return 1 if input number is odd.
if the input number is not positive, return -2 if the input number is
even and return -1 if the input number is odd.

Biostatistics 607: Module 1

DEFINING PositiveEven
1 PositiveEven <- function(x) {
2 if( x > 0 && x%%2==0 ) {
3 return_value <- 2
4 } else if( x > 0 && x%%2==1 ){
5 return_value <- 1
6 } else if( x <= 0 && x%%2==0) {
7 return_value <- -2
8 } else {
9 return_value <- -1
10 }
11 return( return_value )
12 }

Biostatistics 607: Module 1

DEFINING FUNCTIONS: EXAMPLES
Now, let’s look at a few examples of calling our function PositiveEven:
1 PositiveEven(3)
[1] 1
1 PositiveEven(-6)
[1] -2
1 PositiveEven(0)
[1] -2
1 PositiveEven(4)
[1] 2

Biostatistics 607: Module 1

DEFINING FUNCTIONS: EXAMPLES
We could make our function PositiveEven a bit more user-friendly by
throwing an error whenever the user does not input an integer.
1 PositiveEvenSafe <- function(x) { # Function named PositiveEvenSafe
2 if( x%%1 != 0) { # x%%1 will equal 0 if x is an integer
3 stop("x must be an integer")
4 # The stop function will stop the execution
5 # of the function and will return an error
6
7 }
8 if( x > 0 && x%%2==0 ) {
9 return_value <- 2
10 } else if( x > 0 && x%%2==1 ){
11 return_value <- 1
12 } else if( x <= 0 && x%%2==0) {
13 return_value <- -2
14 } else {
15 return_value <- -1
16 }
17 return( return_value )
18 } Biostatistics 607: Module 1
DEFINING FUNCTIONS: EXAMPLES
1 PositiveEvenSafe(3)
[1] 1
1 PositiveEvenSafe(-6)
[1] -2
1 PositiveEvenSafe(2)
[1] 2
1 PositiveEvenSafe(7.1)
2 Error in PositiveEvenSafe(7.1) : x must be an integer

Biostatistics 607: Module 1

RULES FOR CHOOSING FUNCTION NAMES
All the same rules for variable names apply to rules for choosing function
names.
Examples:

Valid_Function_Names Invalid_Function_Names
i 2things
my_function location@
answer42 _user.name
.name .3rd

Biostatistics 607: Module 1

RESERVED WORDS
You also cannot use reserved words as a function name or a variable
name
You can use built-in function names (for example, print) for your own
functions, but this is NOT RECOMMENDED.
The following are the reserved words in R

if else while function for

in next break TRUE FALSE
NULL Inf NA NA_integer
NA_real NA_complex NA_character

You can find the list of reserved words in R by typing in

1 ?reserved
Biostatistics 607: Module 1
DEFAULT ARGUMENT VALUES
We can provide default values for function parameters/arguments
by adding = default_value after the parameter
If an argument is specified in the function call, the specified one is used
Otherwise; the default argument value is used
In the function definition, it is generally better to put parameters without
default arguments before those with default arguments.
When calling a function, arguments must be specified for every
parameter that does not have a default argument.
Unlike Python, in R you can mix arguments with/without default
arguments in an arbitrary order (though I don’t recommend it).

Biostatistics 607: Module 1

EXAMPLE: DEFAULT ARGUMENTS
As an example, let’s write a function that adds 3 numbers and, as a
default, sets one of these numbers to zero:
1 add3 <- function(x, y, z=0) {
2 return(x + y + z)
3 }

The default value for z here is 0 .

1 add3(1, 2) ## omit z
[1] 3
1 add3(1, 2, 0) ## this should give the same as add3(1,2)
[1] 3
1 add3(1, 2, 3) ## set z to 3 instead of 0
[1] 6

Biostatistics 607: Module 1

SPECIFYING ARGUMENTS WITH KEYWORDS
We can specify how arguments are passed to parameters not only by
their order but by names with keyword arguments.
Keyword arguments have to do with how you call the function - not with
the function definition itself.
For example, we could call our function add3 with keywords in the
following way:
1 add3(2, 2, 1) # Call function using original positions
[1] 5
1 add3(x=2, y=2, z=1) # Call function using keywords
[1] 5
1 add3(y=2, x=2, z=1) # With keywords, position does not matter
[1] 5

Biostatistics 607: Module 1

ANOTHER EXAMPLE OF DEFAULT ARGUMENTS
The function foo below has parameters x, y,, z, w.
The default value of z is 0 , and the default value of w is TRUE.
1 foo <- function (x, y, z=0, w=TRUE) {
2 if(w) {
3 1000*x + 100*y + 10*z ## this is equivalent to return(...)
4 } else {
5 1000*x - 100*y + 10*z
6 }
7 }
8 foo(9,3,5,TRUE) ## specify all arguments
[1] 9350
1 foo(9,3,5) ## omit argument w
[1] 9350
1 foo(9, 3) ## omit both z and w
[1] 9300

Biostatistics 607: Module 1

CALLING foo WITH KEYWORD ARGUMENTS
1 ## foo(9) ## this will cause error because y is unknown
2 foo(x=9, y=5) ## specify x and y as keyword arguments
[1] 9500
1 foo(y=5, x=9) ## when using keywords, argument order doesn't matter
[1] 9500
1 foo(9, y=5) ## specify x as positional, y as keyword argument
[1] 9500
1 foo(9, z=3, y=5) ## y,z are keyword arguments, x is positional
[1] 9530

Biostatistics 607: Module 1

QUESTION
Suppose we define the function quiz as
1 quiz <- function(bool_var1, x=0, bool_var2 = TRUE) {
2 y <- 0
3 if(bool_var1 && bool_var2) {
4 y <- x + 2
5 } else {
6 if(bool_var1) {
7 y <- x - 2
8 }
9 }
10 return(y)
11 }

What value does the following function call return?

1 quiz(FALSE, 1.3)

Biostatistics 607: Module 1

EXERCISE
Write an R function that implements the following mathematical
function in R

⎧0 if x = 0 and y = 0
⎪
1 if x ≠ 0 and y = 0
L(x, y) = ⎨
⎪ |x| if y = 1
2
⎩x if y = 2

The function should have user-provided arguments x and y and should

return NA if y does not equal either 0 , 1 , or 2

Biostatistics 607: Module 1

SOLUTION
1 Lfn <- function(x, y) {
2 if(x==0 & y==0) {
3 ans <- 0
4 } else if(x!=0 & y==0) {
5 ans <- 1
6 } else if(y==1){
7 ans <- abs(x) ## abs computes absolute value
8 } else if(y==2){
9 ans <- x*x
10 } else {
11 ans <- NA
12 }
13 return(ans)
14 }

Biostatistics 607: Module 1

EXERCISE
Write an R function called PropGtZero which returns the proportion of
three entered numbers which are greater than 0 .
The function should have the following function definition
1 PropGtZero <- function(x, y, z, gt=TRUE) {
2
3 }

If gt=TRUE, then PropGtZero should return the proportion of the

numbers x, y, z which are greater than 0 .
If gt=FALSE, then PropGtZero should return the proportion of the
numbers x, y, z which are lesser than or equal to 0 .
If one or more of x, y, z, is NA, the function should return NA.
For example, PropGtZero(3,2,-2) should return 2/3.
Biostatistics 607: Module 1
COMMENTS IN R

Biostatistics 607: Module 1

COMMENTS IN R
The comment symbol in R is the hashmark #.
Comments allow you to write notes in English (or any other human
language) within your R programs.
Comments are basically pieces of text the computer will ignore when
interpreting your code.
You can use comments to help explain what your code is doing.
Writing comments becomes more helpful as your code becomes more
complex.
Writing comments can make code more readable for others.

Biostatistics 607: Module 1

COMMENTS IN R
In R, the hashmark symbol # marks the beginning of a comment.
Everything on a line following the hashmark symbol is ignored.
An example
1 # This is an example of a comment
2
3 x <- 42
4
5 # x <- 64
6
7 x
[1] 42

Biostatistics 607: Module 1

COMMENTS IN R
1 # More
2 # examples
3 # of comments
4
5 x <- 42 ## x <- 24
6
7 # x <- 64
8
9 x
[1] 42

Biostatistics 607: Module 1

VECTORS 1

Biostatistics 607: Module 1

VECTORS IN R
The most basic data type in R is the vector.
As we mentioned previously, if we assign the number 42 to the variable x,
R will treat x as a vector.
1 x <- 42 ## the x value is 42
2 x ## print the value of x
[1] 42
1 x[1] ## What does this do?
[1] 42

Here, x is considered to be a vector with length 1.

Technically, there are two kinds of vectors in R: atomic vectors and lists.
Vectors that are homogenous (all elements have the same type) are more
technically referred to as atomic vectors in R.
We will just refer to any atomic vector as a vector.
Biostatistics 607: Module 1
R ALWAYS STORES DATA AS A “COLLECTION”
Dimension Homogeneous Heterogeneous
1-Dimension Atomic Vector List
2-Dimensions Matrix Data Frame
>2-Dimensions Multi-dimensional array
There is no “0-dimensional data” in R.
Even a single-valued object is considered to be a “vector” with length 1.

Source: http://adv-r.had.co.nz/Data-structures.html

Biostatistics 607: Module 1

CREATING VECTORS IN R WITH c()
The most straightforward way to create vectors in R is to use the
concatenate function c()
This links together a group of values into a single vector.
You can also create a single vector from multiple vectors using c.
Examples:
1 x <- c(1,2,3) # a vector with elements 1, 2, and 3
2 x
[1] 1 2 3
1 y <- c(x, 4, 5) # a vector with elements 1,2,3,4,5
2 y
[1] 1 2 3 4 5
1 z <- c(x, y) # a vector with elements 1,2,3,1,2,3,4,5
2 z
[1] 1 2 3 1 2 3 4 5

Biostatistics 607: Module 1

CREATING VECTORS IN R WITH c()
You are not limited to using numbers with c().
For example, you can use c() to create a vector of characters or logicals
1 char_vec <- c("cat", "dog", "hamster") # vector of characters
2 char_vec
[1] "cat" "dog" "hamster"
1 log_vec <- c(TRUE, FALSE, TRUE, TRUE) # vector of logicals
2 log_vec
[1] TRUE FALSE TRUE TRUE

Biostatistics 607: Module 1

CREATING VECTORS WITH SPECIFIC PATTERNS - COLON
It is often very useful to be able to create vectors with certain patterns.
The colon operator : can be used to create a sequence of numbers.
The code from:end will create a vector of numbers starting at from and
increasing (or decreasing) by 1 until reaching the end.
Examples:
1 x <- 1:5 # creates the vector (1,2,3,4,5)
2 x
[1] 1 2 3 4 5
1 y <- 22:28
2 y
[1] 22 23 24 25 26 27 28

Biostatistics 607: Module 1

CREATING VECTORS WITH PATTERNS - COLON
1 z <- 0:-5 # use : to created decreasing vector
2 z
[1] 0 -1 -2 -3 -4 -5

You can even have use a number with a decimal point as the starting or
ending number (but this is not done that frequently).
1 w <- 2.3:6.8 # it keeps increasing by 1 until it reaches
2 # largest value less than 6.8
3 w
[1] 2.3 3.3 4.3 5.3 6.3

Biostatistics 607: Module 1

CREATING VECTORS WITH PATTERNS - COLON
Be careful when using something like a:b-1 when creating a vector
1 b <- 6
2 u <- 1:b - 1 # This does not create the vector 1,2,...,b-1
3 u
[1] 0 1 2 3 4 5
1 u <- 1:(b-1) # use this to create vector 1,2,...,b-1
2 u
[1] 1 2 3 4 5

Biostatistics 607: Module 1

CREATING VECTORS WITH PATTERNS - seq()
The function seq is a useful function for creating vectors that have
desired starting and ending values.
seq provides more flexibility than the colon operator :
You can use seq to create a sequence with different increments than 1
1 seq(1, 11, by=2) # sequence that increases by 2
[1] 1 3 5 7 9 11
1 seq(1, 10, by=2) # stops at 9 since 11 is larger than 10
[1] 1 3 5 7 9
1 seq(1, 11, by=2.54) # increment by non-integer amount
[1] 1.00 3.54 6.08 8.62

Biostatistics 607: Module 1

CREATING VECTORS WITH PATTERNS - seq()
Use the length.out argument in seq to create an equally-spaced
vector with a given length.
1 seq(1, 11, length.out=11) # same as 1:11
[1] 1 2 3 4 5 6 7 8 9 10 11
1 seq(1, 11, length.out=6) # vector of length 6, with equal increments
[1] 1 3 5 7 9 11
1 # using length.out is convenient
2 seq(21.5, 48.2, length.out=5) # don't have to work out correct increment
[1] 21.500 28.175 34.850 41.525 48.200

Biostatistics 607: Module 1

CREATING VECTORS WITH rep()
The rep() (replicate) function is very useful for creating vectors that
have any kind of repeated pattern.
The basic form of rep is
1 rep(x, times)

rep produces a vector which repeats the vector x times number of times.
1 rep(7, 3) # just creates the vector 7,7,7
[1] 7 7 7
1 rep(c(2,4,6), 3) # repeats c(2, 4, 6) three times
[1] 2 4 6 2 4 6 2 4 6

Biostatistics 607: Module 1

CREATING VECTORS WITH rep()
Using rep inside of c():
1 c(10:12, rep(c(2,4,6), 3))
[1] 10 11 12 2 4 6 2 4 6 2 4 6

Using rep with the keyword each will repeat each element of x each
times before moving on to the next element of x.
1 rep(c(2,4,6), each=4) # repeat each element 4 times
[1] 2 2 2 2 4 4 4 4 6 6 6 6

Biostatistics 607: Module 1

EXTRACTING VECTOR ELEMENTS
You can extract the k th element of a vector by using
1 vector_name[k]

For example:
1 x <- c(1,3,5,100)
2 x[2] # second element of x
[1] 3
1 x[4] # fourth element of x
[1] 100

Biostatistics 607: Module 1

EXTRACTING VECTOR ELEMENTS
You can also extract a subset of elements with indices stored by the
vector vec_index from a vector by using
1 vector_name[ vec_index ]

For example:
1 x <- c(1,3,5,100, 1250)
2 x[ c(1,3) ] # extract first and third elements of x
[1] 1 5
1 x[ 3:5 ] # extract elements 3 through 5 of x
[1] 5 100 1250

Biostatistics 607: Module 1

QUESTION
Suppose we define the vector x as
1 x <- 1:10

What will be the value of

1 x[ seq(1, 10, by=2)][3]

a. 3
b. 9
c. 5
d. 4

Biostatistics 607: Module 1

UPDATING VECTOR ELEMENTS
You can change the value of the k th element of a vector by using
1 vector_name[k] <- new_value
1 x <- c(1,3,5,100)
2 x[2] <- 6 # you may update a single element
3 print(x)
[1] 1 6 5 100

You can also update multiple elements of a vector by placing a vector of

indices inside brackets []
1 x[1:3] <- rep(10,3) # update first 3 elements of x
2 print(x)
[1] 10 10 10 100

Biostatistics 607: Module 1

SUBSETTING A VECTOR WITH A LOGICAL EXPRESSION
We mentioned before how you can take a subset of a vector by specifying
the vector indeces.
You can also subset a vector using a logical expression
1 x <- c(10, 2, 21, 15)
2 y <- x[x > 8] # returns all elements of x greater than 8
3 z <- x[x > 12] # returns all elements of x greater than 12
4 y
[1] 10 21 15
1 z
[1] 21 15

You can think of the expression x[x > 8] as doing the following:
1 x[c(TRUE, FALSE, TRUE, TRUE)]
[1] 10 21 15

Biostatistics 607: Module 1

SUBSETTING A VECTOR WITH A LOGICAL EXPRESSION
Subsetting vectors with logical expressions is very useful when you want
to compute statistics from a subset of your data.
For example, if we have a vector named agevec which stores a
collection of patient ages
1 agevec <- c(38, 51, 43, 72, 61, 55, 27, 64, 47)

You can count how many patients are older than 50

1 agevec > 50
[1] FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE
1 sum(agevec > 50) ## how many are older than 50?
[1] 5

Biostatistics 607: Module 1

SUBSETTING A VECTOR WITH A LOGICAL EXPRESSION
You can compute the mean age among the patients older than 50
1 agevec[agevec > 50]
[1] 51 72 61 55 64
1 mean( agevec[agevec > 50] ) ## average age among those older than 50?
[1] 60.6

Biostatistics 607: Module 1

THE WHICH FUNCTION
You can find the indeces of a vector that satisfy a certain condition using
the which function.
1 x <- c(10, 2, 21, 15)
2 which(x > 20) # shows that x[3] > 20
[1] 3
1 which(x > 12) # shows that x[3] > 12 and x[4] > 12
[1] 3 4

The which function really just returns the indeces where a logical vector
is TRUE
1 which( c(FALSE, TRUE, FALSE) )
[1] 2

Biostatistics 607: Module 1

USEFUL METHODS FOR VECTORS
The length function can tell you how many elements are in your vector:
1 x <- 9:0
2 x
[1] 9 8 7 6 5 4 3 2 1 0
1 length(x) # length of the vector
[1] 10
1 typeof(x) # type of elements
[1] "integer"
1 sum(x) # sum of values
[1] 45

Biostatistics 607: Module 1

MORE USEFUL OPERATIONS ON VECTORS
R has functions which allow you to compute all the well-known summary
statistics from a numeric vector.
1 x <- 1:5
2 mean(x) # average of vector elements
[1] 3
1 var(x) # variance (the denominator is length(x)-1)
[1] 2.5
1 sd(x) # standard deviation (the denominator is length(x)-1)
[1] 1.581139

Biostatistics 607: Module 1

MORE USEFUL OPERATIONS ON VECTORS
1 x <- 1:5
2 max(x) # maximum value
[1] 5
1 min(x) # minimum value
[1] 1
1 median(x) # median
[1] 3

Biostatistics 607: Module 1

VECTORS WITH DIFFERENT DATA TYPES IN R
As we mentioned before, R vectors are not limited to having numeric
elements.
The main restriction is that vectors must have elements which are all the
same data type.
1 x <- c(1, 2.5, 42) ## numeric vector
2 print(x)
[1] 1.0 2.5 42.0
1 y <- c("hello","world","biostat607") ## character vectors
2 print(y)
[1] "hello" "world" "biostat607"
1 z <- c(TRUE, FALSE, FALSE) ## logical vectors
2 print(z)
[1] TRUE FALSE FALSE

Biostatistics 607: Module 1

VECTORS WITH “MIXED” DATA TYPES
You can “create” a vector that has mixed data types, but R will
automatically convert the types of some of the elements so that all
elements have the same type.
1 x <- c(TRUE, FALSE, FALSE) ## homogeneous logical vector
2 print(x)
[1] TRUE FALSE FALSE
1 x <- c(TRUE, FALSE, 2) ## contains logical and numeric values
2 print(x) ## R translates logical TRUE/FALSE into numeric 1/0
[1] 1 0 2
1 x <- c(1, 2, "3") ## numeric + character
2 print(x) ## R translates numeric values translates into characters
[1] "1" "2" "3"

Biostatistics 607: Module 1

VECTORS WITH “MIXED” DATA TYPES
1 x <- c(TRUE, 2, "3") ## logical + numeric + character
2 print(x) ## R translates logical and numeric values into characters
[1] "TRUE" "2" "3"

Biostatistics 607: Module 1

EXPLICITLY CHANGING THE DATA TYPES
You can convert a vector to another type using as.logical,
as.numeric, or as.character.
1 x <- as.logical(c(0,1,2,3)) # numeric to logical conversion
2 print(x)
[1] FALSE TRUE TRUE TRUE
1 x <- as.numeric(c(TRUE,FALSE, T,F)) # logical to numeric
2 print(x)
[1] 1 0 1 0
1 x <-as.character(c(0,1,2,3)) # numeric to string
2 print(x)
[1] "0" "1" "2" "3"

Biostatistics 607: Module 1

SOMETIMES CONVERSION DOES NOT WORK
1 ## When a character cannot be converted, it returns NA
2 ## as an invalid number
3 as.numeric(c("123","12.3","123a"))
[1] 123.0 12.3 NA
1 ## Characters cannot be converted into logical values
2 as.logical(c("TRUE","FALSE", "T","TF",0))
[1] TRUE FALSE TRUE NA NA
1 as.integer(c(123, 12.3, "123", "123a"))
[1] 123 12 123 NA

Biostatistics 607: Module 1

MATHEMATICAL OPERATIONS WITH VECTORS
When doing mathematical operations with two vectors of the same
length, R will perform addition, subtraction, multiplication, division
element-by-element.
1 x <- c(10, 5, 0)
2 y <- 1:3
3 x+y # element-wise addition
[1] 11 7 3
1 x*y # element-wise multiplication
[1] 10 10 0
1 x^y # element-wise power
[1] 10 25 0

Biostatistics 607: Module 1

MATHEMATICAL OPERATIONS WITH VECTORS
Multiplying or dividing a vector by a single number multiplies (or divides)
each element by that number
1 x <- c(10, 5, 0, -5)
2
3 3*x
[1] 30 15 0 -15
1 x/2
[1] 5.0 2.5 0.0 -2.5

Adding or subtracting a vector by a single number also adds (or subtracts)

each element by that number
1 x <- c(10, 5, 0, -5)
2
3 3 + x # Actually an example of recycling with a one-element vector
[1] 13 8 3 -2

Biostatistics 607: Module 1

RECYCLING RULES
You can actually add/subtract vectors of different lengths.
When doing this, R recycles the values in the shorter vector
R will print out a warning message if the length of the longer vector is
not a multiple of the shorter vector
1 c(1, 2, 4) + c(6, 0, 9, 10)
[1] 7 2 13 11

What the above code is doing is adding the vector c(1, 2, 4, 1) with
the vector c(6, 0, 9, 10).

Biostatistics 607: Module 1

RECYCLING RULES
Note that if we add a vector of length 3 with a vector of length 6 we will
get no warning message
1 c(1, 2, 4) + c(6, 0, 9, 10, 11, 12)
[1] 7 2 13 11 13 16

This adds the vector c(1, 2, 4, 1, 2, 4) with the vector c(6, 0,

9, 10, 11, 12).
I personally do not use recycling rules much when the length of both
vectors is 2 or more.
It’s probably good to be aware of recycling rules if you are getting this
type of warning message.
You may find it helpful to use these recycling rules if you are, for
example, adding one vector with another vector that has a simple,
repeating pattern.
Biostatistics 607: Module 1
LOGICAL OPERATIONS WITH VECTORS
1 c(TRUE, TRUE, FALSE) & c(TRUE,FALSE,FALSE) # element-wise
[1] TRUE FALSE FALSE
1 c(TRUE, TRUE, FALSE) | c(TRUE,FALSE,FALSE) # element-wise
[1] TRUE TRUE FALSE
1 c(TRUE, TRUE, FALSE) && c(TRUE,FALSE,FALSE) # only first values
[1] TRUE
1 c(TRUE, TRUE, FALSE) || c(TRUE,FALSE,FALSE) # only first values
[1] TRUE

Biostatistics 607: Module 1

QUESTION
Suppose
1 x <- rep(c(1, 5, 10), each=3)

What is the value of

1 sum( x[x > 5] )

a. 45
b. 30
c. 48
d. 33

Biostatistics 607: Module 1

SET OPERATIONS ON VECTORS
You can also do set operations with vectors.
When working with set operations, you should think of the set associated
with a vector as the collection of unique elements from that vector.
1 x <- c(1,2,3,3,4,5) # x is c (1,2,3,3,4,5)
2 y <- c(1,3,3,5,7,9) # y is c (1,3,3,5,7,9)
3 intersect(x,y) # set intersection, note that repeated 3 is dropped
[1] 1 3 5
1 union(x,y) # set union
[1] 1 2 3 4 5 7 9
1 setdiff(x,y) # set difference x - y
[1] 2 4

Biostatistics 607: Module 1

MORE SET OPERATIONS WITH VECTORS
1 x <- 1:5 # x is c (1,2,3,4,5)
2 y <- c(1,3,3,5,7,9) # y is c (1,3,3,5,7,9)
3 x %in% y # membership test
[1] TRUE FALSE TRUE FALSE TRUE
1 match(x, y) # find indices of first matching values
[1] 1 NA 2 NA 4
1 setdiff(x, y) # set difference x-y
[1] 2 4

Biostatistics 607: Module 1

NA VALUES
Missing data in R is usually represented by the value NA.
NA stands for “Not Available”
You can create a vector with NA values by just typing in NA for one of the
vector elements.
1 x <- c(1, 5, NA, 4) # The third element of this vector is NA
2 typeof(x)
[1] "double"

You can type in NA for either numeric or character variables.

R will automatically convert everything to the appropriate type.
1 y <- c("cat", NA, "dog") # The second element of this vector is NA
2 typeof(y)
[1] "character"

Biostatistics 607: Module 1

USING FUNCTIONS WITH NA VALUES
Many of the built-in R functions will return NA if the input numeric vector
contains any NA values.
For example, if we try to compute the standard deviation of the vector x
1 x <- c(1, 5, NA, 4, 7) # The third element of this vector is NA
2 mx <- sd(x) # mx will have the value NA
3 mx
[1] NA

You can compute the standard deviation of the non-NA values by

including the argument na.rm = TRUE
1 sx <- sd(x, na.rm=TRUE) # sx shoud have the standard deviation of 1,5,4,
2 sx
[1] 2.5

Biostatistics 607: Module 1

USING FUNCTIONS WITH NA VALUES
In the function sd, the argument na.rm is an example of an argument
with a default value.
You can see this by looking at the function definition for sd
1 sd <- function(x, na.rm = FALSE) {
2
3 }

The default value of na.rm is FALSE.

So, you need to include na.rm = TRUE if you want sd to ignore
missing values.

Biostatistics 607: Module 1

THE FUNCTION is.na()
The function is.na() is often very useful when you’re working with
data that has mising values
When applied to a vector, is.na() will return a vector of logical values
with the same length as the input vector.
The k th element of is.na(x) will be TRUE if the k th element of x is
missing.
Otherwise, the k th element of is.na(x) will be FALSE.
1 x <- c(10, 3, 5, NA, 1, NA) # Elements 4 and 6 of x have NA values
2 is.na(x)
[1] FALSE FALSE FALSE TRUE FALSE TRUE

You can also use is.na() directly on matrices and data frames.

Biostatistics 607: Module 1

R Vectors & Data Types Guide
No ratings yet
R Vectors & Data Types Guide
73 pages
Ebooks Basicr Writefuns
No ratings yet
Ebooks Basicr Writefuns
11 pages
R Programming: Control Structures & Functions
No ratings yet
R Programming: Control Structures & Functions
4 pages
SRP Unit-3
No ratings yet
SRP Unit-3
25 pages
An Introduction To R:: Basics of Algorithmics in R (Continued)
No ratings yet
An Introduction To R:: Basics of Algorithmics in R (Continued)
45 pages
R Short Tutorial
No ratings yet
R Short Tutorial
5 pages
Unit 2
No ratings yet
Unit 2
101 pages
Stat 20 Section Worksheet 2 Problems From FPP, Chapter 2
No ratings yet
Stat 20 Section Worksheet 2 Problems From FPP, Chapter 2
2 pages
1research Methodology For Commerce Lab
No ratings yet
1research Methodology For Commerce Lab
35 pages
Lec 09
No ratings yet
Lec 09
16 pages
Sim R
No ratings yet
Sim R
6 pages
Writing R Functions: Bootstrapping Guide
No ratings yet
Writing R Functions: Bootstrapping Guide
17 pages
R Programming
No ratings yet
R Programming
50 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
Unit 2 R
No ratings yet
Unit 2 R
16 pages
Research Methodology For Commerce Lab
No ratings yet
Research Methodology For Commerce Lab
35 pages
Statistics Cheat Sheet
100% (1)
Statistics Cheat Sheet
4 pages
R Manual PDF
No ratings yet
R Manual PDF
78 pages
R - Programming - Moduel 1 - Module 4
No ratings yet
R - Programming - Moduel 1 - Module 4
88 pages
R Programming Slides
No ratings yet
R Programming Slides
73 pages
00 Lab Notes
No ratings yet
00 Lab Notes
8 pages
STTN 225 R Summary
No ratings yet
STTN 225 R Summary
18 pages
STA1007S Lab 10: Confidence Intervals: October 2020
No ratings yet
STA1007S Lab 10: Confidence Intervals: October 2020
5 pages
Modern Regression Homework 5-1
No ratings yet
Modern Regression Homework 5-1
8 pages
STA1007S Lab 6: Custom Functions: "Sample"
No ratings yet
STA1007S Lab 6: Custom Functions: "Sample"
6 pages
Homework 2
No ratings yet
Homework 2
8 pages
Introduction to R Programming
No ratings yet
Introduction to R Programming
59 pages
Assignment STAT5002
No ratings yet
Assignment STAT5002
5 pages
R Functions and Argument Binding
No ratings yet
R Functions and Argument Binding
57 pages
Monte Carlo R-Solutions
No ratings yet
Monte Carlo R-Solutions
42 pages
w6 - Statistical Modelling
No ratings yet
w6 - Statistical Modelling
24 pages
Functions and Packages
No ratings yet
Functions and Packages
7 pages
R Commands
No ratings yet
R Commands
5 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
R Programming Student Lab Manual-52-63-3-12
No ratings yet
R Programming Student Lab Manual-52-63-3-12
10 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
R Introduction
No ratings yet
R Introduction
10 pages
MATLAB Functions & Files Guide
No ratings yet
MATLAB Functions & Files Guide
17 pages
Lec7 8
No ratings yet
Lec7 8
28 pages
A1rib T4
No ratings yet
A1rib T4
5 pages
MATLAB Functions for Beginners
No ratings yet
MATLAB Functions for Beginners
8 pages
STA1007S Lab 3: Plots (II) and Sub-Setting: "Sample"
No ratings yet
STA1007S Lab 3: Plots (II) and Sub-Setting: "Sample"
10 pages
07 GLM
No ratings yet
07 GLM
49 pages
Stat 1st Unit
No ratings yet
Stat 1st Unit
32 pages
HW1 Revised
No ratings yet
HW1 Revised
6 pages
Lab 01
No ratings yet
Lab 01
36 pages
Math & Simulation in R Guide
No ratings yet
Math & Simulation in R Guide
21 pages
R Commands
No ratings yet
R Commands
2 pages
Solving An Equation With One Variable: Lab Session 12
No ratings yet
Solving An Equation With One Variable: Lab Session 12
13 pages
Attachment 1
No ratings yet
Attachment 1
6 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
R Sensitivity Analysis Tutorial
No ratings yet
R Sensitivity Analysis Tutorial
14 pages
Wa0012 PDF
No ratings yet
Wa0012 PDF
21 pages
Stata Data Managment
No ratings yet
Stata Data Managment
79 pages
Problem-Set - 1 Practise Problems From Textbook
No ratings yet
Problem-Set - 1 Practise Problems From Textbook
2 pages
ML 04
No ratings yet
ML 04
26 pages
Arrays & Functions Explained
No ratings yet
Arrays & Functions Explained
6 pages
Report Formula Summary Functions: Parentgroupval and Prevgroupval
No ratings yet
Report Formula Summary Functions: Parentgroupval and Prevgroupval
2 pages
Procedural Road Generation
No ratings yet
Procedural Road Generation
11 pages
04 Process Variability With Solutions
No ratings yet
04 Process Variability With Solutions
28 pages
NeurIPS 2024 Are More LLM Calls All You Need Towards The Scaling Properties of Compound Ai Systems Paper Conference
No ratings yet
NeurIPS 2024 Are More LLM Calls All You Need Towards The Scaling Properties of Compound Ai Systems Paper Conference
24 pages
Orf523 S24 HW2
No ratings yet
Orf523 S24 HW2
2 pages
MATH 120 Calculus of Functions of Several Variables (2020-1)
No ratings yet
MATH 120 Calculus of Functions of Several Variables (2020-1)
2 pages
Googel Page Rank
No ratings yet
Googel Page Rank
17 pages
Controller Design (Based On Transient Response Criteria: To Determine Controller Settings For P, PI or PID Controllers
No ratings yet
Controller Design (Based On Transient Response Criteria: To Determine Controller Settings For P, PI or PID Controllers
66 pages
Artificial Bee Colony Optimization For Multi-Area Economic Dispatch
No ratings yet
Artificial Bee Colony Optimization For Multi-Area Economic Dispatch
18 pages
Midterm Review STA216: Generalized Linear Models: I I I I I I
No ratings yet
Midterm Review STA216: Generalized Linear Models: I I I I I I
26 pages
Lab 1 2
No ratings yet
Lab 1 2
4 pages
Abaqus User Subroutines Reference Manual - UEL
No ratings yet
Abaqus User Subroutines Reference Manual - UEL
19 pages
Aditya Shah CV PDF
No ratings yet
Aditya Shah CV PDF
2 pages
Vivim: Efficient Medical Video Segmentation
No ratings yet
Vivim: Efficient Medical Video Segmentation
7 pages
Mid Term Paper CBNST 1
No ratings yet
Mid Term Paper CBNST 1
1 page
Ap Computer Science Principles Test Booklet
100% (3)
Ap Computer Science Principles Test Booklet
43 pages
Hillier6e Chapter07 Supplement1
No ratings yet
Hillier6e Chapter07 Supplement1
20 pages
Impact of Homogeneous Classrooms on Stats
No ratings yet
Impact of Homogeneous Classrooms on Stats
18 pages
Spell Correction For Azerbaijani Language Using Deep Neural Networks
No ratings yet
Spell Correction For Azerbaijani Language Using Deep Neural Networks
5 pages
ORno 6
No ratings yet
ORno 6
1 page
STA Questions
No ratings yet
STA Questions
1 page
Linear Algebra Course Syllabus
No ratings yet
Linear Algebra Course Syllabus
2 pages
SLG 16.3 Probability, Part II - Total Probability
No ratings yet
SLG 16.3 Probability, Part II - Total Probability
3 pages
Car Price Prediction Using Ai
No ratings yet
Car Price Prediction Using Ai
6 pages
Hash Tables - 1: Comp 122, Spring 2004
No ratings yet
Hash Tables - 1: Comp 122, Spring 2004
24 pages
Narang 2013
No ratings yet
Narang 2013
5 pages
Optimization in Industrial Engineering Sqp-Methods
No ratings yet
Optimization in Industrial Engineering Sqp-Methods
30 pages