0% found this document useful (0 votes)

2 views20 pages

DSC2608 Learning_Unit_1

The document outlines the learning objectives for programming in R, emphasizing its importance in economics for statistical analysis and data visualization. It covers the installation of R and RStudio, basic commands, variable assignment, and data types, providing examples of expressions and operations. By the end of the module, students will be able to create various data visualizations and utilize R for empirical research and data-driven decision-making.

Uploaded by

Progress Zingowo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views20 pages

DSC2608 Learning_Unit_1

Uploaded by

Progress Zingowo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

1.

Programming in R
Learning objectives and outcomes: When you reach the end of learning unit 1, you should be able to
do the following:
1. Install and implement code using R programming language.
2. Demonstrate the use of expressions, assigning of variables, and operations in R.
3. Demonstrate the use of data types such as numbers and strings, including logical comparisons.
4. Demonstrate the use of sequences such as arrays and ranges.
5. Demonstrate the use of tables with row and/or column manipulation.

1.1 Getting started with R

1.1.1 What is R and why is it useful in economics. R is a software environment and programming
language used for statistical computing and graphics. It is accessible to researchers, practitioners,
and students equally because it is an open source and publicly available. Owing to its adaptability,
functionality, and extension, R is widely used in a variety of disciplines, including economics.
R is useful in economics for several reasons. Here are some of the reasons:
• It offers a broad range of statistical and econometric approaches, including linear and nonlinear
modelling, time-series analysis, panel data analysis, and machine learning algorithms. These
methods give economists the ability to analyse economic relationships, test hypotheses, and make
predictions by analysing data.
• Thanks to an applied quantitative modelling curriculum, you can perform empirical research and
analyse economic data using the R language. By doing this, you can develop economic theory
and contribute knowledge for deciding on policies that have an effect on the real world.
• Given that many organisations increasingly demand data analytic competencies, having experience
with R can be advantageous in the job market. R coding is therefore very relevant, especially in
data-driven professions like economics. It is a transferrable skill that can be applied to many other
fields, including finance and banking, healthcare, and marketing.
Visit the following page for more details on why R is crucial in economics and why you should choose
it: https://www.core-econ.org/why-doing-economics-has-embraced-r/
In summary, students need to study R coding because it is a widely used language in the field of
econometrics and data analysis. R provides a powerful set of tools for data manipulation, visualisation,
and statistical analysis, and is widely used by researchers and practitioners in many fields. Learning R
can also help students to develop critical thinking skills and improve their ability to communicate their
findings.
By the end of this module, you will be able to create various types of data visualisations. Some
examples of the visualisations may include graphs, charts and other types of illustrations that will help
you to understand and communicate patterns and relationships in the data. You will be able to create
data visualisations, linear regression plots and classification tree diagrams at the end of the course, as

1
Section 1.1. Getting started with R Page 2

demonstrated by the examples in the following figures that illustrate boxplots, scatter plots, density
plots, histograms, etc.

Figure 1.1: Bar chart of the number of customers who have defaulted on credit card payments versus
those who have not

Figure 1.2: Histogram of individual customers’ credit card balance

Section 1.1. Getting started with R Page 3

Figure 1.3: Scatter plot of the relationship between income and balance of customers’ credit cards

Figure 1.4: Boxplot of credit card balance for customers who defaulted versus those who have not
Section 1.1. Getting started with R Page 4

Figure 1.5: Density plot of credit card balance for customers who defaulted versus those who did not

Figure 1.6: Scatter plots of Sales against T V , Radio and N ewspaper advertisement budgets
Section 1.1. Getting started with R Page 5

Figure 1.7: Classification tree of def ault against income, balance, and student predictors

1.1.2 Finding and installing R and RStudio. The R Core Team maintains a network of servers that
contains installation files and documentation on R, called the Comprehensive R Archive Network, or
CRAN.
You can access it at: http://cran.r-project.org/ or https://cran.rstudio.com/ or a Google search for
CRAN R. See the R FAQ (r-project.org) for general information about R and the R for Windows FAQ
(rstudio.com) for Windows-specific information. R is available for Windows, Mac and Unix-like operating
systems. Installation files and instructions can be downloaded from the CRAN site. Note the following:
• Download the version compatible with your operating system (OS).
• R needs to be installed before RStudio is installed.
• RStudio facilitates communication between the user and the computer. To interact with R, RStu-
dio offers a user-friendly interface and a number of tools. It enables efficient writing, execution,
and management of R code. However, R is the actual programming language. Think of RStudio
as a tool that enables you to interact with the computer using R.
1.1.3 Getting started with RStudio. To get started, open RStudio just as you would open any other
application on the computer. The landing page shown in Figure 1.8 will appear on the screen. It usually
has four screens or panes, each of which serves a specific purpose.
Section 1.1. Getting started with R Page 6

Figure 1.8: The RStudio landing page

RStudio has keyboard shortcuts for running all or some of the code in a script. The following are some
of the most useful shortcuts:
• Ctrl + Enter: Run current line or selection.
• Ctrl + Shift + Enter: Run all lines.
• Ctrl + Alt + B: Run from the beginning to the current line.
• Ctrl + Alt + E: Run from the current line to the end.
It is not necessary to memorise the shortcuts and commands as you can always refer to the cheatsheet
at this link: https://www.rstudio.com/resources/cheatsheets/. It gives you a quick way of accessing
some of the commands and syntax, which can always guide you through the most useful features of
RStudio, as well as the long list of keyboard shortcuts built into RStudio.
1.1.4 R Commands, assignment and objects. In order to use R, you need to learn the R language
and really not much more. The R language is a mix of functional and object-oriented styles.
In R, the instructions you provide are referred to as commands. These commands typically do not
require semicolons to indicate the end of a statement. Instead, they are usually terminated by starting
a new sentence. The assignment operator <- is used to assign a value. The variables you create in R
are called objects.
Note that the assignment queries will update objects in your R environment. Queries without assignment,
as well as the “call” of R objects, will either generate an output in the console or in the plot screen.
Section 1.1. Getting started with R Page 7

Note of the following:

• Various operations are performed by calling functions. To carry out a specific operation, you need
to call the appropriate function. For example, if you want to calculate the square root of the
number 25, you would use the function sqrt(25).
• Commands, for example 3 + 2
• Assignments, for example x <- 25 (stores the value 25 in a variable called x)
• Functions, for example print(“Welcome to DSC2608”)
• Computations, for example 31 + 3; x + 5
• Combinations, for example y = sqrt(16); y = 15 + 5
1.1.5 Operators and functions. In R, operators and functions are both essential building blocks that are
used to perform a variety of operations and computations. An overview of R’s operators and functions
is given below:
Operators are symbols or letters that stand in for particular operations that need to be carried out on
data or variables. The arithmetic, assignment, logical, relational and bitwise operators are just a few of
the different types of operators that R provides. On the other hand, functions are reusable chunks of
code that carry out particular actions or operations. They accept arguments as input values and return
output values. Functions can be user-defined functions made by the programmer or built-in functions
offered by R, some of which include sqrt(), mean(), sum(), length(), etc.
In R you can carry out operations on objects using operators and functions. There are several operators
that can be used in R. Most of the common ones are summarised in Table 1.1.
Section 1.2. Expressions, variables and operations Page 8

Table 1.1: Arithmetic, relational, logic and indexing operators in R

Arithmetic Relational
+ addition a == b Is a equal to b? (Do not confuse with =.)
- subtraction a != b Is a not equal to b?
* multiplication a < b Is a less than b?
/ division a > b Is a greater than b?
^ exponential a <= b Is a less than or equal b?
%/% integer division a >= b Is a greater than or equal b?
%% modulo (remainder)

Logic Indexing
! not $ part of a data frame or list
& and [ ] part of a data frame, array or list
| or [[ ]] part of a list
&& sequential and @ part of an S4 object
|| sequential or
isTrue check whether the logical
value is true

1.2 Expressions, variables and operations

1.2.1 Expressions in R. An expression in R is a combination of values, variables, operators and functions

that can be evaluated to yield a result. It symbolises a calculation or operation to be performed.
Also, R has built-in functions for performing arithmetic operations. The following are examples of
expressions in R to perform arithmetic operations:
1 > 3 + 2 # addition
2 [1] 5
3 > 3 - 2 # subtraction
4 [1] 1
5 > 3 * 2 # multiplication
6 [1] 6
7 >
8 > 9 / 2 # division
9 [1] 4.5
10 > 9 %% 2 # modulo division
11 [1] 1
12 > 9%/%2 # integer division
13 [1] 4
14 > 4^2 # exponents
15 [1] 16

The built-in functions available in R include the following:

1 > pi # constant pi
Section 1.2. Expressions, variables and operations Page 9

2 [1] 3.141593
3 > sqrt (25) # sqrt - define square root
4 [1] 5
5 > log (1) # logarithms
6 [1] 0
7 > log (1 , base = 10)
8 [1] 0
9 > exp (0) # mathematical constant e
10 [1] 1

To find out if a function exists in R, you can use the exists() function. It will either return TRUE or
FALSE. For instance, the built-in pi object display the value of the mathematical constant π, which is
roughly equivalent to 3.141593.
1 > exists ( " pi " )
2 [1] TRUE

1.2.2 Variable assignment and operation in R. In R, a variable is a fundamental element that enables
you to give a specific name to a particular datum and store it together with other similar data. For
example, you can assign names such as date, 6, or Hello to different sets of data. By doing this, you can
retrieve the stored data by calling the variable name. In programming, an identifier is a unique name
that you can assign to a variable, function or object to help you distinguish it from others.
The operator <- or = would be used for variable assignment. To see what is contained in a variable,
type the name and R will print the content.
The following R command creates an object named “a” and assigns the value 2022 to it. If “a” had
previously been created in the script, the original value would be overwritten. This means that objects
can be created and their data can be changed using the assignment operator. R has a case-sensitive
syntax. The variables “a” and “A” can coexist and have different values in the R environment.
1 > # variable assignment
2 > a <- 2022
3 > a
4 [1] 2022

This object referred to as “a” is stored in your workspace. You can always see what is stored in the
workspace by using the ls() function:
1 > ls ()
2 [1] " a "

To remove objects from the workspace, use the rm() function:

1 > # remove objects : variables , functions , and datasets
2 > rm ( a )
3 > # to remove all objects added to workspace
4 > rm ( list = ls () )
5 > rm ( a )
6 Warning message :
7 In rm ( a ) : object ’a ’ not found
Section 1.3. Data types and data structures Page 10

1.3 Data types and data structures

1.3.1 Basic data types in R. R works with numerous data types that can be used to store different
kinds of data. The following are the most common types of data:
• Numeric. This data type is used to store numbers, including integers and decimal numbers.
Examples of numeric data in R include the following:
1 > # numeric
2 > x <- 3.5
3 > x
4 [1] 3.5
5 > class ( x )
6 [1] " numeric "
7 > # integer
8 > y <- 5
9 > y
10 [1] 5
11 > class ( y )
12 [1] " numeric "
13 > z <- x + y
14 > z
15 [1] 8.5
16 > class ( z )
17 [1] " numeric "

• Character. This data type is used to store text (or string) values. To display output or results,
you can either use the print() function or just the variable name. Examples of character data
in R include the following:
1 > # character data
2 > module _ name <- " Welcome to DSC2608 - Applied Quantitative Modelling "
3 > print ( module _ name )
4 [1] " Welcome to DSC2608 - Applied Quantitative Modelling "
5 > class ( module _ name )
6 [1] " character "

• Logical. This data type is used to store Boolean values, which can be either TRUE or FALSE. In
the following example, is recession is a logical variable that is set to FALSE. This variable could
be used to represent whether or not the economy is currently in a recession.
1 > # logical ( or Boolean )
2 > is _ recession <- FALSE
3 > class ( is _ recession )
4 [1] " logical "

• Factor. This data type is used to hold categorical data where each value corresponds to a certain
category. The following are some examples of factor data in R:
1 > credit _ rating <- c ( " AAA " , " BBB " , " BB " , " B " , " CCC " )
2 > rating <- factor ( c ( " BBB " , " AAA " , " BB " , " B " , " CCC " ) ,
3 + levels = credit _ rating )
4 > credit _ rating
5 [1] " AAA " " BBB " " BB " " B " " CCC "
6 > rating
7 [1] BBB AAA BB B CCC
8 Levels : AAA BBB BB B CCC
Section 1.3. Data types and data structures Page 11

A factor variable called rating is created using the factor() function. The levels argument
defines the possible values of the factor, in this case the credit ratings.
1.3.2 Data structures. In order to store and manipulate data, R provides a wide range of data structures.
The following are a few common R data structures:
• Vectors. To assign multiple values to a variable, we can use an R object called a vector. A
vector is a sequence/collection of data elements of the same data type such as numeric data and
characters. Members in a vector are called components. The c() function in R concatenates its
arguments to generate a vector, which can be used to build vectors. Here are some examples of
vectors in R:
1 > # create a numeric vector of stock prices : daily closing prices
2 > # over a certain period of time for the S & P 500 index
3 > stock _ prices <- c (4000 , 4015 , 4030 , 4025 , 4010 , 4025 , 4040 , 4055 , 4045 ,
4060 , 4080 , 4090)
4 > class ( stock _ prices )
5 [1] " numeric "

1 > # currency codes

2 > currencies <- c ( " ZAR " , " EUR " , " USD " , " GBP " , " CNY " )
3 > # create a quarterly GDP growth rates
4 > gdp _ growth <- c (0.5 , 0.8 , 0.6 , 0.4 , 0.7)

To access a specific element in the vector, you can use square brackets and specify the position
of the element you want to retrieve. For example, if you have a vector named gdp growth and
you want to access the third element in the vector, you would use gdp growth[3]:
1 > gdp _ growth [3]
2 [1] 0.6

• Matrices. A matrix is a sequence/collection of data elements of the same data type arranged in
a two-dimensional rectangular layout with rows and columns.
The matrix() function in R can be used to construct a matrix. It accepts a vector of data and
the dimensions of the matrix as inputs or arguments.
1 > # create a 3 x4 matrix from the stock _ prices vector
2 > stock . price _ matrix <- matrix ( stock _ prices , nrow = 3 , ncol = 4 , byrow = TRUE )
3
4 > # print the resulting matrix
5 > stock . price _ matrix
6 [ ,1] [ ,2] [ ,3] [ ,4]
7 [1 ,] 4000 4015 4030 4025
8 [2 ,] 4010 4025 4040 4055
9 [3 ,] 4045 4060 4080 4090

Note that the nrow argument specifies the number of rows, the ncol argument specifies the
number of columns and the byrow argument specifies that the matrix should be filled row by row
(as opposed to column by column). Additionally, to access a specific element in the matrix, you
use variable name[i, j], where i is the row index and jis the column index. To access an entire
row, you use variable name[i, ], where i is the row index. To access an entire column, you use
variable name[, j], where j is the column index.
For example, if you use the stock.price matrix and you want to access the element in the second
row and third column, you would use stock.price matrix[2, 3], as follows:
Section 1.3. Data types and data structures Page 12

1 > # access the element in the second row and third column
2 > stock . price _ matrix [2 ,3]
3 [1] 4040
4

• Data frames. A data frame is used for storing data tables. It is a list of vectors of equal length.
Unlike matrices, it can gather vectors containing different variable types. You can create a data
frame in R using the data.frame() function.
1 > # # creates a data frame df with 3 columns : price , currency , and logical .
operator
2 > df <- data . frame ( price = c (2000 , 109.26 , 139866.50) ,
3 + currency = c ( " ZAR " , " USD " , " EUR " ) ,
4 + logical . operator = c ( TRUE , TRUE , FALSE ) )

1 > # returns the data type or class of each column in df

2 > class ( df $ price )
3 [1] " numeric "
4 > class ( df $ currency )
5 [1] " character "
6 > class ( df $ logical . operator )
7 [1] " logical "

The selection of specific elements in data frames works the same way as for matrices. The
command df [i, j] would return the element on the ith row, in the jth column. For example, here
is how to retrieve the element in third the row and the first column of the df data frame:
1 > # retrieve third row and first column
2 > df [3 ,1]
3 [1] 139866.5

• Lists. A list in R allows you to gather a variety of objects under one name in an ordered way.
These objects can be matrices, vectors, data frames or even other lists. It is NOT required that
these objects must be related to each other.
The following example creates a list called company inf o that contains four elements with dif-
ferent data types:
1 > company _ info <- list (
2 + company _ name = " XYZ Holdings Ltd " ,
3 + share _ price = 219.34 ,
4 + num _ employees = 152000 ,
5 + is _ public = TRUE )

A list named company inf o that contains the details about a company called XYZ Holdings Ltd
is being created by the R code. Included in the data are the company name, share price, total
number of employees and whether or not the company is traded publicly. Generally, lists can be
useful for storing and organising information of different data types that could for instance be
used to perform various analyses on individual companies.
To access the ith object in the list, write list name[[i]]. If you want to access a variable in
the ith object of the list, you can use the $ operator followed by the name of the variable. If you
do not know the index of the item in the list, you can also use the names of the items in the list
instead of an index.
For instance, use company inf o$shareprice to retrieve the share price variable in the company inf o
list.
Section 1.4. Sequences Page 13

1 > # access the fourth element in the list

2 > company _ info [[4]]
3 [1] TRUE
4 > # access the share _ price variable
5 > company _ info $ share _ price
6 [1] 219.34

1.4 Sequences

1.4.1 Basic sequences. An important way of creating vectors is to generate a sequence of numbers. The
colon operator : is used to generate integer sequences. It can also be combined with other operations
to build more complex sequences. For creating and iterating through value ranges, it is a very helpful
operator in R.
For instance, 1:10 will generate a sequence of numbers starting from 1 to 10 by steps of 1. This means
there is a need to specify the first and the last values separated by a colon.
1 > # create a sequence of numbers from 1 to 10
2 > 1:10
3 [1] 1 2 3 4 5 6 7 8 9 10

More generally, any arithmetic progression can be generated by the function seq(). The parameters of
seq are shown in the following list:
• seq(from, to) specifies the first and the last values.
• seq(from, to, by = ) specifies the first and last values with the step size.
• seq(length.out = ) – creates an evenly spaced sequence
• seq(from, to, length.out = ) creates an equally spaced sequence by value specified.
• seq(along.with = ) requires another object.
The following illustrates each of the functions:
1 > seq ( from =1 , to =10)
2 [1] 1 2 3 4 5 6 7 8 9 10
3 > # generates a sequence of numbers from 0 to 15 with a step of 3.
4 > seq ( from =0 , to =15 , by =3)
5 [1] 0 3 6 9 12 15
6 > # generates a sequence of 15 evenly spaced numbers between 0 and 1 , inclusive .
7 > seq ( length . out =15)
8 [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
9 > # generates a sequence of 5 evenly spaced numbers between -5 and 5 , inclusive .
10 > seq ( from = -5 , to = 5 , length =5)
11 [1] -5.0 -2.5 0.0 2.5 5.0
12 > # generates a sequence of numbers from 0 to 10 with a default step of 1 ,
13 > # and assigns it to the variable s1 .
14 > s1 <- seq (0 ,10)
15 > s1
16 [1] 0 1 2 3 4 5 6 7 8 9 10
17 > # generates a sequence of numbers from 1 to 5 with a step of 1 ,
18 > # with the same length as the sequence stored in the variable s1
19 > s2 <- seq (1 ,5 , along . with = s1 )
20 > s2
Section 1.4. Sequences Page 14

21 [1] 1.0 1.4 1.8 2.2 2.6 3.0 3.4 3.8 4.2 4.6 5.0

Sometimes it is necessary to have repeated values. In such instances the function rep() can be used
where repeated values are required. In R, it is an iteration function, which means repetition:
1 > # generates a 10 - dimensional vector , where each element is equal to 5
2 > rep (5 ,10)
3 [1] 5 5 5 5 5 5 5 5 5 5
4 > # each number from 5 to 10 is repeated twice .
5 > rep (5:10 , each =2)
6 [1] 5 5 6 6 7 7 8 8 9 9 10 10
7 > # length . out specifies the length of the output vector
8 > rep (5 ,10 , length . out =10)
9 [1] 5 5 5 5 5 5 5 5 5 5

Within the Additional Resources folder on the module site, you can find an Exercise Manual that
includes exercises aimed at evaluating your understanding of each Learning Unit.
Complete Activity 1.1 in the Exercise Manual before you proceed to the next subsection.
1.4.2 Arrays. An array is a multidimensional R data object that can store data in more than two
dimensions. A matrix is the special case of a two-dimensional array. It is created using the array()
function, and it takes vectors as input and uses the values in the dim parameter to create an array.
The following is an example of a 3 × 3 array in R created from a matrix of stock prices:
1 > # create a matrix with stock prices
2 > stock . prices _ matrix <- matrix ( c (100 , 150 , 200 , 125 , 175 , 225 , 150 , 200 , 250) ,
3 + nrow =3 , ncol =3 , byrow = TRUE )
4 > # convert the stock . price _ matrix to a 3 x3 array
5 > stock . prices _ array <- array ( stock . prices _ matrix , dim = c (3 ,3) )
6 > # print the stock . prices _ array
7 > print ( stock . prices _ array )
8 [ ,1] [ ,2] [ ,3]
9 [1 ,] 100 150 200
10 [2 ,] 125 175 225
11 [3 ,] 150 200 250

Here is another example to help you understand better: Imagine you have information about the gross
domestic product (GDP) of three countries – X, Y and Z – for four years – 2019, 2020, 2021 and 2022.
This information is measured in billions of Rands. You can represent these data in an array using the
following code:
1 > # create an array for GDP data
2 > gdp _ array <- array ( c (100 , 120 , 140 , 150 , 200 , 220 , 240 , 260 , 300 ,
3 + 330 , 350 , 380 , 50 , 60 , 70 , 80) ,
4 + dim = c (3 , 4) )
5 > # print the gdp _ array
6 > print ( gdp _ array )
7 [ ,1] [ ,2] [ ,3] [ ,4]
8 [1 ,] 100 150 240 330
9 [2 ,] 120 200 260 350
10 [3 ,] 140 220 300 380

Moreover, you can update the names of the rows and columns in the gdp array by using row.names,
colnames or dimnames. This will help you to organise and label data in a clear and meaningful way.
Section 1.5. Tables Page 15

1 > # update rows and columns of the gdp _ array

2 > row . names ( gdp _ array ) <- c ( " Country X " , " Country Y " , " Country Z " )
3 > colnames ( gdp _ array ) <- c ( " Yr _ 2019 " , " Yr _ 2020 " , " Yr _ 2021 " , " Yr _ 2022 " )
4 > # print the updated gdp _ array
5 > print ( gdp _ array )
6 Yr _ 2019 Yr _ 2020 Yr _ 2021 Yr _ 2022
7 Country X 100 150 240 330
8 Country Y 120 200 260 350
9 Country Z 140 220 300 380

Just like we have seen before with vectors, list, matrices, etc., array elements can also be accessed. For
instance, print the element in the second row and third column of the gdp array:
1 > # second row and third column element of gdp _ array
2 > print ( gdp _ array [2 ,3])
3 [1] 260

Complete Activity 1.2 in the Exercise Manual before you proceed to the next section.

1.5 Tables

Tables are a fundamental object type for representing datasets. A table can be viewed in two ways:
1. as a sequence of named columns that each describe a single aspect of all entries in a dataset
2. as a sequence of rows that each contain all information about a single entry in a dataset
Tables are similar to arrays in that they can store multiple values. Table 1.2 presents some key differences
between tables and arrays:

Table 1.2: Difference between tables and arrays

Tables Arrays
Tables have a fixed number of columns and each Arrays can be multidimensional and can hold ele-
column has a defined data type or format. ments of any type, including other arrays.
Tables typically have named columns, which allows Arrays, on the other hand, usually have numerical
for easy reference to specific attributes of the data. indices to access elements.
Tables are more flexible when it comes to adding or
In arrays, adding or removing elements might re-
removing columns, as the structure remains con-
quire reshaping or resizing the entire array.
sistent across rows.

Another specific type of table commonly used in R for statistical and data analysis is called a data frame.
It is a two-dimensional table-like data structure where each column can contain values of different types
(e.g., characters, numerics, logical) and can have its own name. Data frames are particularly useful for
working with heterogeneous datasets, where different variables may have different data types.
The following Subsections 1.5.1 to 1.5.4 provide some useful table operations:
1.5.1 Binding. The binding of columns can be executed when two datasets, a dataset and a vector, or
two vectors have the same number of values (or the same number of rows in the case of datasets). They
can be placed together into one dataset using cbind() or bind cols. This is different from merging
Section 1.5. Tables Page 16

(which is discussed later in the study guide), hence there is no row matching system. Similarly, with
rows binding, the rbind() or bind rows() can be used in R. Let us create two data frames df 1 and
df 2 containing the student ID and module name, and perform some row and column binding operations.
1 > # create a data frame with student IDs and the economics modules
2 > df1 <- data . frame ( student _ id = c (1:5) ,
3 + module = c ( rep ( " Microeconomics " , 2) , rep ( " Macroeconomics " , 3) ) )
4 > # print the data frame to the console
5 > df1
6 student _ id module
7 1 1 Microeconomics
8 2 2 Microeconomics
9 3 3 Macroeconomics
10 4 4 Macroeconomics
11 5 5 Macroeconomics

Create another data frame df 2.

1 > # create another data frame with student IDs and additional economics modules
2 > df2 <- data . frame ( student _ id = c (6:10) ,
3 + module = c ( rep ( " Econometrics " , 1) , rep ( " Development Economics " , 4) ) )
4 > # print the data frame to the console
5 > df2
6 student _ id module
7 1 6 Econometrics
8 2 7 Development Economics
9 3 8 Development Economics
10 4 9 Development Economics
11 5 10 Development Economics

Combine df 1 and df 2 using the rbind function.

1 > # combine the two data frames using rbind
2 > df _ row _ combined <- rbind ( df1 , df2 )
3 > # print the combined data frame to the console
4 > df _ row _ combined
5 student _ id module
6 1 1 Microeconomics
7 2 2 Microeconomics
8 3 3 Macroeconomics
9 4 4 Macroeconomics
10 5 5 Macroeconomics
11 6 6 Econometrics
12 7 7 Development Economics
13 8 8 Development Economics
14 9 9 Development Economics
15 10 10 Development Economics

Combine df 1 and df 2 using the cbind function.

1 > # combine the two data frames using cbind
2 > df _ bind _ combined <- cbind ( df1 , df2 )
3 > # print the combined data frame to the console
4 > df _ bind _ combined
5 student _ id module student _ id module
6 1 1 Microeconomics 6 Econometrics
7 2 2 Microeconomics 7 Development Economics
8 3 3 Macroeconomics 8 Development Economics
9 4 4 Macroeconomics 9 Development Economics
10 5 5 Macroeconomics 10 Development Economics
Section 1.5. Tables Page 17

1.5.2 Sorting. In R, it is possible to sort the rows of a dataset either in alphabetical order based on
a column containing character variables, or in numerical order based on a column containing numeric
variables. Additionally, the sorting can be done in either ascending or descending order using the sort()
function.
how to create a vector of student IDs using the c() function. The vector contains ten elements, each
of which is a unique student ID:
1 > # create a vector of student IDs
2 > student _ id <- c (1011 , 1005 , 1013 , 1017 , 1009 , 1001 , 1003 , 1015 , 1007 , 1019)
3 > student _ id
4 [1] 1011 1005 1013 1017 1009 1001 1003 1015 1007 1019
5 > # sort the student IDs in ascending order ( from lowest to highest )
6 > sorted _ ids _ ascending <- sort ( student _ id )
7 > sorted _ ids _ ascending
8 [1] 1001 1003 1005 1007 1009 1011 1013 1015 1017 1019
9 > # sort the student IDs in descending order ( from highest to lowest )
10 > sorted _ ids _ descending <- sort ( student _ id , decreasing = TRUE )
11 > sorted _ ids _ descending
12 [1] 1019 1017 1015 1013 1011 1009 1007 1005 1003 1001

Here is another example of how to create a data frame of student marks and sort them in descending
order. It further adds a result column to indicate whether the student passed or failed given that the
pass mark is greater than or equal to 50:
1 > # create a data frame of student marks
2 > marks <- data . frame (
3 + student _ name = c ( " Tlou " , " Jane " , " Piers " , " Sarah " , " Thato " ) ,
4 + DSC2608 _ marks = c (80 , 75 , 44 , 39 , 92)
5 + )
6 > # print the student marks data frame
7 > print ( marks )
8 student _ name DSC2608 _ marks
9 1 Tlou 80
10 2 Jane 75
11 3 Piers 44
12 4 Sarah 39
13 5 Thato 92

1 > # add a column to indicate whether the student passed or failed

2 > marks $ result <- ifelse ( marks $ DSC2608 _ marks >= 50 , " pass " , " fail " )
3 > # sort the data frame based on the highest to lowest marks in DSC2608 _ marks
4 > marks <- marks [ order ( - marks $ DSC2608 _ marks ) ,]
5 > # print the sorted data frame
6 > print ( marks )
7 student _ name DSC2608 _ marks result
8 5 Thato 92 pass
9 1 Tlou 80 pass
10 2 Jane 75 pass
11 3 Piers 44 fail
12 4 Sarah 39 fail

The − sign preceding marks$DSC2608 marks indicates the descending order. When sorting data using
the − sign, it reverses the usual ascending order and arranges the data in descending order.
The following R code uses the kable() function from the knitr package to create a formatted table
of students’ marks and their result in DSC2608 Marks. The knitr package is used to improve the
Section 1.5. Tables Page 18

formatting of output to make it easier to read. Refer to Section 1.6 for an explanation of how to use
packages in R.
1 > library ( knitr )
2 > # create a table using kable ()
3 > kable ( marks , align = c ( " l " , " c " , " c " ) ,
4 + col . names = c ( " Student Name " , " DSC2608 Marks " , " Result " ) )
5
6
7 | | Student Name | DSC2608 Marks | Result |
8 |: - -|: - - - - - - - - - - - -|: - - - - - - - - - - - - - - -:|: - - - - - -:|
9 |5 | Thato | 92 | pass |
10 |1 | Tlou | 80 | pass |
11 |2 | Jane | 75 | pass |
12 |3 | Piers | 44 | fail |
13 |4 | Sarah | 39 | fail |

1.5.3 Transformation. New columns can be created or existing ones modified by applying transfor-
mations to them. Transformations can be adding, subtracting, multiplying, dividing and raising to a
power. Functions such as log() and exp() can also be applied. An additional column pass f ail can
be added to the existing marks data frame using the mutate() function instead.
1 > library ( dplyr )
2 > # additional transformation using mutate function from dplyr package
3 > marks <- marks % >%
4 + mutate ( pass _ fail = ifelse ( DSC2608 _ marks >= 50 , " pass " , " fail " ) )
5 > marks
6 student _ name DSC2608 _ marks pass _ fail
7 1 Tlou 80 pass
8 2 Jane 75 pass
9 3 Piers 44 fail
10 4 Sarah 39 fail
11 5 Thato 92 pass

The code then uses the mutate() function from the dplyr package to add a new column named
pass f ail to the data frame, which indicates whether a student passed or failed the DSC2608 module
based on DSC2608 marks. The ifelse() function is used to assign pass or fail conditionally to the
new column based on whether the mark is greater or less than 50, respectively.
Using the same marks data frame, you can give the columns more descriptive names using the
rename() function from the dplyr package. The student name column is renamed Student N ame,
the DSC2608 marks column is renamed DSC2608 M arks and the pass f ail column is renamed
Results.
1 > library ( dplyr )
2 > # rename the columns in the marks data frame to more descriptive names
3 > marks <- marks % >%
4 + rename ( Student _ Name = student _ name , DSC2608 _ Marks = DSC2608 _ marks ,
5 + Results = pass _ fail )
6 > marks
7 Student _ Name DSC2608 _ Marks Results
8 1 Tlou 80 pass
9 2 Jane 75 pass
10 3 Piers 44 fail
11 4 Sarah 39 fail
12 5 Thato 92 pass
Section 1.5. Tables Page 19

1.5.4 Filtering/Subsetting. Use a logical operator, such as ==, >, <, <=, >=, ! = to filter or find a
subset. Note that the equals logical operator is two == signs, a single = is reserved for an assignment.
The result is a logical variable.
You can use the filter() function from dplyr package to filter the data in the data frame based on
certain conditions. For instance, the following R code filters the marks data frame to create a new
data frame named passed that only includes the records for students who passed the DSC2608 module,
based on the Results column. The filtered data frame is printed to the console using the print()
function:
1 > # filter the data to only include students who passed DSC2608 module
2 > passed <- marks % >% filter ( Results == " pass " )
3 > print ( passed )
4 Student _ Name DSC2608 _ Marks Results
5 1 Tlou 80 pass
6 2 Jane 75 pass
7 3 Thato 92 pass

Similarly, the following R code creates a new data frame named f ailed. The filter condition is based
on the Results column of the marks data frame, where only the records that have a value of fail in
the Results column are selected:
1 > # filter the data to only include students who failed DSC2608 module
2 > failed <- marks % >% filter ( Results == " fail " )
3 > print ( failed )
4 Student _ Name DSC2608 _ Marks Results
5 1 Piers 44 fail
6 2 Sarah 39 fail
7

Finally, create a vector called exam marks that contains the values 87, 62, 55, 41 and 96. Furthermore,
use the pipe operator %>% with the mutate() function to add the new column Exam marks to the
updated data frame that is assigned to a new variable called updated marks:
1 > # create a vector of exam marks
2 > exam _ marks <- c (87 , 62 , 55 , 41 , 96)
3 > # add the new column to the marks data frame
4 > updated _ marks <- marks % >% mutate ( Exam _ marks = exam _ marks )
5 > # print the updated marks data frame
6 > updated _ marks
7 Student _ Name DSC2608 _ Marks Results Exam _ marks
8 1 Tlou 80 pass 87
9 2 Jane 75 pass 62
10 3 Piers 44 fail 55
11 4 Sarah 39 fail 41
12 5 Thato 92 pass 96

Finally, the following R code creates a new data frame called pass distinction, which contains only the
rows from the updated marks data frame where the Exam marks are greater than or equal to 75 and
the Results column is equal to pass. This is done using logical operators and indexing.
1 > # students who scored above or equal to 75 on the exam and passed
2 > pass _ distinction <- updated _ marks [ updated _ marks $ Exam _ marks >= 75 &
3 + updated _ marks $ Results == ’ pass ’ , ]
4 > pass _ distinction
5 Student _ Name DSC2608 _ Marks Results Exam _ marks
6 1 Tlou 80 pass 87
7 5 Thato 92 pass 96
Section 1.6. Importing packages and datasets, viewing data Page 20

Complete Activity 1.3 in the Exercise Manual before you proceed to the next section.

1.6 Importing packages and datasets, viewing data

R packages are collections of R functions and datasets. Some standard packages are included with the
R installation, while others can be installed in RStudio by using the install.packages("package
name") function. Some packages must be downloaded from http://cran.r-project.org/ or Google and
manually installed. Once installed, the package needs to be loaded in each session by using the following
R syntax:
library(“package name”)
Often data are available in different formats ready to be imported into R. R accepts files with different
formats, for example, txt, .csv and .xls. For instance, the following can be used to read several
frequently used files in R:
• Text files, use read.table() for space separated files, comma separated files
• CSV files, use read csv() from the readr package (used by Rstudio interface)
• Excel files, use read excel() from the readxl package (used by Rstudio interface)
There are several ways to look at a dataset to view (or get an overview of data). Firstly, you can simply
extract the data entirely by double clicking on the dataset in the global environment or by using the
View(data name) function.
A specific column can be extracted by using the data name$column name command. The first k rows or
the last k rows can be extracted by using the head(data name, k) or tail(data name, k) command,
respectively.
A quick overview of the dataset can be obtained with the summary(data name) or str(data name)
command.
Complete Activity 1.4 in the Exercise Manual before you proceed with the formal assessment for
Learning Unit 1.
Now that you have reached the end of Learning Unit 1, complete Assessment 1 as outlined in the
Activities section on the module site and ensure that you submit the completed assessment for formal
evaluation.

R Programming - PPT - UNIT- 1
No ratings yet
R Programming - PPT - UNIT- 1
72 pages
Tutorialspoint For R PDF
100% (2)
Tutorialspoint For R PDF
34 pages
DAR Programming - An Approach to Data Analytics-1
No ratings yet
DAR Programming - An Approach to Data Analytics-1
156 pages
R With RStudio For Introductory Statistics
No ratings yet
R With RStudio For Introductory Statistics
163 pages
R for Beginners
No ratings yet
R for Beginners
76 pages
Complete Introduction to Basic Concepts in R Programming Including Variable Types, Basic Syntax, Vectors, Data Frames, and Navigating the RStudio Environment for Data Analysis
No ratings yet
Complete Introduction to Basic Concepts in R Programming Including Variable Types, Basic Syntax, Vectors, Data Frames, and Navigating the RStudio Environment for Data Analysis
22 pages
R Module 1 Notes
No ratings yet
R Module 1 Notes
15 pages
R Language Lab Manual Lab 1
No ratings yet
R Language Lab Manual Lab 1
32 pages
r Studio Manual
No ratings yet
r Studio Manual
61 pages
Essential R
No ratings yet
Essential R
183 pages
R_code_intro
No ratings yet
R_code_intro
46 pages
R Intro Script
No ratings yet
R Intro Script
86 pages
A Crash R Course On Statistical Graphics
No ratings yet
A Crash R Course On Statistical Graphics
169 pages
EssentialR PDF
No ratings yet
EssentialR PDF
181 pages
BDA 2024 Section 01
No ratings yet
BDA 2024 Section 01
34 pages
L1 Financial Data and Their Properties
No ratings yet
L1 Financial Data and Their Properties
53 pages
Lec 1
No ratings yet
Lec 1
42 pages
Introduction To R
No ratings yet
Introduction To R
67 pages
Module 1-1
No ratings yet
Module 1-1
38 pages
Unit 1
No ratings yet
Unit 1
16 pages
Rtutorial
No ratings yet
Rtutorial
28 pages
notes
No ratings yet
notes
17 pages
Computing With R
No ratings yet
Computing With R
20 pages
Topic 1 - Intro To Basics
No ratings yet
Topic 1 - Intro To Basics
38 pages
Introduction To R, Version 2
No ratings yet
Introduction To R, Version 2
51 pages
R & Statistics Tutorial: Andre Garenne May 17, 2017
No ratings yet
R & Statistics Tutorial: Andre Garenne May 17, 2017
50 pages
Introduction To R: General Lines
No ratings yet
Introduction To R: General Lines
36 pages
Getting Started With R and RStudio
No ratings yet
Getting Started With R and RStudio
35 pages
Introduction To Programming Econometrics With R - Draft
No ratings yet
Introduction To Programming Econometrics With R - Draft
55 pages
What Is Statistical Programming?: Computations Which Aid in Statistical Analysis To
No ratings yet
What Is Statistical Programming?: Computations Which Aid in Statistical Analysis To
47 pages
R Tutorial
No ratings yet
R Tutorial
26 pages
Getting Started in R
No ratings yet
Getting Started in R
39 pages
Introduction To R
No ratings yet
Introduction To R
6 pages
A Short Introduction To R: Richard Harris Creative Commons Attribution-Noncommercial-Sharealike 3.0 Unported License
No ratings yet
A Short Introduction To R: Richard Harris Creative Commons Attribution-Noncommercial-Sharealike 3.0 Unported License
36 pages
01-MSBA-615 - Introduction To R Programming and R Studio
No ratings yet
01-MSBA-615 - Introduction To R Programming and R Studio
47 pages
R & Python notes
No ratings yet
R & Python notes
131 pages
r File Finall
No ratings yet
r File Finall
75 pages
A Shortrtutorial: Steven M. Holland
No ratings yet
A Shortrtutorial: Steven M. Holland
28 pages
AnalyticsEdge Rmanual PDF
100% (1)
AnalyticsEdge Rmanual PDF
44 pages
Unit 1 - R Programming
No ratings yet
Unit 1 - R Programming
30 pages
Chapter 1 Introduction (4)
No ratings yet
Chapter 1 Introduction (4)
179 pages
STA1007S Lab 1: R Interface: Getting Started
No ratings yet
STA1007S Lab 1: R Interface: Getting Started
9 pages
MIS 3.hafta (Introduction To R)
No ratings yet
MIS 3.hafta (Introduction To R)
32 pages
Topic 1 - Financial Analytics and the R Environment
No ratings yet
Topic 1 - Financial Analytics and the R Environment
24 pages
R Module 1
No ratings yet
R Module 1
34 pages
A Concise Tutorial On R
No ratings yet
A Concise Tutorial On R
112 pages
Intro2R Wk3 Rev
No ratings yet
Intro2R Wk3 Rev
45 pages
R Socialscience
No ratings yet
R Socialscience
62 pages
Unit---3
No ratings yet
Unit---3
64 pages
Introduction To R Programming Notes For Students
No ratings yet
Introduction To R Programming Notes For Students
41 pages
NUMBER SYSTEMS
No ratings yet
NUMBER SYSTEMS
101 pages
R For Data Science
No ratings yet
R For Data Science
47 pages
Introducation To R
No ratings yet
Introducation To R
23 pages
Introduction To R
No ratings yet
Introduction To R
6 pages
R Studio
No ratings yet
R Studio
41 pages
Quant Checklist 463 by Aashish Arora For Bank Exams 2024
100% (1)
Quant Checklist 463 by Aashish Arora For Bank Exams 2024
102 pages
R Language Lab Manual Lab 1
100% (1)
R Language Lab Manual Lab 1
33 pages
R Tutorial
No ratings yet
R Tutorial
26 pages
Part I: Introductory Materials: Introduction To R
No ratings yet
Part I: Introductory Materials: Introduction To R
25 pages
Introduction To R
No ratings yet
Introduction To R
20 pages
35 Mathematics
No ratings yet
35 Mathematics
7 pages
Engineering Surveying_FC22130
No ratings yet
Engineering Surveying_FC22130
130 pages
Brief R Tutorial
No ratings yet
Brief R Tutorial
8 pages
STA2601 - Applied Statistics I
No ratings yet
STA2601 - Applied Statistics I
3 pages
4a's Lesson Plan - Division of Polynomials
No ratings yet
4a's Lesson Plan - Division of Polynomials
10 pages
Motion in A Plane All Derivations Download Free
100% (1)
Motion in A Plane All Derivations Download Free
2 pages
UMSMATSING18~~2025~2
No ratings yet
UMSMATSING18~~2025~2
6 pages
Control System
No ratings yet
Control System
192 pages
APS-average Propensity To Save
No ratings yet
APS-average Propensity To Save
9 pages
Introduction To Statistics
100% (1)
Introduction To Statistics
25 pages
Math 5 - Numerical Soltuions To Ce Problems: B S C E
No ratings yet
Math 5 - Numerical Soltuions To Ce Problems: B S C E
28 pages
5.constitutive Modelling of Ingot Breakdown Process of Low Alloy Steels
No ratings yet
5.constitutive Modelling of Ingot Breakdown Process of Low Alloy Steels
8 pages
HA2 - 5.9 Fractional (Rational) Equations
No ratings yet
HA2 - 5.9 Fractional (Rational) Equations
4 pages
Bijlaard 2010
No ratings yet
Bijlaard 2010
15 pages
Online Trading Academy - Course Curriculum
No ratings yet
Online Trading Academy - Course Curriculum
4 pages
Complex Numbers: A+bi A Real Part Bi Imaginary Part
No ratings yet
Complex Numbers: A+bi A Real Part Bi Imaginary Part
8 pages
CVR College of Engineering: UGC Autonomous Institution
No ratings yet
CVR College of Engineering: UGC Autonomous Institution
3 pages
Objectives: Energy, Temperature, and Heat
No ratings yet
Objectives: Energy, Temperature, and Heat
25 pages
Lecture19 12 Amperes Law
No ratings yet
Lecture19 12 Amperes Law
28 pages
Simplex Algorithm
No ratings yet
Simplex Algorithm
10 pages
Autocad Shortcuts
No ratings yet
Autocad Shortcuts
16 pages
Grade 4 Math q2 Written Works 3 4
No ratings yet
Grade 4 Math q2 Written Works 3 4
4 pages
Strategy For Mathematics Kashish Mittal
No ratings yet
Strategy For Mathematics Kashish Mittal
28 pages
A Sample Proposal With Comment SLSMLS
No ratings yet
A Sample Proposal With Comment SLSMLS
5 pages
Origami Papiroflexia
No ratings yet
Origami Papiroflexia
6 pages
Neural Network Tutorial 1
No ratings yet
Neural Network Tutorial 1
3 pages
Assignment 5 Ai
No ratings yet
Assignment 5 Ai
3 pages
Các tính chất của Biến đổi Fourier
No ratings yet
Các tính chất của Biến đổi Fourier
3 pages
CML Vs SML
No ratings yet
CML Vs SML
9 pages
Learn R Programming in 24 Hours
From Everand
Learn R Programming in 24 Hours
Alex Nordeen
No ratings yet

DSC2608 Learning_Unit_1

Uploaded by

DSC2608 Learning_Unit_1

Uploaded by

1.

1.1 Getting started with R

Figure 1.2: Histogram of individual customers’ credit card balance

Figure 1.8: The RStudio landing page

Note of the following:

Table 1.1: Arithmetic, relational, logic and indexing operators in R

1.2 Expressions, variables and operations

1.2.1 Expressions in R. An expression in R is a combination of values, variables, operators and functions

The built-in functions available in R include the following:

To remove objects from the workspace, use the rm() function:

1.3 Data types and data structures

1 > # currency codes

1 > # returns the data type or class of each column in df

1 > # access the fourth element in the list

1 > # update rows and columns of the gdp _ array

Table 1.2: Difference between tables and arrays

Create another data frame df 2.

Combine df 1 and df 2 using the rbind function.

Combine df 1 and df 2 using the cbind function.

1 > # add a column to indicate whether the student passed or failed

1.6 Importing packages and datasets, viewing data

You might also like