[go: up one dir, main page]

0% found this document useful (0 votes)
15 views34 pages

BA - Unit 4

This lesson covers data structures in R, including vectors, matrices, lists, factors, and data frames, along with their operations and applications. It aims to equip learners with the skills to effectively organize and manipulate data using R's built-in functions and control flows. The lesson emphasizes the importance of selecting appropriate data structures for various programming tasks to enhance code efficiency and clarity.

Uploaded by

allinone6813
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views34 pages

BA - Unit 4

This lesson covers data structures in R, including vectors, matrices, lists, factors, and data frames, along with their operations and applications. It aims to equip learners with the skills to effectively organize and manipulate data using R's built-in functions and control flows. The lesson emphasizes the importance of selecting appropriate data structures for various programming tasks to enhance code efficiency and clarity.

Uploaded by

allinone6813
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

L E S S O N

4
Data Structures in R
Ms. Asha Yadav
Assistant Professor
Department of Computer Science
School of Open Learning
University of Delhi

STRUCTURE
4.1 Learning Objectives
4.2 Introduction
4.3 Vectors
4.4 Matrices
4.5 Lists
4.6 Factors
4.7 Data Frames
4.8 Conditionals and Control Flows
4.9 Loops
4.10 Apply Family
4.11 Summary
4.12 Answers to In-Text Questions
4.13 Self-Assessment Questions
4.14 References
4.15 Suggested Readings

4.1 Learning Objectives


By the end of this chapter, you should be able to:
Understand and work with vectors, matrices, arrays, lists, factors, and data frames
in R.
Use conditionals and control flows to add logic to your programs.
Implement loops for repeated code and enhance efficiency by using the apply family
of functions.

68 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 68 10-Jan-25 3:51:54 PM


DATA STRUCTURES IN R

Identify when and how to select among various data structures and Notes
control mechanisms.
Write cleaner, more efficient R code using the strength of functional
programming.

4.2 Introduction
In this lesson we will discuss about data structures and their use to orga-
nize and process data more efficiently. You may think of a data structure
as a blueprint that indicates how to arrange and store data. The design
of any structure is deliberate, as it allows data access and manipulation
in certain, structured ways. We use specialized methods or functions to
interact with these structures in programming and statistical software like
R. These tools are built for easier working with data of all shapes and
forms. R offers six key data structures to work with: Vectors, Matrices,
Arrays, Lists, Factors and Data frames.
Further these can be divided into two categories Homogenous and Hetero-
geneous structures. The first three vectors, matrices, and arrays are like
neat, organized boxes, where everything is of the same type hence they
are called homogenous. On the other hand, heterogeneous structures are
data frames and lists that allow for greater flexibility. They can accom-
modate elements of various types to coexist together. Factor is a special
data structure specially used for handling categorical data (nominal or
ordinal). In the subsequent sections we will discuss these data structures.
A point to remember for those who are already familiar with program-
ming; R has no scalar types, in fact, numbers, strings or any other scalar
are vectors of length one.

4.3 Vectors
It is one of the basic data structures in R programming languages, it is
used to store multiple values having same type also called modes. It is
one-dimensional and can hold numeric, character, logical or other values,
but all the values must have same mode. Vectors are fundamental to R,
hence most of the operations are performed on vectors. Various types of
vectors are shown in Table 4.1 below:

PAGE 69
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 69 10-Jan-25 3:51:54 PM


BUSINESS ANALYTICS

Notes Table 4.1: Types of Vectors

Creating a Vector
You can create vectors using the c() function, which stands for combine
or concatenate. Also, vectors are stored contiguously in memory just
like arrays in C, hence the size of vector is determined at the time of
creation. Thus, any modification to the vector will lead to reassignment
(creating a new vector with same name internally). Code to create and
display a few vectors is shown below in code window 1.

Code Window 1
Another point to note is c() function allows you to modify or reassign
an existing vector as shown in code window 2.

Code Window 2

70 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 70 10-Jan-25 3:51:54 PM


DATA STRUCTURES IN R

This will add 10 to the vector v1 in the end or on 4th place as instructed. Notes
Vectors are useful for analysis as R allows us to use various operations
over them. In this section we will explore various operations that can
be used on vectors.
Length: We can obtain the length of a vector using length() function.
This can be used to iterate over vector in loops.

This will return 5 as output.


Indexing and Subset: We can use indexing to refer to a particular
element of a vector, we can also extract subsets using indexing.
Note that vector index starts from 1 instead of 0, and subset range
is inclusive.

This will give 3 and (3,23,4) as output. You can also give nega-
tive index to omit a value like print(v1[-2]) will output all values
except second index.
You can also apply filtering to vectors by applying logical expressions
that return true/false for each vector, output is given by true values.

This will give output 30 40 50 and 20 30 40.


Element-wise Operations: We can apply simple operations on all
the element of a vector.

This will give (3 4 5 6 7), (2 4 6 8 10), (-1 0 1 2 3) as output


respectively. We can apply all operators like arithmetic, logical,

PAGE 71
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 71 10-Jan-25 3:51:55 PM


BUSINESS ANALYTICS

Notes relational etc. in an element wise fashion. Various operators are


already discussed in lesson 3.
Vectorized Functions: R offers many built-in functions which can
be applied to vectors as a whole (rather than element-wise) and
give cumulative output as shown in Table 4.2.
Table 4.2: Vectorized Functions

Combining and Modifying Vectors: Apart from applying operations


on a single vector, we can also apply the given functions on two
or more vectors as shown in Table 4.3.
Table 4.3: Combined Vector Operations

Note: One important point when applying an operation to two vectors


is that such operations require both vectors to be the same length.
In case of length mismatch R automatically recycles, or repeats,
the shorter one (as shown in example) until it is long enough to
match the longer one as shown in code window 3.

72 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 72 10-Jan-25 3:51:55 PM


DATA STRUCTURES IN R

Notes

Code Window 3
Miscellaneous Functions: There are certain functions shown below
in Table 4.4 which can be used with vectors, as required.
Table 4.4: Miscellaneous Functions

Thus, vectors in R support a wide range of operations - from simple


arithmetic to advanced indexing and sub-setting. Because the vectors are
vectorized, you can apply operations directly to entire vectors, bypassing
looping, which makes code much more efficient and concise.

4.4 Matrices
Since you have understood vectors and various operations that can be
applied on them, now, let’s talk about matrices. You can understand a
matrix as an enhanced vector: it’s really nothing but a vector with two
extra attributes; namely the number of rows and the number of columns.

PAGE 73
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 73 10-Jan-25 3:51:56 PM


BUSINESS ANALYTICS

Notes As with vectors, matrices are also homogenous. However, don’t mix up
one-row or one-column matrices with vectors-they are not the same.
Now, matrices are actually a special type of a broader concept in R called
arrays. While matrices have just two dimensions (rows and columns),
arrays can go further and have multiple dimensions. For instance, a
three-dimensional array has rows, columns, and layers, adding an extra
level of organization into your data. The reason that matrices are useful
in R is the vast array of operations that you can carry out on them. Many
of these operations are based upon what you know already about vectors,
such as subsetting and vectorization, but it expands these in two dimen-
sions. The added structure of rows and columns makes matrices ideal
for mathematical operations, data manipulation, and statistical modelling.
The various operations on matrices are discussed below:
Creation: Matrices are generally created using matrix() function,
the data in matrices is stored in column major format by default.
The ‘nrow’ parameter specifies rows, and ‘ncol’ specifies columns.
We can use ‘byrow = TRUE’ to fill data row-wise in matrix instead
of column-wise. Code to create matrix using matrix() function and
by using vectors is shown below in code window 4.

Code Window 4

We can also add or delete rows and columns from matrices as


shown below in code window 5 and 6.

74 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 74 10-Jan-25 3:51:56 PM


DATA STRUCTURES IN R

Notes

Code Window 5

Code Window 6

R provides several operations for matrices, including addition,


multiplication, and scalar operations as shown in code window 7.

PAGE 75
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 75 10-Jan-25 3:51:57 PM


BUSINESS ANALYTICS

Notes

Code Window 7

As you may have noticed in the code window 7 that arithmetic multi-
plication and matrix multiplication are two different functions. Some of
the other functions are rowSums() and colSums() that give sum of rows/
columns and rowMeans() and colMeans() that give mean of rows/columns.
Just like vectors indexing and subsetting can be done on matrices.
You can access specific elements, rows, or columns using indices
as shown in code window 8.

Code Window 8

You can also assign values to submatrices like mat[c(1,3),] <-


matrix(c(1,1,8,12),nrow=2) that is we assign new values to first and
third row to matrix. If you give negative index it will exclude that
element like mat[-2,] will omit second row from output.
Matrix filtering is a powerful operation just like for vectors, it
enables efficient subsetting and selection of data from a matrix

76 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 76 10-Jan-25 3:51:57 PM


DATA STRUCTURES IN R

based on logical criteria. Some examples are shown below in code Notes
window 9.

Code Window 9

You can also give name to the rows and columns of a matrix using
the dimnames() function or by specifying them during the creation
of the matrix (as shown in code window 10).

Code Window 10

PAGE 77
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 77 10-Jan-25 3:51:57 PM


BUSINESS ANALYTICS

Notes Arrays
An array in R is a data structure that can store data in more than
one dimension, hence in R arrays are an extension of matrix. While a
matrix is constrained to two dimensions, with rows and columns, an
array, however, can take three or more dimensions. Arrays are more
useful for organizing and manipulating data having more than two
axes, such as 3D spatial data or multi-dimensional experimental results.
Array can be created using array() function with arguments data,
dimensions and dimension names as shown in code window 11.

Code Window 11

Array elements can be accessed in same manner as vector or


matrices. We can also name the dimensions.

Code Window 12

78 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 78 10-Jan-25 3:51:58 PM


DATA STRUCTURES IN R

We can reshape arrays dimension as shown. Notes

Code Window 13

4.5 Lists
In R, a list is an amazingly flexible data structure, meaning it can store
any kind of data together - numbers, characters, vectors, matrices, and
even other lists. This flexibility makes list different from vectors or ma-
trices, which insist on elements to be of the same class. A list is useful
for organizing complex data where different types may coexist. In R, lists
are used frequently, not only for storing results from statistical models
but also in general for organizing heterogeneous data:
You create a list by using the “list()” function, and any of the
elements in the list are accessed using double square brackets “[[
]]”. So for instance, “list(42, “Hello”, c(1, 2, 3))” generates a list
that has an integer, a string, and a vector.

PAGE 79
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 79 10-Jan-25 3:51:58 PM


BUSINESS ANALYTICS

Notes

Code Window 14

We can do indexing, subsetting or accessing elements of list is


shown in code window 15.

Code Window 15

We can find size of a list using length(), we can also add or delete
elements.

80 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 80 10-Jan-25 3:51:59 PM


DATA STRUCTURES IN R

Notes

Code Window 16

4.6 Factors
Factors are another type of R objects that are created by using a vector,
it stores the vector as well as a record of distinct values in that vector
called level. Factors are majorly used for nominal or categorical data.

Code Window 17

As shown in code window 17 factor fac has 8 values but only 3 different
levels. Level is very useful as shown in code window 18:

PAGE 81
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 81 10-Jan-25 3:51:59 PM


BUSINESS ANALYTICS

Notes

Code Window 18

Here in the code window, case 1 shows that we tried to assign a value
to factor index 2 and it was successfully done as the value belonged to
predefined level, but in case 2 we got NA assigned to index 2 o factor
instead of 15 because 15 was not present in factor level. In case 3 we
anticipated a new level which was not present in initial vector, but we gave
it in factor definition. Thus illegal values cannot be assigned to vectors.
Two commonly used functions with vectors are split() and by().
As the name suggests split() function is used to divide an object

82 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 82 10-Jan-25 3:51:59 PM


DATA STRUCTURES IN R

(such as a vector, data frame, or list) into subsets based on a certain Notes
grouping factor, it is particularly useful when you want to break
down your data into smaller groups according to a factor (like a
categorical variable).

Code Window 19

As shown in code above the vector data is split into groups A,B,C cor-
responding to their factor. However, by() function is used to apply a
function to subsets of a data object that have been grouped by a factor.
It is used when for scenarios where you want to perform operations like
calculating the mean, sum, or other statistical measures for each group
as shown in code window 20.

PAGE 83
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 83 10-Jan-25 3:52:00 PM


BUSINESS ANALYTICS

Notes

Code Window 20

Thus, “split()” function in R splits an object into subsets based on a fac-


tor and returns the grouped data without applying any function to those
subsets. On the other hand, the “by()” function is used when you want to
apply a function, such as “mean” or “sum”, to each group formed by a
factor. Both functions are essential for working with grouped data in R,
allowing users to organize and analyze data based on categorical variables.

4.7 Data Frames


A data frame is a two-dimensional, tabular data structure commonly used
for storing and manipulating data. It is very similar to table or spreadsheet,
where each column can store data of various types-numeric, character,
logical-and each row is an observation or record. Data frames are flexi-
ble and allow easy access to subsets of data, modification of values, and
application of functions across columns or rows. They are created using
the “data.frame()” function and are the default structure for most data
analysis tasks in R, especially for statistical modeling, data visualization,
and manipulation. The most important feature about data frames is that
it keeps the integrity of data intact. All columns within the data frame

84 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 84 10-Jan-25 3:52:00 PM


DATA STRUCTURES IN R

are of the same length, and meaningful column names can be assigned Notes
for easy interpretation and management of data.
Data frame creation is shown in code window 21.

Code Window 21

As we can see in code above, data frame d is created with vec-


tors name and age, the last parameter specifies that whether you
wan string vectors to be treated as factors or not, by default this
parameter is True.
Elements of a data frame can be accessed in several ways, depending
on whether you want to select columns, rows, or specific cells.

Code Window 22
PAGE 85
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 85 10-Jan-25 3:52:01 PM


BUSINESS ANALYTICS

Notes Subsets can be extracted from data frames based on row and column
selection or using logical conditions or by using the subset() function
as shown in code window 23.

Code Window 23

Data frame can handle missing values as well, NA (Not Available)


is used to represent missing or undefined data in R.

Code Window 24

86 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 86 10-Jan-25 3:52:01 PM


DATA STRUCTURES IN R

We can used rbind() or cbind() to combine two data frames row Notes
wise or column wise provided they have same number of columns
in case of rbind() and vice versa. We can also use merge function
to combine two or more data frames by matching rows based on
common columns.

Code Window 25

PAGE 87
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 87 10-Jan-25 3:52:01 PM


BUSINESS ANALYTICS

Notes IN-TEXT QUESTIONS


1. What is the primary difference between a matrix and an array
in R?
2. Write an R code snippet to create a 3x3 matrix.
3. How do lists differ from vectors in R?
4. What makes a data frame unique compared to a matrix?

4.8 Conditionals and Control Flows


Decision making refers to the process of choosing amongst several alter-
native actions or courses of action based on certain conditions or criteria.
It allows programs to make choices based on logical evaluations, which
is typically implemented with control structures like “if”, “else”, and
“switch”. These structures allow the program to execute different blocks
of code based on whether certain conditions are true or false, thus con-
trolling the flow of execution. Decision making is essential in creating
dynamic and responsive applications that can adapt to changing inputs
or situations, so the program behaves correctly under all circumstances.
The flow chart of decision making can be depicted by flow chart below:

Figure 4.1
There are three decision making constructs in R programming: if, if…
else, switch.
88 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 88 10-Jan-25 3:52:02 PM


DATA STRUCTURES IN R

The if statement in R is the simplest form of decision making. Notes


It compares a condition, and then if that condition is TRUE then
the code block inside if is executed; otherwise, the code block is
skipped for a FALSE condition. The syntax of it is shown below:

As shown in code window 26 only that block is executed where


the Boolean condition is true.

Code Window 26

If you want to give a code that should execute some instructions if


conditions is true and other if conditions is false then if..else can
be used. For example, I will go out if it’s raining else I won’t. The
structure of if..else is shown below:

Code window 27 shows if..else code in R.

PAGE 89
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 89 10-Jan-25 3:52:02 PM


BUSINESS ANALYTICS

Notes

Code Window 27

if we need to execute multiple conditions if..else if…else ladder


can be used the syntax is given below:

For example, if we need to print grades based on marks the code


is given in code window 28.

Code Window 28

90 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 90 10-Jan-25 3:52:03 PM


DATA STRUCTURES IN R

In addition to if..else if..else ladder, a switch statement can also be Notes


used. It lets you check if a variable matches any value from a list.
Each possible value is called a “case,” and the variable is compared
against these cases to find a match. Switch statements can be very
straightforward and efficient for handling multiple conditions. Some
rules of switch case are:
If the variable being tested isn’t a character string, it’s automatically
converted to an integer.
You can have as many case statements as you want. Each case
is followed by the value to compare and a colon.
If the variable is an integer between 1 and “nargs() - 1” (the
maximum number of arguments), the matching case’s value is
evaluated, and its result is returned.
If the variable is a character string, it does an exact match among
the case names.
If there are several matches, only the first one is used.
There is no default case.
If there’s no match but there is an unnamed case (“.”), then its
value is returned. If there are more than one unnamed cases,
then an error is raised.
Syntax and example of switch.

Code Window 29
PAGE 91
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 91 10-Jan-25 3:52:03 PM


BUSINESS ANALYTICS

Notes
4.9 Loops
Like any other programming language, we have loops in R too. They
are basic constructs allowing a block of code to be executed repeatedly.
R implements several kinds of loops: for, while, and repeat. Each loop
type is suited for different tasks, depending on the kind of control flow
needed. We will discuss the code and syntax of each of these loops in
this section.
For Loop: It is used to iterate over a sequence of elements (that
are iterate able), such as a vector, list, or sequence using a loop
control variable. The code of for loop is given in code window 30

Code Window 30

The above code iterates over a vector and prints all elements of vectors
one by one. We can write code to iterate over other data structures in
the same manner.
Like for loop, while loop also repeatedly executes a block of code
as long as the condition remains TRUE. But here the loop control
variable needs to be initialized outside the loop. While code to print
sum of 5 numbers is shown below in code window 31, iteration
variable is increment inside the loop.

92 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 92 10-Jan-25 3:52:03 PM


DATA STRUCTURES IN R

Notes

Code Window 31

The third type of iterative statement i.e. repeat loops indefinitely


until explicitly stopped using a break statement.

Code Window 32

PAGE 93
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 93 10-Jan-25 3:52:04 PM


BUSINESS ANALYTICS

Notes We can also have nested loops for complex operations where iterations
are needed at various levels. For example, if you want to print
columns for each row, nested code is shown in code window 33.

Code Window 33

Here, the outer loop takes each value of i, and for every single value of
i, the inner loop takes each value of j. This structure makes sure that for
each pair of values taken by i and j, one calculation is performed—it is
the product of i and j. The result of this calculation is then printed. This
is typically applied when many tasks require the calculation of tables,
pairwise comparisons, or generally any combinatorial operation involving
several variables.
Next and break statements can be used to control loop, next helps
to skips the current iteration and moves to the next one while break
terminates the loop entirely as seen in repeat loop. Code is given
in code window 34.

94 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 94 10-Jan-25 3:52:04 PM


DATA STRUCTURES IN R

Notes

Code Window 34

Thus, in R, loops are very helpful when automating repetitive tasks; for
loop iterates over elements in a sequence, such as vectors or lists, exe-
cuting a block of code for each element. The “while” loop continues to
execute if a specified condition is “TRUE”. This makes a good choice
for tasks where one doesn’t know beforehand the number of iterations. A
loop will run endlessly until stopped using a break statement that should
be provided, ideal when the condition for its stop is more complex in
expression.
Although loops are very general, R’s vectorized operations and apply-fam-
ily functions are often much faster alternatives to handle large datasets
or for simple operations, so are generally preferred in most cases.
IN-TEXT QUESTION
5. Write an R code snippet using an if-else statement to check if
a number is even or odd.

PAGE 95
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 95 10-Jan-25 3:52:04 PM


BUSINESS ANALYTICS

Notes
4.10 Apply Family
The apply family in R includes functions like apply, lapply, sapply, vap-
ply, tapply, mapply, and rapply. It is very useful and powerful feature of
R. These functions provide alternatives to loops for applying functions
across various data structures like vectors, matrices, arrays, lists, factors,
and data frames. They are generally more concise and can improve code
readability and performance for vectorized operations, loops can be slower
than vectorized operation. In this section we will discuss these functions
one by one along with code.
The apply() is used to operate on margins of matrix and array.
It applies a given function along rows or columns of a matrix or
higher-dimensional array. The syntax is apply(X, MARGIN, FUN)
where X is matrix or array, margin refers dimensions and fun is
the function that we need to apply.

Code Window 35

In this code the apply function is used over 3 × 3 matrix to calculate


sum of rows and columns.
lapply() is used to apply a function to each element of the list and
it returns a list. The code for lapply() is given in code window 36.

96 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 96 10-Jan-25 3:52:05 PM


DATA STRUCTURES IN R

Notes

Code Window 36

The sapply() function works like lapply() but it attempts to simplify


the output into a vector or matrix when possible.

Code Window 37

PAGE 97
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 97 10-Jan-25 3:52:05 PM


BUSINESS ANALYTICS

Notes vapply() is also like lapply() and sapply() but it lets you to specify
the expected output type for better reliability.

Code Window 38

tapply() applies a function to subsets of a vector, defined by a


factor or a list of factors. It takes three input parameters data vector,
factors to group by, function to apply.

Code Window 39

mapply() can be used to apply a function to multiple arguments


(vectorized), code is shown.

Code Window 40

If you want to recursively apply a function to elements of a list


you can use rapply(), kit can also be used to handle nested list.

98 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 98 10-Jan-25 3:52:06 PM


DATA STRUCTURES IN R

Notes

Code Window 41

In this code we have given nested list to function rapply and x^2 is
applied to each element of list. Classes specify to apply function only
to specific classes and how return structure like “unlist” for vector or
“replace” for nested list.
Table 4.5 shows various functions of apply family.

Table 4.5 Apply Family Functions

4.11 Summary
In this chapter we have covered some of the basic building blocks in R
that serve as the foundation for manipulating data and controlling pro-
grams. Vectors are one-dimensional arrays that hold elements of a similar
type, whereas matrices extend this concept to two dimensions, and arrays

PAGE 99
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 99 10-Jan-25 3:52:06 PM


BUSINESS ANALYTICS

Notes generalize further to n-dimensions. Lists, however, are containers that can
hold elements of different types, making them very versatile. Factors are
utilized to represent categorical data in a statistically efficient manner.
Data frames, which are a hybrid structure that combines the features of
lists and matrices, are ideal for organizing tabular data. To add logic to
your programs, tools like “if”, “else”, and “switch” allow decision-mak-
ing capabilities. For repetitive operations, loops like “for”, “while”, and
“repeat” are necessary; however, the apply family of functions provides
more efficient alternatives, enabling concise and functional programming.
This chapter has laid a solid foundation for dealing with data, writing
efficient code, and solving complex programming problems in R.

4.12 Answers to In-Text Questions


1. A matrix is limited to two dimensions, while an array can have
more than two dimensions
2. mat <- matrix(1:9, nrow = 3, ncol = 3)
print(mat)
3. Lists can hold elements of different types, while vectors must have
elements of the same type
4. Data frames allow columns to have different types (e.g., numeric,
character), unlike matrices
5. num <- 4
if (num %% 2 == 0) {
print(“Even”)
} else {
print(“Odd”)
}

4.13 Self-Assessment Questions


1. What is the difference between a vector and a matrix in R? Provide
an example of each.

100 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 100 10-Jan-25 3:52:06 PM


DATA STRUCTURES IN R

2. Explain how a list differs from a vector and give a practical example Notes
of when you would use a list instead of a vector.
3. Describe the structure of a data frame and explain why it is particularly
useful for working with tabular data.
4. Write an R code snippet using an if-else statement to determine
whether a number is positive, negative, or zero.
5. What is the purpose of the apply family of functions, and how do
they improve code efficiency compared to traditional loops?

4.14 References
Wickham, H., & Grolemund, G. (2017). R for Data Science. O’Reilly
Media.
Matloff, N. (2011). The Art of R Programming. No Starch Press.
Crawley, M. J. (2012). The R Book. Wiley.

4.15 Suggested Readings


Garrett Grolemund’s Hands-On Programming with R for beginners
exploring R fundamentals.
Hadley Wickham’s Advanced R for a deeper dive into R’s programming
capabilities.

PAGE 101
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 101 10-Jan-25 3:52:07 PM

You might also like