[go: up one dir, main page]

0% found this document useful (0 votes)
8 views56 pages

Question Bank Answers

The document provides a comprehensive overview of R programming, including outputs of various R statements, explanations of basic data types (Numeric, Integer, Character, Logical, Complex), flow control statements (if-else and switch), and logical operators with examples. It details the outputs of specific R functions and operations, illustrating how to manipulate and evaluate data in R. Additionally, it covers the use of logical operators for conditional statements and comparisons.

Uploaded by

Mr. LION
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views56 pages

Question Bank Answers

The document provides a comprehensive overview of R programming, including outputs of various R statements, explanations of basic data types (Numeric, Integer, Character, Logical, Complex), flow control statements (if-else and switch), and logical operators with examples. It details the outputs of specific R functions and operations, illustrating how to manipulate and evaluate data in R. Additionally, it covers the use of logical operators for conditional statements and comparisons.

Uploaded by

Mr. LION
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Module -1

1. Find the output of the following:(5M)


i. sum(7:10)
ii. 7:12+12:17
iii. c(3,1,8,6,7)+c(9,2,5,7,1)
iv. mean(7:10)
v. median(7,8,9,10)
Ans:
i. sum(7:10)
This sums the numbers from 7 to 10, inclusive.
Output:
sum(7:10) = 7 + 8 + 9 + 10 = 34

ii. 7:12 + 12:17


This is an opera on involving two sequences of numbers.
First, the sequence 7:12 is 7, 8, 9, 10, 11, 12.
Then, the sequence 12:17 is 12, 13, 14, 15, 16, 17.
Adding them element-wise:
(7 + 12), (8 + 13), (9 + 14), (10 + 15), (11 + 16), (12 + 17)
Output:
19, 21, 23, 25, 27, 29

iii. c(3,1,8,6,7) + c(9,2,5,7,1)


This adds two vectors element-wise:
(3 + 9), (1 + 2), (8 + 5), (6 + 7), (7 + 1)
Output:
12, 3, 13, 13, 8

iv. mean(7:10)
The sequence 7:10 is 7, 8, 9, 10.
The mean is the sum of these numbers divided by the number of elements:
mean = (7 + 8 + 9 + 10) / 4 = 34 / 4 = 8.5

v. median(7,8,9,10)
The median is the middle value of a set of numbers.
For the sequence 7, 8, 9, 10, there are 4 numbers, so the median is the average of
the two middle values, which are 8 and 9.
median = (8 + 9) / 2 = 17 / 2 = 8.5

2. Determine the output of following R statement


(i) c(1,2,3,4,5)+c(6,7,8,9,10)
(ii) -1:4*-2:3
(iii) iden cal(2^3,2**3)
(iv) 5:9%/%2
(v) c(2,4-2,1+1)==0
Ans: Let's evaluate each R statement step by step:
(i) c(1,2,3,4,5) + c(6,7,8,9,10)
This adds two vectors element-wise:
(1 + 6), (2 + 7), (3 + 8), (4 + 9), (5 + 10)
Output:
7, 9, 11, 13, 15

(ii) -1:4 * -2:3


The sequence -1:4 generates -1, 0, 1, 2, 3, 4.
The sequence -2:3 generates -2, -1, 0, 1, 2, 3.
The two sequences are mul plied element-wise:
(-1 * -2), (0 * -1), (1 * 0), (2 * 1), (3 * 2), (4 * 3)
Output:
2, 0, 0, 2, 6, 12

(iii) iden cal(2^3, 2**3)


In R, both ^ and ** are used for exponen a on. 2^3 and 2**3 both compute
23=82^3 = 8.
The iden cal() func on checks if the values are the same.
Output:
TRUE

(iv) 5:9 %/% 2


The sequence 5:9 generates 5, 6, 7, 8, 9.
The %/% operator performs integer division (floor of the division result).
Each value is divided by 2:
5 %/% 2 = 2, 6 %/% 2 = 3, 7 %/% 2 = 3, 8 %/% 2 = 4, 9 %/% 2 = 4
Output:
2, 3, 3, 4, 4

(v) c(2, 4-2, 1+1) == 0


The vector c(2, 4-2, 1+1) evaluates to c(2, 2, 2).
The comparison c(2, 2, 2) == 0 checks if each element equals 0:
2 == 0, 2 == 0, 2 == 0
Output:
FALSE, FALSE, FALSE

3. Explain the basic data types of R with examples


Ans: R provides several basic data types that are used to store different kinds of data. Here's
an explana on of the primary data types with examples:

1. Numeric
 Descrip on: Represents numbers, which can be integers or real numbers (floa ng-
point numbers).
 Decimal values are called numeric in R.
 It is the default computa onal data type.
 If we assign a decimal value to a variable x as follows, x will be of numeric type.
 If we assign an integer to a variable k, it is s ll being saved as numeric value.
 The fact that if k is an integer can be confirmed with the is.integer() func on
Examples:
 num1 <- 42 # An integer
 num2 <- 3.14 # A floa ng-point number
 num3 <- -7.5 # A nega ve number

2. Integer
 Descrip on: Represents whole numbers explicitly specified as integers. Use the L
suffix to denote an integer.


 Examples:
 int1 <- 10L # Integer
 int2 <- -25L # Nega ve integer
 is.integer(int1) # TRUE

3. Character
 Descrip on: Represents text or string values. Strings are enclosed in quotes (single or
double).
 The Character object is used to represent string values in R.
 Objects can be converted into character values using the as.character() func on.
 A paste() func on can be used to concatenate two character values
 Examples:
 char1 <- "Hello" # A string
 char2 <- 'World' # Another string
 paste(char1, char2) # "Hello World"

4. Logical
 Descrip on: Represents Boolean values (TRUE or FALSE).
 When two variable are compared, the logical values are created.
 The logical operators are “&”(and), “|”(or) and “!”(nega on/not).
 Examples:
 logical1 <- TRUE # Logical value TRUE
 logical2 <- FALSE # Logical value FALSE
 logical3 <- 5 > 3 # TRUE

5. Complex
 Descrip on: Represents complex numbers with real and imaginary parts.
 If we find the square root of -1, it gives an error. But if it is converted into a complex
number and then square root is applied, it produces the necessary result as another
complex number.
 Examples:
 complex1 <- 2 + 3i # Complex number (2 is the real part, 3 is the imaginary part)
 complex2 <- 5 - 1i # Complex number
Example Code in R:
# Numeric
num <- 3.14
print(num)

# Integer
int <- 5L
print(int)

# Character
char <- "Data Science"
print(char)

# Logical
logical_val <- TRUE
print(logical_val)

# Complex
comp <- 4 + 5i
print(comp)

R automa cally assigns the appropriate data type when you create variables, but you can
check or convert data types using func ons like is.numeric(), is.character(), as.integer(), etc.

4. Explain the following flow control statements in R with suitable examples for each
i. If and else statement
ii.Switch Statement
ANS:
i. If and else statement
If statement :takes a logical value and executes the next statement only if the value is
TRUE
a <- 33
b <- 200

if (b > a) {
print("b is greater than a")
}
In this example we use two variables, a and b, which are used as a part of the if statement to
test whether b is greater than a. As a is 33, and b is 200, we know that 200 is greater than
33, and so we print to screen that "b is greater than a".

Else If: The else if keyword is R's way of saying "if the previous condi ons were not
true, then try this condi on"
a <- 33
b <- 33
if (b > a) {
print("b is greater than a")
} else if (a = = b) {
print ("a and b are equal")
}
In this example a is equal to b, so the first condi on is not true, but the else if condi on is
true, so we print to screen that "a and b are equal".
You can use as many else if statements as you want in R.

If Else: The else keyword catches anything which isn't caught by the preceding
condi ons:
a <- 200
b <- 33

if (b > a) {
print("b is greater than a")
} else if (a == b) {
print("a and b are equal")
} else {
print("a is greater than b")
}
In this example, a is greater than b, so the first condi on is not true, also the else if condi on
is not true, so we go to the else condi on and print to screen that "a is greater than b".

Ifelse() :func on takes three arguments of which the first is logical condi on, the
second is the value that is returned when the first vector is TRUE and third is the value that
is returned when the first vector is FALSE.

ii.Switch Statement
• If there are many else statements, it looks confusing and in such cases the
switch()func on is required.
• The first argument of the switch statement is an expression that can return a string
value or an integer.
• This is followed by several named arguments that provide the results when the name
matches the value of the first argument.
• Here also we can execute mul ple statements enclosed by curly braces.
• If there is no match the switch statement returns NULL. So, in this case, it is safe to
men on a default value if none matches.
5. Find the output for the following
i. seq(1:5)
ii.length(1:7)
iii.rep(1:3,4)
iv.c(“Week”,”WEEK”,”week”,”weak”)==”week”
v.5:9%%2
Ans:
Let’s evaluate each statement step by step:

(i) seq(1:5)
The 1:5 generates the sequence 1, 2, 3, 4, 5.
The seq() func on treats 1:5 as a single vector, so seq(1:5) is equivalent to seq(c(1, 2, 3, 4,
5)). In this case, it outputs the elements of the vector unchanged.
Output:
1, 2, 3, 4, 5

(ii) length(1:7)
The 1:7 generates the sequence 1, 2, 3, 4, 5, 6, 7.
The length() func on returns the number of elements in this sequence.
Output:
7

(iii) rep(1:3, 4)
The rep() func on repeats the elements of the vector 1:3 a specified number of mes.
1:3 generates 1, 2, 3, and rep(1:3, 4) repeats this en re sequence 4 mes.
Output:
1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3

(iv) c("Week", "WEEK", "week", "weak") == "week"


The == operator performs element-wise comparison of the vector with the string "week".
 "Week" == "week" → FALSE (case-sensi ve comparison)
 "WEEK" == "week" → FALSE
 "week" == "week" → TRUE
 "weak" == "week" → FALSE
Output:
FALSE, FALSE, TRUE, FALSE

(v) 5:9 %% 2
The 5:9 generates the sequence 5, 6, 7, 8, 9.
The %% operator computes the remainder when each element is divided by 2:
 5 %% 2 = 1
 6 %% 2 = 0
 7 %% 2 = 1
 8 %% 2 = 0
 9 %% 2 = 1
Output:
1, 0, 1, 0, 1
6.List the logical operators in R with suitable examples.
Ans:
Logical operators in R are used to perform logical (Boolean) opera ons. These operators
return logical values (TRUE or FALSE) and are o en used in condi onal statements and data
filtering. Here's a list of logical operators in R along with examples:

1. AND (& and &&)


 Descrip on: Returns TRUE if both condi ons are TRUE.
 &: Vectorized, works element-wise on vectors.
 &&: Only evaluates the first element of each vector.
Examples:
# & operator
c(TRUE, FALSE, TRUE) & c(TRUE, TRUE, FALSE) # Output: TRUE, FALSE, FALSE

# && operator
TRUE && FALSE # Output: FALSE
TRUE && TRUE # Output: TRUE

2. OR (| and ||)
 Descrip on: Returns TRUE if at least one condi on is TRUE.
 |: Vectorized, works element-wise on vectors.
 ||: Only evaluates the first element of each vector.
Examples:
# | operator
c(TRUE, FALSE, TRUE) | c(FALSE, TRUE, FALSE) # Output: TRUE, TRUE, TRUE

# || operator
FALSE || TRUE # Output: TRUE
FALSE || FALSE # Output: FALSE

3. NOT (!)
 Descrip on: Negates the logical value.
 Returns: TRUE becomes FALSE, and FALSE becomes TRUE.
Examples:
!TRUE # Output: FALSE
!FALSE # Output: TRUE
!c(TRUE, FALSE, TRUE) # Output: FALSE, TRUE, FALSE

4. Element-Wise Equality and Inequality


 Descrip on: Used to compare elements in vectors.
o ==: Checks for equality.
o !=: Checks for inequality.
Examples:
# Equality
c(1, 2, 3) == c(1, 4, 3) # Output: TRUE, FALSE, TRUE

# Inequality
c(1, 2, 3) != c(1, 4, 3) # Output: FALSE, TRUE, FALSE

5. Greater Than, Less Than, and Related Operators


 Descrip on: Comparison operators return logical values.
o >: Greater than
o <: Less than
o >=: Greater than or equal to
o <=: Less than or equal to
Examples:
# Greater than
c(5, 3, 8) > c(4, 3, 9) # Output: TRUE, FALSE, FALSE

# Less than or equal to


c(5, 3, 8) <= c(5, 4, 8) # Output: TRUE, TRUE, TRUE

6. xor()
 Descrip on: Exclusive OR; returns TRUE if one (and only one) of the two logical
values is TRUE.
Examples:
xor(TRUE, FALSE) # Output: TRUE
xor(TRUE, TRUE) # Output: FALSE
xor(FALSE, FALSE) # Output: FALSE

7. Logical Evalua on in Subse ng


 Logical operators are commonly used to filter data in vectors or data frames.
Example:
x <- c(10, 15, 20, 25)
x[x > 15 & x <= 25] # Output: 20, 25 (values between 15 and 25)

Summary of Logical Operators:


Operator Descrip on Example Output
& Element-wise AND c(TRUE, FALSE) & c(FALSE, TRUE) FALSE, FALSE
&& First-element AND TRUE && FALSE FALSE
` ` Element-wise OR `c(TRUE, FALSE)
` ` First-element OR
! NOT !TRUE FALSE
== Equality 5 == 5 TRUE
!= Inequality 5 != 3 TRUE
> Greater than 5>3 TRUE
< Less than 3<5 TRUE
>= Greater than or equal to 5 >= 5 TRUE
<= Less than or equal to 3 <= 5 TRUE
xor() Exclusive OR xor(TRUE, FALSE) TRUE
7.Develop a R program to find the factorial of given number using recursive func on calls.
Ans:
factorial <- func on(n)
{
If (n==0)
{
return(1)
}else
{
return(n*factorial(n-1))
}
}
result<-factorial(5)
print(result)

8.Explain repeat, while and for loop with R programing example (10)
Ans:
There are three kinds of loops in R namely
• Repeat
• While
• For
Repeat Loops
 The repeat is the easiest loop in R that executes the same code un l it is forced to
stop.
 This repeat is similar to the do while statement in other languages.
 A break statement can be given when it is required to break the looping.
 Also it is possible to skip the rest of the statements in a loop and executes the next
itera on and this is done by using the next statement.
Repeat with break statement
a<-1
repeat{
print(a)
a<-a+1
if(a==4){
break }}
Repeat with break and next statement
a<-0
repeat{
a<-a+1 if(a==4){
next
}
print(a) if(a==6){
break
}
}
While Loops
 The while loops are backward repeat loops.
 The repeat loop executes the code and then checks for the condi on, but in while
loops the condi on is first checked and then the code is executed.
 So, in this case it is possible that the code may not be executed even once when the
condi on fails at the entry itself during the first itera on.
 A break statement can be given when it is required to break the looping.
 Also it is possible to skip the rest of the statements in a loop and executes the next
itera on and this is done by using the next statement.
While with break statement
a<-0
while(a<6){
a<-a+1
if(a==4){
break
}
print(a)
}
For Loops
• The for loops are used when we know how many mes the code needs to be
repeated.
• The for loop accepts an itera ng variable and a vector.
• It repeats the loop giving the itera ng each element from the vector in turn.
• In this case also if there are mul ple statements to execute, we can use the curly
braces.
• The itera ng variable can be an integer, number, character or logical vectors and they
can be even lists.
for(i in 1:5)
{
j<-i*i
message("The square value of ",i," is ",j)
}

10. Develop R code to calculate the following financial metrics in order to assess the
financial statement of an organiza on being supplied with 2 vectors of data: Monthly
Revenue =[50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 155, 165] and Monthly Expenses
=[30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85] for the financial year
(i)Profit for each month.
(ii)Profit a er tax for each month (Tax Rate is 30%).
(iii)Profit margin for each month equals to profit a er tax divided by revenue.
(iv)Good Months – where the profit a er tax was greater than the mean for the year.
(v). Bad Months – where the profit a er tax was less than the mean for the year.
vi. The best month – where the profit a er tax was max for the year.
vii. The worst month – where the profit a er tax was min for the year(10)
ANS:
Program no 2
11.Develop R Code to Perform the following:
a) Assign different type of values to variables and display the type of variable. Assign
different types such as Double, Integer, Logical, Complex and Character and understand
the difference between each data type.
b) Demonstrate Arithme c and Logical Opera ons with simple examples.
C) Demonstrate genera on of sequences and crea on of vectors.
d) Demonstrate Crea on of Matrices
e) Demonstrate the Crea on of Matrices from Vectors using Binding Func on.
f) Demonstrate element extrac on from vectors, matrices and arrays
ANS:
Program no- 1

12. Explain Environment and Func ons in R


ANS:
1.5.1. Environments
• In R the variables that we create need to be stored in an environment.
• Environments are another type of variables.
• We can assign them, manipulate them and pass them as arguments to func ons.
• They are like lists that are used to store different types of variables.
• When a variable is assigned in the command prompt, it goes by default into the
global environment.
• When a func on is called, an environment is automa cally created to store the
func on-related variables.
• A new environment is created using the func on new.env().
>newenvironment <-new.env()
• We can assign variables into a newly created environment using the double square
brackets or the dollar operator as below.
• The assign func on can also be used to assign variables to an environment.
• We can use get func on to retrieve the values stored in environment

• The func ons ls() and ls.str() take an environment argument and lists its contents.
• We can test if a variable exists in an environment using the exists() func on.
• An environment can be converted into a list using the func on as.list() and a list can
be converted into an environment using the func on as.environment() or the
func on list2env().
• All environments are nested and so every environment has a parent environment.
The empty environment sits at the top of the hierarchy without any parent. The
exists() and the get() func on also looks for the variables in the parent environment.
To change this behaviour we need to pass the argument inherits = FALSE.

 The word frame is used interchangeably with the word environment. The func on to
refer to parent environment is denoted as parent.frame().
 The variables assigned from the command prompt are stored in the global
environment. The func ons and the variables from the R's base package are stored in
the base environment

 Func ons
 A func on and its environment together is called a closure.

 When we load a package, the func ons in that package are stored in the
environment on the search path where the package is installed.

 Func ons are also another data types and hence we can assign and manipulate and
pass them as arguments to other func ons.

 Typing the func on name in the command prompt lists the code associated with
the func on.

 Below is the code listed for the func ons readLines() and matrix().

 When we call a func on by passing values to it, the values are called as arguments.

 The lines of code of the func on can be seen between the curly braces as body of
the func on.
 In R, there is no explicit return statement to return values.

 The last value that is calculated in a func on is returned by default in R.

To create user defined func ons

It is required to just assign the func on as we do for other variables.

Example

In this cube is the name of the func on and x is the argument passed to this func on.
The content within the curly braces is the body of the func on. (Note: If it is a one line
code we can omit the curly braces). Once a func on is defined, it can be called like any
other func on in R by passing its arguments.

 The func ons formals(), args() and formalArgs() can fetch the arguments defined
for a func on. The body of the func on can be retrieved using the body() and
deparse() func ons
 Func ons can be passed as arguments to other func ons and they can be
returned from other func ons.
 For calling a func on, there is another func on called do.call() in which we can
pass the func on name and its arguments as arguments.
 The use of this func on can be seen below when using the rbind() func on to
concatenate two data frames.
13. Explain Ini a ng R with examples

ANS:

1.3 Ini a ng R

1.3.1 First Program

Open R GUI, find the command prompt and type the command below and hit enter to run
the command

>sum(1:5)

[1] 15

The result above shows that the command gives the result 15. That the command has taken
the input of integers from 1 to 5 and has performed the sum opera on on them.

In the above command sum() is a func on that takes the argument 1:5 which means a
vector that consists of sequence of integers from 1 to 5.

1.3.2 Help in R

• There are many ways to get help from R.

• If a func on name or dataset name known then we can type ? followed by the name.

• If name is not known then we need to type ?? Followed by a term that is related to
the search func on.

• Keywords, special characters and two separate terms of search need to be enclosed
in double or single quotes.

• The symbol # is used to comment a line in R program.

? name or help( ) or help.search()

1.3.3 Assigning Variables

• The result of the opera ons in R can be stored for reuse.

• The values can be assigned to the variables using the symbol “< -” or “=“ of which the
symbol “<-” is preferred.

• There is no concept of variable declara on in R.

• The variable type is assumed based on the value assigned

Examples

> x<-1:3
>x

[1] 1 2 3

> Y=4:6

>Y

[1] 4 5 6

> x+3*Y-2

[1] 11 15 19

• The variable names consists of le ers, numbers, dots and underscore, but a variable
name should only starts with an alphabet.

• The variable name should not be reserve words.

• To create a global variable (Variable available everywhere) we use the symbol “<<-”.

Example

> x<<-exp(exp(1))

>x

[1] 15.15426

• Assignment opera on can also be done using assign() func on.

• For global assignment the same func on assign() can be used, but ,by including an
extra a ribute globalenv().

• To see the value of the variable, simply type the name of variable in the command
prompt.

• The same thing can be done using a print() func on

• If assignment and prin ng of a value has to be done in one line we can do the same
in two ways.

• First method, by separa ng the two statements by a semicolon and

• The second method is by wrapping the assignment in parenthesis() as below.

1.3.4 Basic Mathema cal Opera ons

• The vectors and c() func on in R help us to avoid loops.

• The sta s cal func on in R can take the vectors as input and produce results.

• The sum() func on takes vector arguments and produces results.


• Similar to the “+” operator all other operators in R take vectors as inputs and can
produce the results.

• The subtrac on and mul plica on opera ons work as below.

The exponen a on operator is represented using the symbol “^” or the “**”.

This can be checked using the func on iden cal()

The division operator is of three types.

1. The ordinary division is represented using the “/” symbol.

2. The integer division operator is represented using the “%/%” symbol.

3. The modulo division operator is represented using the “%%” symbol

The other mathema cal func on are the trigonometry func ons like sin(), cos(), tan(),
asin(), acos(), atan() and the logarithmic and exponen al func ons like log(), exp(),
loglp(), expml().

All these mathema cal func ons can operate on vectors as well as individual elements.

Comparison or rela onal operators

The operator “= =“ is used for comparing two values.

For checking inequali es of values the operator “!=“ is used.

The other rela ons operators are the ”<”, “>”, “<=“, “>=“.

The rela onal operator also take the vectors as input

and operate on them.


MODULE-2
1. Develop a R program to create two 3 X 3 matrices A and B and perform the
following opera ons i) Transpose of the matrix ii) addi on iii) subtrac on iv)
mul plica on, v)Access the first row of matrix A

ANS:

# Create two 3x3 matrices

A <- matrix(c ( 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ) , nrow = 3 , ncol = 3)

B <- matrix(c ( 9 , 8 , 7 , 6 , 5 , 4 , 3 , 2 , 1 ) , nrow = 3 , ncol = 3)

p r in t ( ” Matrix A: ” )

p r in t (A)

p r in t ( ” Matrix B: ” )

p r in t (B)

1. Transpose of the matrix

transpose_A <- t(A)

transpose_B <-t(b)

p r in t ( ” Transpose o f Matrix A: ” )

p r in t ( transpose_A)

p r in t ( ” Transpose o f Matrix B: ” )

p r in t ( transpose_B)

2. Addi on of two matrix

C <-A+B

p r in t ( ” Addi on o f Matrix A and B: ” )

p r in t (C)

3. Subtrac on of two matrix

C <- A-B

p r in t ( ” Subtrac on o f Matrix A and B: ” )

p r in t (C)

4. Mul plica on of two matrix


C <- A %*% B

p r in t ( ” Matrix M u l t ip l i c a t i o n o f A and B: ” )

p r in t (C)

5) Access the first row of Matrix A

first_row_A <- A[1, ]

print("\nFirst row of Matrix A:\n")

print(first_row_A)

2. Describe the following with R programming example


(i)Crea on of list
(ii) Assigning the names to elements of the list,
(iii)Accessing the elements of the list using index and names,
iv)conversion of the vector to list,
v)combine two lists.(10)
ANS:
Here is a detailed descrip on of the requested opera ons with R programming examples:

(i) Crea on of a List


 A list in R can store elements of different types (e.g., numeric, character, vector,
matrix, etc.).
 Use the list() func on to create a list.
Example:
# Crea ng a list
my_list <- list(1:5, "Hello", matrix(1:4, nrow = 2))
print(my_list)

(ii) Assigning Names to Elements of the List


 You can name elements of a list using the names() func on or directly during
crea on.
Example:
# Assigning names to elements
names(my_list) <- c("Numbers", "Gree ng", "Matrix")
print(my_list)

# Directly assigning names during crea on


my_list_named <- list(Numbers = 1:5, Gree ng = "Hello", Matrix = matrix(1:4, nrow = 2))
print(my_list_named)

(iii) Accessing Elements of the List


 Use indexing with double square brackets [[ ]] or the $ operator to access elements.
 For named lists, access elements using their names.
Example:
# Access by index
print(my_list[[1]]) # Access the first element (Numbers)

# Access by name
print(my_list$Gree ng) # Access the "Gree ng" element
print(my_list[["Matrix"]]) # Access the "Matrix" element

(iv) Conversion of a Vector to a List


 Use the as.list() func on to convert a vector to a list.
Example:
# Conver ng a vector to a list
my_vector <- c(10, 20, 30, 40)
my_list_from_vector <- as.list(my_vector)
print(my_list_from_vector)

(v) Combining Two Lists


 Use the c() func on to combine two or more lists into a single list.
Example:
# Crea ng two lists
list1 <- list(A = 1:3, B = "R")
list2 <- list(C = TRUE, D = matrix(1:4, nrow = 2))

# Combining the lists


combined_list <- c(list1, list2)
print(combined_list)

3. Determine the output of following R statement


(i)paste(c("Pine", "Red"),"Apple"),
(ii)substring("The cat is on the wall",3,10),
(iii)strsplit("I like Banana, Orange and Pineapple" , " "),
(iv)gl(5,3,labels=c(‘one’,’two’,’three’,’four’,five))
ANS:
Here’s a detailed explana on of the outputs for the provided R statements:

(i) paste(c("Pine", "Red"), "Apple")


The paste() func on combines its arguments into character strings. When given a vector, it
pairs each element with the second argument.
Code:
paste(c("Pine", "Red"), "Apple")
Output:
[1] "Pine Apple" "Red Apple"
Explana on:
 The c("Pine", "Red") creates a character vector with two elements.
 Each element is concatenated with "Apple", separated by a space.

(ii) substring("The cat is on the wall", 3, 10)


The substring() func on extracts a por on of a string based on start and end posi ons.
Code:
substring("The cat is on the wall", 3, 10)
Output:
[1] "e cat i"
Explana on:
 The string "The cat is on the wall" is accessed from posi on 3 to 10.
 Characters at these posi ons are "e cat i".

(iii) strsplit("I like Banana, Orange and Pineapple", " ")


The strsplit() func on splits a string into substrings based on a specified delimiter.
Code:
strsplit("I like Banana, Orange and Pineapple", " ")
Output:
[[1]]
[1] "I" "like" "Banana," "Orange" "and" "Pineapple"
Explana on:
 The string is split at every space " ".
 The output is a list where the first (and only) element contains the split substrings.

(iv) gl(5, 3, labels = c('one', 'two', 'three', 'four', 'five'))


The gl() func on generates factors by specifying the number of levels, repe ons, and
labels.
Code:
gl(5, 3, labels = c('one', 'two', 'three', 'four', 'five'))
Output:
[1] one one one two two two three three three four four four five five five
Levels: one two three four five
Explana on:
 5 specifies 5 levels.
 3 specifies each level repeats 3 mes.
 labels assigns the names for the levels.

4. Describe cbind and rbind func ons with example (5M)


Ans:
The cbind() and rbind() func ons in R are used to combine data objects (vectors,
matrices, or data frames) by columns or rows, respec vely.

1. cbind() (Column Bind)


The cbind() func on combines vectors, matrices, or data frames column-wise, crea ng a
matrix or data frame.
Syntax:
cbind(..., deparse.level = 1)
Example:
# Combining vectors using cbind
vec1 <- c(1, 2, 3)
vec2 <- c(4, 5, 6)
result_cbind <- cbind(vec1, vec2)
print(result_cbind)
# Combining matrices using cbind
mat1 <- matrix(1:6, nrow = 2)
mat2 <- matrix(7:12, nrow = 2)
result_cbind_matrix <- cbind(mat1, mat2)
print(result_cbind_matrix)
Output:
vec1 vec2
[1,] 1 4
[2,] 2 5
[3,] 3 6

[,1] [,2] [,3] [,4]


[1,] 1 3 7 9
[2,] 2 4 8 10
[3,] 3 5 9 11

2. rbind() (Row Bind)


The rbind() func on combines vectors, matrices, or data frames row-wise, crea ng a
matrix or data frame.
Syntax:
rbind(..., deparse.level = 1)
Example:
# Combining vectors using rbind
vec1 <- c(1, 2, 3)
vec2 <- c(4, 5, 6)
result_rbind <- rbind(vec1, vec2)
print(result_rbind)

# Combining matrices using rbind


mat1 <- matrix(1:6, ncol = 2)
mat2 <- matrix(7:12, ncol = 2)
result_rbind_matrix <- rbind(mat1, mat2)
print(result_rbind_matrix)
Output:
[,1] [,2] [,3]
vec1 1 2 3
vec2 4 5 6

[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
[4,] 7 10
[5,] 8 11
[6,] 9 12
5. Explain the date and me func on with examples (5)
Ans:
R provides several func ons to handle and manipulate date and me values. These
func ons are part of the base package and work with objects of class Date and POSIXt.

1. Current Date and Time


 Sys.Date(): Returns the current date.
 Sys. me(): Returns the current date and me.
Example:
# Current date
current_date <- Sys.Date()
print(current_date)

# Current date and me


current_ me <- Sys. me()
print(current_ me)
Output:
[1] "2025-01-23"
[1] "2025-01-23 10:25:30 UTC"

2. Date Conversion
 as.Date(): Converts a character string to a Date object.
 as.POSIXct() or as.POSIXlt(): Converts a character string to a date- me object with
me zones.
Example:
# Convert character to Date
date_string <- "2025-01-23"
date_object <- as.Date(date_string)
print(date_object)

# Convert character to POSIXct


date me_string <- "2025-01-23 10:30:45"
date me_object <- as.POSIXct(date me_string)
print(date me_object)
Output:
[1] "2025-01-23"
[1] "2025-01-23 10:30:45 UTC"

3. Forma ng Dates and Times


 format(): Formats date- me objects to a desired structure.
o %Y: Year (e.g., 2025)
o %m: Month (e.g., 01)
o %d: Day (e.g., 23)
o %H: Hour (24-hour clock)
o %M: Minute
o %S: Second
Example:
# Format current date
forma ed_date <- format(Sys.Date(), "%d-%m-%Y")
print(forma ed_date)

# Format current me
forma ed_ me <- format(Sys. me(), "%H:%M:%S")
print(forma ed_ me)
Output:
[1] "23-01-2025"
[1] "10:25:30"

4. Date Arithme c
You can perform arithme c opera ons on Date and POSIXct objects.
Example:
# Add or subtract days
today <- Sys.Date()
next_week <- today + 7
last_week <- today - 7

cat("Today:", today, "\n")


cat("Next week:", next_week, "\n")
cat("Last week:", last_week, "\n")
Output:
Today: 2025-01-23
Next week: 2025-01-30
Last week: 2025-01-16

5. Extract Components of Date-Time


 weekdays(): Day of the week.
 months(): Month name.
 quarters(): Quarter of the year.
Example:
# Extract components
current_date <- Sys.Date()
print(weekdays(current_date)) # Day of the week
print(months(current_date)) # Month name
print(quarters(current_date)) # Quarter
Output:
[1] "Thursday"
[1] "January"
[1] "Q1"

6. Parsing Dates and Times


 strp me(): Parses a character string into a date- me object.
Example:
# Parse custom date- me
custom_date me <- strp me("23/01/2025 10:30", format = "%d/%m/%Y %H:%M")
print(custom_date me)
Output:
[1] "2025-01-23 10:30:00"

7. Time Difference
 di ime(): Calculates the difference between two dates or mes.
Example:
start_date <- as.Date("2025-01-01")
end_date <- as.Date("2025-01-23")

date_difference <- di ime(end_date, start_date, units = "days")


print(date_difference)
Output:
Time difference of 22 days

8. Rounding Dates and Times


 round(), floor(), and ceiling(): Round date- mes to the nearest unit (e.g., hours,
minutes).
Example:
rounded_ me <- round(Sys. me(), "hours")
print(rounded_ me)

6. Determine the output of following examples(5M)


i. gl()
ii.paste(c(“Pine”,”Red”),”Apple”,sep=”-”))
iii.cat(“Black\nBerry”,fill=TRUE)
iv.substr(“The cat is on the wall”,5,10)
v.date1<-strp me(“22:15:45 22|08|2015”,”%H:%H:%S %d|%m|%y”)
ans:
Here’s a detailed explana on of the provided R statements and their respec ve outputs:

(i) gl()
The gl() func on generates factors by specifying the number of levels, repe ons, and
labels.
Code:
# Example: gl(3, 2)
output <- gl(3, 2, labels = c("A", "B", "C"))
print(output)
Output:
[1] A A B B C C
Levels: A B C
Explana on:
 3 specifies the number of levels.
 2 specifies the number of repe ons for each level.
 labels assigns names to each level ("A", "B", "C").
(ii) paste(c("Pine", "Red"), "Apple", sep = "-")
The paste() func on concatenates strings, using the specified separator (sep).
Code:
paste(c("Pine", "Red"), "Apple", sep = "-")
Output:
[1] "Pine-Apple" "Red-Apple"
Explana on:
 The first argument is a vector c("Pine", "Red").
 Each element is combined with "Apple", separated by "-".

(iii) cat("Black\nBerry", fill = TRUE)


The cat() func on outputs a string with specified forma ng. The \n introduces a new
line, and fill = TRUE allows wrapping.
Code:
cat("Black\nBerry", fill = TRUE)
Output:
Black
Berry
Explana on:
 The \n splits "Black\nBerry" into two lines: "Black" and "Berry".
 fill = TRUE ensures each line is displayed separately.

(iv) substr("The cat is on the wall", 5, 10)


The substr() func on extracts a substring from the input string, based on start and end
posi ons.
Code:
substr("The cat is on the wall", 5, 10)
Output:
[1] "cat is"
Explana on:
 The characters from posi ons 5 to 10 ("cat is") are extracted from the input string.

(v) date1 <- strp me("22:15:45 22|08|2015", "%H:%H:%S %d|%m|%y")


The strp me() func on parses a character string into a date- me object, based on the
specified format.
Code:
date1 <- strp me("22:15:45 22|08|2015", "%H:%M:%S %d|%m|%Y")
print(date1)
Output:
[1] "2015-08-22 22:15:45"
Explana on:
 "22:15:45 22|08|2015" is parsed using the format "%H:%M:%S %d|%m|%Y".
 %H: Hours, %M: Minutes, %S: Seconds.
 %d: Day, %m: Month, %Y: Year (4 digits).
7. Develop R program to create a DataFrame with following details and do the
following opera ons:(10M)
i. Subset the dataframe and display the details of only those items whose price is
greater than or equal to 350.
ii. Subset the dataframe and display only the items where the category is "Office
Supplies" and "Desktop Supplies".
iii. Create another dataframe called "item-details" with three different fields itemCode,
ItemQtyonHand and ItemRecorderLvl and merge the two frame.
ANS:
Prorgram no 7

8. Determine the output of the following func ons applied on the given
dataframe.(5M)
x<-c(5,6,7,8)
y<-c(15,16,17,18)
z<-c(25,26,27,28)
G<-data.frame(x,y,z)
i) colSums(G[,1:2])
ii)colMeans(G[,1:3])
iii)rowSums(G[1:3,])
iv)rowMeans(G[2:4,]
v)colMeans(,G[1:2])
ANS:
Here’s the solu on to your query, step by step:

Given Dataframe
x <- c(5, 6, 7, 8)
y <- c(15, 16, 17, 18)
z <- c(25, 26, 27, 28)
G <- data.frame(x, y, z)
print(G)
Output:
x y z
1 5 15 25
2 6 16 26
3 7 17 27
4 8 18 28

(i) colSums(G[, 1:2])


Code:
colSums(G[, 1:2])
Explana on:
 G[, 1:2] selects the first two columns (x and y).
 colSums() computes the sum of each column.
Output:
x y
26 66
(ii) colMeans(G[, 1:3])
Code:
colMeans(G[, 1:3])
Explana on:
 G[, 1:3] selects all three columns (x, y, and z).
 colMeans() computes the mean of each column.
Output:
x y z
6.5 16.5 26.5

(iii) rowSums(G[1:3,])
Code:
rowSums(G[1:3,])
Explana on:
 G[1:3,] selects the first three rows of all columns.
 rowSums() computes the sum of each row.
Output:
[1] 45 48 51

(iv) rowMeans(G[2:4,])
Code:
rowMeans(G[2:4,])
Explana on:
 G[2:4,] selects rows 2 to 4 of all columns.
 rowMeans() computes the mean of each row.
Output:
[1] 16 17 18

(v) colMeans(G[1:2,])
Code:
colMeans(G[1:2,])
Explana on:
 G[1:2,] selects the first two rows of all columns.
 colMeans() computes the mean of each column.
Output:
x y z
5.5 15.5 25.5

9. Explain the following string func ons with suitable examples.


i) cat()
ii) sprint()
iii) strsplit()
iv) substr()
v) toupper()
ANS:
Here’s an explana on of the specified string func ons in R, along with examples:
(i) cat()
The cat() func on concatenates and outputs strings or variables to the console without
quotes. It handles special characters like \n for newlines and spaces.
Syntax:
cat(..., sep = " ", fill = FALSE, labels = NULL)
Example:
cat("Hello", "World!\n", "How", "are", "you?", sep = "-")
Output:
Hello-World!
How-are-you?
Explana on:
 Strings are concatenated with the sep separator ("-" in this case).
 The newline character (\n) starts a new line in the output.

(ii) sprin ()
The sprin () func on is used for forma ed string crea on, similar to C-style string
forma ng.
Syntax:
sprin (fmt, ...)
Example:
sprin ("The result of %d + %d is %d", 3, 4, 3 + 4)
Output:
[1] "The result of 3 + 4 is 7"
Explana on:
 %d is a placeholder for integers.
 Addi onal placeholders include %f for floats and %s for strings.

(iii) strsplit()
The strsplit() func on splits a string into substrings based on a delimiter.
Syntax:
strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
Example:
strsplit("I like Banana, Orange, and Pineapple", split = ", ")
Output:
[[1]]
[1] "I like Banana" "Orange" "and Pineapple"
Explana on:
 The string is split wherever the delimiter ", " is found.

(iv) substr()
The substr() func on extracts or replaces parts of a string based on start and end posi ons.
Syntax:
substr(x, start, stop)
Example:
substr("The quick brown fox", 5, 9)
Output:
[1] "quick"
Explana on:
 Extracts characters from posi on 5 to 9.

(v) toupper()
The toupper() func on converts all characters in a string to uppercase.
Syntax:
toupper(x)
Example:
toupper("hello world")
Output:
[1] "HELLO WORLD"
Explana on:
 Converts the en re string to uppercase le ers.

10. Describe the following with R programming example


(i)Crea on of an Array
(ii) Assigning the names to elements of the array,
(iii)Accessing the elements of the array using index and names,
ANS:
10. Explana on with R Programming Examples

(i) Crea on of an Array


An array in R is a mul -dimensional data structure that can store data of the same type
(numeric, character, etc.). Arrays are created using the array() func on.
Syntax:
array(data, dim = c(dim1, dim2, ...), dimnames = list(...))
Example:
# Create an array with 2 rows, 3 columns, and 2 layers
arr <- array(1:12, dim = c(2, 3, 2))
print(arr)
Output:
,,1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6

,,2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12

(ii) Assigning Names to Elements of the Array


You can assign names to rows, columns, and layers of an array using the dimnames
argument.
Example:
# Create an array with named dimensions
arr <- array(1:12, dim = c(2, 3, 2),
dimnames = list(
c("Row1", "Row2"), # Row names
c("Col1", "Col2", "Col3"), # Column names
c("Layer1", "Layer2") # Layer names
))
print(arr)
Output:
, , Layer1
Col1 Col2 Col3
Row1 1 3 5
Row2 2 4 6

, , Layer2
Col1 Col2 Col3
Row1 7 9 11
Row2 8 10 12

(iii) Accessing Elements of the Array Using Index and Names


You can access elements in an array using numeric indices or names (if names are assigned).
Example Using Indices:
# Access the element in Row1, Col2, Layer2
element <- arr[1, 2, 2]
print(element)
Output:
[1] 9
Example Using Names:
# Access the element in Row2, Col3, Layer1
element <- arr["Row2", "Col3", "Layer1"]
print(element)
Output:
[1] 6
MODULE -3
1. Describe the following data frame manipula on func on with examples (10)
(i)with(),
(ii)within(),
(iii)order()
ANS:
Data Frame Manipula on Func ons in R:
Here’s a brief theory and examples for each of the requested func ons:

(i) with()
Theory: The with() func on allows you to evaluate an expression within the context of a
data frame or list, making it easier to refer to columns directly without repeatedly typing the
data frame name. It is useful for simplifying code when performing opera ons on mul ple
columns in a data frame.
Syntax:
with(data, expression)
 data: A data frame or list.
 expression: The R expression to evaluate within the data frame or list context.
Example:
# Create a data frame
df <- data.frame(name = c("John", "Jane", "Jim"),
age = c(25, 30, 28),
salary = c(50000, 60000, 55000))

# Use with() to access columns directly


with(df, {
avg_salary <- mean(salary)
print(avg_salary)
})
Explana on:
 In the above example, we are using with() to directly access the salary column in the
data frame df to calculate the average salary without having to write df$salary.

(ii) within()
Theory: The within() func on allows modifica on of a data frame or list by evalua ng an
expression within its context. Unlike with(), which only evaluates the expression, within()
also allows you to modify or add new columns to the data frame.
Syntax:
within(data, expression)
 data: The data frame or list.
 expression: The R expression to evaluate, which may include changes or addi ons to
columns.
Example:
# Create a data frame
df <- data.frame(name = c("John", "Jane", "Jim"),
age = c(25, 30, 28),
salary = c(50000, 60000, 55000))
# Use within() to add a new column
df <- within(df, {
salary_increase <- salary * 1.1 # Increase salary by 10%
})
print(df)
Explana on:
 within() allows us to add the salary_increase column to df, which contains the
increased salaries by 10%. The modifica on is performed within the context of the
data frame.

(iii) order()
Theory: The order() func on returns the indices that would sort a vector or column of a data
frame in a specific order (ascending or descending). It is commonly used for sor ng data
frames based on one or more columns.
Syntax:
order(x, decreasing = FALSE, na.last = TRUE)
 x: The vector or column to sort.
 decreasing: Logical value indica ng if sor ng should be in descending order (TRUE) or
ascending (FALSE).
 na.last: Controls whether NA values should be sorted at the beginning or the end.
Example:
# Create a data frame
df <- data.frame(name = c("John", "Jane", "Jim"),
age = c(25, 30, 28),
salary = c(50000, 60000, 55000))

# Sort data frame by salary in descending order


df_sorted <- df[order(df$salary, decreasing = TRUE), ]
print(df_sorted)
Explana on:
 order(df$salary, decreasing = TRUE) sorts the data frame df by the salary column in
descending order. The resul ng data frame, df_sorted, will have rows ordered from
highest to lowest salary.

Summary of Func ons:


Func on Descrip on Example
Allows you to evaluate an expression within the
with(df, { avg_salary <-
with() context of a data frame without specifying the
mean(salary) })
data frame name repeatedly.
Allows modifica on of a data frame by evalua ng df <- within(df, {
within() an expression within its context, including adding salary_increase <- salary * 1.1
or changing columns. })
Sorts a vector or data frame based on one or more df_sorted <-
order() columns, returning indices that would reorder the df[order(df$salary, decreasing
data. = TRUE), ]
2. Design a data frame in R for storing about 10 employee details. Create a CSV file named
“input.csv” that defines all the required informa on about the employee such as id, name,
salary, start_date, dept. Import into R and do the following analysis.
i) Find the total number rows & columns
ii) Find the maximum salary
iii) Retrieve the details of the employee with maximum salary
iv) Retrieve all the employees working in the IT Department.
v) Retrieve the employees in the IT Department whose salary is greater than 20000
ANS:
Program 9

3. Develop R code to illustrate the concept of the following Grouping Func on


(i)apply(),
(ii)lapply(),
(iii)mapply(),
(iv)rapply(),
(v)tapply()
Ans:

(i) apply()
The apply() func on is used to apply a func on to the rows or columns of a matrix or array. It
takes three main arguments: the data (matrix/array), the margin (1 for rows, 2 for columns),
and the func on to apply.
Syntax:
apply(X, MARGIN, FUN, ...)
 X: The matrix or array.
 MARGIN: The margin to apply the func on to. 1 for rows, 2 for columns.
 FUN: The func on to apply.
Example:
# Create a matrix
mat <- matrix(1:12, nrow = 3, ncol = 4)
print("Original Matrix:")
print(mat)

# Apply the sum func on to each row


row_sum <- apply(mat, 1, sum)
print("Sum of each row:")
print(row_sum)

# Apply the mean func on to each column


col_mean <- apply(mat, 2, mean)
print("Mean of each column:")
print(col_mean)
Explana on:
 apply(mat, 1, sum) calculates the sum of each row.
 apply(mat, 2, mean) calculates the mean of each column.
(ii) lapply()
The lapply() func on applies a func on to each element of a list or vector and returns a list.
It is similar to sapply(), but always returns a list.
Syntax:
lapply(X, FUN, ...)
 X: The list or vector.
 FUN: The func on to apply.
Example:
# Create a list
lst <- list(a = 1:5, b = 6:10, c = 11:15)

# Apply the sum func on to each element of the list


result <- lapply(lst, sum)
print("Sum of each list element:")
print(result)
Explana on:
 lapply(lst, sum) applies the sum() func on to each element of the list lst, returning a
list of sums.

(iii) mapply()
The mapply() func on applies a func on to mul ple arguments or lists in parallel. It is a
mul variate version of sapply().
Syntax:
mapply(FUN, ..., MoreArgs = NULL)
 FUN: The func on to apply.
 ...: The arguments or lists to apply the func on to.
Example:
# Two lists
list1 <- c(1, 2, 3)
list2 <- c(4, 5, 6)

# Apply the addi on func on to corresponding elements of both lists


result <- mapply(sum, list1, list2)
print("Sum of corresponding elements:")
print(result)
Explana on:
 mapply(sum, list1, list2) applies the sum() func on to each pair of elements from
list1 and list2 and returns the results in a vector.

(iv) rapply()
The rapply() func on applies a func on recursively to each element of a list or vector,
including nested lists, and simplifies the result if possible.
Syntax:
rapply(X, FUN, classes = "ANY", ...)
 X: The list to apply the func on to.
 FUN: The func on to apply.
 classes: The class of the elements to apply the func on to (default is "ANY").
Example:
# Nested list
nested_list <- list(a = list(1:3), b = list(4:6), c = list(7:9))

# Apply the sum func on recursively


result <- rapply(nested_list, sum)
print("Sum of each nested element:")
print(result)
Explana on:
 rapply(nested_list, sum) applies the sum() func on to each element of the nested list
and simplifies the result.

(v) tapply()
The tapply() func on applies a func on to subsets of a vector, which are defined by a factor
or grouping variable. It is commonly used for grouped analysis.
Syntax:
tapply(X, INDEX, FUN, ...)
 X: The vector to apply the func on to.
 INDEX: A factor or list of factors that define the grouping.
 FUN: The func on to apply.
Example:
# Create a vector and a factor for grouping
values <- c(10, 20, 30, 40, 50)
groups <- factor(c("A", "A", "B", "B", "A"))

# Apply the mean func on to each group


result <- tapply(values, groups, mean)
print("Mean of each group:")
print(result)
Explana on:
 tapply(values, groups, mean) calculates the mean of values for each level of groups
(A and B).

Summary of Func ons:


Func on Descrip on Example
apply() Applies a func on to rows or columns apply(mat, 1, sum) to sum rows.
of a matrix.
lapply() Applies a func on to each element of a lapply(lst, sum) for sum of list
list/vector, returns a list. elements.
mapply() Applies a func on to mul ple mapply(sum, list1, list2) for element-
arguments or lists in parallel. wise addi on.
rapply() Applies a func on recursively to a list, rapply(nested_list, sum) for sum of
including nested lists. nested elements.
tapply() Applies a func on to subsets of a tapply(values, groups, mean) for
vector, defined by a factor. group means.
4. List the types of files .Describe the func ons used for impor ng and expor ng various
types of files with example programs
ANS:
Types of Files
1. Text and CSV files
2. Unstructured Files
3. XML and HTML Files
4. JASON and YAML Files
5. Excel Files
6. SAS,SPSS and MATLAB Files
7. Web Data

Summary of Import/Export Func ons:


File Type Import Func on Export Func on
Text read.table() write.table()
CSV read.csv() write.csv()
Excel read_excel() write_xlsx()
RData load() save()
RDS readRDS() saveRDS()
JSON fromJSON() toJSON()
XML xmlParse() saveXML()
SPSS read_sav() write_sav()
SQL dbReadTable() dbWriteTable()

5. Develop R code to demonstrate the concept of data reshaping using cbind() and rbind()
func on with relevant input and output.
ANS:
5. Data Reshaping using cbind() and rbind() in R
In R, cbind() (column bind) and rbind() (row bind) are func ons used to combine data. These
func ons are commonly used for reshaping data by adding columns or rows to an exis ng
data frame or matrix.
Here is a demonstra on of both func ons with relevant input and output.

1. cbind() (Column Binding)


The cbind() func on combines objects by columns. This means it adds the provided vectors
or matrices as new columns to the exis ng matrix or data frame.
Example:
# Create two matrices (both have 3 rows)
matrix1 <- matrix(1:6, nrow = 3, ncol = 2) # 3 rows, 2 columns
matrix2 <- matrix(7:12, nrow = 3, ncol = 2) # 3 rows, 2 columns

# Print original matrices


cat("Matrix 1:\n")
print(matrix1)
cat("\nMatrix 2:\n")
print(matrix2)
# Using cbind to combine matrices by columns
combined_matrix <- cbind(matrix1, matrix2)
cat("\nCombined Matrix (by Columns):\n")
print(combined_matrix)
Output:
Matrix 1:
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6

Matrix 2:
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12

Combined Matrix (by Columns):


[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Explana on:
 The two matrices matrix1 and matrix2 each have 3 rows and 2 columns. By using
cbind(), we bind them together column-wise, resul ng in a matrix with 3 rows and 4
columns.

2. rbind() (Row Binding)


The rbind() func on combines objects by rows. It appends the provided vectors or matrices
as new rows to the exis ng matrix or data frame.
Example:
# Create two matrices (both have 2 columns)
matrix1 <- matrix(1:4, nrow = 2, ncol = 2) # 2 rows, 2 columns
matrix2 <- matrix(5:8, nrow = 2, ncol = 2) # 2 rows, 2 columns

# Print original matrices


cat("Matrix 1:\n")
print(matrix1)
cat("\nMatrix 2:\n")
print(matrix2)

# Using rbind to combine matrices by rows


combined_matrix_row <- rbind(matrix1, matrix2)
cat("\nCombined Matrix (by Rows):\n")
print(combined_matrix_row)
Output:
Matrix 1:
[,1] [,2]
[1,] 1 3
[2,] 2 4

Matrix 2:
[,1] [,2]
[1,] 5 7
[2,] 6 8

Combined Matrix (by Rows):


[,1] [,2]
[1,] 1 3
[2,] 2 4
[3,] 5 7
[4,] 6 8
Explana on:
 The two matrices matrix1 and matrix2 each have 2 rows and 2 columns. By using
rbind(), we bind them together row-wise, resul ng in a matrix with 4 rows and 2
columns.

Combining Data Frames Using cbind() and rbind()


You can also use these func ons to combine data frames. Below is an example of combining
two data frames using both cbind() and rbind().
Example:
# Create two data frames
df1 <- data.frame(ID = 1:3, Name = c("Alice", "Bob", "Charlie"))
df2 <- data.frame(Age = c(25, 30, 35), City = c("New York", "Los Angeles", "Chicago"))

# Print original data frames


cat("Data Frame 1:\n")
print(df1)
cat("\nData Frame 2:\n")
print(df2)

# Using cbind to combine data frames by columns


combined_df_col <- cbind(df1, df2)
cat("\nCombined Data Frame (by Columns):\n")
print(combined_df_col)

# Using rbind to combine data frames by rows (ensure the column names match)
df3 <- data.frame(ID = 4, Name = "David", Age = 40, City = "Miami")
combined_df_row <- rbind(df1, df3)
cat("\nCombined Data Frame (by Rows):\n")
print(combined_df_row)
Output:
Data Frame 1:
ID Name
1 1 Alice
2 2 Bob
3 3 Charlie

Data Frame 2:
Age City
1 25 New York
2 30 Los Angeles
3 35 Chicago

Combined Data Frame (by Columns):


ID Name Age City
1 1 Alice 25 New York
2 2 Bob 30 Los Angeles
3 3 Charlie 35 Chicago

Combined Data Frame (by Rows):


ID Name Age City
1 1 Alice 25 New York
2 2 Bob 30 Los Angeles
3 3 Charlie 35 Chicago
4 4 David 40 Miami
Explana on:
 cbind() combines df1 and df2 by adding df2's columns to df1.
 rbind() combines df1 and df3 by adding a new row with the same column names as
df1.

6. Determine the output of the following string manipula ng func ons.


i) grep(“my”, This is my pen”)
ii) sub(“my”, “your”, “This is my pen”)
iii)str_detect(“This is my pen”, “my”)
iv)str_split(“I like mangoes, oranges and pineapples”,”,”)
v) str_count(“I like mangoes, oranges and pineapples”,”s”)
ans:
Let's break down each of the string manipula on func ons and determine the expected
output:
1. grep()
grep() is used to search for matches to a pa ern within a string and returns the indices of
the elements that contain the pa ern.
Syntax:
grep(pa ern, x)
 pa ern: The regular expression pa ern to search for.
 x: The character vector to search within.
Example:
grep("my", "This is my pen")
Explana on:
 The func on looks for occurrences of the string "my" within the sentence "This is my
pen".
 The output will return the index of the element containing "my". Since "my" appears
as part of the phrase "This is my pen", it is found in the second element.
Output:
[1] 1
The result 1 indicates that "my" is found in the first posi on of the string.

2. sub()
sub() is used to replace the first occurrence of a pa ern in a string with a replacement string.
Syntax:
sub(pa ern, replacement, x)
 pa ern: The pa ern to match.
 replacement: The string to replace the pa ern.
 x: The character vector to search within.
Example:
sub("my", "your", "This is my pen")
Explana on:
 The func on searches for the first occurrence of the substring "my" and replaces it
with "your".
 Here, the first occurrence of "my" in "This is my pen" will be replaced by "your",
resul ng in the string "This is your pen".
Output:
[1] "This is your pen"

3. str_detect() (from the stringr package)


str_detect() checks if a pa ern is present in a string and returns a logical value (TRUE or
FALSE).
Syntax:
str_detect(string, pa ern)
 string: The string to check.
 pa ern: The pa ern to search for.
Example:
library(stringr)
str_detect("This is my pen", "my")
Explana on:
 This func on checks if the substring "my" exists within the string "This is my pen".
 Since "my" appears in the sentence, it returns TRUE.
Output:
[1] TRUE

4. str_split() (from the stringr package)


str_split() splits a string into substrings based on a specified delimiter.
Syntax:
str_split(string, pa ern)
 string: The string to split.
 pa ern: The delimiter on which to split the string.
Example:
library(stringr)
str_split("I like mangoes, oranges and pineapples", ",")
Explana on:
 The func on splits the string "I like mangoes, oranges and pineapples" at each
occurrence of the comma ,.
 The result will be a list of substrings, where each substring represents part of the
string split by the comma.
Output:
[[1]]
[1] "I like mangoes" " oranges and pineapples"

5. str_count() (from the stringr package)


str_count() counts the number of occurrences of a pa ern in a string.
Syntax:
str_count(string, pa ern)
 string: The string to search within.
 pa ern: The pa ern to count.
Example:
library(stringr)
str_count("I like mangoes, oranges and pineapples", "s")
Explana on:
 This func on counts how many mes the le er "s" appears in the string "I like
mangoes, oranges and pineapples".
 In this case, "s" appears 4 mes.
Output:
[1] 4

MODULE-4
1. Write a note on the main graphical Packages in R.
Ans:

Main Graphical Packages


1. Basic graphics-it has some limita on
2. La ce graphics-The results of the graph can be saved
3. ggplot2 graphics- breaks down the graphs into many parts(“ gg” stands for “grammar
graphics”)
Main Graphical Packages in R
R provides several graphical packages to create a variety of plots and visualiza ons. The
most commonly used graphical packages in R include:

1. Base R Graphics
Base R comes with built-in func ons for crea ng a wide range of standard plots. It is
simple to use and doesn’t require any addi onal packages for basic visualiza ons.
Features:
 Basic Plo ng Func ons: Func ons like plot(), hist(), boxplot(), barplot(), pie(),
sca erplot(), and lines() can create basic plots.
 Customiza on: The plots can be customized with par(), tle(), and axis() func ons.
 Layering: Mul ple graphical elements like points, lines, and text can be layered on
the same plot using addi onal commands.
Example:
# Base R plot example
plot(1:10, type = "o", col = "blue", xlab = "X-Axis", ylab = "Y-Axis")
While Base R graphics are useful for quick and simple visualiza ons, it lacks the flexibility
and advanced features required for more complex and interac ve visualiza ons.

2. ggplot2 (Grammar of Graphics)


ggplot2 is one of the most popular and powerful visualiza on libraries in R. It is based on
the "grammar of graphics" concept, which provides a systema c way of describing and
building plots.
Features:
 Layered Approach: You can build plots step by step, adding layers such as points,
lines, and texts incrementally.
 Elegant Syntax: It uses an intui ve syntax, making it easy to create complex plots
with minimal code.
 Customizable: You can customize almost every aspect of the plot, including the
theme, axes, legends, colors, and more.
 Aesthe cs and Geometries: Visual elements are separated into aesthe cs (like color,
size, shape) and geometries (points, lines, bars).
 Wide Range of Plot Types: ggplot2 can create sca er plots, bar plots, line charts,
histograms, heatmaps, boxplots, and more.
Example:
# ggplot2 example
library(ggplot2)
data(mpg)
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
labs( tle = "Engine Displacement vs. Highway MPG")
ggplot2 is favored for its rich visualiza on capabili es and its ability to create publica on-
ready plots.

3. la ce
la ce is another powerful package for crea ng high-level data visualiza ons, designed
to create plots in a consistent and customizable way.
Features:
 Trellis Plots: la ce is well-known for its support for trellis (or small mul ple) plots,
which allow you to display mul ple similar plots in a grid layout.
 Paneling: The grid system in la ce allows for condi onal plo ng, where data is split
based on a factor variable.
 Reproducibility: The syntax is consistent across different plot types, making it easy to
apply across mul ple plot layouts.
Example:
# la ce example
library(la ce)
xyplot(hwy ~ displ | class, data = mpg, main = "Engine Displacement vs. Highway MPG by
Class")
la ce is preferred for mul -panel plots and working with condi oned plots, but it may
require more effort in customiza on compared to ggplot2.

4. plotly
plotly is a package used to create interac ve and dynamic plots. It is especially useful for
web applica ons and interac ve data explora on.
Features:
 Interac ve Plots: You can hover, zoom, and click to explore the data.
 Integra on with ggplot2: It can convert ggplot2 plots to interac ve plots with
minimal code.
 Web-Ready Visualiza ons: It integrates seamlessly with web technologies, making it
a great op on for dashboards and web applica ons.
Example:
# plotly example
library(plotly)
p <- ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
geom_point()
ggplotly(p)
plotly is excellent for crea ng interac ve, web-based visualiza ons and is commonly
used in data-driven web applica ons.

5. shiny
shiny is an R package for building interac ve web applica ons. Although it is primarily
used for interac ve applica ons, it also supports embedding various types of plots and
charts.
Features:
 Web Applica ons: Create complete web applica ons with interac vity, including
dynamic user interfaces and reac ve plots.
 Integra on with Plo ng Packages: It can display interac ve plots created with
ggplot2, plotly, and other libraries.
 Custom UI Elements: Includes custom input controls (sliders, bu ons, drop-downs)
to control the visualiza on.
Example:
library(shiny)

# Define UI
ui <- fluidPage(
sliderInput("slider", "Select a value", min = 1, max = 10, value = 5),
plotOutput("plot")
)

# Define server
server <- func on(input, output) {
output$plot <- renderPlot({
plot(1:input$slider, 1:input$slider)
})
}

# Run the applica on


shinyApp(ui = ui, server = server)
shiny is great for building interac ve applica ons that can respond to user inputs
dynamically, such as real- me plots and data visualiza ons.

6. highcharter
highcharter is an R wrapper for the Highcharts JavaScript library, allowing the crea on of
interac ve charts and visualiza ons.
Features:
 Interac ve Charts: highcharter creates interac ve visualiza ons that can be easily
integrated into web applica ons.
 Advanced Chart Types: Supports charts such as line charts, bar charts, pie charts,
and more advanced charts like heatmaps, radar charts, and tree maps.
Example:
library(highcharter)
highchart() %>%
hc_chart(type = "line") %>%
hc_add_series(data = c(1, 3, 2, 4, 5), name = "Example Line Chart")
highcharter is ideal for crea ng interac ve and aesthe cally pleasing charts, especially
when building dashboards and web apps.

2. Write the basic syntax for crea ng pie chart and explain each parameter listed in
the syntax. Also write a R program to create a pie chart for the given list of flowers
with count [Rose=25, Lotus=35, Lilly=10, Sunflower=5, Jasmine=15]. Draw the
created output chart. (10)
ANS:
Basic Syntax for Crea ng a Pie Chart in R
The basic syntax for crea ng a pie chart in R is:
pie(x, labels = NULL, edges = 200, radius = 0.8, col = NULL,
main = NULL, border = NULL, clockwise = FALSE, density = NULL,
angle = 45, init.angle = 90, ...)
Explana on of Parameters:
 x: A vector of numeric values represen ng the data to be plo ed in the pie chart.
These values are typically the counts or propor ons of categories.
 labels: A vector of labels corresponding to the segments of the pie chart. If not
provided, R will label the segments as "1", "2", "3", etc.
 edges: The number of edges used to create the pie chart. A higher value gives a
smoother pie chart.
 radius: The radius of the pie chart. The default value is 0.8. Increasing the value
makes the pie chart larger.
 col: A vector of colors to fill the segments of the pie chart. If not specified, R will use
default colors.
 main: The main tle of the pie chart.
 border: The color of the border surrounding the segments. If NULL, no border is
drawn.
 clockwise: A logical value (TRUE or FALSE). If TRUE, the chart will be drawn clockwise;
if FALSE, it will be counterclockwise.
 density: The density of shading lines for the pie chart segments. If NULL, no shading
is used.
 angle: The angle of the first slice in the pie chart. The default value is 45 degrees.
 init.angle: The angle where the first slice is drawn (default is 90 degrees).
 ...: Addi onal arguments for customiza on, such as cex for text size or font for font
type.

R Program to Create a Pie Chart


We will create a pie chart for the list of flowers with their respec ve counts:
 Flowers with Count:
o Rose = 25
o Lotus = 35
o Lilly = 10
o Sunflower = 5
o Jasmine = 15
R Code:
# Data for flowers and their counts
flowers <- c(25, 35, 10, 5, 15)
flower_names <- c("Rose", "Lotus", "Lilly", "Sunflower", "Jasmine")

# Create the pie chart


pie(flowers, labels = flower_names,
col = rainbow(length(flowers)), # Color pale e for the segments
main = "Flower Distribu on", # Title of the pie chart
radius = 0.8, # Radius of the pie chart
clockwise = TRUE) # Draw the chart clockwise
Explana on of the Code:
 flowers: A vector containing the count of flowers.
 flower_names: A vector containing the names of the flowers.
 col = rainbow(length(flowers)): The rainbow() func on is used to generate a vector
of colors. The length of the color vector is equal to the number of flower categories.
 main = "Flower Distribu on": The main tle of the pie chart is set to "Flower
Distribu on".
 radius = 0.8: The radius is set to 0.8, making the pie chart moderately sized.
 clockwise = TRUE: The pie chart will be drawn in a clockwise direc on.

3. For the following data, plot a line plot.


male <-c(1000,2000,1500,4000,800)
female<-c(700,300,600,1200,800)
child<-c(1000,1200,1500,800,2000)
wages<-c(“Male”,”Female”,”Children”)
ANS:
# Data
male <- c(1000, 2000, 1500, 4000, 800)
female <- c(700, 300, 600, 1200, 800)
child <- c(1000, 1200, 1500, 800, 2000)
wages <- c("Male", "Female", "Children")

# Plo ng
plot(male, type = "o", col = "blue", xlab = "Data Points", ylab = "Wages",
main = "Wages by Category", ylim = range(0, max(male, female, child)))
lines(female, type = "o", col = "red")
lines(child, type = "o", col = "green")

# Adding a legend
legend("topright", legend = wages, col = c("blue", "red", "green"), lty = 1, pch = 1)

4. Write the syntax for plo ng the histogram and plot the histogram for the following
data. x<-c(45,33,31,23,58,47,39,58,28,55,42,27)
Ans:
Syntax for Plo ng a Histogram in R
The basic syntax to create a histogram in R is:
hist(x,
breaks = NULL, # Number of bins or breakpoints (op onal)
col = NULL, # Color of the bars (op onal)
main = NULL, # Main tle (op onal)
xlab = NULL, # Label for the x-axis (op onal)
ylab = NULL, # Label for the y-axis (op onal)
border = NULL, # Color of the borders (op onal)
xlim = NULL, # Limits for the x-axis (op onal)
ylim = NULL, # Limits for the y-axis (op onal)
...)
 x: A numeric vector of data values.
 breaks: Defines the number of bins or specific breakpoints. If NULL, R will
automa cally calculate the number of bins.
 col: Color of the bars in the histogram.
 main: The tle of the histogram.
 xlab: Label for the x-axis.
 ylab: Label for the y-axis.
 border: Color for the borders of the bars.
 xlim and ylim: Control the limits of the x-axis and y-axis.

R Code to Plot Histogram


We have the following data:
x <- c(45, 33, 31, 23, 58, 47, 39, 58, 28, 55, 42, 27)
Now, let's plot the histogram for this data.
R Code:
# Data
x <- c(45, 33, 31, 23, 58, 47, 39, 58, 28, 55, 42, 27)
# Plo ng the histogram
hist(x,
col = "skyblue", # Color of the bars
main = "Histogram of Data", # Title of the histogram
xlab = "Values", # X-axis label
ylab = "Frequency", # Y-axis label
border = "black", # Border color for the bars
breaks = 6, # Number of bins
xlim = c(20, 60), # Limits for the x-axis
ylim = c(0, 4)) # Limits for the y-axis

a. Describe the line plot and histogram with examples


Ans:
LINE Plots:

A line chart/line plot is a graph that connects a series of points by drawing line
segments between them.

The plot() func on in R is used to create the line graph in base graphics as in fig.

This func on takes a vector of numbers as input together with few more parameter
listed below.

Syntax: plot(y, type, col, xlab, ylab)

y Numeric vector

type  takes the value ”p”(only points) or “l”(only lines) or ”o”(both lines and points)

xlab  label of x-axis

ylab  label of y-axis

main  tle of the chart

col  colour pale e


Example

> male<-c(1000,2000,1500,4000,800)

> female<-c(700,300,600,1200,800)

> child<-c(1000,1200,1500,800,2000)

> wages<-c("Male","Female","Children")

> color=c("red","blue","green")

> plot(male,type="o",col="red",xlab="Month",ylab="Wages",main="Monthly
wages",ylim=c(0,5000))

> lines(female,type="o",col="blue")

> lines(child,type="o",col="green")

> legend("tople ",wages,cex=0.8,fill=color)

Histograms

Histogram represents the variable values frequencies, that are split into ranges. This
is similar to bar charts but histograms group values into con nuous ranges. In R histograms
in base graphics are drawn using the func on hist() as in figure, that takes a vector of
numbers as input together with few more parameters listed below.

Syntax: hist(v, main, xlim, ylim, breaks, col, border)

V  numeric vector

main  tle of the chart

col  colour pale e

xlab  label of x-axis

xlim  range of x-axis

ylim  range of y-axis

border  border colour

breaks  width of each bar

Example

x<-c(45,33,31,23,58,47,39,58,28,55,42,27)

hist(x,xlab=”Age”,col=”blue”,border=”red”,xlim=c(25,60),ylim=c(0,3),breaks=5)
6. Let us use the built-in dataset air quality which has Daily air quality measurements in
New York, May to September 1973. Develop R program to generate histogram by using
appropriate arguments for the following statements.
a) Assigning names, using the air quality data set.
b) Change colors of the Histogram
c) Remove Axis and Add labels to Histogram
d) Change Axis limits of a Histogram
e) Add Density curve to the histogram.
ANS:
To work with the airquality dataset in R and create a histogram with various customiza ons,
let's break down each part of the task:
a) Assigning names using the airquality dataset
The airquality dataset contains daily air quality measurements in New York from May to
September in 1973. It has several variables like Ozone, Solar.R, Wind, Temp, and Month.
We will use the Ozone variable for the histogram in this example, and we will assign the
column names to the dataset.
b) Change colors of the Histogram
We can specify the color of the bars in the histogram using the col argument.
c) Remove Axis and Add labels to Histogram
We can remove axis labels using the xaxt and yaxt arguments, and then manually add labels
using the mtext() func on.
d) Change Axis limits of a Histogram
To change the axis limits, use the xlim and ylim arguments.
e) Add a Density curve to the Histogram
To add a density curve over the histogram, we can use the lines() func on to add a density
curve.

R Code:
# Load the airquality dataset
data(airquality)

# a) Assigning names to the dataset (already assigned, but we can reassign or modify)
names(airquality) <- c("Ozone", "Solar.R", "Wind", "Temp", "Month", "Day")
head(airquality)

# b) Change colors of the Histogram


hist(airquality$Ozone,
col = "lightblue", # Color of the bars
main = "Histogram of Ozone Levels", # Title of the histogram
xlab = "Ozone Concentra on", # X-axis label
ylab = "Frequency", # Y-axis label
border = "black") # Border color of bars

# c) Remove Axis and Add labels to Histogram


hist(airquality$Ozone,
col = "lightblue",
main = "Histogram of Ozone Levels",
xlab = "Ozone Concentra on",
ylab = "Frequency",
border = "black",
xaxt = "n", # Remove x-axis
yaxt = "n") # Remove y-axis

# Add custom axis labels


axis(1, at = seq(0, 160, by = 20)) # Custom x-axis labels
axis(2, at = seq(0, 10, by = 2)) # Custom y-axis labels

# Add label to the le of the plot


mtext("Frequency", side = 2, line = 2, cex = 1.2)

# d) Change Axis limits of the Histogram


hist(airquality$Ozone,
col = "lightgreen",
main = "Histogram of Ozone Levels",
xlab = "Ozone Concentra on",
ylab = "Frequency",
border = "black",
xlim = c(0, 160), # Set the x-axis limits
ylim = c(0, 10)) # Set the y-axis limits

# e) Add Density curve to the histogram


hist(airquality$Ozone,
col = "lightblue",
main = "Histogram of Ozone Levels with Density Curve",
xlab = "Ozone Concentra on",
ylab = "Frequency",
border = "black",
probability = TRUE) # Normalize the histogram to plot density
# Add density curve
lines(density(airquality$Ozone, na.rm = TRUE), col = "red", lwd = 2) # Density curve in red

7. Explain the exploratory data analysis


Ans:
Exploratory Data Analysis
• Exploratory Data Analysis(EDA) is a visual based method used to analyse data sets
and summarize their main characteris cs.
• EDA shows hoe to use visualiza on and transforma on to explore data in a
systema c way.
EDA is an itera ve cycle of the following steps
1. Generate ques ons about data
2. Search for answers by visualizing, transforming and modelling data.
3. Use what is learnt to refine ques ons and/or generate new ques ons.
Exploratory Data Analysis(EDA) is an approach for data analysis that employs a variety of
techniques to
1. Maximize insight into a data set
2. Uncover underlying structure
3. Extract important variables
4. Detect outliers and anomalies
5. Test underlying assump ons
6. Develop parsimonious models
7. Determine op mal factor se ngs.

8. Explain different parts of the box plot.


Ans:
A box plot (also known as a box-and-whisker plot) is a graphical representa on of a dataset
that shows the distribu on, central tendency, and variability of the data. It is useful for
iden fying outliers and comparing mul ple datasets.
Different Parts of a Box Plot
1. Box (Interquar le Range - IQR):
o The box represents the middle 50% of the data, specifically the interquar le
range (IQR). The IQR is the range between the first quar le (Q1) and the
third quar le (Q3).
o Q1 (First Quar le): The value that divides the lowest 25% of the data.
o Q3 (Third Quar le): The value that divides the lowest 75% of the data.
o The box spans from Q1 to Q3, with a line in the middle represen ng the
median.
2. Median (Q2 - Second Quar le):
o The median (also called the second quar le (Q2)) is the middle value of the
data when it is sorted in ascending order. It divides the dataset into two equal
halves.
o In the box plot, the median is represented by a line inside the box.
3. Whiskers:
o The whiskers are the lines extending from the box to the smallest and largest
values within a certain range, typically 1.5 mes the IQR.
o The lower whisker extends from the first quar le (Q1) to the smallest value
within 1.5 mes the IQR below Q1.
o The upper whisker extends from the third quar le (Q3) to the largest value
within 1.5 mes the IQR above Q3.
4. Outliers:
o Outliers are values that lie beyond the whiskers. They are typically defined as
data points that are more than 1.5 mes the IQR above Q3 or below Q1.
o Outliers are usually represented by individual dots or symbols outside the
whiskers.
5. Notch (op onal):
o In some box plots, there is a notch around the median. This notch represents
the confidence interval around the median. If two box plots do not overlap in
their notches, it suggests that their medians are significantly different.
6. Box Plot Summary:
o The box plot provides a visual summary of the data distribu on. It helps us
iden fy:
 The central tendency (median)
 The spread of the data (IQR)
 The range of the data (whiskers)
 Poten al outliers

Example of a Box Plot Structure


Consider a dataset with values:
data <- c(1, 2, 5, 8, 8, 9, 10, 15, 18, 20, 22, 25, 30)
Here’s what each part of the box plot represents:
 Box: The box spans from Q1 (which could be 5) to Q3 (which could be 22).
 Median (Q2): The median is represented by a line in the middle of the box. In this
case, the median could be 10.
 Whiskers: The whiskers extend from the box to the smallest and largest values within
1.5 * IQR of Q1 and Q3.
 Outliers: Any data points outside the whiskers, for example, 30, would be considered
an outlier.

Box Plot Example in R


# Example data
data <- c(1, 2, 5, 8, 8, 9, 10, 15, 18, 20, 22, 25, 30)

# Crea ng a box plot


boxplot(data,
main = "Box Plot Example",
ylab = "Values",
col = "lightblue",
border = "darkblue",
notch = TRUE) # Add notch for confidence interval
In the plot:
 The box spans from Q1 to Q3, with a line in the middle indica ng the median.
 The whiskers extend to the smallest and largest values within the acceptable range.
 The outlier values (if any) are plo ed as dots outside the whiskers.

9.Plot the bar plot for following data, both horizontal and ver cal. x<-
matrix(c(1000,900,1500,4400,800,2100,1700,2900,3800), nrow=3,ncol=3) years<-
c(“2011”,”2012”,”2013”) city<-c(“Chennai”,”Mumbai”,”Kolkata”)
Ans:
R Code for Ver cal and Horizontal Bar Plots:
# Create matrix data
x <- matrix(c(1000, 900, 1500, 4400, 800, 2100, 1700, 2900, 3800), nrow=3, ncol=3)

# Assign row names and column names to the matrix


years <- c("2011", "2012", "2013")
city <- c("Chennai", "Mumbai", "Kolkata")
rownames(x) <- years
colnames(x) <- city

# View the matrix


print(x)

# Ver cal Bar Plot


barplot(x,
beside = TRUE, # Group bars for each city
col = c("lightblue", "lightgreen", "lightcoral"), # Bar colors
main = "Bar Plot (Ver cal)", # Title
xlab = "Ci es", # X-axis label
ylab = "Values", # Y-axis label
legend.text = years, # Add a legend for years
args.legend = list(x = "topright")) # Posi on legend

# Horizontal Bar Plot


barplot(x,
beside = TRUE, # Group bars for each city
col = c("lightblue", "lightgreen", "lightcoral"), # Bar colors
main = "Bar Plot (Horizontal)", # Title
xlab = "Values", # X-axis label
ylab = "Ci es", # Y-axis label
legend.text = years, # Add a legend for years
args.legend = list(x = "topright"), # Posi on legend
horiz = TRUE) # Horizontal bars

11. Describe the following with examples boxplot(), bwplot(), ggplot()


Ans:
In R, the func ons boxplot(), bwplot(), and ggplot() are used for visualizing data in various
ways. They help you understand the distribu on and rela onships within your dataset.
Here's an explana on of each func on, along with examples:
1. boxplot()
A boxplot is a graphical display of the distribu on of a dataset that highlights the median,
quar les, and poten al outliers. It shows the minimum, lower quar le (Q1), median, upper
quar le (Q3), and maximum of the dataset.
Basic Syntax:
boxplot(data,
main = "Boxplot", # Title of the plot
xlab = "X-axis Label", # Label for x-axis
ylab = "Y-axis Label", # Label for y-axis
col = "lightblue", # Color for the boxes
border = "darkblue") # Border color of boxes
Example:
# Crea ng a vector of data
data <- c(1, 2, 5, 8, 8, 9, 10, 15, 18, 20, 22, 25, 30)

# Plo ng a basic boxplot


boxplot(data,
main = "Boxplot Example",
ylab = "Values",
col = "lightgreen",
border = "darkgreen")
In this example:
 The boxplot will show the distribu on of the data vector.
 The box will span from the first quar le (Q1) to the third quar le (Q3), and the line
inside the box will represent the median.
 Whiskers will extend from the box to the minimum and maximum values within the
acceptable range (1.5 * IQR from Q1 and Q3).
 Outliers (if any) will be plo ed as individual points.

2. bwplot() (from the la ce package)


The bwplot() func on is part of the la ce package and is used to create box-and-whisker
plots. This func on is similar to boxplot() but provides addi onal customiza on and
func onality for grouping variables.
Basic Syntax:
bwplot(x ~ y, data,
main = "Boxplot",
xlab = "X-axis Label",
ylab = "Y-axis Label",
col = "lightblue")
Example:
# Load the la ce package
library(la ce)

# Create a data frame


data <- data.frame(
category = c("A", "B", "A", "B", "A", "B"),
values = c(1, 3, 5, 2, 7, 4)
)

# Plo ng the box-and-whisker plot using bwplot


bwplot(values ~ category,
data = data,
main = "Box-and-Whisker Plot Example",
ylab = "Values",
xlab = "Category",
col = "lightblue")
In this example:
 bwplot(values ~ category) creates a boxplot for the values grouped by category.
 This plot will show the distribu on of the values for each category (A and B) in the
data.

3. ggplot() (from the ggplot2 package)


ggplot() is a func on from the ggplot2 package used for crea ng a wide range of plots,
including sca er plots, line plots, histograms, and box plots. ggplot() follows a grammar of
graphics that allows for greater flexibility and customiza on in plo ng.
Basic Syntax:
ggplot(data, aes(x, y)) +
geom_boxplot() +
labs( tle = "Boxplot",
x = "X-axis Label",
y = "Y-axis Label") +
theme_minimal() # Op onal: Apply a minimal theme
Example:
# Load ggplot2 package
library(ggplot2)

# Create a data frame


data <- data.frame(
category = c("A", "B", "A", "B", "A", "B"),
values = c(1, 3, 5, 2, 7, 4)
)

# Plo ng a boxplot using ggplot


ggplot(data, aes(x = category, y = values)) +
geom_boxplot(fill = "lightblue", color = "black") +
labs( tle = "Boxplot using ggplot",
x = "Category",
y = "Values") +
theme_minimal()
In this example:
 aes(x = category, y = values) defines the aesthe cs for the plot, where x is the
category and y is the values.
 geom_boxplot() adds the boxplot layer to the plot.
 labs() is used to set the tle and axis labels.
 theme_minimal() applies a minimal theme to the plot (you can also use other
themes like theme_bw(), theme_light(), etc.).
ggplot() is very powerful because it allows you to build complex plots incrementally and
supports layering. It is widely used for crea ng highly customizable and aesthe cally
pleasing visualiza ons.

Comparison of boxplot(), bwplot(), and ggplot():


Feature boxplot() (Base R) bwplot() (la ce) ggplot() (ggplot2)
Good customiza on High flexibility and
Flexibility Limited customiza on
with grouping customiza on
Simple and Requires ~ formula Uses a layered approach with +
Syntax
straigh orward syntax operator
Limited, requires manual Has predefined Supports various themes and
Themes
adjustments themes fine control over elements
Can group by categories Grouping by formula
Grouping Supports grouping within aes()
using by syntax ~

MODULE-5

You might also like