[go: up one dir, main page]

0% found this document useful (0 votes)
19 views46 pages

M2 Dar

Uploaded by

2405samrudhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views46 pages

M2 Dar

Uploaded by

2405samrudhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Data structures in R

Matrices:
• Matrices are nothing but a bunch of vectors stacked together!
• While vectors have one dimension, matrices
have two dimensions, determined by rows and columns.
• Finally, like vectors and scalars matrices can contain only one
type of data: numeric, character, or logical.
Creating Matrices using matrix(),cbind(),rbind()
A matrix can also be created using the array()function where the dimension of the
array is two.
The argument byrow = TRUE in the matrix() function assigns the elements row
wise. If this argument is not specified, by default the elements are filled column
wise.

• The dim() function returns the dimensions of an array or a


matrix.
• The functions nrow() and ncol() returns the number of rows and
number of columns of a matrix respectively.
• The length()function also works for matrices and arrays. It is
also possible to assign new dimension for a matrix or an array
using the dim() function.
• The functions rownames(), colnames() and dimnames() can be
used to fetch the row names, column names and dimension
names of matrices and arrays respectively.
• To extract the element at the nth row and mth column
using the expression M[n, m]. The entire nth row can be
extracted using M[n,] and similarly, the mth column can
be extracted using M[,m].
• It is possible to extract more than one column or row.
• The columns of two matrices can be combined using the cbind() function
and similarly the rows of two matrices can be combined using the rbind()
function

#matrix deconstruction
• The arithmetic operators “+”, “- “, “* “, “/ “ work element wise on matrices and arrays. But
the condition is that the matrices or arrays should be of conformable sizes. The matrix
multiplication is done using the operator “%*%”
• Matrix multiplication has specific requirements:
1.Number of Columns in First Matrix: The number of columns in the first matrix (M1) must
be equal to the number of rows in the second matrix (M2).
2.Dimensions: If M1 is an m×n matrix and M2 is a p×q matrix , then matrix multiplication is
possible if and only if n=p. The resulting matrix will have dimensions m×q.
The power operator “^” also works element wise on matrices. To find the
inverse of a matrix the function solve() can be used
Arrays
In R, arrays are the data objects which allow us to store data in more than two dimensions. In
R, an array is created with the help of the array() function. This array() function takes a vector
as an input and to create an array it uses vectors values in the dim parameter.
Syntax:
1.array_name <- array(data, dim= (row_size, column_size, matrices, dim_names))
How to create?
In R, array creation is quite simple. We can
easily create an array using vector and array()
function. In array, data is stored in the form of
the matrix.
Naming rows and columns
In R, we can give the names to the rows, columns, and matrices of the array. This is done with
the help of the dim name parameter of the array() function.
It is not necessary to give the name to the rows and columns. It is only used to differentiate the
row and column for better understanding.
Lists in R
Lists are versatile data structures in R that can hold a mix of objects of different
types and sizes. You can create a list using the list() function. It is possible to
name the elements of the list while creation or later using the names() function.
Accessing list elements :Once you’ve created the list, you can access its elements
using either indices or the name. you can access elements of a list using either the
dollar sign ($) notation or double square brackets ([[ ]]).

dollar sign ($) :Used to access named elements in a list directly,


square brackets [ ] used to access elements by their position or
to return a sublist.
Recursive vs. Atomic
• Lists can be nested. That is a list can be an element of another list. But,
vectors, arrays and matrices are not recursive/nested. They are atomic.
• The functions is.recursive() and is.atomic() shows if a variable type is
recursive or atomic respectively.

1.Atomic Types:
•Atomic types are the basic building blocks in R. They include:
•Numeric (e.g., 1, 3.14)
•Integer (e.g., 1L, 42L)
•Logical (e.g., TRUE, FALSE)
•Character (e.g., "hello", "world")
•Complex (e.g., 1+2i, 3-4i)
•These types cannot contain other types of data structures.Typically used for
storing a single type of data.
2.Recursive Types:
•Recursive types can contain other recursive types, which means they can nest
within each other. The primary recursive data structures in R include:
• Lists: Can contain any type of data, including other lists, vectors,
matrices, or even functions.
• Expressions: A special type of list used for holding R expressions.
Atomic Example Nested List Example

Recursive Example

In summary, lists in R are recursive and can be nested,


while vectors, arrays, and matrices are atomic and
cannot contain other data structures of the same type.
The functions is.atomic() and is.recursive() are useful
for distinguishing between these types.
• A vector can be converted to a list using the function as.list(). Similarly, a list
can be converted into a vector, provided the list contains scalar elements of
the same type.
• This is done using the conversion functions such as as.numeric(),
as.character()
DATA FRAMES
Data Frames are data displayed in a format as a table.They store spread-sheet
like data. It is a list of vectors of equal length (not necessarily of the same basic
data type) By default the row names are automatically
numbered from 1 to the number of rows in
the data frame. It is also possible to provide
row names manually using the row.names
argument as below
It is possible to create data frames with different length of vectors as long
as the shorter ones can be recycled to match that of the longer ones.

Access Items:We can use single brackets [ ], double brackets [[ ]] or $ to access


columns from a data frame:
If we need to fetch a subset of a data frame by
selecting few columns and specifying conditions on The functions rbind() and cbind() can also be
the rows, we can use the subset() function to do this. applied on the data frames as we do for the
matrices. The only condition for rbind()is that
the column names should match, but for cbind()
it does not check even if the column names are
duplicated.
The merge() function can be applied to merge two
data frames provided they have common column
names. By default, the merge() function does the
merging based on all the common columns, otherwise
one of the common column name has to be specified.
• The functions colSums(), colMeans(), rowSums() and
rowMeans() can be applied on the data frames that have
numeric values as below.

Points to Remember
1.Numeric Data Only: These functions operate on numeric data. If
your data frame contains non-numeric columns, you’ll need to
exclude them or convert them to numeric if appropriate.
Assignment Questions:
1. Develop a R program to create two 3*3 matrices A and B and Perform the
following operations i) Transpose ii) Addition iii)Substraction iv) division

2.Perform colSums(), colMeans(), rowSums() and rowMeans() for the data


frames that have numeric values as below.
s <- c(5, 6, 7, 8)
y <- c(25, 26, 27, 28)
G <- data.frame(s, y, y)
Factors
• Factors stores categorical data and they behave like strings sometimes and
integers sometimes.
1. levels()
•Purpose: Retrieves or sets the levels of a factor.
•Usage:
•Get Levels: levels(factor_variable)
•Set Levels: levels(factor_variable) <- c("new_level1", "new_level2")
2. nlevels()
•Purpose: Returns the number of levels of a
factor.
•Usage: nlevels(factor_variable)

3. relevel()
•Purpose: Changes the reference level of a factor
•Usage: relevel(factor_variable, ref = "new_reference_level")
4. droplevels()
•Purpose: Removes unused levels from a factor.
•Usage: droplevels(factor_variable)
5. is.na()
•Purpose: Checks for NA (missing) values in a vector or factor.
•Usage: is.na(vector_or_factor)

6. ordered()
•Purpose: Converts a factor to an ordered factor, which represents ordinal data.
•Usage: ordered(factor_variable, levels = c("level1", "level2", ...))
7.cut()
•Purpose: Divides continuous data into intervals or bins, and optionally converts these bins
into factors.
•Usage: cut(x, breaks, labels = FALSE, include.lowest = FALSE, right = TRUE)
8.table()
•Purpose: Creates a contingency table of counts for categorical data.
•Usage: table(vector_or_factor)

9.nrow() and ncol()


•Purpose: Return the number of rows or columns of a data frame or matrix, respectively. They
are not directly related to factors but useful for data manipulation.
•Usage: nrow(data_frame_or_matrix) and ncol(data_frame_or_matrix)
10.gl()
•Purpose: Generates factors based on a pattern. It is useful for creating factors for
experimental designs.
•Usage: gl(n, k, length = n*k, labels = NULL)

11. interaction()
Purpose: The interaction() function creates a factor that represents the interaction between two
or more factors. This is useful for analyzing how the combination of categorical variables
influences a response variable.
Summary
•levels(): Get/set levels of a factor.
•nlevels(): Count levels of a factor.
•relevel(): Change the reference level.
•droplevels(): Drop unused levels.
•is.na(): Check for missing values.
•ordered(): Convert to an ordered factor.
•cut(): Bin continuous data into factors.
•table(): Create a contingency table of counts.
•gl(): Generate factors based on patterns.
•The interaction(): function in R is used to compute the interaction between
factors
Strings
Strings are stored in character vectors. Most string manipulation functions act on character
vectors.
String Manipulation Functions
1.c()
•Purpose: Create a character vector.
•Usage: c("string1", "string2", ...)
2.paste()
•Purpose: Concatenate strings with a
separator.
•Usage: paste(string1, string2, ..., sep = " ")
3.paste0()
•Purpose: Concatenate strings without any
separator.
•Usage: paste0(string1, string2, ...)
4. toString()
•Purpose: Convert a numeric vector to a character vector with elements separated by a comma and a
space.
•Usage: toString(numeric_vector)
5.cat()
•Purpose: Print objects, similar to paste(), but outputs
directly to the console. It does not return a value.
•Usage: cat(..., sep = " ", fill = FALSE)

6. noquote()
•Purpose: Print strings without quotes.
•Usage: noquote(string_vector)

7.formatC()
•Purpose: Format numbers with specific parameters such as width, digits, and format.
•Usage: formatC(x, format = "f", digits = 2, width = 10, flag = "-")
8. sprintf()
•Purpose: Format strings and numbers using placeholders.
•Usage: sprintf(format_string, value1, value2, ...)

9. toupper()
•Purpose: Convert characters in a string to uppercase.
•Usage: toupper(string)
10.tolower()
•Purpose: Convert characters in a string to lowercase.
•Usage: tolower(string)

11.substr() or substring()
•Purpose: Extract a substring from a string.
•Usage: substr(string, start, stop) or substring(string, first, last)
12.strsplit()
•Purpose: Split a string into substrings based on a delimiter.
•Usage: strsplit(string, split = " ")

Assignment
3.Create a character vector words containing the following strings: "apple", "banana",
"cherry", "date", and "berry".
b. Use the paste() function to concatenate these words into a single string with each word
separated by a comma and a space.
c. Convert the string to toupper(),tolower(), substr(),noqoute() and print it.
Directory Functions
1. Getting and Setting the Working Directory
•getwd(): Returns the current working directory.

• setwd(): sets the working directory to the specified path.

2. File and Directory Manipulation


•basename(): Extracts the file name from a given path.
Ex:basename(“/path/to/file.txt) # returns “file.txt”

• dirname(): Extracts the directory path from a given file path.


Ex:Dirname(“/path/to/file.txt”) #returns “path/to”
3. Constructing File Paths
•file.path(): Constructs file paths by automatically inserting the appropriate path separator

4. R Home Directory
•R.home(): Returns the installation directory of R.This directory contains R files and libraries.

5. Relative and Absolute Paths : Relative paths refer to location in relation to current directory.
Ex:”data/file.txt”specifies a file located in subdirectory called data within current directory.
Absolute paths specifies the complete path from the root of the file system Ex:”C://Users/data/file.txt” returns
full path regardless of current directory.
•Relative Path Notations:
➢ . denotes the current directory.
➢ .. denotes the parent directory.
➢ ~ denotes the home directory.

•path.expand(): Converts relative paths to absolute paths.


Dates and Times
1. POSIXct:
•Represents the number of seconds since the beginning of 1970 (UTC).
•Suitable for efficient storage and arithmetic calculations.

2.POSIXlt:
•Stores date-time as a list of individual components (seconds, minutes, hours, etc.).
•Useful for accessing specific components easily.
3. Date:
•Represents dates only, stored as the number of days since 1970.
•Ideal when time is not significant.

4. Conversion Functions
•Use as.Date() to convert various formats to Date class.
•Use as.POSIXlt() to convert POSIXct to POSIXlt for easier access to components.
Date Conversions
Date conversions are essential when working with CSV files that store dates as strings.
In R, you can use the strptime() function to convert these strings into date-time objects
(POSIXlt).
Parsing Strings to Dates
1.Using strptime():
•The strptime() function parses a date-time string based on a specified format. If the string
does not match the format, it returns NA. Syntax :strptime(x,format,tz=“”)
Format Specifiers
➢ %H: Hour in 24-hour
format
➢ %M: Minutes
➢ %S: Seconds
➢ %d: Day of the month
(01 to 31)
➢ %m: Month (01 to 12)
➢ %Y: Year (four digits)
Converting Dates Back to Strings
2.Using strftime():
•This function converts date-time objects back into formatted strings. You can specify how you want the
output to look.

Format Specifiers for strftime()


•%I: Hour in 12-hour format
•%p: AM/PM
•%M:Minutes
•%A: Full name of the day of the week
•%B: Full name of the month
•%Y:Year without century(00-99)
Time Zones in R
1. Getting Default Time Zone
You can retrieve the system's default time zone using:

Specifying Time Zones


1.Using strptime() and strftime():
•You can specify a time zone when formatting dates.
•If no time zone is provided, R uses the default time zone.
Warning Messages
R may issue warnings when using non-standard time zones like "UTC-5" or "UTC+5", but it
usually returns the correct time. This occurs because R doesn’t recognize these formats as
valid time zones.
Working with POSIXlt and POSIXct
Time zone changes do not apply to POSIXlt objects directly. You should convert to POSIXct first.
Calculations with Dates and Times in R
In R, you can perform arithmetic operations with date and time objects. Here’s how to manipulate POSIXct, POSIXlt, and Date
classes:
1. POSIXct and POSIXlt
When you add a number to POSIXct or POSIXlt objects, it shifts the time by that many seconds.
Example: Adding Seconds
2. Date Class
When you add a number to a Date object, it shifts by that many days.

You might also like