[go: up one dir, main page]

0% found this document useful (0 votes)
10 views32 pages

Module 2

Uploaded by

sandeep07uma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views32 pages

Module 2

Uploaded by

sandeep07uma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

1

R Programming
Introduction : R is a flexible and powerful open source implementation of the language for statistics developed by John
chambers and others at bell labs.
The 5 reasons to learn and use R:
 R is open source and completely free.
 R is a better Statistical software than earlier available statistical packages like SPSS and Minitab
 R provides hundreds of built-in statistical functions as well as its own built-in programming language.
 R is used in teaching and performing computational statistics.
 We can get help from the R user community is easy.
R Studio :
R-Studio is an integrated development environment (IDE) for the R programming language. It provides a user-
friendly interface that enhances the R programming experience. Here are some key features and concepts related to R-
Studio.
 Script Editor : R-Studio includes a script editor where you can write and execute R code. The editor provides syntax
highlighting, code completion, and other features to make coding more efficient.
 Console : The console is where R code is executed interactively. You can type and run commands directly, making it
a useful environment for testing code snippets or exploring data.
2

 Work Space : R-Studio keeps track of your R session's workspace, which includes variables, data frames, and other
objects that you have created or loaded. You can view and manage your workspace using the "Environment" tab.
 File Navigation : R-Studio provides a file browser to navigate your project's directory structure. You can create,
open, and save R scripts, data files, and other project-related materials.
 Plots and Output Panes : R-Studio has dedicated panes for viewing plots, console output, and other results. Plots
generated in R scripts are displayed in the "Plots" pane, and console output is shown in the "Console" pane.
 Packages and Help : The "Packages" tab allows you to manage and install R packages easily. R-Studio also provides a
help pane that displays documentation and information about functions and packages.
 Version Control : R-Studio has built-in support for version control systems such as Git and SVN. You can manage
your version-controlled projects directly within the IDE.
 Integrated Development Environment (IDE) : R-Studio is designed to be a comprehensive IDE for R, providing an
integrated workflow for coding, running scripts, managing projects, and visualizing data.
 R Markdown : R-Studio supports R Markdown, a format that integrates R code and narrative text into a single
document. This is widely used for creating dynamic and reproducible reports.
 Tasks and Sessions : R-Studio allows you to manage multiple R sessions and run background jobs. This is useful for
parallel processing and managing resource-intensive tasks.
3

R-Studio enhances the R programming experience by providing a user-friendly interface, tools for project
management, and integration with various R-related technologies. It has become a popular choice for data
scientists, statisticians, and researchers working with the R programming language.
Run R : On Windows, you can find R in the Start menu, and on Mac OS/Linux, you can open a terminal and type R to start
the console.
In the R console, you can type your R code directly and press Enter to execute it.

In R-Studio, you can create a new script by clicking on "File" -> "New Script" or open an existing script.
Write your R code in the script editor. You can run individual lines or selected portions of code by pressing Ctrl +
Enter (Windows/Linux) or Cmd + Enter (Mac).
4

You can run the entire script by clicking on the "Run" button or by using the Ctrl + Shift + S (Windows/Linux) or
Cmd + Shift + S (Mac) keyboard shortcut.
Using R Console as a Calculator :
We can use R Console as a calculator for doing Addition, Subtraction, multiplication and division etc.,
The Following are the examples of using R as a Calculator.
5

R Labels each output value with a number in square brackets. The [1] is simply the index of the first element of the
vector.
We can use the “=” sign to assign a value to the object, we can use double equal “==” sign for the test for equality.

Assigning a value in R :
In R we have to use left pointing “ <- “ for assigning a value to the variable, Vector, etc.,
6

Assigning a Single Value to a Variable:


We can use ‘x’ as a label for Single Value
> x <- 2
>x
2
>x^2
4
>x^x
4
Vectors and assignment

R operates on named data structures. The simplest such structure is the numeric vector, which is a single
entity consisting of an ordered collection of numbers. To set up a vector named x, say, consisting of five
numbers, namely 10.4, 5.6, 3.1, 6.4 and 21.7, use the R command
7

> x <- c(10.4, 5.6, 3.1, 6.4, 21.7)

This is an assignment statement using the function c() which in this context can take an arbitrary number of
vector arguments and whose value is a vector got by concatenating its arguments end to end.1

A number occurring by itself in an expression is taken as a vector of length one.

Notice that the assignment operator (‘<-’), which consists of the two characters ‘<’ (“less than”) and ‘-’
(“minus”) occurring strictly side-by-side and it ‘points’ to the object receiving the value of the expression. In
most contexts the ‘=’ operator can be used as an alternative.

Assignment can also be made using the function assign(). An equivalent way of making the same
assignment as above is with:

> assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))

The usual operator, <-, can be thought of as a syntactic short-cut to this.

Assignments can also be made in the other direction, using the obvious change in the assign- ment operator.
So the same assignment could be made using

> c(10.4, 5.6, 3.1, 6.4, 21.7) -> x


8

If an expression is used as a complete command, the value is printed and lost 2. So now if we were to use the
command

> 1/x

the reciprocals of the five values would be printed at the terminal (and the value of x, of course, unchanged).

The further assignment

> y <- c(x, 0, x)

would create a vector y with 11 entries consisting of two copies of x with a zero in the middle place.

Vector arithmetic

Vectors can be used in arithmetic expressions, in which case the operations are performed element by element.
Vectors occurring in the same expression need not all be of the same length. If they are not, the value of the
expression is a vector with the same length as the longest vector which occurs in the expression. Shorter vectors
in the expression are recycled as often as need be (perhaps fractionally) until they match the length of the longest
vector. In particular a constant is simply repeated. So with the above assignments the command

> v <- 2*x + y + 1

generates a new vector v of length 11 constructed by adding together, element by element, 2*x
9

repeated 2.2 times, y repeated just once, and 1 repeated 11 times.

1
With other than vector types of argument, such as list mode arguments, the action of c() is rather
different. See Section 6.2.1 [Concatenating lists], page 27.
2
Actually, it is still available as .Last.value before any other statements are executed.

The elementary arithmetic operators are the usual +, -, *, / and ^ for raising to a power. In addition all of
the common arithmetic functions are available. log, exp, sin, cos, tan, sqrt, and so on, all have their usual
meaning. max and min select the largest and smallest elements of a vector respectively. range is a function
whose value is a vector of length two, namely c(min(x), max(x)). length(x) is the number of elements in x,
sum(x) gives the total of the elements in x, and prod(x) their product.

Two statistical functions are mean(x) which calculates the sample mean, which is the same as
sum(x)/length(x), and var(x) which gives

sum((x-mean(x))^2)/(length(x)-1)
1
0
or sample variance. If the argument to var() is an n-by-p matrix the value is a p-by-p sample covariance matrix
got by regarding the rows as independent p-variate sample vectors.

sort(x) returns a vector of the same size as x with the elements arranged in increasing order; however there
are other more flexible sorting facilities available (see order() or sort.list() which produce a permutation
to do the sorting).

Note that max and min select the largest and smallest values in their arguments, even if they are given several
vectors. The parallel maximum and minimum functions pmax and pmin return a vector (of length equal to their
longest argument) that contains in each element the largest (smallest) element in that position in any of the
input vectors.

For most purposes the user will not be concerned if the “numbers” in a numeric vector are integers, reals
or even complex. Internally calculations are done as double precision real numbers, or double precision complex
numbers if the input data are complex.

To work with complex numbers, supply an explicit complex part. Thus

sqrt(-17)

will give NaN and a warning, but

sqrt(-17+0i)
1
1
will do the computations as complex numbers.

Generating regular sequences

R has a number of facilities for generating commonly used sequences of numbers. For example 1:30 is the vector
c(1, 2, ..., 29, 30). The colon operator has high priority within an ex- pression, so, for example 2*1:15 is
the vector c(2, 4, ..., 28, 30). Put n <- 10 and compare the sequences 1:n-1 and 1:(n-1).

The construction 30:1 may be used to generate a sequence backwards.

The function seq() is a more general facility for generating sequences. It has five arguments, only some of
which may be specified in any one call. The first two arguments, if given, specify the beginning and end of the
sequence, and if these are the only two arguments given the result is the same as the colon operator. That is
seq(2,10) is the same vector as 2:10.

Arguments to seq(), and to many other R functions, can also be given in named form, in which case the
order in which they appear is irrelevant. The first two arguments may be named from=value and to=value;
thus seq(1,30), seq(from=1, to=30) and seq(to=30, from=1) are all the same as 1:30. The next two arguments
to seq() may be named by=value and length=value, which specify a step size and a length for the sequence
respectively. If neither of these is given, the default by=1 is assumed.

For example
1
2
> seq(-5, 5, by=.2) -> s3

generates in s3 the vector c(-5.0, -4.8, -4.6, ..., 4.6, 4.8, 5.0). Similarly

> s4 <- seq(length=51, from=-5, by=.2)

generates the same vector in s4.

The fifth argument may be named along=vector, which is normally used as the only argu- ment to create
the sequence 1, 2, ..., length(vector), or the empty sequence if the vector is empty (as it can be).

A related function is rep() which can be used for replicating an object in various complicated ways. The
simplest form is

> s5 <- rep(x, times=5)

which will put five copies of x end-to-end in s5. Another useful version is

> s6 <- rep(x, each=5)

which repeats each element of x five times before moving on to the next.
1
3
Logical vectors

As well as numerical vectors, R allows manipulation of logical quantities. The elements of a logical vector can have
the values TRUE, FALSE, and NA (for “not available”, see below). The first two are often abbreviated as T and F,
respectively. Note however that T and F are just variables which are set to TRUE and FALSE by default, but are not
reserved words and hence can be overwritten by the user. Hence, you should always use TRUE and FALSE.

Logical vectors are generated by conditions. For example

> temp <- x > 13

sets temp as a vector of the same length as x with values FALSE corresponding to elements of x
where the condition is not met and TRUE where it is.

The logical operators are <, <=, >, >=, == for exact equality and != for inequality. In addition if c1 and c2 are
logical expressions, then c1 & c2 is their intersection (“and”), c1 | c2 is their union (“or”), and !c1 is the negation
of c1.

Logical vectors may be used in ordinary arithmetic, in which case they are coerced into numeric vectors, FALSE
becoming 0 and TRUE becoming 1. However there are situations where logical vectors and their coerced numeric
counterparts are not equivalent, for example see the next subsection.

Missing values
1
4
In some cases the components of a vector may not be completely known. When an element or value is “not
available” or a “missing value” in the statistical sense, a place within a vector may be reserved for it by assigning
it the special value NA. In general any operation on an NA becomes an NA. The motivation for this rule is simply
that if the specification of an operation is incomplete, the result cannot be known and hence is not available.

The function is.na(x) gives a logical vector of the same size as x with value TRUE if and only if the
corresponding element in x is NA.

> z <- c(1:3,NA); ind <- is.na(z)

Notice that the logical expression x == NA is quite different from is.na(x) since NA is not really a value but a
marker for a quantity that is not available. Thus x == NA is a vector of the same length as x all of whose values are
NA as the logical expression itself is incomplete and hence undecidable.

Note that there is a second kind of “missing” values which are produced by numerical com- putation, the so-
called Not a Number, NaN, values. Examples are

> 0/0

or

> Inf - Inf


1
5

which both give NaN since the result cannot be defined sensibly.

In summary, is.na(xx) is TRUE both for NA and NaN values. To differentiate these,
is.nan(xx) is only TRUE for NaNs.

Missing values are sometimes printed as <NA> when character vectors are printed without quotes.

Character vectors

Character quantities and character vectors are used frequently in R, for example as plot labels. Where needed
they are denoted by a sequence of characters delimited by the double quote character, e.g., "x-values", "New
iteration results".

Character strings are entered using either matching double (") or single (’) quotes, but are printed using
double quotes (or sometimes without quotes). They use C-style escape sequences, using \ as the escape
character, so \ is entered and printed as \\, and inside double quotes " is entered as \". Other useful escape
sequences are \n, newline, \t, tab and \b, backspace—see
?Quotes for a full list.

Character vectors may be concatenated into a vector by the c() function; examples of their use will emerge
frequently.
1
6
The paste() function takes an arbitrary number of arguments and concatenates them one by one into character
strings. Any numbers given among the arguments are coerced into character strings in the evident way, that is,
in the same way they would be if they were printed. The arguments are by default separated in the result by
a single blank character, but this can be changed by the named argument, sep=string, which changes it to
string, possibly empty.

For example

> labs <- paste(c("X","Y"), 1:10, sep="")

makes labs into the character vector

c("X1", "Y2", "X3", "Y4", "X5", "Y6", "X7", "Y8", "X9", "Y10")

Note particularly that recycling of short lists takes place here too; thus c("X", "Y") is repeated 5 times to
match the sequence 1:10.3

Index vectors; selecting and modifying subsets of a data set

Subsets of the elements of a vector may be selected by appending to the name of the vector an index vector in
square brackets. More generally any expression that evaluates to a vector may have subsets of its elements
similarly selected by appending an index vector in square brackets immediately after the expression.
1
7
Such index vectors can be any of four distinct types.

1. A logical vector. In this case the index vector is recycled to the same length as the vector from which elements
are to be selected. Values corresponding to TRUE in the index vector are selected and those corresponding to
FALSE are omitted. For example

> y <- x[!is.na(x)]

creates (or re-creates) an object y which will contain the non-missing values of x, in the same order. Note that
if x has missing values, y will be shorter than x. Also

> (x+1)[(!is.na(x)) & x>0] -> z

creates an object z and places in it the values of the vector x+1 for which the corresponding value in x was
both non-missing and positive.
1
8

1. Write any 3 math functions in R.


1
9

2. Define constants in R
2
0
2
1
2
2
2
3
2
4
Q.no 6. What are the advantages of R?

R offers several advantages, particularly for data analysis and statistical computing:
1. Statistical Capabilities: R is specifically designed for statistics, making it powerful for complex analyses, from basic
to advanced techniques.
2. Extensive Libraries: It has a vast collection of packages (CRAN) for various statistical methods, data manipulation,
visualization, and machine learning.
3. Data Visualization: R excels in data visualization with libraries like ggplot2, enabling the creation of high-quality,
customizable graphics.
4. Community Support: R has a large and active community, providing resources, tutorials, and forums for help and
collaboration.
5. Reproducible Research: Tools like R Markdown facilitate the integration of code, results, and narrative, promoting
reproducible analysis.
6. Integration with Other Tools: R can interface with databases, web APIs, and other programming languages like
Python, enhancing its versatility.
7. Cross-Platform Compatibility: R runs on various operating systems (Windows, macOS, Linux), ensuring accessibility
for many users.
8. Open Source: Being open-source means that R is free to use and continuously improved by its community.
7. Explain Data Frames in R.
In R, a Data Frame is a two-dimensional, tabular data structure that is used for storing data. It is similar to a spreadsheet or SQL table,
where data is organized in rows and columns. Each column can contain different types of data (e.g., numeric, character, or factor),
making data frames highly versatile for handling diverse datasets.
Key Features of Data Frames:
1. Rows and Columns: Each row represents an observation, while each column represents a variable.
2
5
2. Different Data Types: Columns can have different data types (e.g., numeric, character, logical). This allows you to store mixed types of
data in a single structure.
3. Column Names: Each column has a name, making it easier to reference and manipulate specific variables.
4. Row Names: Rows can also have names, which can be useful for indexing or identifying specific observations.
5. Flexibility: Data frames can be easily modified by adding or removing columns and rows.

1. Explain R functions with suitable examples?


R functions are a fundamental part of R programming, allowing you to encapsulate code for reuse and improve
organization. Functions take inputs (arguments), perform operations, and often return outputs.
Basic Structure of a Function
A typical function in R is defined using the function keyword:
my_function <- function(arg1, arg2) {
# Code to execute
result <- arg1 + arg2
return(result)
}
2
6
Example 1: A Simple Function
Here's a simple function that adds two numbers:
add_numbers <- function(a, b) {
sum <- a + b
return(sum)
}

# Using the function


result <- add_numbers(3, 5)
print(result) # Output: 8
Example 2: Function with Default Arguments
You can set default values for arguments:
greet <- function(name, message = "Hello") {
paste(message, name)
}

# Using the function with and without the default message


print(greet("Alice")) # Output: "Hello Alice"
print(greet("Bob", "Hi")) # Output: "Hi Bob"
Example 3: Function Returning Multiple Values
R functions can return a list to convey multiple outputs:
calculate_stats <- function(values) {
mean_value <- mean(values)
2
7
sd_value <- sd(values)
return(list(mean = mean_value, sd = sd_value))
}

# Using the function


stats <- calculate_stats(c(1, 2, 3, 4, 5))
print(stats$mean) # Output: 3
print(stats$sd) # Output: 1.581139
Conclusion
R functions are versatile and powerful tools for structuring your code, making it reusable and easier to read. By
understanding how to create and utilize functions, you can enhance your data analysis and statistical programming
capabilities.
2
8

2.Write a R program to find classes in the vector functions.


Certainly! In R, you can use the sapply function combined with the class function to determine the classes of
elements in a vector. Here’s a simple example program that does just that:
# Create a sample vector with different types of elements
sample_vector <- c(1, 2.5, "Hello", TRUE, list(a = 1))

# Function to get classes of each element in the vector


get_classes <- function(vec) {
sapply(vec, class)
}

# Find and print classes


classes <- get_classes(sample_vector)
2
9
print(classes)
Explanation:
1. Sample Vector: This vector contains different types of elements: integers, numeric, character, logical, and a list.
2. get_classes Function: This function uses sapply to apply the class function to each element of the vector.
3. Print Classes: Finally, it prints the class of each element.
You can run this code in your R environment to see the results. Let me know if you need any further modifications
or explanations!

3.Explain about Variables and Data Types in R Programming .

In R programming, variables are used to store data values, and they are fundamental to any programming task.
Variables can hold various types of data, which are categorized into different data types. Here’s a breakdown:
Variables
 Definition: A variable is a name that refers to a value. You create a variable by assigning a value to it using the <-
operator (or =).
 Naming: Variable names can include letters, numbers, underscores, and periods, but they cannot start with a
number. R is case-sensitive, so myVar and myvar would be considered different variables.
Data Types
R has several basic data types:
3
0
1. Numeric: This is the default data type for numbers in R, including both integers and real numbers (decimals).
o Example
o x <- 5 # Numeric
o y <- 3.14 # Numeric (decimal)
2. Integer: R can specifically represent integers using the L suffix.
 Example
 z <- 5L # Integer
3. Character: This type represents text data, and it is enclosed in either single or double quotes.
Example:
name <- "Alice" # Character
4. Logical: This data type represents Boolean values: TRUE or FALSE.
Example:
is_valid <- TRUE # Logical
3
1
4. What is R Studio? Explain its features
RStudio is an integrated development environment (IDE) specifically designed for R, a programming language widely
used for statistical computing and data analysis. Here are some of its key features:
1. User-Friendly Interface: RStudio provides a clean, organized workspace with multiple panes for scripts, console,
environment, and files, making it easier to navigate and manage projects.
2. Script Editor: The built-in script editor supports syntax highlighting, code completion, and code folding, which helps
in writing and organizing R scripts efficiently.
3. Integrated Console: Users can execute R commands directly in the console, allowing for quick testing and
exploration of code.
4. Visualization Tools: RStudio includes tools for visualizing data and plots, making it easy to create and display
graphics inline.
5. Project Management: RStudio supports project-based workflows, allowing users to manage files, scripts, and
datasets more effectively within a project structure.
6. Package Management: It provides tools for installing, updating, and managing R packages, facilitating the use of
additional libraries for various tasks.
7. Debugging and Profiling Tools: RStudio includes debugging features that help identify and fix errors in code, as well
as profiling tools to analyze the performance of R scripts.
8. Version Control Integration: RStudio supports integration with Git and other version control systems, making it
easier to manage changes and collaborate on projects.
9. R Markdown Support: Users can create dynamic reports and presentations using R Markdown, combining code,
output, and narrative text in a single document.
10.Customizable Layouts: The interface can be customized according to user preferences, allowing for a personalized
coding environment.
3
2
11.Cross-Platform: RStudio runs on Windows, macOS, and Linux, making it accessible to a wide range of users.
Overall, RStudio enhances the R programming experience by providing tools and features that streamline coding,
data analysis, and reporting tasks.

You might also like