[go: up one dir, main page]

0% found this document useful (0 votes)
17 views15 pages

Unit I

R is a free, open-source programming language primarily used for statistical computing and data visualization, featuring strong graphic design capabilities and a wide range of packages. It supports quick calculations, is platform-independent, and is well-suited for data analysis and machine learning. The document also covers variable assignment, data types, and the installation process for R and RStudio.

Uploaded by

royalff065
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views15 pages

Unit I

R is a free, open-source programming language primarily used for statistical computing and data visualization, featuring strong graphic design capabilities and a wide range of packages. It supports quick calculations, is platform-independent, and is well-suited for data analysis and machine learning. The document also covers variable assignment, data types, and the installation process for R and RStudio.

Uploaded by

royalff065
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

UNIT - 1 1

UNIT I - Chapter 1
What is R?

R is a popular programming language used for statistical computing and


graphical presentation. Its most common use is to analyse and visualize data.

As stated earlier, R is a programming language and software environment for


statistical analysis, graphics representation and reporting.

The following are the important features of R:


1. Open-Source
You don’t have to pay any money to download R on your computer. It is free and
open-source software. Furthermore, you can contribute towards the development of
R, customize its packages, and add more features

2. Strong Ability to Design Graphics


R has improved libraries that make it possible to create interactive graphics. As a
result, data visualization and representation are relatively simple. R can generate
various flow diagrams, from straightforward charts to intricate, interactive ones.

3. Extensive Range of Packages


CRAN, or the Comprehensive R Archive Network, contains
over 10,000 different packages and extensions that help handle a wide range of data
science challenges. R contains a large set of packages for many subjects, such as
astronomy, biology, and so forth. While R was developed for academic objectives,
it is now also utilized in industry

4. No Compilation
The R language is interpreted rather than compiled. As a result, no compiler is
required to compile code into an executable program. The R code is evaluated step
by step and turned straight into machine-level calls. This significantly reduces the
time required to run a R script.

5. Enables Quick Calculations


R supports a wide range of complicated operations on vectors, arrays, data
frames, and other data objects of various sizes. Furthermore, all of these actions
occur at breakneck speed. It includes a variety of operator suites to execute these
varied calculations.

1
UNIT - 1 2

6. Compatibility with Multiple Platforms


R allows for cross-platform compatibility. It can run on any operating system
and in any software environment. It can also run on any hardware setup without
the need for any further workarounds.

Why Use R?

 It is a great resource for data analysis, data visualization, data science


and machine learning
 It provides many statistical techniques (such as statistical tests,
classification, clustering and data reduction)
 It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter
plot, etc++
 It works on different platforms (Windows, Mac, Linux)
 It is open-source and free
 It has a large community support
 It has many packages (libraries of functions) that can be used to solve
different problems

Advantages of R over other programming languages:

 It’s open-source. No fees or licenses are needed, so it’s a low-risk


venture if you’re developing a new program.

 It’s platform-independent. R runs on all operating systems, so


developers only need to create one program that can work on competing
systems. This independence is yet another reason why R is cost-
effective!

 It has lots of packages. For example, the R language has more than
10,000 packages stored in the CRAN repository, and the number is
continuously increasing.

 It’s great for statistics. Statistics are a big thing today, and R shines in
this regard. As a result, programmers prefer it over other languages for
statistical tool development.
2
UNIT - 1 3

 It’s well suited for Machine Learning. R is ideal for machine learning
operations such as regression and classification. It even offers many
features and packages for artificial neural network development.

 R lets you perform data wrangling. R offers a host of packages that


help data analysts turn unstructured, messy data into a structured format.

 R is still growing. R keeps evolving and growing, constantly updating


and upgrading, thanks to a solid supportive community.

R Command Prompt

Once you have R environment setup, then it’s easy to start your R command
prompt by just typing the following command at your command prompt:

This will launch R interpreter and you will get a prompt > where you can start
typing your program as follows:

> myString <- "Hello, World!"

> print ( myString)

[1] "Hello, World!

R Script File

Usually, you will do your programming by writing your programs in script files
and then you execute those scripts at your command prompt with the help of R
interpreter called Rscript. So let's start with writing following code in a text file
called test.R as under:

# My first program in R Programming


myString <- "Hello, World!"

print ( myString)
Save the above code in a file test.R and execute it at Linux command prompt as
given below. Even if you are using Windows or other system, syntax will remain
same.

3
UNIT - 1 4

$ Rscript test.R
Comments
Comments are like helping text in your R program and they are ignored by the
interpreter while executing your actual program. Single comment is written using
# in the beginning of the statement as follows:
# My first program in R Programming
Installation of R

R programming is a very popular language and to work on that we have to install


two things, i.e., R and RStudio. R and RStudio works together to create a project
on R.

Installing R to the local computer is very easy. First, we must know which
operating system we are using so that we can download it accordingly.

The official site https://cloud.r-project.org provides binary files for major


operating systems including Windows, Linux, and Mac OS. In some Linux
distributions, R is installed by default, which we can verify from the console by
entering R.

To install R, either we can get it from the site https://cloud.r-project.org or can


use commands from the terminal.

Installing R Packages from the CRAN Repository: Alternative Method

If we work with R in an IDE, we can use the menu instead of the install.packages()
function to install the necessary modules from the CRAN repository. For
example, in RStudio, the most popular IDE for R, we need to complete the
following steps:

 Click Tools → Install Packages


 Select Repository (CRAN) in the Install from: slot
 Type the package name (or several package names, separated with a
white space or comma)
 Leave Install dependencies ticked as by default
 Click Install

Use the help() command


To find information for a particular function, such as the function print, type
help('print') on the R command line and press enter (I recommend using quotes
whenever you use this command, but there are some special cases when they are
unnecessary). This will open up a window with information on how to use the
4
UNIT - 1 5

required function.
Typing help() on the R command line and pressing enter will open a window
telling you a bit on how to use the help() command
The find() function, on the other hand, is used to return a location where the
objects of a given name can be found.
library()

When we execute the above code, it produces the following result. It may vary
depending on the local settings of your pc.
Packages in library ‘C:/Program Files/R/R-3.2.2/library’:

base The R Base Package


boot Bootstrap Functions (Originally by Angelo Canty
for S)
class Functions for Classification

search()

When we execute the above code, it produces the following result. It may vary
depending on the local settings of your pc.
[1] ".GlobalEnv" "package:stats" "package:graphics"
[4] "package:grDevices" "package:utils" "package:datasets"
[7] "package:methods" "Autoloads" "package:base"

Install package manually


Go to the link R Packages to download the package needed. Save the package as
a .zip file in a suitable location in the local system.
Now you can run the following command to install this package in the R
environment.
install.packages(file_name_with_path, repos = NULL, type = "source")

# Install the package named "XML"


install.packages("E:/XML_3.98-1.3.zip", repos = NULL, type = "source")

To read the data from the keyboard, we use three different functions; scan(),
readline(), print().
Read Data Values: This is used for reading data into the input vector or an
input list from the environment console or file.
Keywords: File,
5
UNIT - 1 6

connection. For
example:
> #Author DataFlair
> inp = scan()
> inp
With readline(), we read multiple lines from a connection.

Keywords: File, connection.


We can use readline() for inputing a line from the keyboard in the form of a string:

For example:
> str = readline()
> str

Printing to the screen: In interactive mode, one can print the value of the
variable by just typing the variable name or expression. print() function can be
used in the batch mode as:
print(x)
The argument might be an object. So it is better to use cat() instead of print(), as
the last one can print only one expression and its result is numbered, which may
be a nuisance to us. Here is an example written below:
> print("DataFlair")
> cat("DataFlair
\n") DataFlair
> int <- 24
> cat(int, "DataFlair", "Big
Data\n") 24 DataFlair Big
Data

NA
In R, the NA values are used to represent missing values. (NA stands for “not
available.”) You may encounter NA values in text loaded into R (to represent
missing values) or in data loaded from databases (to replace NULL values).

Inf and -Inf


If a computation results in a number that is too big, R will return Inf for a positive
number and -Inf for a negative number (meaning positive and negative infinity,
respectively):
6
UNIT - 1 7

UNIT 1- Chapter 3
R - Variables

A variable is a name given to a memory location, which is used to store values in


a computer program. Variables in R programming can be used to store numbers (real
and complex), words, matrices, and even tables.
R is a dynamically programmed language which means that unlike other programming
languages, we do not have to declare the data type of a variable before we can use it in
our program.

For a variable to be valid, it should follow these rules

 It should contain letters, numbers, and only dot or underscore characters.


 It should not start with a number (eg:- 2iota)
 It should not start with a dot followed by a number (eg:- .2iota)
 It should not start with an underscore (eg:- _iota)
 It should not be a reserved keyword.

Variable Assignment-Assigning values to the variable

The variables can be assigned values using leftward, rightward and equal to
operator. The values of the variables can be printed
using print() or cat() function.

The cat() function combines multiple items into a continuous print output.
Eg:
# Assignment using equal operator.
var.1 = c(0,1,2,3)
# Assignment using leftward operator.
var.2 <- c("learn","R")

Finding Variables - Using ls( )

To know all the variables currently available in the workspace we use


7
UNIT - 1 8

the ls() function. Also the ls() function can use patterns to match the variable names.

print(ls())
o/p:
[1] "my var" "my_new_var" "my_var" "var.1"
[5] "var.2" "var.3" "var.name" "var_name2."

Deleting Variables
Variables can be deleted by using the rm() function. Below we delete the variable
var.3. On printing the value of the variable error is thrown.
rm(var.3)
print(var.3)

UNIT 1 - CHAPTER 2
R - DATA TYPES

There are six data types in R

 Numeric
 Integer
 Complex
 Character
 Logical
 Raw Data Type

Numeric
A numeric data type is the most common type in R, and contains any number with or
without a decimal, like: 10.5, 55, 787:

Eg:

x <- 10.5
y <- 55
# Print values of x and y
8
UNIT - 1 9

print(x)
print(y)

# Print the class name of x and y


class(x)
class(y)

o/p:
[1] 10.5
[1] 55
[1] "numeric"
[1] "numeric"

Integer
Integers are numeric data without decimals. This is used when you are certain that you
will never create a variable that should contain decimals. To create an integer variable,
you must use the letter L after the integer value:

Eg:
x <- 1000L
y <- 55L

# Print values of x and y


print(x)
print(y)

# Print the class name of x and y


class(x)
class(y)

o/p:
[1] 1000
[1] 55
[1] "integer"
[1] "integer

Complex
9
UNIT - 1 10

A complex number is written with an "i" as the imaginary part:

Eg:
x <- 3+5i
y <- 5i

# Print values of x and y


print(x)
print(y)
# Print the class name of x and y
class(x)
class(y)

o/p:
[1] 3+5i
[1] 0+5i
[1] "complex"
[1] "complex"

Character Data Type

This data type is used to specify character or string values in a


Variable. In programming, a string is a set of characters. For example, 'A' is a single
character and is a string.
You can use single quotes or double quotes " " to represent strings.
In general, we use:

‘’- for character variables


“” - for string variables

Eg:
# create a string variable
fruit <- "Apple"
print(class(fruit))
# create a character variable
my_char <- 'A'
print(class(my_char))

o/p:
10
UNIT - 1 11

Output

[1] "character"

[1] "character"

Logical:
The logical data type in R is also known as boolean data type. It can only have
2 values: TRUE or FALSE.
Eg:
x <- TRUE
print(x)
print(class(x))
y <- FALSE
print(y)
print(class(y))

o/p:
TRUE
[1] "logical"
[1] FALSE
[1] "logical"

Raw Data Type


A raw data type specifies values as raw bytes. You can use the following methods to
convert character data types to a raw data type and vice-versa:
charToRaw() - converts character data to raw data
rawToChar() - converts raw data to character data

eg:
# convert character to raw

raw_variable <- charToRaw("Welcome to Programiz")


print(raw_variable)
print(class(raw_variable))
# convert raw to character
char_variable <- rawToChar(raw_variable)
print(char_variable)
print(class(char_variable))

11
UNIT - 1 12

o/p: Output

[1] 57 65 6c 63 6f 6d 65 20 74 6f 20 50 72 6f 67 72 61 6d 69 7a
[1] "raw"
[1] "Welcome to Programiz"
[1] "character"

R-objects
There are many types of R-objects. The frequently used ones are –

Vectors
Lists
Matrices
Arrays
Factors
Data Frames

Vectors

Vectors are one of the basic types of objects in R programming. Atomic vectors can
store homogeneous data types such as character, doubles, integers, raw, logical, and
complex.
A vector is a group of elements of similar data type. When you want to create vector
with more than one element, you should use c() function which means to combine the
elements into a vector.

Eg: # Create a vector.


apple <- c('red','green',"yellow")
print(apple)
# Get the class of the vector.
print(class(apple))

o/p:
[1] "red" "green" "yellow"
[1] "character"

A vector is the basic data structure in R that stores data of similar types. For example,
Suppose we need to record the age of 5 employees. Instead of creating 5 separate
variables, we can simply create a vector.

Lists
A list is an R-object which can contain many different types of elements inside it like
vectors, functions and even another list inside it.
12
UNIT - 1 13

Using list() to create a list.

Eg: For example,

# list with similar type of data


list1 <- list(24, 29, 32, 34)
# list with different type of data
list2 <- list("Ranjy", 38, TRUE)

print(list1)
print(list2)

[[1]]
[1] 24

[[2]]
[1] 29

[[3]]
[1] 32

[[4]]
[1] 34

[[1]]
[1] "Ranjy"

[[2]]
[1] 38

[[3]]
[1] TRUE

Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector
input to the matrix function.
Syntax:
matrix(vector, nrow, ncol,byrow=TRUE)
here
vector - the data items of same type
nrow- number of rows
ncol- number of columns
13
UNIT - 1 14

byrow(optional) - if TRUE, the matrix is filled row-wise. By default, the matrix is


filled column-wise.

Eg:
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
[,1] [,2] [,3]
[1,] "a" "a" "b"
[2,] "c" "b" "a"

Arrays
Arrays can be of any number of dimensions. The array function takes a dim attribute
which creates the required number of dimensions. In the below example we create an
array with two elements which are 3x3 matrices each.
Eg:
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)

o/p:
,,1

[,1] [,2] [,3]


[1,] "green" "yellow" "green"
[2,] "yellow" "green" "yellow"
[3,] "green" "yellow" "green"

,,2

[,1] [,2] [,3]


[1,] "yellow" "green" "yellow"
[2,] "green" "yellow" "green"
[3,] "yellow" "green" "yellow"

Factors
Factors are the r-objects which are created using a vector. It stores the vector along
with the distinct values of the elements in the vector as labels. The labels are always
character irrespective of whether it is numeric or character or Boolean etc. in the input
vector. They are useful in statistical modeling.
Factors are created using the factor() function. The nlevels functions gives the count
of levels.
Eg:
# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')
14
UNIT - 1 15

# Create a factor object.


factor_apple <- factor(apple_colors)

# Print the factor.


print(factor_apple)
print(nlevels(factor_apple))

o/p:
[1] green green yellow red red red green
Levels: green red yellow
[1] 3

Data Frames
Data frames are tabular data objects. Unlike a matrix in data frame each column can
contain different modes of data. The first column can be numeric while the second
column can be character and third column can be logical. It is a list of vectors of equal
length.
Data Frames are created using the data.frame() function.

Eg:
# Create the data frame.
BMI <- data.frame(
gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age = c(42,38,26)
)
print(BMI)

o/p:
gender height weight Age
1 Male 152.0 81 42
2 Male 171.5 93 38
3 Female 165.0 78 26

15

You might also like