SCTR Unit 1
SCTR Unit 1
Techniques Using R
Unit-1
Statistics:
Statistics concerns the collection of data, organization, interpretation,
analysis and data presentation. The main purpose of using statistics is to
plan the collected data in terms of
• Experimental designs
• Statistical surveys.
Statistics is considered a mathematical science that works with numerical
data. In short, statistics is a crucial process which helps to make the
decision based on the data.
Computing: Computing is the act of calculating something. It is any goal-
oriented activity requiring, benefiting from, or creating computing
machinery. It includes the study and experimentation of algorithmic
processes, and development of both hardware and software.
R:
R is a programming language which is highly used in statistics.
R is becoming more and more popular due to two major reasons:
1.R is open source.
2.R has most of the latest statistical methods.
R is available across widely used platforms like Windows, Linux, and
macOS.
R is an open-source programming language that is widely used as a
statistical software and data analysis tool.
Features of R
Basic Statistics: The most common basic statistics terms are the mean,
mode, and median. These are all known as “Measures of Central
Tendency.” So using the R language we can measure central tendency very
easily.
Static graphics: R is rich with facilities for creating and developing
interesting static graphics. R contains functionality for many plot types
including graphic maps, mosaic plots, biplots, and the list goes on.
Probability distributions: Probability distributions play a vital role in
statistics and by using R we can easily handle various types of probability
distribution such as Binomial Distribution, Normal Distribution, Chi-
squared Distribution and many more.
Data analysis: It provides a large, coherent and integrated collection of
tools for data analysis.
Advantages of R Programming:
R is the most comprehensive statistical analysis package. As new
technology and concepts often appear first in R.
As R programming language is an open source. Thus, you can run R
anywhere and at any time.
R programming language is suitable for GNU/Linux and Windows
operating system.
R programming is cross-platform which runs on any operating system.
In R, everyone is welcome to provide new packages, bug fixes, and code
enhancements.
Disadvantages of R:
Operator Description
+ Addition
- Subtraction
* Multiplication
/ Division
^ or ** Exponentiation
Further Operators:
Modular %% 25 %% 6
However, R does have a print() function available if you want to use it. This might be useful if you
are familiar with other programming languages, such as Python, which often uses the print()
function to output code.
print("Hello World!")
And there are times you must use the print() function to output code, for example when working
with for loops
for (x in 1:10) {
print(x)
}
R Comments
Comments can be used to explain R code, and to make it more readable. It can also be used to prevent execution
when testing alternative code.
Comments starts with a #. When executing code, R will ignore anything that starts with #.
Example
# This is a comment
"Hello World!“
Comments does not have to be text to explain the code, it can also be used to prevent R from executing the code:
Example
# "Good morning!"
"Good night!"
R VARIABLES
name
# output "John"
age
# output 40
From the example above, name and age are variables,
while "John" and 40 are values.
Data Types in R Programming
The fundamental or atomic data types in R Programming are as follows:
•Numeric
•Integer
•Complex
•Character
•Logical
•Numeric:
In R, if we assign any decimal value to a variable it becomes a variable of a
numeric data type.
For example, the statement below assigns a numeric data type to the
variable “x”.
x = 45.6
And, the following statement is used to print the data type of the variable
“x”:
class(x)
Output:- [1] "numeric"
•Integer:
To create an integer variable in R, we need to call the (as.Integer) function while
assigning value to a variable.
For example:-
e = as.integer(3)
class(e)
Output: [1] "integer"
Another way of creating an integer variable is by using the suffix L keyword as:
x = 5L
class(x) #(“L describes that , it is a Integer value”)
Output: [1] "integer"
•Complex Data Types:
The complex data type is used to specify purely imaginary values in R.
We use the suffix i to specify the imaginary part.
For example,
# 2i represents imaginary part
complex_value <- 3 + 2i
x&y x|y !x
Output: [1] FALSE Output: [1] TRUE Output: [1] FALSE
Missing Values
In R , NA stands for Not Available. Each cell of your data that displays NA
is a missing value.
• Not available values are sometimes enclosed by < and >, i.e. <NA>.
• NaN stands for Not a Number and represents an undefined or
unrepresentable value. It appears, for instance, when you try to divide by
zero.
R – Objects:
Every programming language has its own data types to store values or any information
so that the user can assign these data types to the variables and perform operations
respectively. Operations are performed accordingly to the data types. These data types
can be character, integer, float, long, etc. Based on the data type, memory/storage is
allocated to the variable. For example, in C language character variables are assigned
with 1 byte of memory, integer variable with 2 or 4 bytes of memory and other data
types have different memory allocation for them. Unlike other programming languages,
variables are assigned to objects rather than data types in R programming.
Type of R – Objects:
There are 5 basic types of objects in the R language:
• Vectors
• Matrices
• Factors
• Array
• Data Frames
R Vector
• In R, a sequence of elements which share the same data type is known as vector.
• To combine the list of items to a vector, use the c() function and separate the items by a comma.
• A vector supports logical, integer, double, character, complex, or raw data type.
• The elements which are contained in vector known as components of the vector.
• We can check the type of vector with the help of the typeof() function.
• Vector is classified into two parts, i.e., Atomic vectors and Lists.
• They have three common properties, i.e., function type, function length, and attribute function.
• In an atomic vector, all the elements are of the same type, but in the list, the elements are of different
data types.
Atomic Vector
Example
vec <- c(3,4,5,6)
char_vec<-c("shubham","nishka","gunjan","sumit")
logic_vec<-c(TRUE,FALSE,FALSE,TRUE)
out_list<-list(vec,char_vec,logic_vec)
out_list
Output:
[[1]]
[1] 3 4 5 6
[[2]]
[1] "shubham" "nishka" "gunjan" "sumit"
[[3]]
[1] TRUE FALSE FALSE TRUE
MATRICES
byrow is a logical clue. If TRUE then the input vector elements are arranged by row.
dimname is the names assigned to the rows and columns.
Output:
The 3x3 matrix:
cde
a123
b456
c789
ARRAYS
array() function is used to create n-dimensional array.
This function takes dim attribute as an argument and creates required length of each dimension as specified in the attribute.
Syntax:
array(data, dim = (nrow, ncol, nmat), dimnames=names)
,, 2
[, 1] [, 2] [, 3]
[1, ] 1 1 1
[2, ] 2 2 2
[3, ] 3 3 3
,, 3
[, 1] [, 2] [, 3]
[1, ] 1 1 1
[2, ] 2 2 2
[3, ] 3 3 3
Factors:
Factors are used to categorize data. Examples of factors are:
Demography: Male/Female
Music: Rock, Pop, Classic, Jazz
Training: Strength, Stamina
To create a factor, use the factor() function and add a vector as argument
Example
# Create a factor
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
# Print the factor
music_genre
Result:
[1] Jazz Rock Classic Classic Pop Jazz Rock Jazz
Levels: Classic Jazz Pop Rock
Data Frames
While the first column can be character, the second and third can be numeric
or logical.
output:
Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45
Importing Data into R:
Scan():
The `scan()` function in R is used to read data from a file or a connection.
It's a versatile function that can be used to import various types of data, but
it's generally used for reading basic types of data like numbers, strings, and
logical values. Here's the breakdown of the parameters of the `scan()`
function:
It reads the data sequentially and allows you to specify various parameters
for handling different types of data. Here's an example of how to use
`scan()` to read data from a file:
Suppose you have a file named "data.txt" with the following contents:
12345
6 7 8 9 10
# Read data from the file "data.txt"
data <- scan("data.txt")
# Print the data
print(data)
In this case, the output will be:
[1] 1 2 3 4 5 6 7 8 9 10
Importing Data into R:
The read.table() function in R is used to read tabular data from files, such
as text files, CSV files, or TSV (tab-separated values) files. It's a versatile
function that can handle various file formats and options.
Here's the basic syntax and usage of the read.table() function:
read.table(file, header = FALSE, sep = "", quote = "\"", dec = ".", fill =
FALSE, ...)
# Parameters: # - file: Name of the file to read.
# - header: Logical. Whether the first row contains column names.
# - sep: Separator used to separate values in the file.
# - quote: Quote character used to enclose character strings.
# - dec: Character used for decimal points.
# - fill: Logical. Whether to fill shorter lines with NA values.
# - comment.char: Character that indicates comments.
# - ...: Additional parameters to control reading.
# Returns a data frame containing the read data.