[go: up one dir, main page]

0% found this document useful (0 votes)
32 views36 pages

SCTR Unit 1

Uploaded by

Pradeep Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views36 pages

SCTR Unit 1

Uploaded by

Pradeep Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Statistical Computing

Techniques Using R

Unit-1
Statistics:
Statistics concerns the collection of data, organization, interpretation,
analysis and data presentation. The main purpose of using statistics is to
plan the collected data in terms of
• Experimental designs
• Statistical surveys.
Statistics is considered a mathematical science that works with numerical
data. In short, statistics is a crucial process which helps to make the
decision based on the data.
Computing: Computing is the act of calculating something. It is any goal-
oriented activity requiring, benefiting from, or creating computing
machinery. It includes the study and experimentation of algorithmic
processes, and development of both hardware and software.
R:
R is a programming language which is highly used in statistics.
R is becoming more and more popular due to two major reasons:
1.R is open source.
2.R has most of the latest statistical methods.
R is available across widely used platforms like Windows, Linux, and
macOS.
R is an open-source programming language that is widely used as a
statistical software and data analysis tool.
Features of R
Basic Statistics: The most common basic statistics terms are the mean,
mode, and median. These are all known as “Measures of Central
Tendency.” So using the R language we can measure central tendency very
easily.
Static graphics: R is rich with facilities for creating and developing
interesting static graphics. R contains functionality for many plot types
including graphic maps, mosaic plots, biplots, and the list goes on.
Probability distributions: Probability distributions play a vital role in
statistics and by using R we can easily handle various types of probability
distribution such as Binomial Distribution, Normal Distribution, Chi-
squared Distribution and many more.
Data analysis: It provides a large, coherent and integrated collection of
tools for data analysis.
Advantages of R Programming:
R is the most comprehensive statistical analysis package. As new
technology and concepts often appear first in R.
As R programming language is an open source. Thus, you can run R
anywhere and at any time.
R programming language is suitable for GNU/Linux and Windows
operating system.
R programming is cross-platform which runs on any operating system.
In R, everyone is welcome to provide new packages, bug fixes, and code
enhancements.
Disadvantages of R:

• In the R programming language, the standard of some


packages is less than perfect.
• Although, R commands give little pressure to memory
management. So R programming language may consume
all available memory.
• In R basically, nobody to complain if something doesn’t
work.
• R programming language is much slower than other
programming languages such as Python and MATLAB.
Arithmetic Operators:

Operator Description

+ Addition

- Subtraction

* Multiplication

/ Division

^ or ** Exponentiation
Further Operators:

Description R Symbol Example

Comment # # This is a comment

Assignment <- x <- 5

Assignment -> 5 -> x

Concatenation operator c c(1,2,4)

Modular %% 25 %% 6

Sequence from a to b by h seq seq(a,b,h)

Sequence Operator : 0:3


R Syntax

1.To output text in R, use single or double quotes:


Example
"Hello World!"
2. To output numbers, just type the number (without quotes):
Example
5
10
25
3. To do simple calculations, add numbers together:
Example
5+5
R Print Output
Unlike many other programming languages, you can output code in R without using a print
function:
Example
"Hello World!"

However, R does have a print() function available if you want to use it. This might be useful if you
are familiar with other programming languages, such as Python, which often uses the print()
function to output code.
print("Hello World!")

And there are times you must use the print() function to output code, for example when working
with for loops
for (x in 1:10) {
print(x)
}
R Comments
Comments can be used to explain R code, and to make it more readable. It can also be used to prevent execution
when testing alternative code.

Comments starts with a #. When executing code, R will ignore anything that starts with #.

This example uses a comment before a line of code:

Example
# This is a comment
"Hello World!“

This example uses a comment at the end of a line of code:


Example
"Hello World!" # This is a comment

Comments does not have to be text to explain the code, it can also be used to prevent R from executing the code:
Example
# "Good morning!"
"Good night!"
R VARIABLES

• Variables are containers for storing data values.


• R does not have a command for declaring a variable.
• A variable is created the moment you first assign a value to it.
• To assign a value to a variable, use the <- sign.
• To output (or print) the variable value, just type the variable name
Example:
name <- "John"
age <- 40

name
# output "John"
age
# output 40
From the example above, name and age are variables,
while "John" and 40 are values.
Data Types in R Programming
The fundamental or atomic data types in R Programming are as follows:
•Numeric
•Integer
•Complex
•Character
•Logical
•Numeric:
In R, if we assign any decimal value to a variable it becomes a variable of a
numeric data type.
For example, the statement below assigns a numeric data type to the
variable “x”.
x = 45.6
And, the following statement is used to print the data type of the variable
“x”:
class(x)
Output:- [1] "numeric"
•Integer:
To create an integer variable in R, we need to call the (as.Integer) function while
assigning value to a variable.
For example:-
e = as.integer(3)
class(e)
Output: [1] "integer"
Another way of creating an integer variable is by using the suffix L keyword as:
x = 5L
class(x) #(“L describes that , it is a Integer value”)
Output: [1] "integer"
•Complex Data Types:
The complex data type is used to specify purely imaginary values in R.
We use the suffix i to specify the imaginary part.
For example,
# 2i represents imaginary part
complex_value <- 3 + 2i

# print class of complex_value


print(class(complex_value))
Output:
[1] "complex"
•Character Data Types:
The character data type is used to specify character or string values in a variable.
In programming, a string is a set of characters.
For example, 'A' is a single character and "Apple" is a string.
You can use single quotes '' or double quotes "" to represent strings.
In general, we use:
‘ ' for character variables
“ " for string variables
For example,
# create a string variable
fruit <- "Apple"
print(class(fruit))
# create a character variable
my_char <- 'A'
print(class(my_char))
Output
[1] "character"
[1] "character"
•Logical Data Types:
The logical data type in R is also known as boolean data type.
It can only have two values: TRUE and FALSE.
For example,
bool1 <- TRUE
print(bool1)
print(class(bool1))
bool2 <- FALSE
print(bool2)
print(class(bool2))
Output
[1] TRUE
[1] "logical"
[1] FALSE
[1] "logical"
Three standard logical operations, i.e., AND(&), OR(|), and NOT(!) yield a
variable of the logical data type.
For example:-
x= TRUE;
y = FALSE

x&y x|y !x
Output: [1] FALSE Output: [1] TRUE Output: [1] FALSE
Missing Values
In R , NA stands for Not Available. Each cell of your data that displays NA
is a missing value.
• Not available values are sometimes enclosed by < and >, i.e. <NA>.
• NaN stands for Not a Number and represents an undefined or
unrepresentable value. It appears, for instance, when you try to divide by
zero.
R – Objects:
Every programming language has its own data types to store values or any information
so that the user can assign these data types to the variables and perform operations
respectively. Operations are performed accordingly to the data types. These data types
can be character, integer, float, long, etc. Based on the data type, memory/storage is
allocated to the variable. For example, in C language character variables are assigned
with 1 byte of memory, integer variable with 2 or 4 bytes of memory and other data
types have different memory allocation for them. Unlike other programming languages,
variables are assigned to objects rather than data types in R programming.
Type of R – Objects:
There are 5 basic types of objects in the R language:
• Vectors
• Matrices
• Factors
• Array
• Data Frames
R Vector

• In R, a sequence of elements which share the same data type is known as vector.
• To combine the list of items to a vector, use the c() function and separate the items by a comma.
• A vector supports logical, integer, double, character, complex, or raw data type.
• The elements which are contained in vector known as components of the vector.
• We can check the type of vector with the help of the typeof() function.
• Vector is classified into two parts, i.e., Atomic vectors and Lists.
• They have three common properties, i.e., function type, function length, and attribute function.
• In an atomic vector, all the elements are of the same type, but in the list, the elements are of different
data types.
Atomic Vector

In R, there are four types of atomic vectors.


Atomic vectors are created with the help of c() function.
R LISTS
In R, lists are the second type of vector.
Lists are the objects of R which contain elements of different types such as number, vectors, string and another list inside it.
It can also contain a function or a matrix as its elements.
A list is a data structure which has components of mixed data types.
We can say, a list is a generic vector which contains other objects.

Example
vec <- c(3,4,5,6)
char_vec<-c("shubham","nishka","gunjan","sumit")
logic_vec<-c(TRUE,FALSE,FALSE,TRUE)
out_list<-list(vec,char_vec,logic_vec)
out_list
Output:

[[1]]
[1] 3 4 5 6
[[2]]
[1] "shubham" "nishka" "gunjan" "sumit"
[[3]]
[1] TRUE FALSE FALSE TRUE
MATRICES

To store values as 2-Dimensional array, matrices are used in R.


Data, number of rows and columns are defined in the matrix() function.
Syntax:
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)

byrow is a logical clue. If TRUE then the input vector elements are arranged by row.
dimname is the names assigned to the rows and columns.

A = matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9),nrow = 3, ncol = 3,byrow = TRUE )


rownames(A) = c("a", "b", "c")
colnames(A) = c("c", "d", "e")
cat("The 3x3 matrix:\n")
print(A)

Output:
The 3x3 matrix:
cde
a123
b456
c789
ARRAYS
array() function is used to create n-dimensional array.
This function takes dim attribute as an argument and creates required length of each dimension as specified in the attribute.

Syntax:
array(data, dim = (nrow, ncol, nmat), dimnames=names)

nrow : Number of rows


ncol : Number of columns
nmat : Number of matrices of dimensions nrow * ncol
dimnames : Default value = NULL.
Example:
# Create 3-dimensional array
# and filling values by column
arr <- array(c(1, 2, 3), dim = c(3, 3, 3))
print(arr)
Output:
,, 1
[, 1] [, 2] [, 3]
[1, ] 1 1 1
[2, ] 2 2 2
[3, ] 3 3 3

,, 2

[, 1] [, 2] [, 3]
[1, ] 1 1 1
[2, ] 2 2 2
[3, ] 3 3 3

,, 3

[, 1] [, 2] [, 3]
[1, ] 1 1 1
[2, ] 2 2 2
[3, ] 3 3 3
Factors:
Factors are used to categorize data. Examples of factors are:
Demography: Male/Female
Music: Rock, Pop, Classic, Jazz
Training: Strength, Stamina
To create a factor, use the factor() function and add a vector as argument
Example
# Create a factor
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
# Print the factor
music_genre
Result:
[1] Jazz Rock Classic Classic Pop Jazz Rock Jazz
Levels: Classic Jazz Pop Rock
Data Frames

Data Frames are data displayed in a format as a table.

Data Frames can have different types of data inside it.

While the first column can be character, the second and third can be numeric

or logical.

However, each column should have the same type of data.

Use the data.frame() function to create a data frame.


EXAMPLE:
# Create a data frame
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
# Print the data frame
Data_Frame

output:
Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45
Importing Data into R:
Scan():
The `scan()` function in R is used to read data from a file or a connection.
It's a versatile function that can be used to import various types of data, but
it's generally used for reading basic types of data like numbers, strings, and
logical values. Here's the breakdown of the parameters of the `scan()`
function:
It reads the data sequentially and allows you to specify various parameters
for handling different types of data. Here's an example of how to use
`scan()` to read data from a file:
Suppose you have a file named "data.txt" with the following contents:
12345
6 7 8 9 10
# Read data from the file "data.txt"
data <- scan("data.txt")
# Print the data
print(data)
In this case, the output will be:
[1] 1 2 3 4 5 6 7 8 9 10
Importing Data into R:
The read.table() function in R is used to read tabular data from files, such
as text files, CSV files, or TSV (tab-separated values) files. It's a versatile
function that can handle various file formats and options.
Here's the basic syntax and usage of the read.table() function:
read.table(file, header = FALSE, sep = "", quote = "\"", dec = ".", fill =
FALSE, ...)
# Parameters: # - file: Name of the file to read.
# - header: Logical. Whether the first row contains column names.
# - sep: Separator used to separate values in the file.
# - quote: Quote character used to enclose character strings.
# - dec: Character used for decimal points.
# - fill: Logical. Whether to fill shorter lines with NA values.
# - comment.char: Character that indicates comments.
# - ...: Additional parameters to control reading.
# Returns a data frame containing the read data.

You might also like