MDPN460 Lecture03

Uploaded by

mohamedggharib02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views34 pages

MDPN460 Lecture03

Uploaded by

mohamedggharib02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

MDPN460 – Industrial

Engineering Lab
Lecture 3

Basic Statistical Analysis Using R Programming

1 / 34
Today’s Lecture
●
Basic statistical analysis using R
– More on vectors
– Matrices and arrays
– Data storage in R
– Packages, libraries and repositories
– Some built-in graphics functions
●
Introduction to the paper airplane factory software
●
A free course on Machine Learning with R:
https://www.youtube.com/watch?v=Liws4MShq1A 2 / 34
Numeric vectors in R
●
As shown in the last lecture, a numeric vector is a list of
numbers.
●
The c() function is used to collect things together into a
vector.
●
The c() function can be used to join vectors as in the
following example:
> v1 <- c(10:5)
> v2 <- c(22, 25, 65)
> v3 <- c(v1, v2)
> v3
[1] 10 9 8 7 6 5 22 25 65
> v4 <- c(v3, 2:6)
> v4
[1] 10 9 8 7 6 5 22 25 65 2 3 4 5 6 3 / 34
Extracting elements from vectors
●
Use a number inside square brackets after the vector’s
assigned name to access any element of the vector at
the index referenced by the used number.
●
Vectors in R are not zero-based, i.e., they start at 1.
> V <- 20:100
> V[50]
[1] 69
> V[1]
[1] 20
> V[80]
[1] 99
> V[81]
[1] 100
> V[82]
[1] NA 4 / 34
Extracting sub vectors from
vectors
●
Sub-vectors can be accessed using a colon between two
indexes.
> V[10:20]
[1] 29 30 31 32 33 34 35 36 37 38 39
●
Or you can specify indexes using c()
> V[c(2,5,50)]
[1] 21 24 69
●
You can exclude indexes
> V[-(20:70)]
[1] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 90 91 92
[23] 93 94 95 96 97 98 99 100

5 / 34
Vector Arithmetic
●
All mathematical operations can be conducted on the
numerical values inside numerical vectors.
> x <- c(0, 1, 3, 7, 12, 20, 99)
>x*2
[1] 0 2 6 14 24 40 198
>x/2
[1] 0.0 0.5 1.5 3.5 6.0 10.0 49.5
> x^2
[1] 0 1 9 49 144 400 9801
> (x - 5) / 2
[1] -2.5 -2.0 -1.0 1.0 3.5 7.5 47.0
> (x * 3) %% 2
[1] 0 1 1 1 0 0 1
> (x * 3) %/% 2
[1] 0 1 4 10 18 30 148
6 / 34
Simple Patterned Vectors
●
We have seen the use of the : operator for producing
simple sequences of integers. Patterned vectors can also
be produced using the seq() function as well as the rep()
function.
> seq(1, 21, by = 2)
[1] 1 3 5 7 9 11 13 15 17 19 21
> rep(3, 12) # repeat the value 3, 12 times
[1] 3 3 3 3 3 3 3 3 3 3 3 3
> rep(seq(2, 20, by = 2), 2) # repeat the pattern 2 4 ... 20, twice
[1] 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
> rep(c(1, 4), c(3, 2)) # repeat 1, 3 times and 4 twice
[1] 1 1 1 4 4
> rep(c(1, 4), each = 3) # repeat each value 3 times
[1] 1 1 1 4 4 4
> rep(1:10, rep(2, 10)) # repeat each value twice
7 / 34
[1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10
Vectors with random patterns
●
The sample() function allows us to simulate
things like the results of the repeated tossing of
a 6-sided die.
> sample(1:6, size = 8, replace = TRUE) # an imaginary die is
tossed 8 times
[1] 3 4 4 2 1 6 6 5
> sample(1:6, size = 8, replace = TRUE) # an imaginary die is
tossed 8 times
[1] 4 2 5 2 6 3 2 4
> sample(1:6, size = 8, replace = TRUE) # an imaginary die is
tossed 8 times
[1] 2 6 4 2 2 6 2 6

8 / 34
Character vectors
●
Scalars and vectors can be made up of strings of
characters instead of numbers.
●
All elements of a vector must be of the same type.

> Student <- c("Ahmed", "Yasser", "Mona", "Amr")

> sample(Student, size = 3, replace = TRUE)
[1] "Mona" "Yasser" "Mona"
> sample(Student, size = 3, replace = FALSE)
[1] "Amr" "Mona" "Ahmed"
> More.Student <- c(Student, 322)
> sample(More.Student, size = 3, replace = FALSE)
[1] "Ahmed" "322" "Yasser"

9 / 34
Basic operations on character
vectors
●
There are two basic operations you might want to
perform on character vectors.
●
To take substrings, use substr() . It takes arguments
substr(x, start, stop) , where x is a vector of character
strings, and start and stop say which characters to keep.
> Initial <- substr(More.Student, 1, 1)
> Initial
[1] "A" "Y" "M" "A" "3"
> Initial <- substr(More.Student, 1, 2)
> Initial
[1] "Ah" "Ya" "Mo" "Am" "32"

10 / 34
Basic operations on character
vectors
●
The other basic operation is building up strings by
concatenation within elements. Use the paste() function
for this.
> paste("Name:", Student, sep=" ")
[1] "Name: Ahmed" "Name: Yasser" "Name: Mona" "Name: Amr"
> paste("Name:", Student, "Initial:", Initial, sep=" ")
[1] "Name: Ahmed Initial: Ah" "Name: Yasser Initial: Ya" "Name: Mona Initial: Mo"
[4] "Name: Amr Initial: Am" "Name: Ahmed Initial: 32"
> paste("Name:", More.Student, "Initial:", Initial, sep=" ")
[1] "Name: Ahmed Initial: Ah" "Name: Yasser Initial: Ya" "Name: Mona Initial: Mo"
[4] "Name: Amr Initial: Am" "Name: 322 Initial: 32"
> picker <- paste("Picked student:", Student, sep = " ")
> sample(picker, size=4, replace = FALSE)
[1] "Picked student: Mona" "Picked student: Ahmed" "Picked student: Yasser"
[4] "Picked student: Amr"
11 / 34
Factors
●
Factors offer an alternative way to store character
data. For example, a factor with four elements and
having the two levels, control and treatment can be
created using:
> grp <- c("control", "treatment", "control", "treatment")
> grp
[1] "control" "treatment" "control" "treatment"
> grp <- factor(grp)
> grp
[1] control treatment control treatment
Levels: control treatment
> levels(grp)
[1] "control" "treatment"
> as.integer(grp)
[1] 1 2 1 2 12 / 34
Factors
●
The levels() function can be used to change factor
labels as well. For example, suppose we wish to
change the "control" label to "placebo" . Since
"control" is the first level, we change the first
element of the levels(grp) vector:
> levels(grp)
[1] "control" "treatment"
> as.integer(grp)
[1] 1 2 1 2
> levels(grp)[1] <- "placebo"
> grp
[1] placebo treatment placebo treatment
Levels: placebo treatment
13 / 34
Matrices and Arrays
●
To arrange values into a matrix, we use the matrix()
function:
> m1 <- matrix(1:6, nrow = 2, ncol = 3)
> m1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
●
We can then access elements using two indices. For
example, the value in the first row, second column is
> m1[1, 2]
[1] 3

●
Alternatively,
> m1[6]
[1] 6
> m1[4] 14 / 34
[1] 4
Accessing whole rows or cols
●
Whole rows or columns of matrices may be
selected by leaving one index blank:
> m1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> m1[2,]
[1] 2 4 6
> m1[,2]
[1] 3 4

15 / 34
More general arrays
●
A more general way to store data is in an array. Arrays have
multiple indices, and are created using the array function:
> a1 <- array(sample(1:24), c(3, 4, 2))
> a1
,,1

[,1] [,2] [,3] [,4]

[1,] 21 7 13 14
[2,] 17 18 23 22
[3,] 5 8 4 11

,,2

[,1] [,2] [,3] [,4]

[1,] 12 15 16 1
[2,] 2 19 24 9
16 / 34
[3,] 3 6 20 10
Data storage in R
●
Like any computer programming language,
numerical values are internally stored and
processed using the binary format.
●
This leads to rounding off errors, as shown in the
following example.
> n <- 1:10
> 1.25 * (n * 0.8) - n
[1] 0.000000e+00 0.000000e+00 4.440892e-16 0.000000e+00 0.000000e+00 8.881784e-16
[7] 8.881784e-16 0.000000e+00 0.000000e+00 0.000000e+00

17 / 34
Dates and Times
●
When looking at dates over historical time periods,
changes to the calendar (such as the switch from the
Julian calendar to the modern Gregorian calendar
that occurred in various countries between 1582 and
1923) affect the interpretation of dates.
●
Times are also messy, because there is often an
unstated time zone (which may change for some
dates due to daylight savings time), and some years
have “leap seconds” added in order to keep standard
clocks consistent with the rotation of the earth.

18 / 34
Dates and Times
●
In R, The base package has the function strptime() to
convert from strings (e.g. "2020-12-25" , or
"12/25/20" ) to an internal numerical representation,
and format() to convert back for printing.
●
The ISOdate() and ISOdatetime() functions are used
when numerical values for the year, month, day, etc.
are known. Other functions are available in the
chron package.

19 / 34
Missing values and other
special values
●
The missing value symbol is NA . Missing values
often arise in real data, but they can also arise
because of the way calculations are performed.
> some.evens <- NULL # creates a vector with no elements
> some.evens[seq(2, 20, 2)] <- seq(2, 20, 2)
> some.evens
[1] NA 2 NA 4 NA 6 NA 8 NA 10 NA 12 NA 14 NA 16 NA 18 NA 20

●
What happened here is that we assigned values
to elements 2,4, . . . ,20 but never assigned
anything to elements 1,3, . . . ,19, so R uses NA to
signal that the value is unknown. 20 / 34
Missing values and other
special values
●
Consider the following:
> x <- c(0, 1, 2)
>x/x
[1] NaN 1 1

●
The NaN symbol denotes a value which is “not a
number” which arises as a result of attempting
to compute the indeterminate 0/0.
●
This symbol is sometimes used when a
calculation does not make sense. In other cases,
special values may be shown, or you may get an
error or warning message. 21 / 34
Missing values and other
special values
●
Consider the following:
> x <- c(0, 1, 2)
>1/x
[1] Inf 1.0 0.5

●
Here R has tried to evaluate 1/0 and reports the
infinite result as “Inf.“
●
When there may be missing values, the is.na()
function should be used to detect them. For
instance,
> is.na(some.evens)
[1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
FALSE TRUE FALSE TRUE FALSE TRUE 22 / 34

[16] FALSE TRUE FALSE TRUE FALSE

Packages, libraries, and
repositories
●
In R, a package is a module containing functions,
data, and documentation. R always contains the
base packages (e.g. base , stats , graphics );
these contain things that everyone will use.
●
There are also contributed packages (e.g. knitr
and chron); these are modules written by others
to use in R.
●
When you start your R session, you will have
some packages loaded and available for use,
while others are stored on your computer in a
23 / 34
library.
Check for loaded packages
●
To be sure a package is loaded, run code like
> library(knitr)
> search()
[1] ".GlobalEnv" "package:knitr" "tools:rstudio" "package:stats"
[5] "package:graphics" "package:grDevices" "package:utils" "package:datasets"
[9] "package:methods" "Autoloads" "package:base"

●
The function “search()” results in the names of
the loaded available packages.
●
The generated list provides the order by which
packages are searched for function calls.
24 / 34
Online repositories
●
Thousands of contributed packages are available,
though you may have only a few dozen installed on your
computer. If you try to use one that isn’t already there,
you will receive an error message:
> library(funStats)
Error in library(funStats) : there is no package called ‘funStats’
●
This means that the package doesn’t exist on your
computer, but it might be available in a repository online.
The biggest repository of R packages is known as CRAN.
To install a package from CRAN, you can run a command
like
install.packages("knitr")
25 / 34
Loading packages in RStudio

●
within RStudio, click on the Packages tab in the
Output Pane, choose Install, and enter the name
in the resulting dialog box.

26 / 34
Getting help

●
There are many functions in R which are
designed to do all sorts of things.
●
The online help facility can help you to see what
a particular function is supposed to do. There are
a number of ways of accessing the help facility.
– Type “help()” with the name of the function
between braces.
– Type “?” followed by the function name.
– Hit “F1” in RStudio. 27 / 34
Finding help when you do not
know the function name
●
One way to explore the help system is to use help.start() .
This brings up an Internet browser, such as Google
Chrome or Firefox.
●
The browser will show you a menu of several options,
including a listing of installed packages. (The base
package contains many of the routinely used functions;
other commonly used functions are in utils or stats)
●
You can get to this page within RStudio by using the Help
| R Help menu item.

28 / 34
Finding help when you do not
know the function name
●
Another function that is often used is help.search() ,
abbreviated as a double question mark. For example, to
see if there are any functions that do optimization
(finding minima or maxima), type
> ??optimization

> help.search("nonlinear programming")

●
Web search engines such as Google can also be useful
for finding help on R. Including ‘R’ as a keyword in such a
search will often bring up the relevant R help page.

29 / 34
Installing packages
●
The name of the R package that is needed is usually
listed at the top of the help page. You can usually
install them by typing
> install.packages("packagename")

●
This will work as long as the package is available in
the CRAN repository.
●
Google may also find discussions of similar
questions to yours on sites like
https://stackoverflow.com/, where discussions about
R are common.
30 / 34
Some built-in graphics functions
●
Basic plots such as the histogram, the scatterplot
and the pie chart are built-in in R. Try the codes
below
> hist(islands)
> x <- seq(1, 10)
> y <- x^2 - 10 * x
> plot(x,y)
> curve(expr = sin, from = 0, to = 6 * pi)
> curve(x^2 - 10 * x, from = 1, to = 10)
> pie.sales <- c(0.12, 0.3, 0.26, 0.16, 0.04, 0.12)
> names(pie.sales) <- c("Blueberry", "Cherry",
+ "Apple", "Boston Cream", "Other", "Vanilla Cream")
> pie(pie.sales) # default colours

31 / 34
Some elementary built-in
statistics functions
●
Basic statistics functions are built-in in R. The
following is a list of such functions.
median(x) # computes the median or 50th percentile of the data in x
var(x) # computes the variance of the data in x
summary(x) # computes several summary statistics on the data in x
length(x) # number of elements in x
min(x) # minimum value of x
max(x) # maximum value of x
pmin(x, y) # pairwise minima of corresponding elements of x and y
pmax(x, y) # pairwise maxima of x and y
range(x) # difference between maximum and minimum of data in x
IQR(x) # interquartile range: difference between 1st and 3rd
# quartiles of data in x

32 / 34
Lab Assignment #2

33 / 34
The paper airplane factory

34 / 34

Smda Unit III
No ratings yet
Smda Unit III
80 pages
Sem-Iv Class-1: The R Environment
No ratings yet
Sem-Iv Class-1: The R Environment
32 pages
BDA Section 3
No ratings yet
BDA Section 3
33 pages
R Training by Emma Mba
No ratings yet
R Training by Emma Mba
68 pages
WIN SEM (2022-23) CSE4027 ETH AP2022236000324 Reference Material I 25-Jan-2023 Module-1 Topic-3 - R Datatypes
No ratings yet
WIN SEM (2022-23) CSE4027 ETH AP2022236000324 Reference Material I 25-Jan-2023 Module-1 Topic-3 - R Datatypes
41 pages
Mod 2 Summary Table
No ratings yet
Mod 2 Summary Table
16 pages
R-Unit 2
No ratings yet
R-Unit 2
81 pages
Source Code 1
No ratings yet
Source Code 1
40 pages
Network Analysis and Visualization With R and Igraph
No ratings yet
Network Analysis and Visualization With R and Igraph
62 pages
R
No ratings yet
R
38 pages
R BasicCommands
No ratings yet
R BasicCommands
5 pages
IDS - Unit 3 - 5
No ratings yet
IDS - Unit 3 - 5
80 pages
R-Basic Concepts
No ratings yet
R-Basic Concepts
67 pages
R Programming
No ratings yet
R Programming
30 pages
Handout 02
No ratings yet
Handout 02
12 pages
Practical 1
No ratings yet
Practical 1
4 pages
R Introduction
No ratings yet
R Introduction
40 pages
3. Data Types in R (Vectors)
No ratings yet
3. Data Types in R (Vectors)
48 pages
R Course ISLR Basics 2023
No ratings yet
R Course ISLR Basics 2023
77 pages
RStudio
No ratings yet
RStudio
60 pages
1. About R Language
No ratings yet
1. About R Language
15 pages
Intr2R Week2 2020
No ratings yet
Intr2R Week2 2020
13 pages
WINSEM2021-22 MAT2001 ELA VL2021220501462 Reference Material I 04-01-2022 1. Introduction of R Language - I
No ratings yet
WINSEM2021-22 MAT2001 ELA VL2021220501462 Reference Material I 04-01-2022 1. Introduction of R Language - I
15 pages
Data Structure in
No ratings yet
Data Structure in
18 pages
1 - Introduction To Programming With R
No ratings yet
1 - Introduction To Programming With R
13 pages
BRM PRACTICAL FILE H--
No ratings yet
BRM PRACTICAL FILE H--
37 pages
R - Lecture 2
No ratings yet
R - Lecture 2
51 pages
RBigData NTL
No ratings yet
RBigData NTL
24 pages
Lab 1 22.7
No ratings yet
Lab 1 22.7
40 pages
Chapter_3_R objects or data types
No ratings yet
Chapter_3_R objects or data types
7 pages
Week3 2020
No ratings yet
Week3 2020
20 pages
Rbasics
No ratings yet
Rbasics
96 pages
STAT 04 Simplify Notes
No ratings yet
STAT 04 Simplify Notes
34 pages
Lecture 1
No ratings yet
Lecture 1
42 pages
MIS 4.hafta (Introduction To R)
No ratings yet
MIS 4.hafta (Introduction To R)
52 pages
Chapter 1 Introduction To R
No ratings yet
Chapter 1 Introduction To R
33 pages
KD Lab - 1 Introductions To R
No ratings yet
KD Lab - 1 Introductions To R
12 pages
R Lab
No ratings yet
R Lab
114 pages
R PPT
No ratings yet
R PPT
63 pages
Introduction To Spatial Data Handling in R
No ratings yet
Introduction To Spatial Data Handling in R
25 pages
R Session A
No ratings yet
R Session A
107 pages
Data_analysis_with_R _24
No ratings yet
Data_analysis_with_R _24
47 pages
About R Language: Installation
No ratings yet
About R Language: Installation
7 pages
Basics of R Programming - Part 2
No ratings yet
Basics of R Programming - Part 2
7 pages
R Is A Command Line Based Language All Commands Are Entered Directly Into The Console. R
No ratings yet
R Is A Command Line Based Language All Commands Are Entered Directly Into The Console. R
8 pages
Computer Programming Syllabus 2
No ratings yet
Computer Programming Syllabus 2
9 pages
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
No ratings yet
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
36 pages
Introduction to r Chap 2
No ratings yet
Introduction to r Chap 2
30 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Introduction To R
No ratings yet
Introduction To R
21 pages
Data in R
No ratings yet
Data in R
7 pages
R Cheatsheet Base R
No ratings yet
R Cheatsheet Base R
2 pages
R Programming Checklist of Basic Skills With Examples
No ratings yet
R Programming Checklist of Basic Skills With Examples
33 pages
data anlytics using r notes
No ratings yet
data anlytics using r notes
14 pages
N2 Data in R
No ratings yet
N2 Data in R
7 pages
Accenture Array Questions
No ratings yet
Accenture Array Questions
2 pages
R Programming Swirl
No ratings yet
R Programming Swirl
22 pages
Biostat S1 Handout
No ratings yet
Biostat S1 Handout
7 pages
Effective Modern C++ -- 中文版 (Scott Meyers) (Z-Library)
No ratings yet
Effective Modern C++ -- 中文版 (Scott Meyers) (Z-Library)
203 pages
R-Tutorial - Introduction
No ratings yet
R-Tutorial - Introduction
30 pages
#Instruction Sets V2.5RC-20240208
100% (1)
#Instruction Sets V2.5RC-20240208
83 pages
Programming With VB - Net Full Material
No ratings yet
Programming With VB - Net Full Material
115 pages
18_PCE23CS019_Ankit_Singh[2].....___...... (1)
No ratings yet
18_PCE23CS019_Ankit_Singh[2].....___...... (1)
43 pages
PWC Unit-3 (Arrays and Strings)
No ratings yet
PWC Unit-3 (Arrays and Strings)
45 pages
Programming in C - CS3251 - HandWritten Notes -
No ratings yet
Programming in C - CS3251 - HandWritten Notes -
21 pages
Lecture 05-06 - Linked List
No ratings yet
Lecture 05-06 - Linked List
71 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
Shreya 7911
No ratings yet
Shreya 7911
59 pages
Learning Java in 28 Hours
No ratings yet
Learning Java in 28 Hours
77 pages
Is5312 Week10-V2
No ratings yet
Is5312 Week10-V2
51 pages
Unit 3
No ratings yet
Unit 3
69 pages
Programming Fundamental 1st Semester (COSC-1101)
75% (4)
Programming Fundamental 1st Semester (COSC-1101)
9 pages
IT NOTES CODEx
No ratings yet
IT NOTES CODEx
24 pages
DSA Portfolio
No ratings yet
DSA Portfolio
11 pages
super cay
No ratings yet
super cay
9 pages
Statistical Lab Using R-Programming Lab Manual and Workbook: Department of Mathematics
No ratings yet
Statistical Lab Using R-Programming Lab Manual and Workbook: Department of Mathematics
58 pages
C2_W1_Assignment
No ratings yet
C2_W1_Assignment
24 pages
DSA in 30 Days
No ratings yet
DSA in 30 Days
3 pages
Data Structure PLC
No ratings yet
Data Structure PLC
15 pages
Python unit-4 answers
No ratings yet
Python unit-4 answers
7 pages
BCA Semester-I 2024-28 Syllabus as per NEP 2020
No ratings yet
BCA Semester-I 2024-28 Syllabus as per NEP 2020
6 pages
Sliding Window Approach Explained
No ratings yet
Sliding Window Approach Explained
6 pages
Assignment 4 PDF
No ratings yet
Assignment 4 PDF
12 pages
C Programming Arrays
No ratings yet
C Programming Arrays
6 pages
Class Template Array
No ratings yet
Class Template Array
14 pages
Anurag Tiwari Mca.10029.24 Assignment 3
No ratings yet
Anurag Tiwari Mca.10029.24 Assignment 3
8 pages
CSE 2112 Intro To Programming Course Outline
No ratings yet
CSE 2112 Intro To Programming Course Outline
2 pages
Sample 55252
No ratings yet
Sample 55252
16 pages
Data Structure MCQ (Multiple Choice Questions) - Sanfoundry
No ratings yet
Data Structure MCQ (Multiple Choice Questions) - Sanfoundry
15 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet