0% found this document useful (0 votes)

12 views15 pages

R Examples

R examples

Uploaded by

Abdullah Bingazi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views15 pages

R Examples

R examples

Uploaded by

Abdullah Bingazi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

An Introduction to R, RStudio, R Markdown

Savaş Dayanık

24 09 2019

Introduction to R, Rstudio
You can type straight text and math, for example,

tips = Xβ +

Some tips:
• This is a bullet. You can emphasize any important infor by enclosing it with one-star parenthesis.
• If it is absolutely important, then double the stars.
• TO insert a new R chunk, use Insert menu or press Ctrl-Alt-i. For all keybindings, Tools menu is your
friend.
– To run a single R clause in chunk, press Ctrl + ENTER.
– To run all clauses in the same chunl, pres Ctrl-Shift-ENTER
– to comment out any part of code or text, highlight and CTRL-SHFT-C

Analyze tips dataset

A waiter collected the values of several variables that he thinks are important to determine tip amount, and
wants us to analyze the relation between tips he received and the factors that he just picked up.
d <- suppressWarnings(read_csv("tips.csv",
col_types = cols(
X1 = col_double(),
OBS = col_double(),
TOTBILL = col_double(),
TIP = col_double(),
SEX = col_character(),
SMOKER = col_character(),
DAY = col_character(),
TIME = col_character(),
SIZE = col_double()
))[,-1]) %>%
select(-OBS) %>%
mutate(
SEX = factor(SEX),
DAY = factor(DAY, levels = c("thurs","fri","sat","sun"),
labels = c("THU", "FRI", "SAT", "SUN") ),
TIME = factor(TIME),
SMOKER = factor(SMOKER))

## New names:
## * `` -> `...1`

1
d %>% distinct(DAY)

## # A tibble: 4 x 1
## DAY
## <fct>
## 1 SUN
## 2 SAT
## 3 THU
## 4 FRI
d %>% count(DAY)

## # A tibble: 4 x 2
## DAY n
## <fct> <int>
## 1 THU 62
## 2 FRI 19
## 3 SAT 87
## 4 SUN 76
d %>%
head() %>%
pander(caption = "(\\#tab:data) A glimpse over the data")

Table 1: (#tab:data) A glimpse over the data

TOTBILL TIP SEX SMOKER DAY TIME SIZE

16.99 1.01 F no SUN dinner 2
10.34 1.66 M no SUN dinner 3
21.01 3.5 M no SUN dinner 3
23.68 3.31 M no SUN dinner 2
24.59 3.61 F no SUN dinner 4
25.29 4.71 M no SUN dinner 4

# pander(another(yetanother(d)))
#
# d %>%
# yetanother() %>%
# another() %>%
# pander()

Table @ref(tab:data) shows the first six rows of tip data set, whch has actually 244 rows. Let us describe the
variables in the table briefly:
TOTBILL Total bill paid by the party
TIP Tip left by the party
SEX Gender of who paid the bill (F, M)
SMOKER whether bill payer smokes or not (yes, no)
DAY Day of the week when the pary have had the meal (thurs, fri, sat, sun)
TIME Time of day when the party had had meal (lunch, dinner)
SIZE number of people in the party
Below is a summary of each variable:

2
d %>%
summary() %>%
pander()

Table 2: Table continues below

TOTBILL TIP SEX SMOKER DAY TIME

Min. : 3.07 Min. : 1.000 F: 87 no :151 THU:62 dinner:176
1st Qu.:13.35 1st Qu.: 2.000 M:157 yes: 93 FRI:19 lunch : 68
Median :17.80 Median : 2.900 NA NA SAT:87 NA
Mean :19.79 Mean : 2.998 NA NA SUN:76 NA
3rd Qu.:24.13 3rd Qu.: 3.562 NA NA NA NA
Max. :50.81 Max. :10.000 NA NA NA NA

SIZE
Min. :1.00
1st Qu.:2.00
Median :2.00
Mean :2.57
3rd Qu.:3.00
Max. :6.00

boxplot(d$TOTBILL, main = "TOTALBILL")

boxplot(d$TIP, main = "TIP")
boxplot(d$SIZE, main = "SIZE")

TOTALBILL TIP SIZE

10
50

6
5
40

4
30

3
20

2
10

Figure 1: Boxplots for TOTALBILL on the left and TIP in the middle, and SIZE on the right.

Boxplots in Figure @ref(fig:bxplts) show that all numerical variables have right-skewed distributions.
d %>%
##select_if(is.numeric) %>%
gather(variable, value, TOTBILL, TIP, SIZE) %>%
ggplot(aes(variable, value)) +
# geom_boxplot(aes(fill = DAY)) +
geom_boxplot(aes(fill = TIME)) +
coord_flip()

3
TOTBILL

TIME
variable

TIP dinner
lunch

SIZE

0 10 20 30 40 50
value
Scatterplot
Modern version
g <- ggplot(d, aes(TOTBILL, TIP)) +
geom_point() +
geom_abline(intercept = 0, slope = .18, col = "red") +
geom_text(x=45, y=45*.18, label="18% tip\nline",
col="red", hjust = 0, vjust=1 )
print(g)

4
10.0

18% tip
7.5 line
TIP

5.0

2.5

10 20 30 40 50
TOTBILL
plot(g)

5
10.0

18% tip
7.5 line
TIP

5.0

2.5

10 20 30 40 50
TOTBILL
d

## # A tibble: 244 x 7
## TOTBILL TIP SEX SMOKER DAY TIME SIZE
## <dbl> <dbl> <fct> <fct> <fct> <fct> <dbl>
## 1 17.0 1.01 F no SUN dinner 2
## 2 10.3 1.66 M no SUN dinner 3
## 3 21.0 3.5 M no SUN dinner 3
## 4 23.7 3.31 M no SUN dinner 2
## 5 24.6 3.61 F no SUN dinner 4
## 6 25.3 4.71 M no SUN dinner 4
## 7 8.77 2 M no SUN dinner 2
## 8 26.9 3.12 M no SUN dinner 4
## 9 15.0 1.96 M no SUN dinner 2
## 10 14.8 3.23 M no SUN dinner 2
## # i 234 more rows
g + facet_grid(DAY+TIME~SMOKER+SEX, labeller = label_both) +
theme(strip.text.y = element_text(angle = 0))

6
SMOKER: no SMOKER: no SMOKER: yes SMOKER: yes
SEX: F SEX: M SEX: F SEX: M
10.0
7.5 18% tip
5.0 TIME: dinner DAY: THU
2.5 line
10.0
7.5 18% tip 18% tip 18% tip 18%TIME:
tip lunch
5.0 DAY: THU
2.5 line line line line
10.0
7.5 18% tip 18% tip 18% tip 18% tip
5.0 TIME: dinner DAY: FRI
2.5 line line line line
TIP

10.0
7.5 18% tip 18% tip 18%TIME:
tip lunch
5.0 DAY: FRI
2.5 line line line
10.0
7.5 18% tip 18% tip 18% tip 18% tip
5.0 TIME: dinner DAY: SAT
2.5 line line line line
10.0
7.5 18% tip 18% tip 18% tip 18% tip
5.0 TIME: dinner DAY: SUN
2.5 line line line line
10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50
TOTBILL
Let us also calculate the correlation between TOTBILL and TIP
cor(d$TOTBILL, d$TIP)

## [1] 0.6757341
d %>%
group_by(SMOKER, SEX, DAY, TIME) %>%
summarize(cor = cor(TOTBILL, TIP),
count = n())

## `summarise()` has grouped output by 'SMOKER', 'SEX', 'DAY'. You can override
## using the `.groups` argument.
## # A tibble: 20 x 6
## # Groups: SMOKER, SEX, DAY [16]
## SMOKER SEX DAY TIME cor count
## <fct> <fct> <fct> <fct> <dbl> <int>
## 1 no F THU dinner NA 1
## 2 no F THU lunch 0.881 24
## 3 no F FRI dinner NA 1
## 4 no F FRI lunch NA 1
## 5 no F SAT dinner 0.623 13
## 6 no F SUN dinner 0.849 14
## 7 no M THU lunch 0.798 20
## 8 no M FRI dinner 1 2
## 9 no M SAT dinner 0.920 32
## 10 no M SUN dinner 0.706 43
## 11 yes F THU lunch 0.869 7

7
## 12 yes F FRI dinner 0.949 4
## 13 yes F FRI lunch 0.374 3
## 14 yes F SAT dinner 0.448 15
## 15 yes F SUN dinner -0.665 4
## 16 yes M THU lunch 0.629 10
## 17 yes M FRI dinner 0.926 5
## 18 yes M FRI lunch -0.305 3
## 19 yes M SAT dinner 0.621 27
## 20 yes M SUN dinner -0.0835 15
d %>%
mutate(crazy = factor(sample(floor(seq(n())/2)))) %>%
group_by(crazy) %>%
ggplot(aes(TOTBILL, TIP)) +
geom_line(aes(col=crazy)) +
geom_point(aes(col=crazy)) +
theme(legend.position = "none")

10.0

7.5
TIP

5.0

2.5

10 20 30 40 50
TOTBILL
d %>%
mutate(crazy = factor(sample(floor(seq(n())/2)))) %>%
group_by(crazy) %>%
summarize(cor = cor(TOTBILL, TIP)) %>%
xtabs(~cor, .)

## Warning: There were 4 warnings in `summarize()`.

## The first warning was:
## i In argument: `cor = cor(TOTBILL, TIP)`.
## i In group 17: `crazy = 16`.

8
## Caused by warning in `cor()`:
## ! the standard deviation is zero
## i Run `dplyr::last_dplyr_warnings()` to see the 3 remaining warnings.
## cor
## -1 1
## 28 89

Tables are useful to analyze categorical (qualitative, factor) data

SMOKER, SEX, DAY, TIME
d_tbl <- d %>%
mutate_if(is.character, factor) %>%
select_if(is.factor) %>%
xtabs(~ . , .)

d_tbl %>% ftable

## TIME dinner lunch

## SEX SMOKER DAY
## F no THU 1 24
## FRI 1 1
## SAT 13 0
## SUN 14 0
## yes THU 0 7
## FRI 4 3
## SAT 15 0
## SUN 4 0
## M no THU 0 20
## FRI 2 0
## SAT 32 0
## SUN 43 0
## yes THU 0 10
## FRI 5 3
## SAT 27 0
## SUN 15 0
library(vcd)

## Loading required package: grid

structable( DAY + TIME ~ SEX + SMOKER, d_tbl)

## DAY THU FRI SAT SUN

## TIME dinner lunch dinner lunch dinner lunch dinner lunch
## SEX SMOKER
## F no 1 24 1 1 13 0 14 0
## yes 0 7 4 3 15 0 4 0
## M no 0 20 2 0 32 0 43 0
## yes 0 10 5 3 27 0 15 0
(margin.table(d_tbl, c(1,2)) / sum(d_tbl)) %>% round(2) %>% multiply_by(100)

## SMOKER
## SEX no yes
## F 22 14

9
## M 40 25
# dimnames(d_tbl)

d_tbl %>%
margin.table(c(1,2))%>%
mosaic(type="expected")

SMOKER
no yes
F
SEX
M

d_tbl %>%
margin.table(c(1,2))%>%
mosaic(gp = gpar(fill = rep(c("pink", "lightblue"), each=2)))

10
SMOKER
F no yes
SEX
M

Tiles are aliged within each block. Therefore,

we tend to think SMOKER and SEX are independent.
dimnames(d_tbl)

## $SEX
## [1] "F" "M"
##
## $SMOKER
## [1] "no" "yes"
##
## $DAY
## [1] "THU" "FRI" "SAT" "SUN"
##
## $TIME
## [1] "dinner" "lunch"
d_tbl %>%
margin.table(1)%>%
mosaic()

11
F
SEX
M

d_tbl %>%
margin.table(c(1,3))%>%
mosaic(gp =gpar(fill = rep(c("pink", "lightblue"), each=4)))

DAY
THU FRI SAT SUN
F
SEX
M

library(RColorBrewer)

d_tbl %>%
margin.table(c(1,3))%>%
mosaic(gp =gpar(fill = brewer.pal(4, "PuOr"))) # picked diverging palette

12
DAY
F THU FRI SAT SUN
SEX
M

d_tbl %>%
margin.table(c(1,3))%>%
mosaic(type="expected")

DAY
THU FRI SAT SUN
F
SEX
M

Because DAY tiles within each SEX blocks are significatly disaligned, we cannot expected independence of
SEX and DAY. So they seem to be related. Can I measure the strength of the relation? Later
• Tile areas are proportional to the cell counts of the corresponding table.

13
• Titles within blocks are aligned across blocks: strongly suggests that SEX and SMOKER are independent
(random variables).
How can we check the relation between every pait of categorical variables?
library(vcd)
d %>%
mutate_if(is.character, factor) %>%
select_if(is.factor) %>%
xtabs(~. , .) %>%
pairs(diag_panel = pairs_barplot(var_offset = 1.3,
rot = -30,
just_leveltext = "left",
gp_leveltext = gpar(fontsize = 8)),
shade = TRUE)

200
SEX
150
100
50
0 F M

SMOKER
200
150
100
50
0 no ye
s
DAY
80
40
0 TH SA
U T
TIME
200
150
100
50
0 din lun
ne ch
r

Independence tests

Digression: What does pipe %>% do?

5*((mean(extract2(d, "TOTBILL"), na.rm=TRUE))ˆ2 )

## [1] 1957.418
versus
d %>%
#select(TOTBILL) %>%
extract2("TOTBILL") %>%

14
mean(na.rm=TRUE) %>%
`ˆ`(2) %>%
`*`(5)

## [1] 1957.418
Which one is readable and easy to modify?

Hair color, Eye color, gender

Go to Google form and fill the form for
• yourself,
• your mother,
• your father,
• your siblings

R: Devore Solutions
No ratings yet
R: Devore Solutions
29 pages
R: Eye Color Example
No ratings yet
R: Eye Color Example
22 pages
Individual Part 3
No ratings yet
Individual Part 3
4 pages
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
No ratings yet
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
15 pages
Notes Chp5 R Programming
No ratings yet
Notes Chp5 R Programming
4 pages
Time Series Regression Models Guide
No ratings yet
Time Series Regression Models Guide
74 pages
R Analysis Summary
No ratings yet
R Analysis Summary
6 pages
R Cheat Sheet
No ratings yet
R Cheat Sheet
9 pages
Modeling and Visulizing Data Using R: A Practical Introduction
No ratings yet
Modeling and Visulizing Data Using R: A Practical Introduction
106 pages
Essential R
No ratings yet
Essential R
261 pages
Data Tabulation and Frequencies
No ratings yet
Data Tabulation and Frequencies
34 pages
7.19 Problem Set
No ratings yet
7.19 Problem Set
2 pages
BES - R Lab
No ratings yet
BES - R Lab
5 pages
Agenda: 1) Assign Homework #1 (Due Wednesday 6/30) 2) Lecture Over More of Chapter 2
No ratings yet
Agenda: 1) Assign Homework #1 (Due Wednesday 6/30) 2) Lecture Over More of Chapter 2
43 pages
EXAM1 - Muhibbul Arman Mannan: List Ls
No ratings yet
EXAM1 - Muhibbul Arman Mannan: List Ls
13 pages
R Statistical Package
No ratings yet
R Statistical Package
63 pages
Recipes For Data Processing
No ratings yet
Recipes For Data Processing
51 pages
Arsenal
No ratings yet
Arsenal
60 pages
Week 02
No ratings yet
Week 02
39 pages
Pracal Labexamsamplequestions
No ratings yet
Pracal Labexamsamplequestions
35 pages
R Programming
No ratings yet
R Programming
8 pages
R Workshop Material 18-19, Oct-2023
No ratings yet
R Workshop Material 18-19, Oct-2023
67 pages
STA302F2025 Worksheet2
No ratings yet
STA302F2025 Worksheet2
5 pages
Unit3-Data Science
No ratings yet
Unit3-Data Science
37 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
Unit 3
No ratings yet
Unit 3
36 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
Time Series For Data Science Analysis and Forecasting (Wayne A. Woodward, Bivin Philip Sadler Etc.) (Z-Library)
100% (1)
Time Series For Data Science Analysis and Forecasting (Wayne A. Woodward, Bivin Philip Sadler Etc.) (Z-Library)
529 pages
STATA - Subject Table of Contents
No ratings yet
STATA - Subject Table of Contents
15 pages
EFA in R
No ratings yet
EFA in R
32 pages
DR - Pierpaolo-Delser - Introduction R
No ratings yet
DR - Pierpaolo-Delser - Introduction R
83 pages
3 Graphical Descriptive Techniques 2
No ratings yet
3 Graphical Descriptive Techniques 2
41 pages
Stoc
No ratings yet
Stoc
44 pages
4 III BTech Minor DS Courses Syllabus
No ratings yet
4 III BTech Minor DS Courses Syllabus
5 pages
Visual Statistics Use R PDF
No ratings yet
Visual Statistics Use R PDF
388 pages
Visual Statistics Use R!
50% (2)
Visual Statistics Use R!
388 pages
Advanced Statistical Methods Using R Notes
No ratings yet
Advanced Statistical Methods Using R Notes
55 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
Visual Guide To Machine Learning
No ratings yet
Visual Guide To Machine Learning
349 pages
Verzani Answers
100% (8)
Verzani Answers
94 pages
Modelling With R
No ratings yet
Modelling With R
3 pages
Applied Statistics
No ratings yet
Applied Statistics
457 pages
Statistical Modelling
No ratings yet
Statistical Modelling
39 pages
Unit 1 Big Data Analytics - An Introduction (Final)
No ratings yet
Unit 1 Big Data Analytics - An Introduction (Final)
65 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Stastistics and Probability With R Programming Language: Lab Report
50% (2)
Stastistics and Probability With R Programming Language: Lab Report
44 pages
Comp333 wk2 Example1
No ratings yet
Comp333 wk2 Example1
2 pages
Report of BDA Mini Project
No ratings yet
Report of BDA Mini Project
11 pages
Rintro
No ratings yet
Rintro
61 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
43 pages
Module2 BDA
No ratings yet
Module2 BDA
44 pages
R Software - Notes
No ratings yet
R Software - Notes
18 pages
R Graphics for Data Visualization
No ratings yet
R Graphics for Data Visualization
73 pages
R for Applied Statistics
No ratings yet
R for Applied Statistics
457 pages
RBasics Handout
No ratings yet
RBasics Handout
6 pages
R Viva Ques
No ratings yet
R Viva Ques
24 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Doc. AP Stat Supplemental Ch. 1 - Exploring Data
No ratings yet
Doc. AP Stat Supplemental Ch. 1 - Exploring Data
3 pages
4567 1a A4
No ratings yet
4567 1a A4
2 pages
Day 1 Python Notebook
No ratings yet
Day 1 Python Notebook
19 pages
Green Belt Dmaic Training System With Software Tools and A 25 Lesson Course 12041280
No ratings yet
Green Belt Dmaic Training System With Software Tools and A 25 Lesson Course 12041280
156 pages
(Ebook PDF) Essentials of Modern Business Statistics With Microsoft Office Excel 7th Editioninstant Download
100% (6)
(Ebook PDF) Essentials of Modern Business Statistics With Microsoft Office Excel 7th Editioninstant Download
51 pages
Unit-2 Biostatistics Descriptive
No ratings yet
Unit-2 Biostatistics Descriptive
31 pages
Assignment 4 On Visualization On Graph With Solution
No ratings yet
Assignment 4 On Visualization On Graph With Solution
14 pages
Cardiovascular Disease Prediction Combination Using Machine and Deep Learning Model
No ratings yet
Cardiovascular Disease Prediction Combination Using Machine and Deep Learning Model
16 pages
Student Resources PDF
100% (3)
Student Resources PDF
943 pages
Graphical Representation of Statistical Data
No ratings yet
Graphical Representation of Statistical Data
57 pages
Antonio Marco - A Pen and Paper Introduction To Statistics (2024, CRC Press - Taylor & Francis Group)
No ratings yet
Antonio Marco - A Pen and Paper Introduction To Statistics (2024, CRC Press - Taylor & Francis Group)
161 pages
Machine Learning
100% (2)
Machine Learning
136 pages
Homework 2 1445 - 1446
No ratings yet
Homework 2 1445 - 1446
2 pages
Lecture 2. Exploratory Data Analysis
No ratings yet
Lecture 2. Exploratory Data Analysis
28 pages
Grade 12 Paper 2 Prelim 2023 - MEMO
No ratings yet
Grade 12 Paper 2 Prelim 2023 - MEMO
20 pages
CENTRAL TENDENCY Topical Past Papers
No ratings yet
CENTRAL TENDENCY Topical Past Papers
35 pages
Maths Lit Grade 12 TIPS P1 October 2024
No ratings yet
Maths Lit Grade 12 TIPS P1 October 2024
18 pages
Database Insights for Car Sales
No ratings yet
Database Insights for Car Sales
16 pages
Unit 1
No ratings yet
Unit 1
50 pages
Box Plot - A Visual Display That Gives A Summary of The Data Set
No ratings yet
Box Plot - A Visual Display That Gives A Summary of The Data Set
6 pages
Statistics Exam Practice
100% (1)
Statistics Exam Practice
31 pages
MD51 Lecture 1
No ratings yet
MD51 Lecture 1
61 pages
Chapter 2 Investigating and Comparing Data Distributions Student Notes 2024 DDA
No ratings yet
Chapter 2 Investigating and Comparing Data Distributions Student Notes 2024 DDA
28 pages
Integrated Approach To Quantify The Impact of Land Use and Land Cover Changes On Water Quality of Surma River, Sylhet, Bangladesh HIGHLIGHTED
No ratings yet
Integrated Approach To Quantify The Impact of Land Use and Land Cover Changes On Water Quality of Surma River, Sylhet, Bangladesh HIGHLIGHTED
17 pages
Exploring Data-MC Practice: Use The Data For Questions 1 - 5
No ratings yet
Exploring Data-MC Practice: Use The Data For Questions 1 - 5
2 pages
Types of Graphs Used in Math and Statistics
0% (1)
Types of Graphs Used in Math and Statistics
14 pages
EDA on Student Performance Dataset
No ratings yet
EDA on Student Performance Dataset
23 pages
Thankful For Statistics AP Statistics Math Medic C52f5914e8
No ratings yet
Thankful For Statistics AP Statistics Math Medic C52f5914e8
5 pages
Lesson Note 11
No ratings yet
Lesson Note 11
9 pages

R Examples

Uploaded by

R Examples

Uploaded by

An Introduction to R, RStudio, R Markdown

Analyze tips dataset

Table 1: (#tab:data) A glimpse over the data

TOTBILL TIP SEX SMOKER DAY TIME SIZE

Table 2: Table continues below

TOTBILL TIP SEX SMOKER DAY TIME

boxplot(d$TOTBILL, main = "TOTALBILL")

TOTALBILL TIP SIZE

## Warning: There were 4 warnings in `summarize()`.

Tables are useful to analyze categorical (qualitative, factor) data

d_tbl %>% ftable

## TIME dinner lunch

## Loading required package: grid

## DAY THU FRI SAT SUN

Tiles are aliged within each block. Therefore,

Digression: What does pipe %>% do?

Hair color, Eye color, gender

You might also like