0% found this document useful (0 votes)

108 views8 pages

Tidyverse: Core Packages in Tidyverse

The document discusses the tidyverse package in R which contains packages for data science tasks like wrangling, visualization, and modeling. It provides examples of using dplyr to select, filter, group, and summarize data; using tidyr to gather, spread, separate, and unite data; using ggplot2 to create density and scatter plots; and using stargazer and lfe packages to output regression tables and fit linear models with group fixed effects.

Uploaded by

Abhishek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views8 pages

Tidyverse: Core Packages in Tidyverse

Uploaded by

Abhishek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Tidyverse

Tidyverse is a collection of essential R packages for data science. The packages under the
tidyverse umbrella help us in performing and interacting with the data. There are a whole host of
things you can do with your data, such as subsetting, transforming, visualizing, etc.

● Start by installing Tidyverse :

install.packages("tidyverse")

Core packages in Tidyverse

Data Wrangling & Data Import & Management Data Visualization

Transformations

Dplyr Tibble ggplot2

Tidyr Readr

StringR

Forcats

dplyr

List of functions dplyr offers:

● select(): Select columns from your dataset

● filter(): Filter out certain rows that meet your criteria(s)
● group_by(): Group different observations together such that the original dataset does not
change. Only the way it is represented is changed in the form of a list
● summarise(): Summarise any of the above functions
● arrange(): Arrange your column data in ascending or descending order
● join(): Perform left, right, full, and inner joins in R
● mutate(): Create new columns by preserving the existing variables
Through this subsection , the code examples will be working on the food demand dataset which
can be accessed here : https://www.kaggle.com/shivashi11/food-demand-prediction

Code Examples :

library(dplyr)
joined_data <- left_join(data,fc,by="center_id")

Let’s use three dplyr functions simultaneously to summarise the data.

data %>%
select(center_type,num_orders) %>%
filter(center_type=="TYPE_A") %>%
summarise(avg_A=mean(num_orders))

tidyr

List of functions tidyr offers:

● gather(): The function “gathers” multiple columns from your dataset and converts them
into key-value pairs
● spread(): This takes two columns and “spreads” them into multiple columns
● separate(): As the name suggests, this function helps in separating or splitting a single
column into numerous columns
● unite(): Works completely opposite to the separate() function. It helps in combining two
or more columns into one

Below is code for uniting two binary variables and create only one column for both:

data %>%
unite_(.,"email_home",c("emailer_for_promotion","homepage_featured")) %>%
head()

Output :

Another example of how tidyr works :

data<- data.frame(variable1 = rep(LETTERS[1:3], each = 3),

variable2 = rep(paste0("factor", c(1, 2, 3)), 3),
num = 1:9)
head(data)
spread(data,variable2,num)
ggplot2
Ggplot2 is often used to produce charts and visualizations because of it’s ease of use and
interactivity.

An example below to create a density chart :

install.packages("ggplot2")
library(ggplot2)

ggplot(data = data) +
aes(x = num_orders) +
geom_density(adjust = 1, fill = "#0c4c8a") +
theme_minimal()
Below is an example to create a scatterplot :

ggplot(data = data) +
aes(x = checkout_price, y = base_price) +
geom_point(color = "#1f9e89") +
theme_minimal()

Stargazer
Stargazer is an R package that creates LATEX code, HTML code and ASCII text for well-
formatted regression tables, with multiple models side-by-side, as well as for summary statistics
tables, data frames, vectors and matrices.

Stargazer excels in at least three respects: its ease of use, the large number of models it
supports, and its beautiful aesthetics.

● One can install stargazer from CRAN in the usual way:

install.packages("stargazer")
library(stargazer)

● To create a summary statistics table from the ‘attitude’ data frame (which should be
available with your default installation of R), simply run the following:
stargazer(attitude)
Output :

● To output the contents of the first four rows of some data frame, specify the part of the
data frame you would like to see, and set the summary option to FALSE:

stargazer(attitude[1:4,], summary=FALSE, rownames=FALSE)

Output :

Now, let us try to create a simple regression table with three side-by-side models – two Ordinary
Least Squares (OLS) and one probit regression model – using the lm() and glm() functions.
We can set the align argument to TRUE, so that coefficients in each column are aligned along
the decimal point.
stargazer(linear.1, linear.2, probit.model, title="Results", align=TRUE)
Output :

lfe
lfe or Linear group effects package is intended for linear models with multiple group fixed
effects, i.e. with 2 or more factors with a large number of levels. It performs similar functions as
lm, but it uses a special method for projecting out multiple group fixed effects from the normal
equations, hence it is faster. It is a generalization of the within estimator. This may be required if
the groups have high cardinality (many levels), resulting in tens or hundreds of thousands of
dummy variables. It is also useful if one only wants to control for the group effects, without
actually estimating them

Code example :
oldopts <- options(lfe.threads=1)
x <- rnorm(1000)
x2 <- rnorm(length(x))
id <- factor(sample(10,length(x),replace=TRUE))
firm <- factor(sample(3,length(x),replace=TRUE,prob=c(2,1.5,1)))
year <- factor(sample(10,length(x),replace=TRUE,prob=c(2,1.5,rep(1,8))))
id.eff <- rnorm(nlevels(id))
firm.eff <- rnorm(nlevels(firm))
year.eff <- rnorm(nlevels(year))
y <- x + 0.25*x2 + id.eff[id] + firm.eff[firm] +
year.eff[year] + rnorm(length(x))
est <- felm(y ~ x+x2 | id + firm + year)
summary(est)

getfe(est,se=TRUE)
# compare with an ordinary lm
summary(lm(y ~ x+x2+id+firm+year-1))
options(oldopts)

Output :

R Graphics Essentials Great Data Visualization
No ratings yet
R Graphics Essentials Great Data Visualization
248 pages
R语言学习笔记
No ratings yet
R语言学习笔记
78 pages
R Programming
No ratings yet
R Programming
30 pages
Data Manipulation With Dplyr
100% (1)
Data Manipulation With Dplyr
39 pages
What Are The Tidyverse Packages in R Language?
No ratings yet
What Are The Tidyverse Packages in R Language?
12 pages
Module 4-1
No ratings yet
Module 4-1
84 pages
Week 6
No ratings yet
Week 6
36 pages
Data Analytics Lesson 10 Notes
No ratings yet
Data Analytics Lesson 10 Notes
7 pages
Unit3 R
No ratings yet
Unit3 R
30 pages
R Most Important Question
No ratings yet
R Most Important Question
12 pages
DV Lab
No ratings yet
DV Lab
52 pages
Plyr Package in R Programming
No ratings yet
Plyr Package in R Programming
9 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
40 pages
Learning R Programming For Data Science Enthusiasts
No ratings yet
Learning R Programming For Data Science Enthusiasts
8 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
Module 2 ExploratoryDataAnalysis
No ratings yet
Module 2 ExploratoryDataAnalysis
22 pages
Ismaykim1 PDF
No ratings yet
Ismaykim1 PDF
522 pages
Econometrics I - R Summary (Maite Cabeza-Gutes)
No ratings yet
Econometrics I - R Summary (Maite Cabeza-Gutes)
77 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
R Graphics Essentials For Great Data Visualization 9781979748100 C
No ratings yet
R Graphics Essentials For Great Data Visualization 9781979748100 C
257 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
Group Manipulation and Data Reshaping in R
No ratings yet
Group Manipulation and Data Reshaping in R
10 pages
R Programming Unit-3 Complete Notes
No ratings yet
R Programming Unit-3 Complete Notes
10 pages
Boulder Handout 2019
No ratings yet
Boulder Handout 2019
187 pages
MIT 201 - Tutorial 02
No ratings yet
MIT 201 - Tutorial 02
12 pages
R Stats Cheatsheet
No ratings yet
R Stats Cheatsheet
1 page
R Reference Card
No ratings yet
R Reference Card
6 pages
R Basics
No ratings yet
R Basics
18 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
R File Code
No ratings yet
R File Code
16 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
Advanced R Programming Tidyverse Packages Notes
No ratings yet
Advanced R Programming Tidyverse Packages Notes
12 pages
R Programming Slides
No ratings yet
R Programming Slides
73 pages
Unit3 R
No ratings yet
Unit3 R
19 pages
R Intro STAT5000
No ratings yet
R Intro STAT5000
17 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
R Course Own English HS
No ratings yet
R Course Own English HS
70 pages
R Tutorial
No ratings yet
R Tutorial
15 pages
R Lab 3
No ratings yet
R Lab 3
7 pages
Practical Assignment-10 Mini Project Nutrition Calculator - Calculate Nutrition For Recipes
No ratings yet
Practical Assignment-10 Mini Project Nutrition Calculator - Calculate Nutrition For Recipes
16 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
R Examples
No ratings yet
R Examples
56 pages
DA Lab Week-1
No ratings yet
DA Lab Week-1
7 pages
R Manual
No ratings yet
R Manual
10 pages
R Functions List
No ratings yet
R Functions List
8 pages
R Cheatsheet Base R
No ratings yet
R Cheatsheet Base R
2 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
Rede Unimed Leste Fluminense
No ratings yet
Rede Unimed Leste Fluminense
243 pages
Grade 9 Math Summative
83% (6)
Grade 9 Math Summative
3 pages
UL2
No ratings yet
UL2
2 pages
ch-2,3 of Indian Classical
No ratings yet
ch-2,3 of Indian Classical
17 pages
Andre Leite Dissertacao
No ratings yet
Andre Leite Dissertacao
81 pages
CRM Cheat Sheet
No ratings yet
CRM Cheat Sheet
7 pages
Traveller Pre-Interm - Key To Tests Sources
0% (3)
Traveller Pre-Interm - Key To Tests Sources
77 pages
RBasics Handout
No ratings yet
RBasics Handout
6 pages
R Commands: Appendix B
No ratings yet
R Commands: Appendix B
5 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
25 Thanksgiving Prayers
100% (4)
25 Thanksgiving Prayers
7 pages
NASA-TM-83561 HYTESS - A Hypothetical Turbofan Engine Simplified Simulation
No ratings yet
NASA-TM-83561 HYTESS - A Hypothetical Turbofan Engine Simplified Simulation
28 pages
R Programming For NGS Data Analysis
No ratings yet
R Programming For NGS Data Analysis
5 pages
Predict The Output's Question
No ratings yet
Predict The Output's Question
2 pages
History Project Information
No ratings yet
History Project Information
8 pages
Lecture 4 - CS50's Introduction To Databases With SQL
No ratings yet
Lecture 4 - CS50's Introduction To Databases With SQL
9 pages
Ibn Arabi Journey To The Lord of Power
100% (1)
Ibn Arabi Journey To The Lord of Power
36 pages
Surya Namaskaram Tamil PDF
No ratings yet
Surya Namaskaram Tamil PDF
20 pages
3.2.8-Packet-Tracer - Investigate-A-Vlan-Implementation
No ratings yet
3.2.8-Packet-Tracer - Investigate-A-Vlan-Implementation
3 pages
Mmcoe Dbms Project Last
No ratings yet
Mmcoe Dbms Project Last
27 pages
Large Language Model (LLM) - Driven Chatbots For Neuro-Ophthalmic Medical Education
No ratings yet
Large Language Model (LLM) - Driven Chatbots For Neuro-Ophthalmic Medical Education
3 pages
Xiaopan OS: Description
No ratings yet
Xiaopan OS: Description
1 page
Reading Stressful Jobs 1 4
No ratings yet
Reading Stressful Jobs 1 4
4 pages
Snap Vocab - Compiled From Pyqs
No ratings yet
Snap Vocab - Compiled From Pyqs
17 pages
Chapter 7 Complete
No ratings yet
Chapter 7 Complete
1 page
Tugas Kelas 11
No ratings yet
Tugas Kelas 11
2 pages
Jarvis: Physical Examination & Health Assessment, 6th Edition
No ratings yet
Jarvis: Physical Examination & Health Assessment, 6th Edition
16 pages
The Appointed Time Is Now
100% (2)
The Appointed Time Is Now
7 pages
Vir Sadhana
No ratings yet
Vir Sadhana
60 pages
The Perfect Sacrifice: Hebrews 9:11-15
No ratings yet
The Perfect Sacrifice: Hebrews 9:11-15
10 pages
Shaping The Way We Teach English:: Successful Practices Around The World
No ratings yet
Shaping The Way We Teach English:: Successful Practices Around The World
5 pages
Dinas Pendidikan Dan Kebudayaan Pemerintah Kabupaten Buton Utara
No ratings yet
Dinas Pendidikan Dan Kebudayaan Pemerintah Kabupaten Buton Utara
2 pages
Grade III Reviewer
No ratings yet
Grade III Reviewer
13 pages
Perhaps Love (Simile)
No ratings yet
Perhaps Love (Simile)
4 pages
Unit 7
No ratings yet
Unit 7
4 pages
Theatrical Productions Integrated With Music and Songs Have
No ratings yet
Theatrical Productions Integrated With Music and Songs Have
2 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet

Tidyverse: Core Packages in Tidyverse

Uploaded by

Tidyverse: Core Packages in Tidyverse

Uploaded by

Tidyverse

● Start by installing Tidyverse :

Core packages in Tidyverse

Data Wrangling & Data Import & Management Data Visualization

Dplyr Tibble ggplot2

List of functions dplyr offers:

● select(): Select columns from your dataset

Let’s use three dplyr functions simultaneously to summarise the data.

List of functions tidyr offers:

Another example of how tidyr works :

data<- data.frame(variable1 = rep(LETTERS[1:3], each = 3),

An example below to create a density chart :

● One can install stargazer from CRAN in the usual way:

stargazer(attitude[1:4,], summary=FALSE, rownames=FALSE)

You might also like