Tidyverse: Core Packages in Tidyverse
Tidyverse: Core Packages in Tidyverse
Tidyverse is a collection of essential R packages for data science. The packages under the
tidyverse umbrella help us in performing and interacting with the data. There are a whole host of
things you can do with your data, such as subsetting, transforming, visualizing, etc.
install.packages("tidyverse")
Tidyr Readr
StringR
Forcats
dplyr
Code Examples :
library(dplyr)
joined_data <- left_join(data,fc,by="center_id")
data %>%
select(center_type,num_orders) %>%
filter(center_type=="TYPE_A") %>%
summarise(avg_A=mean(num_orders))
tidyr
● gather(): The function “gathers” multiple columns from your dataset and converts them
into key-value pairs
● spread(): This takes two columns and “spreads” them into multiple columns
● separate(): As the name suggests, this function helps in separating or splitting a single
column into numerous columns
● unite(): Works completely opposite to the separate() function. It helps in combining two
or more columns into one
Below is code for uniting two binary variables and create only one column for both:
data %>%
unite_(.,"email_home",c("emailer_for_promotion","homepage_featured")) %>%
head()
Output :
install.packages("ggplot2")
library(ggplot2)
ggplot(data = data) +
aes(x = num_orders) +
geom_density(adjust = 1, fill = "#0c4c8a") +
theme_minimal()
Below is an example to create a scatterplot :
ggplot(data = data) +
aes(x = checkout_price, y = base_price) +
geom_point(color = "#1f9e89") +
theme_minimal()
Stargazer
Stargazer is an R package that creates LATEX code, HTML code and ASCII text for well-
formatted regression tables, with multiple models side-by-side, as well as for summary statistics
tables, data frames, vectors and matrices.
Stargazer excels in at least three respects: its ease of use, the large number of models it
supports, and its beautiful aesthetics.
install.packages("stargazer")
library(stargazer)
● To create a summary statistics table from the ‘attitude’ data frame (which should be
available with your default installation of R), simply run the following:
stargazer(attitude)
Output :
● To output the contents of the first four rows of some data frame, specify the part of the
data frame you would like to see, and set the summary option to FALSE:
Now, let us try to create a simple regression table with three side-by-side models – two Ordinary
Least Squares (OLS) and one probit regression model – using the lm() and glm() functions.
We can set the align argument to TRUE, so that coefficients in each column are aligned along
the decimal point.
stargazer(linear.1, linear.2, probit.model, title="Results", align=TRUE)
Output :
lfe
lfe or Linear group effects package is intended for linear models with multiple group fixed
effects, i.e. with 2 or more factors with a large number of levels. It performs similar functions as
lm, but it uses a special method for projecting out multiple group fixed effects from the normal
equations, hence it is faster. It is a generalization of the within estimator. This may be required if
the groups have high cardinality (many levels), resulting in tens or hundreds of thousands of
dummy variables. It is also useful if one only wants to control for the group effects, without
actually estimating them
Code example :
oldopts <- options(lfe.threads=1)
x <- rnorm(1000)
x2 <- rnorm(length(x))
id <- factor(sample(10,length(x),replace=TRUE))
firm <- factor(sample(3,length(x),replace=TRUE,prob=c(2,1.5,1)))
year <- factor(sample(10,length(x),replace=TRUE,prob=c(2,1.5,rep(1,8))))
id.eff <- rnorm(nlevels(id))
firm.eff <- rnorm(nlevels(firm))
year.eff <- rnorm(nlevels(year))
y <- x + 0.25*x2 + id.eff[id] + firm.eff[firm] +
year.eff[year] + rnorm(length(x))
est <- felm(y ~ x+x2 | id + firm + year)
summary(est)
getfe(est,se=TRUE)
# compare with an ordinary lm
summary(lm(y ~ x+x2+id+firm+year-1))
options(oldopts)
Output :