R-tutorial-3
Çağatay Ünal
2024-10-17
Lecture 3
As we did last lesson, we are going to download some meta-data
and try to manage it, edit it, visualise it.
Downloading
We are going to use wordlbank data again. Lets move to world
development database. Our website is:
https://databank.worldbank.org/
After finding world development database, we will interested in 3
different countries. Germany, Italy, Turkiye. After selecting these
countries, continue with series. There are 1488 different
indicators/series. For our economics sakes, we will “Final
consumption expenditure (% of GDP)”
Mini Sum
Final consumption expenditure (formerly total consumption) is the
sum of household final consumption expenditure (private
consumption) and general government final consumption
expenditure (general government consumption). (worldbank.data)
Downloading 2
We will download this data as CSV document. After downloading
process, open zip document and move it to your working directory.
How do we find the working directory? Open the R Studio, Jump
to Tool, Go to Global Options, You will able to see the working
directory option. And you can easily change where do you want to
save your works etc.
Data Cleaning 1
Now we are going to import our data. You have basically two
options. The first one is importing it from right frame of the R
studio. In the default R Studio options, you will see Files option.
And find your data you downloaded it.
The second option is import from Import Dataset option where it
is up-right frame. Click it and click the “From Text (readr)”. Find
your data and import it.
Data Cleaning 2
Just a quick and important warning is “always” change your data
name while youre importing it.
After you imported it, please copy the lastest console code like this:
library(readr)
data <- read_csv("afe4aa7c-584a-4bcc-b540-ba55a787153d_Seri
Data Cleaning 3
And please “always” remember to R what we installed before and
will help you.
# if its necessary:
# install.packages("tibble")
library(tibble)
library(ggplot2)
library(tidyr)
Data Cleaning 4
We will transpose this data. Because we “always” need the
variables as columns, values as rows.
If youre not familiar with the “Transpose of a matrix” please type
exactly to Google. And you will remember from high school this
process. Its very fundamental.
And now we need to transpose it:
data_t <- t(data)
View(data_t)
Data Cleaning 5
Now we remove the all NA values from our data.
data_clean <- data_t[, colSums(is.na(data_t)) == 0]
View(data_clean)
Data Cleanin 6
As you can see, we do not need the first 4 rows. They are
irrelevant.
1st code is names our exactly the first row as a column names.
colnames(data_clean) <- data_clean[1, ]
And now we are going to remove our first four rows from the data.
data_clean <- data_clean[-(1:4), ]
View(data_clean)
Data Cleaning 7
We need to convert our data to data frame. Otherwise its just
basically some triva data and those are just random numbers or
words. For making some process, you need to “always” convert it
to data frame. Because R %90 works with dataframes.
data_clean <- data.frame(data_clean)
Data Cleaning 8
Ohh. There is something wrong with the first column. It does not
seem like a column. It often happens when you transpose your
data. For make it a column:
data_clean <- rownames_to_column(data_clean, var = "Years")
View(data_clean)
Data Cleaning 9
What is gsub? It is basically replacement syntax. It helps us to
change the words have same pattern.
Basic command is:
# gsub(pattern, replacement, x, ignore.case = FALSE, fixed = FALSE)
And we are going to:
data_clean$Years <- gsub("\\s*\\[YR[0-9]{4}\\]", "", data_clean$Years)
View(data_clean)
Data Cleaning 10
For doing some math or visualising, we need to convert all the
columns numeric.
data_clean[] <- lapply(data_clean, function(x) as.numeric(as.character(x)))
View(data_clean)
Data Cleaning 11
What is pivot_longer?
It reshapes data from a “wide” format to a “long” format.
# pivot_longer(data, cols, names_to = "name", values_to = "value")
In our example:
data_long <- pivot_longer(data_clean, cols = -Years, names_to = "variable", values_to = "value")
View(data_long)
Data Visualisation 1
ggplot(data_long, aes(x = Years, y = value, color = variable)) +
geom_line(size = 1) +
labs(title = "Data Visualization",
x = "Year",
y = "Value",
color = "Variable") +
theme_minimal()
Data Visualization
80.0
77.5
Variable
Value
75.0 Germany
Italy
Turkiye
72.5
70.0