[go: up one dir, main page]

0% found this document useful (0 votes)
22 views17 pages

R Tutorial3

The document is a tutorial on managing and visualizing World Bank data using R, focusing on final consumption expenditure for Germany, Italy, and Turkiye. It outlines steps for downloading, cleaning, and transforming the data, including importing it into R, transposing it, and converting it to a data frame. The tutorial concludes with a data visualization example using ggplot to display the trends over the years.

Uploaded by

cagatayunal00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views17 pages

R Tutorial3

The document is a tutorial on managing and visualizing World Bank data using R, focusing on final consumption expenditure for Germany, Italy, and Turkiye. It outlines steps for downloading, cleaning, and transforming the data, including importing it into R, transposing it, and converting it to a data frame. The tutorial concludes with a data visualization example using ggplot to display the trends over the years.

Uploaded by

cagatayunal00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

R-tutorial-3

Çağatay Ünal

2024-10-17
Lecture 3

As we did last lesson, we are going to download some meta-data


and try to manage it, edit it, visualise it.
Downloading

We are going to use wordlbank data again. Lets move to world


development database. Our website is:
https://databank.worldbank.org/
After finding world development database, we will interested in 3
different countries. Germany, Italy, Turkiye. After selecting these
countries, continue with series. There are 1488 different
indicators/series. For our economics sakes, we will “Final
consumption expenditure (% of GDP)”
Mini Sum

Final consumption expenditure (formerly total consumption) is the


sum of household final consumption expenditure (private
consumption) and general government final consumption
expenditure (general government consumption). (worldbank.data)
Downloading 2

We will download this data as CSV document. After downloading


process, open zip document and move it to your working directory.
How do we find the working directory? Open the R Studio, Jump
to Tool, Go to Global Options, You will able to see the working
directory option. And you can easily change where do you want to
save your works etc.
Data Cleaning 1

Now we are going to import our data. You have basically two
options. The first one is importing it from right frame of the R
studio. In the default R Studio options, you will see Files option.
And find your data you downloaded it.
The second option is import from Import Dataset option where it
is up-right frame. Click it and click the “From Text (readr)”. Find
your data and import it.
Data Cleaning 2

Just a quick and important warning is “always” change your data


name while youre importing it.
After you imported it, please copy the lastest console code like this:

library(readr)
data <- read_csv("afe4aa7c-584a-4bcc-b540-ba55a787153d_Seri
Data Cleaning 3

And please “always” remember to R what we installed before and


will help you.

# if its necessary:

# install.packages("tibble")

library(tibble)
library(ggplot2)
library(tidyr)
Data Cleaning 4

We will transpose this data. Because we “always” need the


variables as columns, values as rows.
If youre not familiar with the “Transpose of a matrix” please type
exactly to Google. And you will remember from high school this
process. Its very fundamental.
And now we need to transpose it:

data_t <- t(data)

View(data_t)
Data Cleaning 5

Now we remove the all NA values from our data.

data_clean <- data_t[, colSums(is.na(data_t)) == 0]

View(data_clean)
Data Cleanin 6

As you can see, we do not need the first 4 rows. They are
irrelevant.
1st code is names our exactly the first row as a column names.

colnames(data_clean) <- data_clean[1, ]

And now we are going to remove our first four rows from the data.

data_clean <- data_clean[-(1:4), ]

View(data_clean)
Data Cleaning 7

We need to convert our data to data frame. Otherwise its just


basically some triva data and those are just random numbers or
words. For making some process, you need to “always” convert it
to data frame. Because R %90 works with dataframes.

data_clean <- data.frame(data_clean)


Data Cleaning 8

Ohh. There is something wrong with the first column. It does not
seem like a column. It often happens when you transpose your
data. For make it a column:
data_clean <- rownames_to_column(data_clean, var = "Years")

View(data_clean)
Data Cleaning 9

What is gsub? It is basically replacement syntax. It helps us to


change the words have same pattern.
Basic command is:
# gsub(pattern, replacement, x, ignore.case = FALSE, fixed = FALSE)

And we are going to:


data_clean$Years <- gsub("\\s*\\[YR[0-9]{4}\\]", "", data_clean$Years)

View(data_clean)
Data Cleaning 10

For doing some math or visualising, we need to convert all the


columns numeric.
data_clean[] <- lapply(data_clean, function(x) as.numeric(as.character(x)))

View(data_clean)
Data Cleaning 11

What is pivot_longer?
It reshapes data from a “wide” format to a “long” format.
# pivot_longer(data, cols, names_to = "name", values_to = "value")

In our example:
data_long <- pivot_longer(data_clean, cols = -Years, names_to = "variable", values_to = "value")

View(data_long)
Data Visualisation 1
ggplot(data_long, aes(x = Years, y = value, color = variable)) +
geom_line(size = 1) +
labs(title = "Data Visualization",
x = "Year",
y = "Value",
color = "Variable") +
theme_minimal()

Data Visualization

80.0

77.5

Variable
Value

75.0 Germany
Italy
Turkiye

72.5

70.0

You might also like