UNIT 11
WORKING WITH VECTOR DATA
USING R
Structure______________________________________________
11.1 Introduction Clipping Raster Using Shapefile in R
Expected Learning Outcomes Clipping Raster Using Shapefile in R
11.2 Data Visualisation in R 11.5 Understanding Raster and Vector Data
Types of Data Visualizations Explanation of Shapefile Data Set
Advantages 11.6 Geometric Measurement
Limitations 11.7 Common Mistakes while using shapefile in
R
Application Areas
11.8 Summary
11.3 Combining Vectors, Matrix or Data
Frames by Rows in R Language 11.9 Terminal Questions
Combine Matrix Using rbind() Function 11.10 References and Further/Suggested
Readings
Combine Data Frame using rbind()
Function 11.11 Answers
11.4 Clipping a Raster
Methods of Clipping Raster Using
Shapefile in R
13.1 INTRODUCTION
Data visualization is the technique used to deliver insights in data using visual cues such as graphs,
charts, maps, and many others. This is useful as it helps in intuitive and easy understanding of the
large quantities of data and thereby make better decisions regarding it.
11.2 DATA VISUALISATION
The popular data visualization tools that are available are Tableau, Plotly, R,
Google Charts, Infogram, and Kibana. The various data visualization platforms
have different capabilities, functionality, and use cases. They also require a
different skill set. This present topic focuses the use of R for data visualization.
R is a language that is designed for statistical computing, graphical data
analysis, and scientific research. It is usually preferred for data visualization as
it offers flexibility and minimum required coding through its packages.
Consider the following Air Quality data set for visualization in R:
Ozone Solar R. Wind km/hr Temp °F Month Day
41 190 7.4 67 5 1
36 118 8 72 5 2
12 149 12.6 74 5 3
18 313 11.5 62 5 4
NA NA 14.3 56 5 5
28 NA 14.9 66 5 6
12.2.1 Types of Data Visualisations
Some of the various types of visualizations from above Air Quality Data set
offered by R’ are:
Bar Plot:There are two types of bar plots- horizontal and vertical which
represent data points as horizontal or vertical bars of certain lengths
proportional to the value of the data item. They are generally used for
continuous and categorical variable plotting. By setting the horiz parameter to
true and false, we can get horizontal and vertical bar plots respectively.
Example 1:
Coding:
# Horizontal Bar Plot for
# Ozone concentration in air
barplot(airquality$Ozone,
main = 'Ozone Concenteration in air',
xlab = 'ozone levels', horiz = TRUE)
output:
Figure 1: Ozone Concentration in Air
Example 2:
Coding
# Vertical Bar Plot for
# Ozone concentration in air
barplot(airquality$Ozone, main = 'Ozone Concenteration in air',
xlab = 'ozone levels', col ='blue', horiz = FALSE
Output:
Figure 2:Ozone Concentration in Air
Histogram: A histogram is like a bar chart as it uses bars of varying height to
represent data distribution. However, in histogram values are grouped into
consecutive intervals called bins. In a Histogram, continuous values are
grouped and displayed in these bins whose size can be varied.
Example 1:
Coding:
# Histogram for Maximum Daily Temperature
data(airquality)
hist(airquality$Temp, main ="La Guardia Airport's\
Maximum Temperature(Daily)",
xlab ="Temperature(Fahrenheit)",
xlim = c(50, 125), col ="yellow",
freq = TRUE)
output:
Figure 3: Maximum Temperature
For a histogram, the parameter xlim can be used to specify the interval within
which all values are to be displayed.
Another parameter freq., when set to TRUE denotes the frequency of the
various values in the histogram and when set to FALSE, the probability
densities are represented on the y-axis such that they are of the histogram
adds up to one.
Histograms are used in the following scenarios:
• To verify an equal and symmetric distribution of the data.
• To identify deviations from expected values.
Box Plot: The statistical summary of the given data is presented graphically
using a boxplot. A boxplot depicts information like the minimum and maximum
data point, the median value, first and third quartile, and interquartile range.
Example:
# Box plot for average wind speed
data(airquality)
boxplot(airquality$Wind, main = "Average wind speed\
at La Guardia Airport",
xlab = "Miles per hour", ylab = "Wind",
col = "orange", border = "brown",
horizontal = TRUE, notch = TRUE)
output:
Figure 4: Average Wind Speed
Scatter Plot: A scatter plot is composed of many points on a Cartesian plane.
Each point denotes the value taken by two parameters and helps us easily
identify the relationship between them.
# Scatter plot for Ozone Concentration per month
data(airquality)
plot(airquality$Ozone, airquality$Month,
main ="Scatterplot Example",
xlab ="Ozone Concentration in parts per billion",
ylab =" Month of observation ", pch = 19)
Figure 5:Ozone Concentration
Scatter Plots are used in the following scenarios:
• To show whether an association exists between bivariate data.
• To measure the strength and direction of such a relationship.
Map visualization in R: Here we are using maps package to visualize and
display geographical maps using an R programming language.
install.packages("maps")
# Read dataset and convert it into
# Dataframe
data <- read.csv("worldcities.csv")
df<- data.frame(data)
# Load the required libraries
library(maps)
map(database = "world")
# marking points on map
points(x = df$lat[1:500], y = df$lng[1:500], col = "Red")
Figure 6:geographical maps
3D Graphs in R: Here we will use preps() function, This function is used to
create 3D surfaces in perspective view. This function will draw perspective plots
of a surface over the x–y plane.
Syntax: persp(x, y, z)
Parameter: This function accepts different parameters i.e. x, y and z where x
and y are vectors defining the location along x- and y-axis. z-axis will be the
height of the surface in the matrix z.
Return Value:persp() returns the viewing transformation matrix for projecting 3D
coordinates (x, y, z) into the 2D plane using homogeneous 4D coordinates (x, y,
z, t).
# Adding Titles and Labeling Axes to Plot
cone <- function(x, y){
sqrt(x ^ 2 + y ^ 2)
}
# prepare variables.
x <- y <- seq(-1, 1, length = 30)
z <- outer(x, y, cone)
# plot the 3D surface
# Adding Titles and Labeling Axes to Plot
persp(x, y, z,
main="Perspective Plot of a Cone",
zlab = "Height",
theta = 30, phi = 15,
col = "orange", shade = 0.4)
Figure 7: Perspective Plot of Cone
11.2.2 Advantages
• R has the following advantages over other tools for data visualization:
• R offers a broad collection of visualization libraries along with extensive
online guidance on their usage.
• R also offers data visualization in the form of 3D models and multipanel
charts.
• Through R, we can easily customize our data visualization by changing
axes, fonts, legends, annotations, and labels.
11.2.3 Limitations
• R also has the following disadvantages:
• R is only preferred for data visualization when done on an individual
standalone server.
• Data visualization using R is slow for large amounts of data as compared to
other counterparts.
11.2.4 Application Areas
• Presenting analytical conclusions of the data to the non-analyst’s
departments of your company.
• Health monitoring devices use data visualization to track any anomaly in
blood pressure, cholesterol and others.
• To discover repeating patterns and trends in consumer and marketing data.
• Meteorologists use data visualization for assessing prevalent weather
changes throughout the world.
• Real-time maps and geo-positioning systems use visualization for traffic
monitoring and estimating travel time.
11.3 COMBINING VECTORS, MATRIX OR DATA
FRAMES BY ROWS IN R LANGUAGE
In this portion, we will discuss how we Combine Vectors, Matrices, or Data
Frames by Rows in R Programming Language using rbind function.
Programming Functions ofrbind
The rbind function combines or concatenates data frames or matrices by rows.
the rbind stands for row binds it shows that it appends rows of one object to
another.
Figure 8:Combine Vectors, Matrix or Data Frames by Rows
Syntax:
rbind(x1, x2, …, deparse.level = 1)
Parameters:x1, x2: vector, matrix, data frames
deparse.level: This value determines how the column names are generated.
The default value of deparse.level is 1.
Combine Vectors using rbind() Function
It will create two vectors and combine them with the help of rbinf function.
# R program to illustrate
# rbind function
Figure 9: Resultant of Combine matrices
# Initializing two vectors
x <- 2:7
y <- c(2, 5)
# print the value of x
x
# print the value of y
y
# Calling rbind() function
rbind(x, y)
11.3.1 Combine Matrix Using rbind() Function
We will create two matrix and combine them with the help of rbinf function.
Figure 10:Resultant of combine two Matrix or Data
# create one matrix
a1<-matrix(1:9,3,3)
a1
# Create second matrix
a2<-matrix(10:18,3,3)
a2
# combine the both matrix
rbind(a1,a2)
11.3.2 Combine Data Frame Using rbind() Function
We will create two data frame and combine them with the help of rbinf function.
# Create the first data frame
df1 <- data.frame(
Name = c("Anurag", "Nishant", "Jayesh"),
Age = c(25, 30, 22),
Score = c(90, 85, 92)
)
df1
# Create the second data frame
df2 <- data.frame(
Name = c("Vipul", "Shivang", "Pratham"),
Figure 11:Resultant of combine data
frame
Age = c(28, 35, 27),
Score = c(88, 91, 89)
)
df2
# Combine the data frames by rows
combined_df<- rbind(df1, df2)
# Display the result
print(combined_df)
Combine data frame having missing values
Figure 12:Resultant of combine data
frame Missing values
We will create two data frames having missing values and combine them with
the help of rbinf function.
# Create the first data frame
df1 <- data.frame(
Name = c("Anurag", "Nishant", "Jayesh"),
Age = c(25, NA, 22),
Score = c(90, 85, NA)
)
df1
# Create the second data frame
df2 <- data.frame(
Name = c("Vipul", "Shivang", "Pratham"),
Age = c(NA, 35, 27),
Score = c(88, NA, 89)
)
df2
# Combine the data frames by rows
combined_df<- rbind(df1, df2)
# Display the result
print(combined_df)
Clipping raster using shapefile in R
Clipping raster facts and the usage of shapefiles is a not unusual venture in
spatial data evaluation. It lets you to aware of unique regions of interest within a
larger raster dataset, based on the barriers described by a shapefile. This
article will guide you through creating a raster and shapefile, appearing in a
clipping manner, and interpreting the consequences using R Programming
Language.
11.5 UNDERSTANDING RASTER AND SHAPEFILE
DATA
Here we will discuss Raster and Shapefile Data.
Raster Data: Raster data represents geographic areas as a grid of cells or
pixels, each with a fee. Commonly used for non-stop information, raster formats
include GeoTIFF and GRID.
Shapefile Data: A shapefile is a vector statistics format in GIS composed of a
couple of files (*.Shp, *.Shx, *.Dbf) that define geometric shapes (factors, lines,
polygons). Shapefiles represent discrete functions at the Earth’s surface.
Explanation of Shapefile spatial data set
A shapefile is an extensively-used report layout in Geographic Information
Systems (GIS) designed to keep vector statistics. Vector facts represents
geographical features the usage of points, strains, and polygons, and each
shapefile includes a hard and fast of documents that together describe these
functions and their attributes. Here’s a more exact breakdown of each
component:
1. *.Shpextension File: Geometry Data
Purpose: The *.Shp document incorporates the actual geometric
information of the geographic capabilities. This includes the coordinates that
outline factors, traces, or polygons.
Details: For example, if you have a shapefile representing rivers, the
*.Shp document will shop the coordinates that trace the course of every river.
2. *.ShxextensionFile: Shape Index Data
Purpose: The *.Shx record offers an index to the shapes in the *.Shp
record. It helps the GIS software program quick find and retrieve the
geometric statistics.
Details: This index improves overall performance by means of permitting
efficient get right of entry to specific parts of the *.Shp file, that is crucial
whilst dealing with big datasets.
3. *.DbfextensionFile: Attribute Data
Purpose: The *.Dbffile holds attribute records associated with each
geometric feature. This fact is stored in a tabular format just like a
spreadsheet or database desk.
Details: Each row in the *.Dbfdocument corresponds to a form in the
.Shp file, and each column consists of a particular attribute, including
name, population, or land use kind.
4. *.PrjextensionFile: Coordinate System and Projection
Purpose: The *.Prj file contains records about the coordinate machine
and map projection utilized by the shapefile.
Details: This is crucial for ensuring that the shapefile’s spatial records
are correctly aligned with different spatial datasets. It specifies how the
round Earth is projected onto a flat map and consists of details which
include the coordinate system (WGS84) and projection kind (UTM).
5. *.Shp,*.XmlextensionFile: Metadata
Purpose: The *.Shp,*.Xmlfile contains metadata approximately the
shapefile, together with records approximately the records’ origin, the
method used for facts series, and different descriptive info.
Details: Metadata is crucial for information the context of the facts, its
accuracy, and it’s supposed use. This record can help users interpret
the shapefile’s content material and investigate its suitability for unique
packages.
Now, method of creation of a vector (shapefile) in R.
Figure 13: Creation of vector file
# Install required packages
install.packages("raster")
install.packages("sf") # For handling spatial vector data
# Load libraries
library(raster)
library(sf)
# Define the extent of the raster
extent <- extent(0, 10, 0, 10)
# Create a raster object with a resolution of 1x1
r <- raster(extent, nrow = 10, ncol = 10)
# Populate the raster with values (e.g., a gradient)
values(r) <- 1:ncell(r)
# Write the raster to a file
writeRaster(r, "example_raster.tif", format = "GTiff", overwrite = TRUE)
# Convert raster to polygons
polygons <- as(r, "SpatialPolygonsDataFrame")
# Convert to sf object
polygons_sf<- st_as_sf(polygons)
# Write the polygons to a shapefile
st_write(polygons_sf, "example_shapefile.shp", delete_layer = TRUE)
11.4.1 Methods of Clipping raster using shapefile in R
• Here the created that the shapefile is saved in a console part in R Studio.
• Define the spatial volume of the raster the use of the volume feature. Here,
the extent is from (zero, 10) in both x and y directions.
• Create a raster item with the desired quantity and determination. The
decision is ready by means of specifying the number of rows (nrow) and
columns (ncol).
• Populate the raster with values. In this case, a gradient is created by
assigning values from 1 to the range of cells inside the raster.
• Write the raster to a document in GeoTIFF format the use of the writeRaster
feature.
• Convert the raster to polygons and keep it as a shapefile.
• View the contents of the shapefile my_shapefile *.shp created by in R script:
• open the vs code and create a file demo.R .
• Copy and paste the given code below and save it.
click open run and debug.
# Load the sf package
library(sf)
# Read the shapefile
my_sf<- st_read("my_shapefile.shp")
# Print the sf object to see the attribute data
print(my_sf)
# Plot the spatial data
plot(my_sf)
Figure 14: Plotted spatial data
11.4.2 Clipping Raster Using Shapefile in R
To use shapefiles in R, you often convert them to sf items or Spatial gadgets
(from the sp package) to perform spatial operations together with clipping.
Clipping a Raster
To clip a raster, you can use a polygon as a mask boundary. Here’s how you
could do it: Create a polygon that defines the location to clip. Use the mask
feature to clip the raster the use of the polygon.
Figure 15:Clipping raster using shapefile in R
# Install required packages
install.packages("raster")
install.packages("sf")
# Load libraries
library(raster)
library(sf)
# Define the extent of the raster
extent <- extent(0, 10, 0, 10)
# Create a raster object with a resolution of 1x1
r <- raster(extent, nrow = 10, ncol = 10)
# Set the CRS for the raster
crs(r) <- CRS("+proj=longlat +datum=WGS84")
# Populate the raster with values (e.g., a gradient)
values(r) <- 1:ncell(r)
# Create a polygon for clipping
polygon <- st_as_sf(st_sfc(st_polygon(list(rbind(c(2, 2), c(8, 2),
c(8, 8), c(2, 8), c(2, 2))))))
# Set the CRS for the polygon
polygon <- st_set_crs(polygon, "+proj=longlat +datum=WGS84")
# Clip the raster using the polygon
clipped_raster<- mask(r, polygon)
# Plot the original and clipped raster
par(mfrow = c(1, 2))
plot(r, main = "Original Raster")
plot(clipped_raster, main = "Clipped Raster")
11.4.3 Clipping Raster Using Shapefile In R
Geometric Measurement
Geometric measurement is mainly to measure the area, length and distance of
the geometries. The operation can be done by the function st_area, st_length
and st_distance in package sf respectively. Geometric measurement is very
useful when it comes to calculating the population density (area), the radial
distance between two points (distance) and lots of practical issues. But please
note that the precision of the geometric measurement depends on the
projection of the map (CRS). If we use the geographic coordinate systems
(latitude–longitude, e.g., EPSG:4326) to measure the geometries, we may
obtain a wrong result. This is because the geographic coordinate systems uses
a three-dimensional spherical surface to define a location on the earth with
longitudes and latitudes in degrees, and of course, the measurement on two-
dimensional must result in the high deviation. The measurement of functions
can be well-performed when the geometries in a correct projected coordinate
system, which defines the X and Y of two-dimensional geometry. Here, we do
not further discuss about the theory on the measurement difference among
CRS, which requires better technical knowledge on the GIS, but just realize
how to do calculation by using package sf.
st_area is used to calculate the area of the polygon. Take data lnd in package
spData for instance. lnd is the data that records all of the boroughs in London
with the attributes such as name, official code, area, and so forth.
11.7 COMMON MISTAKES WHILE USING
SHAPEFILE IN R
• CRS Mismatch: Failing to reproject the shapefile to the equal CRS because
the raster can result in mistakes or wrong outputs.
• Wrong Package Functions: Ensure you use functions from the suitable
programs. For example, crop() and masks() belong to the raster bundle.
• Memory Management: Large raster documents can devour significant
memory. Be mindful of memory usage and don’t forget the use of features
that permit processing in chunks.
• Incorrect Paths: Always take a look at that the record paths are efficaciously
designated, particularly while analyzing or writing files.
11.8 SUMMARY
Clipping a raster, the use of a shapefile in R is an effective technique in
geospatial evaluation. With the proper applications and methods, you could
successfully extract relevant quantities of raster information based totally on
vector data. By knowledge the capacity demanding situations, like CRS
mismatches and reminiscence problems, you may make sure accurate and
effective clipping for your spatial workflows.
11.9 TERMINAL QUESTIONS
11.10 REFERENCES AND FURTHER/SUGGESTED
READINGS
11.11 ANSWERS
11. GLOSSARY
CRS: Clipping raster in shapefile
str(data) # structure
dim(data) # dimensions
View(data) # open View window of data
head(data) # beginning of the data frame
tail(data) # end of the data frame
names(data) # names of the columns
rownames(data) # names of the rows
colnames(data) # names of the columns
Data Objects in R:These objects, composed of multiple atomic data, elements,
are the bread and butter of R:Vectors and Data Frames