Module 6
Data Analytics with R
A.I. Kalsekar Technical Campus, New Panvel
Data Analytics with R (Syllabus)
• Exploring Basic features of R, Exploring RGUI, Exploring RStudio, Handling
Basic Expressions in R, Variables in R, Working with Vectors, Storing and
Calculating Values in R, Creating and using Objects, Interacting with users,
Handling data in R workspace, Executing Scripts, Creating Plots, Accessing help
and documentation in R
• Reading datasets and Exporting data from R, Manipulating and Processing Data in
R, Using functions instead of script, built-in functions in R
• Data Visualization: Types, Applications
A.I. Kalsekar Technical Campus, New Panvel
What is R?
➢ R is a programming language and free software developed by Ross Ihaka and Robert Gentleman
in 1990.
➢ This programming language was named R, based on the first letter of first name of the
two R authors (Robert Gentleman and Ross Ihaka)
➢ R is an open-source programming language mostly used for statistical computing and
data analysis and is available across widely used platforms like Windows, Linux, and
MacOS.
➢ It generally comes with the command-line interface and provides a vast list of packages
for performing tasks.
➢R is an interpreted language that supports both procedural programming and
object-oriented programming.
A.I. Kalsekar Technical Campus, New Panvel
What is R?
➢ R is one of the most important tool which is used by researchers, data analyst, statisticians and
marketers for retrieving, cleaning, analyzing, visualizing and presenting data.
➢ R allows integration with the procedures written in other programming language like
C, C++, .NET, python and FORTRAN language to improve efficiency.
➢ The R language provides a wide range of statistical and graphical techniques,
including linear and nonlinear modeling, time series analysis, clustering, and
more.
A.I. Kalsekar Technical Campus, New Panvel
Why R Programming Language?
A.I. Kalsekar Technical Campus, New Panvel
Why R ?
• R programming is used as a leading tool for machine learning, statistics, and data
analysis. Objects, functions, and packages can easily be created by R.
• It’s a platform-independent language. This means it can be applied to all
operating system.
• It’s an open-source free language. That means anyone can install it in any
organization without purchasing a license.
• R programming language is not only a statistic package but also allows us to
integrate with other languages (C, C++). Thus, you can easily interact with
many data sources and statistical packages.
• The R programming language has a vast community of users and it’s growing
day by day.
• R is currently one of the most requested programming languages in the Data
Science job market that makes it the hottest trend nowadays.
A.I. Kalsekar Technical Campus, New Panvel
History of R….
The initial version of R, known as R 0.16, was released in 1995.
R quickly gained popularity among statisticians, data analysts, and researchers due to its flexibility,
extensibility, and powerful statistical capabilities.
The R language provides a wide range of statistical and graphical techniques, including linear and
nonlinear modeling, time series analysis, clustering, and more.
The first project was considered in 1992. The initial version was released in 1995,and in 2000, a stable
beta version was released.
Latest version of R version 4.3.1 has been released on 16-06-2023.
A.I. Kalsekar Technical Campus, New Panvel
1991 Created in New Zealand by Ross Ihaka and Robert Gentleman.
1993 August, First announcement of R to the public.
1995 Martin Machler convinces Ross and Robert to use the GNU General public License to make R a free s\w.
1996 A public mailing list is created
1997 The R core Group is formed. The core group controls the source code for R.
2000 R version 1.0.0 was released.
2013 R version 3.0.2 was released in December.
2014-16 R versions 3.2.x to 3.3.x
R version 3.4.0 was released in April.
2023 Version 4.3.1 has been released
2025 Version 4.5.1 has been released
A.I. Kalsekar Technical Campus, New Panvel
Timeline of R Programming Language
•1995 → First public release of R 0.16 by Ross Ihaka and Robert Gentleman
•1997 → Formation of the R Core Team to maintain and develop R.
•2000 (February) → Release of R 1.0.0, the first stable version.
•2004 → R becomes widely adopted in universities and research institutions.
•2013 → Release of R 3.0.0 (“Masked Marvel”), marking improvements in big data handling.
•2015 → CRAN (Comprehensive R Archive Network) grows to 7,000+ packages.
•2019 → Release of R 3.6.0 with performance improvements.
•2020 (April) → Release of R 4.0.0, introducing major updates (new defaults, stringsAsFactors = FALSE by
default).
•2023 (April) → Release of R 4.3.0 with modern enhancements in performance and usability.
•2024 (April) → Release of R 4.4.0 (“Puppy Cup”).
•2025 (April) → Release of R 4.5.0 (“How About a Twenty-Six”).
•2025 (June) → Latest version R 4.5.1 (“Great Square
A.I. Kalsekar Root”).
Technical Campus, New Panvel
Features of R Programming Language.
R is a Programming language that supports both procedural as well as object-oriented programming .
R can be easily integrated with many other technologies and frameworks like Hadoop and
HDFS.It can also integrate with other programming languages like C,C++, python, java,
FORTRAN and JavaScript.
Open-source free language. That means anyone can install it in any organization
without purchasing a license.
R Packages: One of the major features of R is it has a wide availability of
libraries. R has CRAN(Comprehensive R Archive Network), which is a
repository holding more than 15, 0000 packages.
A.I. Kalsekar Technical Campus, New Panvel
Powerful Graphics: R’s graphical capabilities are amazing.It can produce publication-quality graphs and plots
of any kind with its base package. With added packages like ggplot2 and plotly the possibilities are endless.
No need for a compiler: The R language is interpreted. It does not need a compiler to convert the code into
a program.
Cross- Platform support: R is cross-platform supportive that is it can run on any OS and in any
Software environment without any hassle.
Performs fast calculations: You can perform wide variety of complex operations on
vectors, arrays, data frames and other data objects of varying sizes.
vast community of users :The R programming language has a vast
community of users and it’s growing day by day.
A.I. Kalsekar Technical Campus, New Panvel
Machine learning with R Language
➢ The MLR package which stands for Machine learning in R has become
highly popular.
➢ This package is useful for all machine learning algorithms and provides other
tools that help with machine learning as well
A.I. Kalsekar Technical Campus, New Panvel
Programming in R:
• Since R is much similar to other widely used languages syntactically, it is easier to
code and learn in R.
• Programs can be written in R in any of the widely used IDE like R Studio, Rattle,
Tinn-R, etc. After writing the program save the file with the extension .r.
• To run the program, use the following command on the command line:
R file_name.r
A.I. Kalsekar Technical Campus, New Panvel
Advantages of R:
➢ R is the most comprehensive statistical analysis package. As new technology and
concepts often appear first in R.
➢ As R programming language is an open source. Thus, you can run R anywhere and at any
time.
➢ R programming language is suitable for GNU/Linux and Windows operating system.
➢ R programming is cross-platform which runs on any operating system.
➢ In R, everyone is welcome to provide new packages, bug fixes, and code enhancements.
A.I. Kalsekar Technical Campus, New Panvel
Disadvantages of R:
• In the R programming language, the standard of some packages is less than perfect.
• Although, R commands give little pressure to memory management. So R
programming language may consume all available memory.
• In R basically, nobody to complain if something doesn’t work.
• R programming language is much slower than other programming languages such as
Python and MATLAB.
A.I. Kalsekar Technical Campus, New Panvel
Exploring RGUI
A.I. Kalsekar Technical Campus, New Panvel
Exploring R Studio
➢ R Studio is an integrated development environment(IDE) for R.
➢ IDE is a GUI, where you can write your quotes, see the results and also see the
variables that are generated during the course of programming.
➢ R Studio is available as both Open source and Commercial software.
➢ R Studio is also available as both Desktop and Server versions.
➢ R Studio is also available for various platforms such as Windows, Linux, and macOS.
A.I. Kalsekar Technical Campus, New Panvel
After the installation process is over,
the R Studio interface looks like:
A.I. Kalsekar Technical Campus, New Panvel
• The console panel(left panel) is the place where R is waiting for you to tell it what
to do, and see the results that are generated when you type in the commands.
• To the top right, you have the Environmental/History panel. It contains 2 tabs:
• Environment tab: It shows the variables that are generated during the course of
programming in a workspace that is temporary.
• History tab: In this tab, you’ll see all the commands that are used till now from
the start of usage of R Studio.
•To the right bottom, you have another panel, which contains multiple tabs, such as files,
plots, packages, help, and viewer.
• The Files tab shows the files and directories that are available within the default
workspace of R.
• The Plots tab shows the plots that are generated during the course of programming.
• The Packages tab helps you to look at what are the packages that are already
installed in the R Studio and it also gives a user interface to install new packages.
• The Help tab is the most important one where you can get help from the R
Documentation on the functions that are in built-in R.
• The final and last tab is that the Viewer tab which can be used to see the local web
content that’s generated using R. A.I. Kalsekar Technical Campus, New Panvel
Basic Expressions in R
• "hello world“
• 100+200
• a <- 60
• b <-68
• c =a+b
•C
• a<b
• a>b
A.I. Kalsekar Technical Campus, New Panvel
Variables in R
• A variable in R can store an atomic vector, group of atomic vectors or a combination of many R
objects. A valid variable name consists of letters, numbers and the dot or underline characters. The
variable name starts with a letter or the dot not followed by a number.
Variable Name Validity Reason
var_name2. valid Has letters, numbers, dot and underscore
var_name% Invalid Has the character '%'. Only dot(.) and underscore allowed.
2var_name invalid Starts with a number
.var_name, valid Can start with a dot(.) but the dot(.)should not be followed
var.name by a number.
.2var_name invalid The starting dot is followed by a number making it invalid.
_var_name invalid Starts with _ which is not valid
A.I. Kalsekar Technical Campus, New Panvel
R - Data Types
Data Type Example Verify
Logical TRUE, FALSE v <- TRUE
print(class(v))
[1] "logical"
Numeric 12.3, 5, 999 v <- 23.5 print(class(v))
[1] "numeric"
Integer 2L, 34L, 0L v <- 2L print(class(v))
[1] "integer"
Complex 3 + 2i v <- 2+5i print(class(v))
[1] "complex"
Character 'a' , '"good", v <- "TRUE" print(class(v))
"TRUE", '23.4' [1] "character"
A.I. Kalsekar Technical Campus, New Panvel
Programming Exercises
1. Write a R program to create three vectors numeric data, character data and
logical data. Display the content of the vectors and their type.
2. The numbers below are the first ten days of rainfall amounts in 1996. Read them
into a vector using the c() function
0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
a) What was the mean rainfall, how about the standard deviation?
b)Calculate the cumulative rainfall (’running total’) over these ten days.
Confirm that the last value of the vector that this produces is equal to the
total sum of the rainfall.
c) Which day saw the highest rainfall (write code to get the answer)?
A.I. Kalsekar Technical Campus, New Panvel
3. Write a R program to create a simple bar plot of five subjects marks.
marks = c(70, 95, 80, 74)
barplot(marks, main = "Comparing marks of 5 subjects",
xlab = "Marks", ylab = "Subject", names.arg =
c("English", "Science", "Math.", "Hist."), col = "green",horiz
= FALSE )
4. Write a R program to compute sum, mean and product of a given vector elements.
nums = c(10, 20, 30)
print('Original vector:')
print(nums)
print(paste("Sum of vector elements:",sum(nums)))
print(paste("Mean of vector elements:",mean(nums)))
print(paste("Product of vector elements:",prod(nums)))
A.I. Kalsekar Technical Campus, New Panvel
5. Write a R program to list the distinct values in a vector from a given vector.
v = c(10, 10, 10, 20, 30, 40, 40, 40, 50)
print("Original vector:")
print(v)
print("Distinct values of the said vector:")
print(unique(v))
6. Write a R program to find the elements of a given vector that are not in
another given vector.
a = c(0, 10, 10, 10, 20, 30, 40, 40, 40, 50, 60)
b = c(10, 10, 20, 30, 40, 40, 50)
print("Original vector-1:")
print(a)
print("Original vector-2:")
print(b)
print("Elements of a that are not in b:")
result = setdiff(a, b)
print(result)
A.I. Kalsekar Technical Campus, New Panvel
7. Write a R program to reverse the order of given vector.
v = c(0, 10, 10, 10, 20, 30, 40, 40, 40, 50, 60)
print("Original vector-1:")
print(v)
rv = rev(v)
print("The said vector in reverse order:")
print(rv)
8. Write a R program to concatenate a vector.
a = c("Python","NumPy", "Pandas")
print(a)
x = paste(a, collapse = "")
print("Concatenation of the said string:")
print(x)
A.I. Kalsekar Technical Campus, New Panvel
9. Write a R program to add 3 to each element in a given vector. Print the original and new vector.
v = c(1, 2, NULL, 3, 4, NULL)
print("Original vector:")
print(v)
new_v = (v+3)[(!is.na(v)) & v > 0]
print("New vector:")
print(new_v)
A.I. Kalsekar Technical Campus, New Panvel
R - Data Structures
• while doing programming in any programming language, you need to use various variables to store
various information. Variables are nothing but reserved memory locations to store values. This
means that, when you create a variable you reserve some space in memory.
• In R, the variables are not declared as some data type. The variables are assigned with R-Objects
and the data type of the R-object becomes the data type of the variable. There are many types of R-
objects. The frequently used ones are −
• Vectors
• Lists
• Matrices
• Arrays
• Factors
• Data Frames
A.I. Kalsekar Technical Campus, New Panvel
1. Vector:
• A sequence of elements of the same data type. Most basic data structure in R.
• Example:
v <- c(1, 2, 3, 4, 5) # numeric vector
names <- c("A", "B", "C") # character vector
2. List :
• Can hold elements of different data types (numbers, strings, vectors, etc.).
• Example:
mylist <- list(1, "Hello", TRUE, c(2,3,4))
A.I. Kalsekar Technical Campus, New Panvel.
3. Matrix
•A 2D structure where all elements must be of the same data type.
•Example:
m <- matrix(1:9, nrow=3, ncol=3)
4. Array: Multi-dimensional version of a matrix.
Example:
arr <- array(1:12, dim = c(3,2,2))
A.I. Kalsekar Technical Campus, New Panvel.
5. Data Frame
•A table-like structure with rows and columns.
•Each column can have a different data type (numeric, character, logical).
•Most used structure for datasets.
•Example:
df <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 28),
Passed = c(TRUE, FALSE, TRUE)
)
6. Factor
•Used to represent categorical data (like labels, levels).
•Example:
gender <- factor(c("Male", "Female", "Female", "Male"))
A.I. Kalsekar Technical Campus, New Panvel.
Creating Plots
• data()
• data(cars)
• Cars
• cars$speed
• cars$dist
• plot(cars$speed,cars$dist,xlab="speed",ylab = "Distance", main = "Cars speed and distance")
• barplot(BOD$Time,BOD$demand,xlab = "Time", ylab = "Demand", main = "Biochemical Oxygen
Demand",col="red",border="black")
• Rainfall_data<-c(18,23,29,24,12)
• month<-c("jun","july","aug","sept","oct")
• png(filename = "Bar chart.jpg")
• barplot(Rainfall_data,xlab="Month",ylab="Rainfall",main="Rainfall variation in monsoon
season",names.arg=month,col="black",border="Red")
A.I. Kalsekar Technical Campus, New Panvel
Accessing help and documentation in R
• The help() function and ? help operator in R provide access to the documentation
pages for R functions, data sets, and other objects, both for packages in the
standard R distribution and for contributed packages.
• To access documentation for the standard lm (linear model) function, for example,
enter the command help(lm) or help("lm"), or ?lm or ?"lm" (i.e., the quotes are
optional).
• help()
• help(lm)
A.I. Kalsekar Technical Campus, New Panvel
Reading Data into R
• # Read data into R using the read.csv function
• # Set working directory
• setwd("C:/Users/91776/Desktop/Bank_project")
• # Read data from csv file
• read.csv("be.csv")
• student <- read.csv("be.csv")
• # view the data frame object in window
• view(student)
A.I. Kalsekar Technical Campus, New Panvel
Reading Data into R
• #print data frame object to console
• print(student)
• # view just names of the variables in the data frame
• names(student)
• studentbe <- read.csv("be.csv")
• # remove data frame
• remove(studentbe)
A.I. Kalsekar Technical Campus, New Panvel
Built-in functions in R
• The functions which are already created or
defined in the programming framework are
known as a built-in function.
• R has a rich set of functions that can be used
to perform almost every task for the user.
A.I. Kalsekar Technical Campus, New Panvel
Math Functions
• R provides the various mathematical functions to perform the mathematical calculation. These
mathematical functions are very helpful to find absolute value, square value and much more calculations.
In R, there are the following functions which are used:
S. No Function Description Example
1. abs(x) It returns the absolute value of input x. x<- -4 print(abs(x))
Output[1] 4
2. sqrt(x) It returns the square root of input x. x<- 4 print(sqrt(x))
Output[1] 2
3. ceiling(x) It returns the smallest integer which is larger x<- 4.5 print(ceiling(x))
than or equal to x. Output[1] 5
4. floor(x) It returns the largest integer, which is smaller x<- 2.5 print(floor(x))
than or equal to x. Output[1] 2
A.I. Kalsekar Technical Campus, New Panvel
5. trunc(x) It returns the truncate value of input x. x<- c(1.2,2.5,8.1)
print(trunc(x))
Output[1] 1 2 8
6. cos(x), It returns cos(x), sin(x) value of input x. x<- 4
sin(x), print(cos(x))
tan(x) print(sin(x))
print(tan(x))
Output[1] -06536436 [2] -0.7568025 [3]
1.157821
7. log(x) It returns natural logarithm of input x. x<- 4 print(log(x))
Output[1] 1.386294
8. log10(x) It returns common logarithm of input x. x<- 4 print(log10(x))
Output[1] 0.60206
9. exp(x) It returns exponent. x<- 4 print(exp(x))
Output[1] 54.59815
A.I. Kalsekar Technical Campus, New Panvel
String Function
• R provides various string functions to perform tasks. These string functions
allow us to extract sub string from string, search pattern etc.
• String manipulation is a process used for handling and analyzing strings.
String functions help manipulate the contents of a string.
• There are the following string functions in R:
A.I. Kalsekar Technical Campus, New Panvel
S. No Function Description Example
1. paste() It concatenates strings together, separating them string1 <- "Hello"
with the sep string. It allows us to combine string2 <- "world"
multiple strings into a single string. result <- paste(string1, string2, sep = ", ")
print(result)
Output: "Hello, world"
2. substr() It extracts substrings from a character vector by text <- "Hello World.."
specifying the starting and ending positions. subs <- substr(text, start = 1, stop = 5)
print(subs)
Output: "Hello"
3. toupper() It converts a given string into uppercase letters. text <- "Hello World.."
up_text <- toupper(text)
print(up_text)
Output: "HELLO WORLD.."
4. tolower() It converts a given string into lowercase letters. text <- "Hello World.."
lo_text <- tolower(text)
print(lo_text)
Output: "hello world.."
5. sub() It finds a pattern in a given character vector and text <- "Hello World.."
replaces it with a specified replacement text. new_text <- sub("World", "everyone", text)
print(new_text)
Output: "Hello everyone.."
Statistical Probability Functions
➢R provides extensive statistical probability functions, allowing programmers to
analyze and work with probability distributions.
➢ These functions include normal, binomial, Poisson, and uniform distribution.
➢We can calculate cumulative probabilities, quantiles, and densities and generate
random numbers using these functions.
A.I. Kalsekar Technical Campus, New Panvel
S. No Function Description Example
1. pnorm() It calculates a given number's cumulative x <- 4.78
probability (area under the curve) in a standard cum_prob <- pnorm(x)
normal distribution. print(cum_prob)
Output: 0.9999991
2. qnorm() It calculates a given probability's quantile (inverse x <- 0.75
cumulative probability) in a standard normal quant <- qnorm(x)
distribution. print(quant)
Output: 0.6744898
3. dnorm() It calculates a given number's density (probability x <- 1.43
mass) in a standard normal distribution. dens <- dnorm(x)
print(dens)
Output: 0.1435046
4. rnorm() It generates random numbers from a standard rnum <- rnorm(10)
normal distribution. print(rnum)
Output: .1017659 -1.5608056 -
0.8775891 -1.3254433 -0.1229205 -
0.3095503 0.5354532 0.6446904 -
A.I. Kalsekar Technical Campus, New Panvel
0.8226329 -0.5761977
Other Statistical Functions
S. No Function Description Example
1. cor() It measures the correlation coefficient value x <- c(1, 5, 15, 20)
between two given vectors and calculates the y <- c(2, 6, 18, 24)
strength and direction of the linear relationship corr<- cor(x, y)
between the two variables.
print(corr)
Output: 0.9996147
2. var() It computes the sample variance of a given vector. x <- c(5, 7, 9, 12, 15)
varn <- var(x)
print(varn)
Output: 15.8
3. cov() It measures the covariance between two vectors. x <- c(1, 2, 3, 4, 5)
y <- c(6, 7, 8, 9, 10)
covr <- cov(x, y)
print(covr)
Output: 2.5
4. median() It computes the sample median of a given numeric df <- c(1, 2, 7, 12, 15)
vector. med_value <- median(df)
print(med_value)
A.I. Kalsekar Technical Campus, New Panvel
Output: 7
S. No Function Description Example
5. sd() It computes the standard deviation of a given set of df <- c(1, 3, 5, 12, 20)
values. std_dev <- sd(df)
print(std_dev)
Output: 7.79102
6. range() It returns a vector with two elements representing a df <- c(1, 3, 4, 5, 9, 10)
given dataset's minimum and maximum values. rang <- range(df)
print(rang)
Output: 1 10
7. diff() It computes the lagged differences between x <- c(4, 8, 12, 16, 20)
consecutive elements in a given vector. dif <- diff(x)
print(dif)
Output: 4 4 4 4
A.I. Kalsekar Technical Campus, New Panvel
Other Useful Functions
S. No Function Description Example
1. unique() It extracts only the unique elements or rows from x <- c(1, 2, 3, 2, 4, 1, 4, 3, 5)
the input object and returns a vector, data frame, unique_values <- unique(x)
or array with duplicate elements removed. print(unique_values)
Output: 1 2 3 4 5
2. sort() It sorts a vector in ascending order by default. x <- c(5, 2, 7, 1, 4, 9, 8)
sort_data <- sort(x)
print(sort_data)
Output: 1 2 4 5 7 8 9
3. rev() It returns the reverse version of data objects. x <- c(39, 40, 41, 42, 43, 44, 45)
rev_x <- rev(x)
print(rev_x)
Output: 45 44 43 42 41 40 39
4. length() It determines the length or the number of x <- c(1, 2, 3, 4, 5, 12, 15, 18)
elements in a vector or an object. x_length <- length(x)
print(x_length)
A.I. Kalsekar Technical Campus, New Panvel Output: 8
Data Visualization in R
• Data visualization is the technique used to deliver insights in data using visual cues
such as graphs, charts, maps, and many others.
• This is useful as it helps in easy understanding of the large quantities of data and
thereby make better decisions regarding it.
• Data Visualization in R Programming Language
➢The popular data visualization tools that are available are Tableau, Plotly, R, Google
Charts, Infogram, and Kibana. The various data visualization platforms have
different capabilities, functionality, and use cases.
➢R is a language that is designed for statistical computing, graphical data analysis,
and scientific research.
➢It is usually preferred for data visualization as it offers flexibility and minimum
required coding through its packages.
A.I. Kalsekar Technical Campus, New Panvel
R provides a series of packages for data visualization. These packages
are as follows:
A.I. Kalsekar Technical Campus, New Panvel
Types of Data Visualizations
1. Bar Plot
➢There are two types of bar plots- horizontal and vertical which represent data
points as horizontal or vertical bars of certain lengths proportional to the value of
the data item.
➢They are generally used for continuous and categorical variable plotting.
➢By setting the horiz parameter to true and false, we can get horizontal and vertical
bar plots respectively.
A.I. Kalsekar Technical Campus, New Panvel
barplot(airquality$Ozone, barplot(airquality$Ozone,
main = 'Ozone Concenteration in air', main = 'Ozone Concenteration in air',
xlab = 'ozone levels’, xlab = 'ozone levels’,
horiz = FALSE) horiz = TRUE)
A.I. Kalsekar Technical Campus, New Panvel
Bar plots are used for the following scenarios:
•To perform a comparative study between the various data categories in the data set.
•To analyze the change of a variable over time in months or years.
A.I. Kalsekar Technical Campus, New Panvel
Types of Data Visualizations contd…
2. Histogram
➢ A histogram is like a bar chart as it uses bars of varying height to represent data
distribution.
➢ However, in a histogram values are grouped into consecutive intervals called bins.
➢ In a Histogram, continuous values are grouped and displayed in these bins whose size
can be varied.
Histograms are used in the following scenarios:
•To verify an equal and symmetric distribution of the data.
•To identify deviations from expected values.
A.I. Kalsekar Technical Campus, New Panvel
data(airquality)
hist(airquality$Temp, main ="La Guardia
Airport's\Maximum Temperature(Daily)",
xlab ="Temperature(Fahrenheit)",
xlim = c(50, 125), col ="yellow",
freq = TRUE)
The parameter xlim can be used to specify the
interval within which all values are to be displayed.
freq when set to TRUE denotes the frequency of the
various values in the histogram and when set to FALSE,
the probability densities are represented on the y-axis
A.I. Kalsekar Technical Campus, New Panvel
hist(airquality$Temp, main ="La Guardia hist(airquality$Temp, main ="La Guardia
Airport's\Maximum Temperature(Daily)", Airport's\Maximum Temperature(Daily)",
xlab ="Temperature(Fahrenheit)", xlab ="Temperature(Fahrenheit)",
xlim = c(50, 125), xlim = c(50, 125),
col ="yellow", col ="yellow",
freq = TRUE) freq = FALSE)
A.I. Kalsekar Technical Campus, New Panvel
Types of Data Visualizations contd…
3. Box Plot
➢The statistical summary of the given data is presented graphically using a boxplot.
➢A boxplot depicts information like the minimum and maximum data point, the median
value, first and third quartile, and interquartile range.
Box Plots are used for:
•To give a comprehensive statistical description of the data through a visual cue.
•To identify the outlier points that do not lie in the inter-quartile range of data.
A.I. Kalsekar Technical Campus, New Panvel
data(airquality)
boxplot(airquality$Wind,
main = "Average wind speed\at La Guardia
Airport",
xlab = "Miles per hour", ylab = "Wind",
col = "orange", border = "brown",
horizontal = TRUE, notch = TRUE)
A.I. Kalsekar Technical Campus, New Panvel
A.I. Kalsekar Technical Campus, New Panvel
A.I. Kalsekar Technical Campus, New Panvel
A.I. Kalsekar Technical Campus, New Panvel
Types of Data Visualizations contd…
• Scatter Plot
• A scatter plot is composed of many points on a
Cartesian plane. Each point denotes the value taken by
two parameters and helps us easily identify the
relationship between them.
Scatter Plots are used in the following scenarios:
•To show whether an association exists between bivariate
data.
•To measure the strength and direction of such a
relationship.
A.I. Kalsekar Technical Campus, New Panvel
Application Areas:
• Presenting analytical conclusions of the data to the non-analysts departments of your
company.
• Health monitoring devices use data visualization to track any anomaly in blood pressure,
cholesterol and others.
• To discover repeating patterns and trends in consumer and marketing data.
• Meteorologists use data visualization for assessing prevalent weather changes throughout
the world.
• Real-time maps and geo-positioning systems use visualization for traffic monitoring and
estimating travel time.
A.I. Kalsekar Technical Campus, New Panvel
Data Visualization using R Base
Package
• 1) Scatter Diagram
• 2) Line Chart
• 3) Bar Chart
• 4) Histogram
• 5) Boxplot
• 6) correlation matrix
A.I. Kalsekar Technical Campus, New Panvel
A.I. Kalsekar Technical Campus, New Panvel
Data Visualization using R Base Package
• # mtcars dataset
• mtcars
• #pressure dataset
• pressure
• #airquality dataset
• airquality
A.I. Kalsekar Technical Campus, New Panvel
#scatterplot
• plot(mtcars$mpg,mtcars$disp)
• #change the X and Y labels and also give some title
• plot(mtcars$disp,mtcars$mpg,xlab="disp",ylab="mpg",main="disp qnd mpg")
• #To change the symbol
• #Plot character or pch
• #In R base plot functions, two options are available lty and lwd, lty stands for line types, and lwd for line width.
• plot(mtcars$disp,mtcars$mpg,xlab="disp",ylab="mpg",main="disp qnd mpg",pch=2)
• # want to change the color of the symbol,You can also use html hex color codetable
• plot(mtcars$disp,mtcars$mpg,xlab="disp",ylab="mpg",main="disp qnd
mpg",pch=21,col="red3",bg="slateblue3",lwd=5)
• # plotting by colour by group
• plot(mtcars$disp,mtcars$mpg,xlab="disp",ylab="mpg",main="disp qnd mpg",
• pch=21,col=mtcars$cyl,bg="slateblue3",lwd=5)
A.I. Kalsekar Technical Campus, New Panvel
# line chart
• pressure
• plot(pressure$temperature,pressure$pressure)
• #to add lines
• plot(pressure$temperature,pressure$pressure,type="l")
• #create line chart with point together
• plot(pressure$temperature,pressure$pressure,type="b")
• #to change line type
• plot(pressure$temperature,pressure$pressure,type="l",lty=1,lwd=3,col="red",
• main="line chart of temperatur and pressure",
• xlab = "temperatur",ylab="pressure")
• #when we change to both
• plot(pressure$temperature,pressure$pressure,type="b",lty=1,lwd=3,col=rainbow(7),pch=3,
• main="line chart of temperatur and pressure",
• xlab = "temperatur",ylab="pressure")
A.I. Kalsekar Technical Campus, New Panvel
#Bar chart
• #for categorical variable
• #we want to see the frequency Distribution
• #using barplot()
• barplot(mtcars$cyl)
• #need to make aggregate data
• barplot(table(mtcars$cyl))
• barplot(table(mtcars$cyl),main = "Bar chart showing distribution of \n numger of cylinder",
• xlab="No of cylinder",ylab = "Frequency",col="pink",border = "blue")
• barplot(table(mtcars$cyl),main = "Bar chart showing distribution of \n numger of cylinder",
• xlab="No of cylinder",ylab = "Frequency",col="pink",border = NA)
• barplot(table(mtcars$cyl),main = "Bar chart showing distribution of \n numger of cylinder",
• xlab="No of cylinder",ylab = "Frequency",col=c("pink","blue","red"),border = NA)
A.I. Kalsekar Technical Campus, New Panvel
Topic : Some Important previous year questions
A.I. Kalsekar Technical Campus, New Panvel.
Topic : Some Important previous year questions
A.I. Kalsekar Technical Campus, New Panvel.
Topic : Some Important previous year questions
A.I. Kalsekar Technical Campus, New Panvel.