[go: up one dir, main page]

0% found this document useful (0 votes)
10 views34 pages

Sanju - R

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views34 pages

Sanju - R

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

DATA ANALYTICS USING R

LANGUAGE
PRACTICAL FILE

Submitted to : Submitted by:


Er. Anju Godara Sanjana
Assistant Professor CSE 6th Semester
CSE Department 21098116780054
Index
Sr.No Program Name Date Signature

1 Introduction to R programming and how to


install Rstudio and packages in R .

2 how to install Rstudio and packages in R


and also define a few basic commands to
get started. Like install package, find,help
,remove, description command

3 explore the dataset and write description of


the data set mtcars, air quality, air
passenger, iris and see the list of dataset
available in package and write description
of following dataset it includes number of
attributes and instance and apply summary
(), head(), tail(),str () function to these
dataset.

4 Write a R program to create a vector of a


specified type and length. Create vector of
numeric, complex, logical and character
types of length 6.

5 Write a program to find the sum ,mean and


product of a vector.

6 Write a R program to create a factor


corresponding to height of women data set
which contain height and weight for a
sample of women.

7 Write a R program to create a list of


elements using vectors, matrices and
functions. Print the content of the list.

8 Write a R program to create a Dataframes


which contain details of 5 employees and
display summary of the data.

9 Write a R program to create inner, outer,


left, right join(merge) from given two data
frames.

10 Write a R program to create a bell curve of


a random normal distribution.

11 Program to visualize data using bar chart


and box plot.

12 Program to visualize data using histogram


and scatter plot.

13 Write a R program to create a simple bar


plot of five subjects marks.
Practical No.1

Aim: Introduction to R programming and how to install R studio and packages in R

Introduction to R programming:

R is a programming language and free software developed by Ross Ihaka and Robert Gentleman in
1993. R possesses an extensive catalog of statistical and graphical methods. It includes machine
learning algorithms, linear regression, time series, statistical inference to name a few. Most of the
R libraries are written in R, but for heavy computational tasks, C, C++ and Fortran codes are
preferred.
R is not only entrusted by academic, but many large companies also use R programming language,
including Uber, Google, Airbnb, Facebook and so on.
Data analysis with R is done in a series of steps; programming, transforming, discovering,
modeling and communicate the results
Program: R is a clear and accessible programming tool.
Transform: R is made up of a collection of libraries designed specifically for data Science

Discover: Investigate the data, refine your hypothesis and analyze them.

Model: R provides a wide array of tools to capture the right model for your data
Communicate: Integrate codes, graphs, and outputs to a report with R Mark down or build Shiny apps
to share with the world

● What is R used for?


● Statistical inference
● Data analysis
● Machine learning algorithm
● Installation of R-Studio on windows:

Step – 1: With R-base installed, let’s move on to installing RStudio. To begin, go to download
RStudio and click on the download button for RStudio desktop.
Step – 2: Click on the link for the windows version of RStudio and save the .exe file.

Step – 3: Run the .exe and follow the installation instructions.

3.a. Click Next on the welcome window.

3.b. Enter/browse the path to the installation folder and click Next to proceed.
3.c. Select the folder for the start menu shortcut or click on do not create shortcuts
and then click Next.

3.d. Wait for the installation process to complete.


3.e. Click Finish to end the installation.

Install the R Packages


In RStudio, if you require a particular library, then you can go through the following
instructions:
• First, run RStudio.
• After clicking on the packages tab, click on install. The following dialog box will appear. In
the Install Packages dialog, write the package name you want to install under the Packages
field and then click install. This will install the package you searched for or give you a list of
matching packages based on your package text
Program No.2

Aim: How to install R packages and also define few basic commands to get started. Like
install package, find, help, remove, description command.

Packages:
The package is an appropriate way to organize the work and share it with others. Typically,
a package will include code (not only R code!), documentation for the package and the
functions inside, some tests to check everything works as it should, and data sets.

Packages in R:

Packages in R Programming language are a set of R functions, compiled code, and sample
data. These are stored under a directory called “library” within the R environment. By
default, R installs a group of packages during installation. Once we start the R console, only
the default packages are available by default. Other packages that are already installed
need to be loaded explicitly to be utilized by the R program that’s getting to use them.

DATA dplyr, tidyr,


MANAGEME foreign, haven etc.
NT

DATA ggplot2, ggvis,


VISUALIZTION lattice, igraph etc.

DATA PRODUCTS shiny, slidify, knitr,


markdown etc.

DATA MASS, forecast,


MODELLINGAND bootstrap, broom,
SIMULATION nlme, ROCR, party
etc.

Install the R Packages:

In RStudio, if you require a particular library, then you can go through the following instructions:

• First, run RStudio.


• After clicking on the packages tab, click on install. The following dialog box will appear.
• In the Install Packages dialog, write the package name you want to install under the
Packages field and then click install. This will install the package you searched for or give
you a list of matching packages based on your package text.

Few commands:

• Install an R-Package:
For installing package we need the name of the package and use the following command:
install.packages("package name")
Example:

install.packages("ggplot2")
• To check what packages are installed on your computer:
For installed packages we need the name of the package and use the following
command:
installed.packages(“package name”)
Example:
Installed.pakages(“ggplot2”)

• To remove a specific package:


For removing packages we need the name of the package and use the following command:

remove.packages(“package name”)

Example:
remove.packages(“ggplot2”)

• For description of a package:


To get details about packages we need the name of the package and use the followingcommand:

packageDescription(“package name”)

Example:

packageDescription(“ggplot2”)

• To find a package:
For finding packages we need the name of the package and use the followingcommand:
find.package(“package name”)

Example:
find.package(“ggplot2”)

• To install multiple packages at a time:


For installing more than one package we need the name of the packages and use thefollowing command:

install.packages(c(“package name”, “package name”))

Example:
install.packages(c(“ggplot2”, “tydr”, “dplyr”))

• To get help about a package:


To get help about packages we need the name of the package and use the followingcommand:
help(package, “package name”)

Example:
help(package, “dplyr”)

• Library of packages:
To get list of all packages we use the following command:
library()
Program No.3
Aim: Explore the Datasets and write description of dataset mtcars, airquality, iris and haireyecolor
and see the list of dataset available in package. Write description of following dataset it includes
number of attributes and instance and apply summary(), head(), tail(),str () function to these
dataset.

Datasets:
A dataset is a data collection presented in a table.The R programming language has tons of built-in datasets
that can generally be used as a demo data to illustrate how the R functions work.

In R, there are tons of datasets we can try but the mostly used built-in datasets are:

• airquality - New York Air Quality Measurements


• AirPassengers - Monthly Airline Passenger Numbers 1949-1960
• mtcars - Motor Trend Car Road Tests
• iris - Edgar Anderson's Iris Data

These are few of the most used built-in data sets. If you want to learn about other built-in datasets,
please visit The R Datasets Package.

Display R datasets
To display the dataset, we simply use the following commands:

o Direct name of dataset


o print(“dataset name”)
o dataset(“dataset name”)

Description of Datasets:

1. HairEyeColor:
Description:
Distribution of hair and eye color and sex in 592 statistics students.
Usage:
HairEyeColor

Format:
A 3-dimensional array resulting from cross-tabulating 592 observations on 3 variables. The
variables and their levels are as follows:
No Name Levels
1 Hair Black, Brown, Red, Blond 2 Eye Brown, Blue, Hazel, Green3 Sex Male, Female

Details:
The Hair \times× Eye table comes from a survey of students at the University of Delaware
reported by Snee (1974). The split by Sex was added by Friendly (1992a) for didactic
purposes.
This data set is useful for illustrating various techniques for the analysis of contingency
tables, such as the standard chi-squared test or, more generally, log-linear modelling, and
graphical methods such as mosaic plots, sieve diagrams or association plots.

2. Iris:
The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the
British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple
measurements in taxonomic problems as an example of linear discriminant analysis. It is
sometimes called Anderson's Iris data set because Edgar Anderson collected the data to
quantify the morphologic variation of Iris flowers of three related species. Two of the three
species were collected in the Gaspé Peninsula "all from the same pasture, and picked on
the same day and measured at the same time by the same person with the same
apparatus".

Usage:
iris
iris3

Format:
iris is a data frame with 150 cases (rows) and 5 variables (columns) named Sepal.Length,
Sepal.Width, Petal.Length, Petal.Width, and Species. iris3 gives the same data arranged as
a 3-dimensional array of size 50 by 4 by 3, as represented by S- PLUS. The first dimension
gives the case number within the species subsample, the second the measurements with
names Sepal L., Sepal W., Petal L., and Petal W., and the third the species.

3. Airquality:

Description
Daily air quality measurements in New York, May to September 1973.

Usage:
Airquality
Format:
A data frame with 153 observations on 6 variables.
[1] Ozone numeric Ozone (ppb)
[2] Solar.R numeric Solar R (lang)
[3] Wind numeric Wind (mph)
[4] Temp numeric Temperature (degrees F)
[5] Month numeric Month (1--12)
[6] Day numeric Day of month (1--31)
Daily readings of the following air quality values for May 1, 1973 (a Tuesday) to
September 30, 1973.
Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt Island Solar.R:
Solar radiation in Langleys in the frequency band 4000–7700 Angstroms from0800 to
1200hours at Central Park
Wind: Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia Airport
Temp: Maximum daily temperature in degrees Fahrenheit at La Guardia Airport

4. Mtcars:
Description:
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel
consumption and 10aspects of automobile design and performance for 32 automobiles
(1973–74 models).

Usage:

Mtcars

Format:

A data frame with 32 observations on 11 (numeric) variables.

[1] mpg Miles/(US) gallon


[2] cyl Number of cylinders
[3] disp Displacement (cu.in.)
[4] hpGross horsepower
[5] drat Rear axle
ratio [6]wt Weight
(1000 lbs)

[7] qsec 1/4 mile time


[8] vs Engine (0 = V-shaped, 1 = straight)
[9] am Transmission (0 = automatic, 1 = manual)
[10] gear Number of forward gears
[11] carb Number of carburetors

Few commands for Data exploration: Library

command:

Syntax: library(help = ‘datasets’)


Summary command:

Syntax: summary(dataset name)

Structure command: Syntax: str(dataset name)


View command:

Syntax: view(dataset name)

Head command:

Syntax: head(dataset name, n)


n = number of observations want to show
Tail command:

Syntax: tail(dataset name, n)


n = number of observations want to show

Column command:

Syntax: ncol(dataset name)

Row command:

Syntax: nrow(dataset name)

Data command:

Syntax: data(dataset name)


Practical No.4
Aim: Write a R program to create a vector of a specified type and length. Create vector of numeric,
complex, logical and character types of length 6.

x =
vector(“numeric”,6)
print(“Numeric
Type:”) print(x)
c=
vector(“complex”,6)
print(“Complex Type
:”) print(c)
l=vector(“logical”,6)
print(“Logical
Type:”) print(l)
chr =
vector(“character”,6)
print(“character Type:”)
print(chr)
Output:
Practical No.5

Aim: Write a program to find Sum, mean and product of a vector.

X= c(20,30,40,10)
print(“Sum

:”)
print(sum(X)
print(“Mean:”
)
print(mean(X)
print(“Produc
t:”)
print(prod(X)
Output:
Practical No. 6

Aim: Write a R program to create a factor corresponding to height ofwomen


data set which contain height and weight for sample of women.

INPUT

data = women
print("Women data set of height and
weights:") print(data)
height_f = cut(women$height,4)
print("Factor corresponding to
height:") print(table(height_f))
Output:
Program No. 7

Aim: Write a R program to create a list of elements using vectors, matrices and a functions. Print the content
of the list.

l = list(
c(1, 3, 4, 6, 7, 10),
month.abb,
matrix(c(2, -6, 1, -4), nrow =
2), asin
)
print("Content of the
list:") print(l)
Output:
Program No. 8

Aim: Write a R program to create a Dataframes which contain details of 5 employees and display
summary of the data.

Employees = data.frame(Name=c("Baliram","Anil","Muskan", "Neha","LAURA MARTIN"),


Gender=c("M","M","F","F","M"),
Age=c(21,23,18,20,32),
Designation=c("Clerk","Manager","Exective","CEO","ASSISTANT"),
SSN=c("123-34-2346","123-44-779","556-24-433","123-98-987","679-77-576")
)
print("Summary of the data:")
print(summary(Employees))
Output:
Program No. 9

Aim: Write a R program to create inner, outer, left, right join(merge) from given two data frames.

df1 = data.frame(numid = c(13, 15, 11, 12))

df2 = data.frame(numid = c(14, 16, 12,

13)) print("Left outer Join:")

result = merge(df1, df2, by = "numid", all.x = TRUE)

print(result)

print("Right outer Join:")

result = merge(df1, df2, by = "numid", all.y = TRUE)

print(result)

print("Outer Join:")

result = merge(df1, df2, by = "numid", all = TRUE)

print(result)

print("Cross Join:")

result = merge(df1, df2, by = NULL)

print(result)
Output:
Program No. 10

Aim: Write a R program to create bell curve of a random normal distribution.

n = floor(rnorm(10000, 500,
100)) t = table(n)
barplot(t)
Output:
Practical No. 11

Aim: WAP to visualize data using bar chart and box plot.

Bar Chart:
# x-axis values
x = c("A", "B", "C", "D",”E”)
# y-axis
values y =
c(1,3,6,7,8)
barplot(y, names.arg = x)

Box Plot:

boxplot(mpg ~ cyl, data = mtcars, xlab = "Quantity of Cylinders", ylab


= "Miles Per Gallon", main = "R Boxplot Example")
Output:
Practical No. 12
Aim: WAP to visualize data using histogram and scatterplot. Histogram:

v = c(60,10,39,17,55,13,21,38,16,24,12)
hist(v,xlab = "Weight",ylab="Frequency",col = "yellow",border = "red")

Scatterplot:

x <- c(4,6,7,7,1,2,10,3,11,12,9,8)
y <- c(99,85,86,88,150,105,88,94,78,77,80,100)
plot(x, y, main="Observation of Cars", xlab="Car age", ylab="Car speed")
Output:
Program No. 13

Aim: Write a R program to create a simple bar plot of five subjects marks.

marks = c(75, 90, 85,


70) barplot(marks,
main = "Comparing marks of 5
subjects", xlab = "Marks",
ylab = "Subject",
names.arg = c("Science", "English", "GK", "Hindi"),
col = "darkred",
horiz = FALSE)
Output:

You might also like