Sanju - R
Sanju - R
LANGUAGE
PRACTICAL FILE
Introduction to R programming:
R is a programming language and free software developed by Ross Ihaka and Robert Gentleman in
1993. R possesses an extensive catalog of statistical and graphical methods. It includes machine
learning algorithms, linear regression, time series, statistical inference to name a few. Most of the
R libraries are written in R, but for heavy computational tasks, C, C++ and Fortran codes are
preferred.
R is not only entrusted by academic, but many large companies also use R programming language,
including Uber, Google, Airbnb, Facebook and so on.
Data analysis with R is done in a series of steps; programming, transforming, discovering,
modeling and communicate the results
Program: R is a clear and accessible programming tool.
Transform: R is made up of a collection of libraries designed specifically for data Science
Discover: Investigate the data, refine your hypothesis and analyze them.
Model: R provides a wide array of tools to capture the right model for your data
Communicate: Integrate codes, graphs, and outputs to a report with R Mark down or build Shiny apps
to share with the world
3.b. Enter/browse the path to the installation folder and click Next to proceed.
3.c. Select the folder for the start menu shortcut or click on do not create shortcuts
and then click Next.
Aim: How to install R packages and also define few basic commands to get started. Like
install package, find, help, remove, description command.
Packages:
The package is an appropriate way to organize the work and share it with others. Typically,
a package will include code (not only R code!), documentation for the package and the
functions inside, some tests to check everything works as it should, and data sets.
Packages in R:
Packages in R Programming language are a set of R functions, compiled code, and sample
data. These are stored under a directory called “library” within the R environment. By
default, R installs a group of packages during installation. Once we start the R console, only
the default packages are available by default. Other packages that are already installed
need to be loaded explicitly to be utilized by the R program that’s getting to use them.
In RStudio, if you require a particular library, then you can go through the following instructions:
Few commands:
• Install an R-Package:
For installing package we need the name of the package and use the following command:
install.packages("package name")
Example:
install.packages("ggplot2")
• To check what packages are installed on your computer:
For installed packages we need the name of the package and use the following
command:
installed.packages(“package name”)
Example:
Installed.pakages(“ggplot2”)
remove.packages(“package name”)
Example:
remove.packages(“ggplot2”)
packageDescription(“package name”)
Example:
packageDescription(“ggplot2”)
• To find a package:
For finding packages we need the name of the package and use the followingcommand:
find.package(“package name”)
Example:
find.package(“ggplot2”)
Example:
install.packages(c(“ggplot2”, “tydr”, “dplyr”))
Example:
help(package, “dplyr”)
• Library of packages:
To get list of all packages we use the following command:
library()
Program No.3
Aim: Explore the Datasets and write description of dataset mtcars, airquality, iris and haireyecolor
and see the list of dataset available in package. Write description of following dataset it includes
number of attributes and instance and apply summary(), head(), tail(),str () function to these
dataset.
Datasets:
A dataset is a data collection presented in a table.The R programming language has tons of built-in datasets
that can generally be used as a demo data to illustrate how the R functions work.
In R, there are tons of datasets we can try but the mostly used built-in datasets are:
These are few of the most used built-in data sets. If you want to learn about other built-in datasets,
please visit The R Datasets Package.
Display R datasets
To display the dataset, we simply use the following commands:
Description of Datasets:
1. HairEyeColor:
Description:
Distribution of hair and eye color and sex in 592 statistics students.
Usage:
HairEyeColor
Format:
A 3-dimensional array resulting from cross-tabulating 592 observations on 3 variables. The
variables and their levels are as follows:
No Name Levels
1 Hair Black, Brown, Red, Blond 2 Eye Brown, Blue, Hazel, Green3 Sex Male, Female
Details:
The Hair \times× Eye table comes from a survey of students at the University of Delaware
reported by Snee (1974). The split by Sex was added by Friendly (1992a) for didactic
purposes.
This data set is useful for illustrating various techniques for the analysis of contingency
tables, such as the standard chi-squared test or, more generally, log-linear modelling, and
graphical methods such as mosaic plots, sieve diagrams or association plots.
2. Iris:
The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the
British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple
measurements in taxonomic problems as an example of linear discriminant analysis. It is
sometimes called Anderson's Iris data set because Edgar Anderson collected the data to
quantify the morphologic variation of Iris flowers of three related species. Two of the three
species were collected in the Gaspé Peninsula "all from the same pasture, and picked on
the same day and measured at the same time by the same person with the same
apparatus".
Usage:
iris
iris3
Format:
iris is a data frame with 150 cases (rows) and 5 variables (columns) named Sepal.Length,
Sepal.Width, Petal.Length, Petal.Width, and Species. iris3 gives the same data arranged as
a 3-dimensional array of size 50 by 4 by 3, as represented by S- PLUS. The first dimension
gives the case number within the species subsample, the second the measurements with
names Sepal L., Sepal W., Petal L., and Petal W., and the third the species.
3. Airquality:
Description
Daily air quality measurements in New York, May to September 1973.
Usage:
Airquality
Format:
A data frame with 153 observations on 6 variables.
[1] Ozone numeric Ozone (ppb)
[2] Solar.R numeric Solar R (lang)
[3] Wind numeric Wind (mph)
[4] Temp numeric Temperature (degrees F)
[5] Month numeric Month (1--12)
[6] Day numeric Day of month (1--31)
Daily readings of the following air quality values for May 1, 1973 (a Tuesday) to
September 30, 1973.
Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt Island Solar.R:
Solar radiation in Langleys in the frequency band 4000–7700 Angstroms from0800 to
1200hours at Central Park
Wind: Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia Airport
Temp: Maximum daily temperature in degrees Fahrenheit at La Guardia Airport
4. Mtcars:
Description:
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel
consumption and 10aspects of automobile design and performance for 32 automobiles
(1973–74 models).
Usage:
Mtcars
Format:
command:
Head command:
Column command:
Row command:
Data command:
x =
vector(“numeric”,6)
print(“Numeric
Type:”) print(x)
c=
vector(“complex”,6)
print(“Complex Type
:”) print(c)
l=vector(“logical”,6)
print(“Logical
Type:”) print(l)
chr =
vector(“character”,6)
print(“character Type:”)
print(chr)
Output:
Practical No.5
X= c(20,30,40,10)
print(“Sum
:”)
print(sum(X)
print(“Mean:”
)
print(mean(X)
print(“Produc
t:”)
print(prod(X)
Output:
Practical No. 6
INPUT
data = women
print("Women data set of height and
weights:") print(data)
height_f = cut(women$height,4)
print("Factor corresponding to
height:") print(table(height_f))
Output:
Program No. 7
Aim: Write a R program to create a list of elements using vectors, matrices and a functions. Print the content
of the list.
l = list(
c(1, 3, 4, 6, 7, 10),
month.abb,
matrix(c(2, -6, 1, -4), nrow =
2), asin
)
print("Content of the
list:") print(l)
Output:
Program No. 8
Aim: Write a R program to create a Dataframes which contain details of 5 employees and display
summary of the data.
Aim: Write a R program to create inner, outer, left, right join(merge) from given two data frames.
print(result)
print(result)
print("Outer Join:")
print(result)
print("Cross Join:")
print(result)
Output:
Program No. 10
n = floor(rnorm(10000, 500,
100)) t = table(n)
barplot(t)
Output:
Practical No. 11
Aim: WAP to visualize data using bar chart and box plot.
Bar Chart:
# x-axis values
x = c("A", "B", "C", "D",”E”)
# y-axis
values y =
c(1,3,6,7,8)
barplot(y, names.arg = x)
Box Plot:
v = c(60,10,39,17,55,13,21,38,16,24,12)
hist(v,xlab = "Weight",ylab="Frequency",col = "yellow",border = "red")
Scatterplot:
x <- c(4,6,7,7,1,2,10,3,11,12,9,8)
y <- c(99,85,86,88,150,105,88,94,78,77,80,100)
plot(x, y, main="Observation of Cars", xlab="Car age", ylab="Car speed")
Output:
Program No. 13
Aim: Write a R program to create a simple bar plot of five subjects marks.