Unit1-R (1)
Unit1-R (1)
Unit1-R (1)
For data analysis, we need proper tools. Extracting patterns directly from a large set of
numbers that have been aligned in rows and columns is almost impossible. To work with
data, we need tools such as R to boost the productivity.
Introduction to R
The R programming language originated in 1993. The adoption of R started in the data-
related research industry has been growing rapidly for the last decade. Today R
programming language has become the lingua franca of data science.
R as a programming language
R programming language has been evolving and developing over the last 20 years. The goal
is to make the language easy and flexible so that complex statistical computing, data
exploration, and visualization operations can be performed.
The ease of use and flexibility are conflicting goals. A programming language can help finish
a variety of statistical analysis tasks by clicking a few buttons, but it won't be flexible if you
need customization, automation, and your work needs to be reproducible. On the other
hand, a programming language can flexible so that you can transform data and make
complicated graphs but it may not be easy to learn. R is known for its well-positioned
balance.
R as a computing environment
R is lightweight and ready to use. R is smaller and easier to deploy in comparison to other
statistical software, for example, Matlab and SAS.
R Programming Language has gained its importance in the data science community for the
following reasons:
Free of charge: R is available free of charge. In other words, for the installation and using for
commercial use, you need not buy a license.
This is a confidential publication. All rights reserved. This document may not, in a whole 1
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
Open-source: R is open source. Thousands of developers around the globe have been
working constantly to add new packages, review the source code, and fix the bugs. The
source code is also available so that you can dig in the source code to fix any bug or improve
the functionality of the packages.
Popular: R is a popular programming language for statistical analysis, data mining, analysis,
and visualization.
Reproducible: When using software based on a graphical user interface, you only need to
choose from menus and click buttons. However, it is hard to accurately reproduce what you
have done automatically without writing scripts.
Rich Online Resources: R is known for the huge, rapidly increasing number of online
resources. There are more than 7,500 packages available at CRAN (short for Comprehensive
R Archive Network), a worldwide network of mirror servers from which you can get
identical, up-to-date, R distributions and packages.
Strong community: The community of R consists of not only R developers but also, (the
majority), R users from a wide range of backgrounds such as statistics, econometrics,
finance, bioinformatics, mechanical engineering, physics, medicine, and so on.
A great number of R developers actively contribute to open source projects or packages
written in R. The goal of the community is to make data analysis, exploration, and
visualization easier and more interesting.
Installing R
If you are Windows user, you can download an installer for the latest version. Then run the
Windows installer to install R. Even though the installation process is easy, many users face
issues during the installation.
When choosing the components to install, in the Windows drop-down, the installer would
display four components. Install the default options as shown below
This is a confidential publication. All rights reserved. This document may not, in a whole 2
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
Next step is to select additional tasks. Select the default options.
This is a confidential publication. All rights reserved. This document may not, in a whole 3
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
Now R has been installed on your system. You can either use R in the command prompt or
in the R GUI.
Even though, you can directly start using R, we recommend RStudio for editing and
debugging R scripts. R is the backend and RStudio is the front end.
RStudio
The user interface for R programming is RStudio. It is open-source and it is available for free
for multiple platforms such as Windows, Mac, and Linux.
RStudio is known for its powerful features to boost productivity in data analysis and
visualization. RStudio support various advanced features such as syntax highlighting,
autocompletion, multi-tabbed views, file management, graphics viewport, package
management, integrated help viewer, code formatting, version control, interactive
debugging, and many more.
The screenshot of the user interface of RStudio for the Windows operating system is given
below. The main window consists of several parts. Each part is known as a pane. Each part
performs a different function. The panes have been designed to help data analysts work
with the data.
The Console
The R Console is also embedded in RStudio. It works like a command prompt or terminal.
The commands that you type at the console, would be submitted to R engine by RStudio. R
engine is responsible for executing the commands. RStudio takes the inputs from the user to
R engine and presents the results back to the user.
You can use console to execute a command, define a variable, or evaluate an expression
interactively to compute a statistical measure, transform data, or produce charts.
This is a confidential publication. All rights reserved. This document may not, in a whole 5
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
The Editor
While working with data, we not only type commands at the console but also write scripts, a
set of commands that represent a logic flow, at the editor. The editor is useful for editing R
scripts, markdown documents, web pages, and many types of other configuration files.
The code editor is a more advanced editor than a plain text editor. It supports advanced
functionalities such as syntax highlighting, autocompletion of R Code, and debugging with
the breakpoint. You may also use the following shortcut keys:
The environment pane exhibits the variables and functions that have been created and that
are available for repeated use. By default, variables are shown in the global environment,
which is the user workspace where you are working.
Whenever you create a new object, you can find a new entry in the Environment pane. You
can see the variable name and the short description of its values. When you change the
value of a symbol, the change is reflected in the environment pane.
This is a confidential publication. All rights reserved. This document may not, in a whole 6
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
The history pane
You can see previous expressions evaluated in the console. In the history pane, you can
repeat the task that were performed previously by simply pressing up in the console.
In the file pane, you can see the files in the folder whereas you can navigate between the
folders, create new folders, delete or rename the folders and files. When you work on the
RStudio project, you can view and organize the project files in the File pane
This is a confidential publication. All rights reserved. This document may not, in a whole 7
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
The plots pane
You can use the plots pane to see the graphics produced by R code. If there is more than
one plot, previous plots are stored. You can view all the plots by navigating back and forth.
You can view all the installed packages in the package pane. You can use CRAN to install or
update the package or you can remove an existing package from your library.
This is a confidential publication. All rights reserved. This document may not, in a whole 8
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
The Help Pane
R platform provides a detailed documentation. You can find the documentation in the Help
Pane. Using this documentation, you can learn how to use the functions.
• Type the function name in the Search box and find it directly
• Type the function name in the console and press F1
• Type ? before the function name and execute it
In practice, you don't have to remember all of R's functions; you only need to remember
how to get help with a function you are not familiar with.
This is a confidential publication. All rights reserved. This document may not, in a whole 9
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
The Viewer Pane
The Viewer pane is a new feature; it was introduced as an increasing number of R packages
combine the functionality of both R and existing JavaScript libraries to make rich and
interactive presentations of data.
This is a confidential publication. All rights reserved. This document may not, in a whole 10
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.