[go: up one dir, main page]

0% found this document useful (0 votes)
0 views10 pages

Unit1-R (1)

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 10

1.

Getting Started with R

For data analysis, we need proper tools. Extracting patterns directly from a large set of
numbers that have been aligned in rows and columns is almost impossible. To work with
data, we need tools such as R to boost the productivity.

Introduction to R

R programming language is used in statistical computing, data exploration, analysis, and


visualization. R is free, open-source and it has a strong and rapidly growing community. It
has more than 17000 packages that enable R to deal with problems in a wide range of fields

The R programming language originated in 1993. The adoption of R started in the data-
related research industry has been growing rapidly for the last decade. Today R
programming language has become the lingua franca of data science.

R is not just a programming language. It is a comprehensive computing environment that is


supported by a strong and active community and has a rapidly growing and expanding
ecosystem.

R as a programming language

R programming language has been evolving and developing over the last 20 years. The goal
is to make the language easy and flexible so that complex statistical computing, data
exploration, and visualization operations can be performed.

The ease of use and flexibility are conflicting goals. A programming language can help finish
a variety of statistical analysis tasks by clicking a few buttons, but it won't be flexible if you
need customization, automation, and your work needs to be reproducible. On the other
hand, a programming language can flexible so that you can transform data and make
complicated graphs but it may not be easy to learn. R is known for its well-positioned
balance.

R as a computing environment

R is lightweight and ready to use. R is smaller and easier to deploy in comparison to other
statistical software, for example, Matlab and SAS.

The need for R

R Programming Language has gained its importance in the data science community for the
following reasons:

Free of charge: R is available free of charge. In other words, for the installation and using for
commercial use, you need not buy a license.
This is a confidential publication. All rights reserved. This document may not, in a whole 1
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
Open-source: R is open source. Thousands of developers around the globe have been
working constantly to add new packages, review the source code, and fix the bugs. The
source code is also available so that you can dig in the source code to fix any bug or improve
the functionality of the packages.

Popular: R is a popular programming language for statistical analysis, data mining, analysis,
and visualization.

Flexible: R supports dynamic scripting. It allows programming styles in multiple paradigms,


including functional programming and object-oriented programming. It also supports
flexible metaprogramming. Its flexibility enables you to perform highly customized and
comprehensive data transformation and visualization.

Reproducible: When using software based on a graphical user interface, you only need to
choose from menus and click buttons. However, it is hard to accurately reproduce what you
have done automatically without writing scripts.

Rich Online Resources: R is known for the huge, rapidly increasing number of online
resources. There are more than 7,500 packages available at CRAN (short for Comprehensive
R Archive Network), a worldwide network of mirror servers from which you can get
identical, up-to-date, R distributions and packages.

Strong community: The community of R consists of not only R developers but also, (the
majority), R users from a wide range of backgrounds such as statistics, econometrics,
finance, bioinformatics, mechanical engineering, physics, medicine, and so on.
A great number of R developers actively contribute to open source projects or packages
written in R. The goal of the community is to make data analysis, exploration, and
visualization easier and more interesting.

Installing R

You can install R from official website (https://www.r-project.org/), download R


(https://cran.r-project.org/mirrors.html), choose a nearby mirror (For India
https://mirror.niser.ac.in/cran/), download a version for your operating system, select base
as subdirectory, and click on “Download R 3.2.3 for Windows”. The latest version while
writing this content is 3.2.3. It may be different when you are trying to install R.

If you are Windows user, you can download an installer for the latest version. Then run the
Windows installer to install R. Even though the installation process is easy, many users face
issues during the installation.

When choosing the components to install, in the Windows drop-down, the installer would
display four components. Install the default options as shown below

This is a confidential publication. All rights reserved. This document may not, in a whole 2
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
Next step is to select additional tasks. Select the default options.

Now installation starts to copy the files on your hard drive

This is a confidential publication. All rights reserved. This document may not, in a whole 3
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
Now R has been installed on your system. You can either use R in the command prompt or
in the R GUI.

Even though, you can directly start using R, we recommend RStudio for editing and
debugging R scripts. R is the backend and RStudio is the front end.

Windows users may also install Rtools from http://cran.rstudio.com/bin/windows/Rtools/.


You can write C++ code, compile and call it in R. You can also use C/C++ code from other
sources.

RStudio

The user interface for R programming is RStudio. It is open-source and it is available for free
for multiple platforms such as Windows, Mac, and Linux.

RStudio is known for its powerful features to boost productivity in data analysis and
visualization. RStudio support various advanced features such as syntax highlighting,
autocompletion, multi-tabbed views, file management, graphics viewport, package
management, integrated help viewer, code formatting, version control, interactive
debugging, and many more.

RStudio can be downloaded from https://www.rstudio.com/products/rstudio/download.


The preview version with new features can be downloaded from
https://www.rstudio.com/products/rstudio/download/preview. Note that RStudio does not
include R, so you need to make sure that you have R installed while working in RStudio.
This is a confidential publication. All rights reserved. This document may not, in a whole 4
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
Once you complete the installation of RStudio, you see the following user interface of
RStudio.

RStudio’s User Interface

The screenshot of the user interface of RStudio for the Windows operating system is given
below. The main window consists of several parts. Each part is known as a pane. Each part
performs a different function. The panes have been designed to help data analysts work
with the data.

The Console

The R Console is also embedded in RStudio. It works like a command prompt or terminal.
The commands that you type at the console, would be submitted to R engine by RStudio. R
engine is responsible for executing the commands. RStudio takes the inputs from the user to
R engine and presents the results back to the user.

You can use console to execute a command, define a variable, or evaluate an expression
interactively to compute a statistical measure, transform data, or produce charts.

This is a confidential publication. All rights reserved. This document may not, in a whole 5
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
The Editor

While working with data, we not only type commands at the console but also write scripts, a
set of commands that represent a logic flow, at the editor. The editor is useful for editing R
scripts, markdown documents, web pages, and many types of other configuration files.

The code editor is a more advanced editor than a plain text editor. It supports advanced
functionalities such as syntax highlighting, autocompletion of R Code, and debugging with
the breakpoint. You may also use the following shortcut keys:

Ctrl + Enter – Execute the selected line


Ctrl + Shift + S – source the current document. Evaluate all the expressions in the
current document
Tab or Ctrl + Space – Autocompletion list of variables and function, matching as you
type
Breakpoint - You can click on the left margin of a line number to set a breakpoint. When
you execute the script the program will pause at this line and wait for you to debug.

The environment pane

The environment pane exhibits the variables and functions that have been created and that
are available for repeated use. By default, variables are shown in the global environment,
which is the user workspace where you are working.

Whenever you create a new object, you can find a new entry in the Environment pane. You
can see the variable name and the short description of its values. When you change the
value of a symbol, the change is reflected in the environment pane.

This is a confidential publication. All rights reserved. This document may not, in a whole 6
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
The history pane

You can see previous expressions evaluated in the console. In the history pane, you can
repeat the task that were performed previously by simply pressing up in the console.

The File Pane

In the file pane, you can see the files in the folder whereas you can navigate between the
folders, create new folders, delete or rename the folders and files. When you work on the
RStudio project, you can view and organize the project files in the File pane

This is a confidential publication. All rights reserved. This document may not, in a whole 7
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
The plots pane

You can use the plots pane to see the graphics produced by R code. If there is more than
one plot, previous plots are stored. You can view all the plots by navigating back and forth.

The package pane

You can view all the installed packages in the package pane. You can use CRAN to install or
update the package or you can remove an existing package from your library.

This is a confidential publication. All rights reserved. This document may not, in a whole 8
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
The Help Pane

R platform provides a detailed documentation. You can find the documentation in the Help
Pane. Using this documentation, you can learn how to use the functions.

Ways to view the documentation of a function are:

• Type the function name in the Search box and find it directly
• Type the function name in the console and press F1
• Type ? before the function name and execute it
In practice, you don't have to remember all of R's functions; you only need to remember
how to get help with a function you are not familiar with.

This is a confidential publication. All rights reserved. This document may not, in a whole 9
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.
The Viewer Pane

The Viewer pane is a new feature; it was introduced as an increasing number of R packages
combine the functionality of both R and existing JavaScript libraries to make rich and
interactive presentations of data.

This is a confidential publication. All rights reserved. This document may not, in a whole 10
or in part, be copied, reproduced, translated, photocopied, or reduced to any medium
without prior and express written consent from Samatrix Consulting Pvt Ltd.

You might also like