DSC551 (PROGRAMMING FOR DATA SCIENCE)
CHAPTER 1
INTRODUCTION TO R PROGRAMMING
PREPARED BY: DR NIK NUR FATIN FATIHAH BINTI SAPRI
DSC551 (PROGRAMMING FOR DATA SCIENCE)
What is R programming?
R is a programming language and environment specifically designed for
statistical computing and data analysis.
R provides a wide variety of statistical analysis(linear and nonlinear
modelling, classical statistical tests, time-series analysis, classification,
clustering, …) and graphical techniques, and is highly extensible.
DSC551 (PROGRAMMING FOR DATA SCIENCE)
How to install R programming?
Instructions
YOU CAN CHOOSE EITHER TO USE R OR R STUDIO
1) Go to
https://www.rstudio.com/p
roducts/rstudio/download/
https://cran.r- 2) In ‘Installers for Supported
project.org/bin/windows/ba
se/R-4.3.1-win.exe Platforms’ section, choose and
TYPE OF R click the R Studio installer
PROGRAMMING based on your operating
INSTALLATION system. The download should
https://posit.co/download/r begin as soon as you click.
studio-desktop/
3) Click Next..Next..Finish.
4)Download Complete.
DSC551 (PROGRAMMING FOR DATA SCIENCE)
Features of R Studio
R SCRIPT R CONSOLE
GRAPHICAL PLOT
R ENVIRONMENT
DSC551 (PROGRAMMING FOR DATA SCIENCE)
Features of R (Old Version)
R SCRIPT R CONSOLE
R PROMPT Note that, “+” symbol indicates incomplete
command
DSC551 (PROGRAMMING FOR DATA SCIENCE)
Why Learn R?
Availability and
open source
tool
Widely used for
statistical data It is very flexible
analysis, data and highly
manipulation, data customizable.
mining and machine
learning.
Excellent for Popular for
data academic
visualization research
Has a large
community
users
DSC551 (PROGRAMMING FOR DATA SCIENCE)
Application of R?
Sentiment Stock-Market Behaviour Software Business
Analysis Modeling Analysis Development Intelligence
Fraud Weather
detection Forecasting
Record Keeping
DSC551 (PROGRAMMING FOR DATA SCIENCE)
How to install R packages
1) Select the CRAN mirror (country 4) Choose package to be
2) Choose any country server 3) Select Install packages
server) installed
Otherwise, after selecting the CRAN
mirror, you can just type
install.packages(“packageName”)
Eg: install.packages(“MASS”)
DSC551 (PROGRAMMING FOR DATA SCIENCE)
How to install R packages
1) At the graphical output pane, click on “Packages” ribbon
3) If the package is NOT LISTED, you can click at the Install
ribbon, and begin to type package name in the search field.
1
2) Type package name in the search field and check (/) in the box
TYPE HERE 2
1 3
2
DSC551 (PROGRAMMING FOR DATA SCIENCE)
Basic Built-in functions in R
Functions Description
q() To quit R
c() To combine elements
rm(object) To remove the object from the current environment
str(object) To check the structure of the data
objects() To list the names of all objects
typeof(object) To check type of data/object
length(object) To check the number of observations
is.numeric(object) To check if the data is numeric
is.na(object) To check missing data/values
is.character(object) To check if the data is character/non numeric
mode(object) To the check the mode/type of the data
mean(object) To compute the mean of a certain numerical object
DSC551 (PROGRAMMING FOR DATA SCIENCE)
Type of Data/Object in R
Data Types Description Example
Also known as boolean data type TRUE and FALSE
Logical
Represents all real numbers with or 12.3, 5, 999
Numeric
without decimal values.
Specifies real values without decimal -8, 0, 40, 100, 888
Integer
points.
used to specify purely imaginary values in 3 + 2i
Complex
R
“A”, “Apple”, “Hello”,
Specifies character or string values in a
Character “Welcome to Programming
variable.
Class”
You can use class ( ) function to check on its data type
DSC551 (PROGRAMMING FOR DATA SCIENCE)
Type of Operators in R
Arithmetic Relational Logical Assignment
• + • < • ! • Left
• - • > • & assignment:
• / • <= • && <-, =, <<-
• * • >= • | • Right
assignment:
• ^ • == • ||
-> , ->>
• %% • !=
• %/%
DSC551 (PROGRAMMING FOR DATA SCIENCE)
Assignment
Operator Description Example
Left To assign values to data/object/vectors. x<-6
assignment: x=6
<-, =, <<- x<<-6
Right To assign values to data/object/vectors. 5->x
assignment: 4->>x
-> , ->>
DSC551 (PROGRAMMING FOR DATA SCIENCE)
Arithmetic Given x=10 and y=2,complete the following output in the following table:
Operators Description Example/output
+ Addition x+y
- Subtraction x-y
* Multiplication x*y
/ Division y/x
^ Exponentiation x^y
%% Remainder of division y%%x
%/% Integer division Y%/%x
DSC551 (PROGRAMMING FOR DATA SCIENCE)
Relational Given x=5 and y=16,complete the following output in the following table:
Operators Description Example
< Less than x<y
> Greater than x>y
<= Less than or equal to x<=y
>= Greater than or equal to x>=y
== Equal to x==y
!= Not equal to x!=y
DSC551 (PROGRAMMING FOR DATA SCIENCE)
Given k=c(TRUE,FALSE,0,6) and m=c(FALSE,TRUE,FALSE,TRUE),complete
Logical the following output in the following table:
Operators Description Example/output
! Logical NOT (gives the opposite logical value.) !k
& Logical AND k&m
Logical AND. Takes first element of both the
&& k&&m
vectors and gives the TRUE only if both are TRUE.
| Logical OR k|m
Logical OR. Takes first element of both the vectors
|| k||m
and gives the TRUE if one of them is TRUE.
DSC551 (PROGRAMMING FOR DATA SCIENCE)
Computations in R Complete the following table:
Functions Output Notes
abs(3-6) abs ( ) – to find absolute value
sqrt(16)
3^10
exp(1.7)
log(10)
log10(100)
pi
round(pi,2)
floor(14.7)
ceiling(14.7)
DSC551 (PROGRAMMING FOR DATA SCIENCE)
Exercises
a) Install “fBasics” package in R/ R studio.
b) List TWO (2) types of data/object. Provide example for each of the types of data/object listed.
c) State TWO (2) advantages of R programming language.
d) Write the following expression in R.
e) Name the built-in functions used to tabulate/summarize the data according to classes/groups.
f) Given that the circumference of a cake is, c= 11.3563 cm. Use this information to answer the
following questions:
i. c1=round(c,2)
ii. floor(c1)
iii. ceiling(c1)