R Lect1 Introduction

This document introduces R programming for data science. It discusses how data science involves analyzing large volumes of diverse data like images, text, and sensor data to find patterns. These patterns are used to create models that can predict and describe data. The document then outlines the key steps in knowledge discovery in databases (KDD): data collection, preprocessing, transformation, data mining to generate models, and interpretation. Finally, it discusses common data mining methods like classification, regression, clustering, association analysis, and visualization that are used to extract patterns from data.

Uploaded by

Aakash Raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views16 pages

R Lect1 Introduction

Uploaded by

Aakash Raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

R PROGRAMMING FOR DATA

SCIENCES
Dr. Athira B
Lect1 - Introduction
Data Science
• “Analyze a huge volume of data about a
specific problem with the purpose of creating
patterns in scientific fields like Statistics,
Machine Learning and Pattern Recognition”.
• Those patterns, found in multiple forms like
associations, anomalies, clusters, classes etc
• These patterns, also termed, as models
• Data generation: shopping cart data, medical
records, social media announcements,
banking and stock market operations and so
on.
• wide variety of types - images, videos, real
time data, DNA sequences – Big Data
• Methods and tools used - Hadoop, Map-
Reduce, Hive, MongoDB, GraphPD
• The two main goals of practical Data Science
are to create models, which can be used both
in predicting and describing data.
KNOWLEDGE DISCOVERY IN DATABASES (KDD)

• 5 steps: Data Collection, Preprocessing,

Transformation, Data Mining,
Interpretation/Evaluation
1. DATA COLLECTION
– by using sensors or not automatically e.g. via a
questionnaire
2. PREPROCESSING
– cleansing data: handling defective, false or missing
data.
3. TRANSFORMATION
– converting data under a common frame allowing us to
edit them later. It is mostly used for smoothing data
and removing noise.
4. DATA MINING
– an algorithm is used for model generation. Clean
and transformed data are now ready to be used
by an algorithm in order to create a model,
usually for categorization or prediction.
5. INTERPRETATION AND EVALUATION
– interpret and evaluate the results
• Examples
• By using data from older recorded temperatures during the
summer season of the previous 15 years, we try to predict
the temperatures for the summer season of the next 15
years.
• Telecommunication companies not only reward clients who
spend lots of money but also clients named as “guides”.
• After 9/11, Bill Clinton announced that after examining lots
of databases, FBI agents discovered that 5 of the
perpetrators were registered to these databases. One of
them owned 30 credit cards with a negative balance of
$250.000 and lived in US for less than two years.
• Finding a phone number from a phonebook
• Finding information about Paris on the
internet
• Finding the average of exams grades
• Searching for the medical records of a
patience with a particular disease, in order to
further analyze his medical record.
Data Mining Methods
• Depending on the data types and the type of knowledge
extracted, they are classified in different categories.
1. CLASSIFICATION:
– a predictive method – ‘classifier’
• In classification, the outcome we want to predict is the
class of the samples.
• A class can have discrete values from a finite set.
• On the contrary, during prediction with methods like
regression,
• the variable-goal could be any real number
2. REGRESSION
• Regression is a similar to classification process,
whose goal is learning or else training a
function.
• It is also a predictive method.
• By using some independent variables its goal
is to predict the values of a dependent
variable.
• The variables in this example are the square meters of a house and the
selling price in thousands of dollars.
• Linear regression adapts a line in the samples of the dataset
• By having the optimal line we can then estimate pretty accurately
questions like: “Which is the selling price for 150 square meters houses?”.
3. CLUSTERING
• Clustering is a descriptive method.
• Given a dataset, the goal of clustering is to
create clusters (groups with the same or
similar features).
4. EXTRACTION AND ASSOCIATION ANALYSIS
• These association rules discover hidden relationships
between features of a dataset.
• A classic example of association rules in practice has to do
with the analysis of a shopping cart in a super market, where
data have to do with clients transactions.
• Eg: some transactions - {bread, milk}, {bread, diapers, beer,
eggs}, {milk, diapers, beer, soda}, {bread, milk, diapers, beer}
and {bread, milk, diapers, soda}.
• it’s quite possible that whoever buys milk and bread might
also buy eggs and soda.
5. VISUALIZATION
• Data visualization helps in better understanding
not only the data themselves but also correlations
that might occur between them.
6. ANOMALY DETECTION
• Anomaly detection focuses in finding deviations in
data according to similar data collected in the past
or by typical values of these data
• Some other examples of anomaly detection are the
following:
• Fraud detection based on a user profile
• Finding dysfunctional objects in industrial production

Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
54 pages
DM Unit 1
No ratings yet
DM Unit 1
10 pages
Data Mining 1
No ratings yet
Data Mining 1
7 pages
Ba Unit 3 Own
No ratings yet
Ba Unit 3 Own
7 pages
Data Mining
No ratings yet
Data Mining
7 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
Data Mining: © Pearson Education Limited 1995, 2005
No ratings yet
Data Mining: © Pearson Education Limited 1995, 2005
50 pages
DSand ML
No ratings yet
DSand ML
76 pages
Wk. 1. Introduction (08.10.2020)
No ratings yet
Wk. 1. Introduction (08.10.2020)
30 pages
Unit No 3
No ratings yet
Unit No 3
10 pages
Internship Report: T.J.Instituteoftechnology
No ratings yet
Internship Report: T.J.Instituteoftechnology
29 pages
Dr. Gaurav Dixit: Department of Management Studies
No ratings yet
Dr. Gaurav Dixit: Department of Management Studies
26 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
Free Data Science Course Material 2018
No ratings yet
Free Data Science Course Material 2018
32 pages
Week 4 - Introduction To Data Mining and Data Mining Techniques
No ratings yet
Week 4 - Introduction To Data Mining and Data Mining Techniques
44 pages
2 Buss Intel Analytics
No ratings yet
2 Buss Intel Analytics
43 pages
Data Similarity and Dissimilarity
No ratings yet
Data Similarity and Dissimilarity
73 pages
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R Online Version
100% (3)
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R Online Version
123 pages
DWDM Unit 1 Part 1
No ratings yet
DWDM Unit 1 Part 1
35 pages
Data Science & Analytics Basics
No ratings yet
Data Science & Analytics Basics
71 pages
មេរៀនទី១
No ratings yet
មេរៀនទី១
40 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
7 pages
Ds Final
No ratings yet
Ds Final
3 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
6 pages
Data Mining Transparencies
No ratings yet
Data Mining Transparencies
50 pages
Unit 1
No ratings yet
Unit 1
59 pages
Data Mining Chapter 1 Notes
100% (1)
Data Mining Chapter 1 Notes
40 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Introduction To Data Mining Unit1
No ratings yet
Introduction To Data Mining Unit1
37 pages
Comp 6838
No ratings yet
Comp 6838
41 pages
DTS Modul Data Science Methodology
100% (1)
DTS Modul Data Science Methodology
56 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
Unit 5
No ratings yet
Unit 5
26 pages
CAS CS 565, Data Mining
No ratings yet
CAS CS 565, Data Mining
30 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R PDF Download
83% (6)
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R PDF Download
44 pages
CU Data Science
No ratings yet
CU Data Science
8 pages
Data Mining: V Mounika Revathi Dept of Cse Sitam
No ratings yet
Data Mining: V Mounika Revathi Dept of Cse Sitam
13 pages
Data Notes Detailed
No ratings yet
Data Notes Detailed
12 pages
Summary Business Analytics
No ratings yet
Summary Business Analytics
24 pages
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R Download
No ratings yet
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R Download
48 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
DsNaIT v2.0
No ratings yet
DsNaIT v2.0
43 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Data Mining, Data Pattern, Machine Learning (Week 2
No ratings yet
Data Mining, Data Pattern, Machine Learning (Week 2
19 pages
Intro to Data Science Basics
No ratings yet
Intro to Data Science Basics
11 pages
Data Mining
No ratings yet
Data Mining
23 pages
Paper - Xvii Data Mining and Warehousing
No ratings yet
Paper - Xvii Data Mining and Warehousing
140 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
B Ei
No ratings yet
B Ei
44 pages
DMM 1
No ratings yet
DMM 1
4 pages
Kamlesh Mooc File
No ratings yet
Kamlesh Mooc File
15 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
Data Mining and ML Challenges
No ratings yet
Data Mining and ML Challenges
26 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
Kadir
No ratings yet
Kadir
84 pages
Ivy - Data Science and Data Visualization Certification Course
100% (1)
Ivy - Data Science and Data Visualization Certification Course
10 pages
Add Column in Snowflake-Quick & Easy Methods
No ratings yet
Add Column in Snowflake-Quick & Easy Methods
18 pages
RDBMS PPT
No ratings yet
RDBMS PPT
34 pages
41 - 200810 Iss PRG Vastech PDF
No ratings yet
41 - 200810 Iss PRG Vastech PDF
16 pages
Nishant Agarwal Resume
No ratings yet
Nishant Agarwal Resume
2 pages
MTX - Associate Machine Learning Engineer
No ratings yet
MTX - Associate Machine Learning Engineer
2 pages
How To Add Second Standby - Document 842822.1
No ratings yet
How To Add Second Standby - Document 842822.1
5 pages
Servicemag Logging Mode Guide
No ratings yet
Servicemag Logging Mode Guide
6 pages
Introduction to Information Systems
No ratings yet
Introduction to Information Systems
121 pages
Shweta Aggarwal
No ratings yet
Shweta Aggarwal
87 pages
Information Systems in Business
No ratings yet
Information Systems in Business
26 pages
Patanjali Field Study Project
No ratings yet
Patanjali Field Study Project
26 pages
Informatica Transformations: Active Transformation
No ratings yet
Informatica Transformations: Active Transformation
8 pages
SmartView User Notes
No ratings yet
SmartView User Notes
7 pages
KV New Plan
No ratings yet
KV New Plan
1 page
Introduction To Forms Builder: A Guide To Oracle9i 1
No ratings yet
Introduction To Forms Builder: A Guide To Oracle9i 1
43 pages
CS 432/536 (SP 17-18) - Dr. Mian Muhammad Awais Page 1 of 2
No ratings yet
CS 432/536 (SP 17-18) - Dr. Mian Muhammad Awais Page 1 of 2
2 pages
(0457) Commonly Repeated GP Answers
No ratings yet
(0457) Commonly Repeated GP Answers
5 pages
Assignment 1 Database Systems
No ratings yet
Assignment 1 Database Systems
18 pages
Chapter 1
No ratings yet
Chapter 1
7 pages
GemFire Architecture
No ratings yet
GemFire Architecture
72 pages
Bibliographic Data Migration From Libsys To Koha
No ratings yet
Bibliographic Data Migration From Libsys To Koha
8 pages
Psychology Thesis Writing Challenges
100% (3)
Psychology Thesis Writing Challenges
8 pages
MBA SIP Initial Pages Word
100% (1)
MBA SIP Initial Pages Word
13 pages
DYIX.E308290 - Rigid Ferrous Metal Conduit - UL Product Iq
No ratings yet
DYIX.E308290 - Rigid Ferrous Metal Conduit - UL Product Iq
1 page
Course Outline - 1836873659accounting Software Application
No ratings yet
Course Outline - 1836873659accounting Software Application
3 pages
Code First Sample
No ratings yet
Code First Sample
19 pages
Data Exploration
No ratings yet
Data Exploration
3 pages
Economics For The IB Diploma Coursebook With Digital Access 2 Years 3rd Edition Ellie Tragakes Download
100% (2)
Economics For The IB Diploma Coursebook With Digital Access 2 Years 3rd Edition Ellie Tragakes Download
30 pages
Amar Kumar Padhi: Oracle
No ratings yet
Amar Kumar Padhi: Oracle
4 pages

R Lect1 Introduction

Uploaded by

R Lect1 Introduction

Uploaded by

R PROGRAMMING FOR DATA

• 5 steps: Data Collection, Preprocessing,

You might also like