09 Handout 1

The document discusses data science, including what it is, its history and uses. It covers topics like analytics, machine learning, and the role of data scientists. Data science involves obtaining insights from data using skills like programming, business and analytics. It can be used for applications such as customer service, self-driving cars, and making predictions.

Uploaded by

oracion.rovjapheth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views4 pages

09 Handout 1

Uploaded by

oracion.rovjapheth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

IT2302

INTRODUCTION TO DATA SCIENCE

Overview (Campbell, 2021)
Let us first understand what data science is before we delve into various aspects of data science. In simple
terms, data science is a branch of mathematics and statistics to obtain useful and meaningful insights
about the data set and trends from the raw data or information. Using programming, business, and
analytical skills, you can process and manage the data set. This sounds tough. Most people do not know
how to work with data science or understand how to develop skills effectively.
The field of data science goes back to its roots in statistics. Having said that, this field is a combination of
programming, business acumen, and statistics. It is essential to learn more about each topic to have an
idea of how you approach the learning process. The art of finding any hidden insights and trends from the
data set goes way back. Ancient Egyptians analyzed census data to help them collect taxes efficiently.
They also used data analysis to forecast when there could be floods in the Nile. It is important to learn
from past data to identify trends or insights in the data set. This helps the business make informed
decisions.
Regardless of the industry, every company is looking for ways to manage and store large volumes of data.
This was a challenge for most companies until 2010. The introduction of Hadoop (a software framework
for distributed storage and processing of big data) and other platforms has given organizations an easier
way to store large volumes of data. Now, companies can focus on methods and solutions to process
information. This can only be done using data science. It is important to note that data science is the
future of technology.
Data science is a mix of numerous algorithms, tools, principles, and languages to identify the hidden
patterns within the variables in the data set. This may lead you to wonder how this is different from what
has been done on data for years. The answer is that earlier, we could only use tools and algorithms to
explain the variables in the data set, but using data science, it becomes easier to predict the outcomes.
A data analyst uses the data only to explain what is happening in the present using a historical data set.
On the other hand, a data scientist only looks at the data to obtain insights from the data set. He also uses
some advanced algorithms to identify the probability of the occurrence of an event. He looks at the data
from various angles and aspects.
Data science is used to make informed decisions based on predictions made using the existing data set.
It is centered on building, cleaning, and organizing datasets. On the other hand, data analytics pertains
to analyzing data to answer questions, extract insights, and identify trends. This is accomplished using
different tools, techniques, and frameworks that vary depending on the type of analysis being conducted
(Stobierski, 2021). Thus, you can apply numerous analytics to a data set to obtain information. We will
discuss these in brief in the subsequent sections.
Analytics
It is the systematic investigation of data or statistics. It is used to discover, interpret, and communicate
meaningful patterns in data.
• Predictive casual. If you want to develop a model to predict the possibilities or outcomes of a futuristic
event, you need to use predictive causal analytics. Let us assume you work for a credit company, and
you loan people money based on their credit. You will be concerned with your customer's ability to
repay the amount you have lent to them. You can develop models to perform predictive analysis using
the payment history. This can help you determine if the customer will pay you on time or not.

09 Handout 1 *Property of STI

 student.feedback@sti.edu Page 1 of 4
IT2302

• Prescriptive. You may need to use a model to make the required decisions and modify the parameters
based on the data set or question. To do this, you need to use prescriptive analytics. This form of
analytics is more about providing the right information to make an informed decision. You can also use
this type of analytics to predict a range of associated outcomes and prescribed actions. An example of
this type of analytics is a self-driving car. You can run numerous algorithms (a procedure or formula for
solving a problem, based on conducting a sequence of specified actions) on the data collected from
the cars and use the results to make the car more intelligent. This makes it easier for the car to make
the right decisions to turn, slow down, speed up, or identify the direction to take.
Data science and machine learning are both popular buzzwords today. These two (2) terms are often
thrown together but should not be used interchangeably. Although data science includes machine
learning, it is a vast field with many different tools.
Machine Learning
It is a group of computational algorithms that performs pattern recognition, classification, and prediction
by learning from existing data.

• Make predictions. Numerous machine-learning algorithms allow you to make predictions using
unstructured, semi-structured, and structured data sets. Let us assume you work for a finance company
and you have the transactional data available. You need to develop a model to determine the trend of
future transactions. To perform this analysis, you need to use a supervised machine-learning algorithm.
Such algorithms are used to train the machine with an existing data set. You can also use supervised
machine learning algorithms to develop and train a model to detect future frauds based on historical
information.
• Pattern discovery. Not every data set has variables you can use to make the necessary predictions.
This is not true. There is a hidden pattern in every data set, and you need to find those patterns to
make the required predictions. To do this, you need to use an unsupervised model since you do not
have any pre-defined labels in the data set (using which you can group the variables). One (1) of the
most common algorithms used to identify patterns is clustering. Let us assume you work for a phone
company, and you are tasked with identifying where to set up cell towers in an area to establish a
network. You can then use the clustering algorithm to identify where you can set up towers to ensure
every user in the area receives the optimum signal strength.
Why Use Data Science? (Campbell, 2021)
In the past, organizations manage small volumes of data. It was easy to analyze and understand the data
and relationships within the data set using some business intelligence tools. Most traditional business
intelligence tools only worked on structured data sets, but most of the data collected today are semi-
structured or unstructured. It is important to understand that most data collected now are semi-
structured or unstructured.
Simple business intelligence tools cannot process this type of data, especially since large volumes of data
are collected from different instruments. For this reason, we need to develop advanced and complex
analytical algorithms and tools to process, analyze, and draw some insights from the data.
It is not only for this reason why data science has gained popularity. Let us look at how data science is
used in different domains:
• Customer Service. How great would it be if you could know exactly what your customers want? Do you
think you can use existing data to learn more about your customers, such as purchase history, browsing
history, income, and age? You may have had this data with you in the past, as well. Since you use

09 Handout 1 *Property of STI

 student.feedback@sti.edu Page 2 of 4
IT2302

different mathematical and statistical models, you can effectively work with large volumes of data and
identify the right products to recommend to your customers. This is a great way to bring more business
to your firm.
• Self-Driven Cars. How would you feel if your car could drive you home? Numerous companies are
trying to develop and improve the workings of a self-driven car. The cars collect live information from
various sensors, such as lasers, radars, and cameras, to create a map of the surrounding environment.
The algorithm in the car uses this data to decide to speed up, slow down, park, stop, overtake, etc.
These algorithms are often machine learning algorithms.
• Predictions. Let us now consider how you can use data science in predictive analytics. Consider
weather forecasting. The algorithms used take data from aircraft, satellites, radars, ships, and other
parts to collect and analyze data. This helps you build the required models. You can use these models
to predict the occurrence of any natural calamities. Using this information, you can take the necessary
measures to save lives.
Who is a Data Scientist? (Campbell, 2021)
If you look for data scientist on the Internet, you may come across numerous definitions. A data scientist
uses data science to answer some business questions and concerns. The term data scientist was coined
when people learned that a data scientist uses data, various mathematical or statistical functions,
operations, and other scientific fields and applications to make sense of the data in the database.
Functions Performed by Data Scientists
Data scientists crack various data problems using their expertise in specific scientific disciplines. He works
with different mathematical, statistical, and computer science elements. He does not necessarily have to
be an expert in these fields. He would use some technologies and solutions to develop the right solutions
and reach conclusions crucial for the organization's development and growth. A data scientist finds a way
to present the data in a useful form compared to the data available in the data set. They work with both
structured and unstructured data.
Differences between Data Science and Business Intelligence (Campbell, 2021)
Before we look at the differences between data science and business intelligence, let us understand these
terms better.
Using business intelligence (BI), an organization can find insight and hindsight in the existing data set to
describe various trends in the data set. Through BI, businesses can take data from internal and external
sources, prepare that data, and run queries on the data set to obtain the required information. They can
then create the required dashboards to answer different questions or identify solutions to various
business problems. BI can also help businesses evaluate certain futuristic events. On the other hand, data
science is a different approach to looking at data. You can take a forward-looking approach and explain
any information or insight in the data set. Using data science, you can analyze the current or past data
that helps you predict the outcomes. This is one (1) way most organizations do their best to make
informed decisions.
Now you have an idea of what data science is, let us look at the lifecycle of data science. Most people rush
into using the models they develop on the data sets without understanding the basics of data science.
You need to understand these basics and assess the business requirements before you rush into using the
model. Make sure to follow the data science life cycle phases to ensure your results are accurate.
Lifecycle
This section gives you a brief overview of the phases in the data science lifecycle.

09 Handout 1 *Property of STI

 student.feedback@sti.edu Page 3 of 4
IT2302

• Phase One: Discovery. Before you work on the project, you need to understand the following: business
requirements, specifications, required or approved budget, and priorities. If you want to pursue a
career in data science, you need to possess the ability to ask important questions. You need to assess
if you have the right resources, people, technology, data, and time to support the work done on the
project. This phase involves framing the problem and identifying the initial hypothesis you want to test.
• Phase Two: Data Preparation. When you identify the required resources needed to work on the
analysis, you need to develop or identify an analytical sandbox where you can perform the testing and
analysis of the data. Before modeling it, you need to process, explore, and condition the data. You also
need to perform the following operations to move the data into the sandbox environment: extract-
transform-load-transform. Programming languages can be used to clean, transform, and visualize the
data used in the analysis. These programming languages help you identify the outliers in the data. You
can also use the information to develop or identify a relationship between variables. Once the data is
cleaned and prepared, you can perform different types of analysis on the data.
• Phase Three: Plan the Model. During this phase, you need to identify the techniques and methods to
help you draw the relationship between the different variables in the data set. These relationships will
help you determine the algorithms you can use in the next phase of the lifecycle. To do this, you need
to apply exploratory data analytics methods and tools using various formulas and visualization
methods. Let us look at some tools used for this below:
o R: This programming language has various modeling capabilities. It is also a good platform to
use and develop the right models if you are a beginner.
o SQL: This provides a set of methods to perform analysis within the database using different
predictive models and mining functions.
o ACCESS or SAS: These tools can be used to access data from various storage platforms, like
Hadoop, and use that data to create a reusable and repeatable model.
The market has numerous tools to develop modeling techniques, but R is commonly used. At the end
of this phase, you will have the required insights in your data that will help you determine the algorithm
to use. The next phase is where you apply this algorithm and develops the model.
• Phase Four: Build the Model. Now that you have decided which algorithm to use, you must split the
data set into training and testing data sets. In this phase, you need to consider the existing tools and
determine if they are sufficient for building a model. Make sure you identify a robust environment to
run the models. To develop the model, you need to analyze different techniques, such as clustering,
classification, and association.
• Phase Five: Operate the Model. In this phase, you run the data through the model and deliver the
reports and necessary technical documents. Additionally, you may also need to run the model in the
production environment to test if it works the way it needs. This gives you an idea of how the model
performs on real-time data. You can also determine any constraints in the model.
• Phase Six: Communicate the Results. It is important to evaluate if the model has given you the needed
results. You can do this by analyzing your hypotheses. This is the last phase of the data science lifecycle
and is where you identify the key findings and communicate the same to the organization. You can
determine the results of the model based on the criteria you identified in the first phase.

Reference:
Campbell, A. (2021). Data science for beginners: Comprehensive guide to most important basics in data science. Alex Campbell.
Stobierski, T. (2021). What's the difference between data analytics & data science? https://online.hbs.edu/blog/post/data-analytics-vs-data-
science

09 Handout 1 *Property of STI

 student.feedback@sti.edu Page 4 of 4

Data Similarity and Dissimilarity
No ratings yet
Data Similarity and Dissimilarity
73 pages
Entrepreneurship: Chapter No 2
No ratings yet
Entrepreneurship: Chapter No 2
28 pages
Unit 1 PPT 1
No ratings yet
Unit 1 PPT 1
27 pages
Chapter 1
No ratings yet
Chapter 1
62 pages
The Psychobiology of Trauma and Resilience Across The Lifespan - D. Delahanty (Jason Aronson, 2008) WW
No ratings yet
The Psychobiology of Trauma and Resilience Across The Lifespan - D. Delahanty (Jason Aronson, 2008) WW
295 pages
Crash Course - Introduction To Data Science
No ratings yet
Crash Course - Introduction To Data Science
121 pages
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
No ratings yet
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
27 pages
10 Handout 1 Unlocked
No ratings yet
10 Handout 1 Unlocked
3 pages
2023 Art Therapy: Is There Science Behind That?
No ratings yet
2023 Art Therapy: Is There Science Behind That?
8 pages
Data Science Chacha
No ratings yet
Data Science Chacha
150 pages
Data Science
No ratings yet
Data Science
64 pages
Foundations of Data Science PPT TEXT BOOK
No ratings yet
Foundations of Data Science PPT TEXT BOOK
132 pages
UNIT - II Artificial Intelligence Second Part
No ratings yet
UNIT - II Artificial Intelligence Second Part
9 pages
DS Unit 1
No ratings yet
DS Unit 1
37 pages
Datascience Notes
No ratings yet
Datascience Notes
161 pages
Introduction Am
No ratings yet
Introduction Am
74 pages
Cs - Fundamentals of Data Science
No ratings yet
Cs - Fundamentals of Data Science
203 pages
Intro To Data Science
No ratings yet
Intro To Data Science
52 pages
Guilford 1972
No ratings yet
Guilford 1972
15 pages
How To Develop A Training Module
100% (1)
How To Develop A Training Module
3 pages
Unit I & II - FDS - II AI&DS
No ratings yet
Unit I & II - FDS - II AI&DS
48 pages
Question No. 1:: What Have You Learned in Earth Science That You Will Never Forget?
No ratings yet
Question No. 1:: What Have You Learned in Earth Science That You Will Never Forget?
13 pages
IDS Notes
No ratings yet
IDS Notes
32 pages
Unit 4
No ratings yet
Unit 4
10 pages
Reviewer (PS10)
No ratings yet
Reviewer (PS10)
5 pages
Data Science Basics
No ratings yet
Data Science Basics
25 pages
DS 1
No ratings yet
DS 1
85 pages
Trends in Data Science: AI and DS-I
No ratings yet
Trends in Data Science: AI and DS-I
32 pages
Multicultural Lesson Plan
No ratings yet
Multicultural Lesson Plan
2 pages
Lesson 2
No ratings yet
Lesson 2
50 pages
Components of Ethical Leadership
No ratings yet
Components of Ethical Leadership
11 pages
M1.1 DS
No ratings yet
M1.1 DS
57 pages
Data Science Book
No ratings yet
Data Science Book
383 pages
UNIT - I Intro To DS
No ratings yet
UNIT - I Intro To DS
18 pages
Document 4
No ratings yet
Document 4
3 pages
Unconscious MInd Mapping - 101
No ratings yet
Unconscious MInd Mapping - 101
3 pages
DS B&V-1
No ratings yet
DS B&V-1
30 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
53 pages
Data Science Tutorial 1
No ratings yet
Data Science Tutorial 1
26 pages
Himadev
No ratings yet
Himadev
37 pages
Screenshot 2025-04-23 at 8.26.12 AM
No ratings yet
Screenshot 2025-04-23 at 8.26.12 AM
14 pages
Ch7-Overview of Data Science-Part 1
No ratings yet
Ch7-Overview of Data Science-Part 1
37 pages
Constructivism
No ratings yet
Constructivism
2 pages
Unit 1
No ratings yet
Unit 1
28 pages
Lesson Plan - Thanksgiving
No ratings yet
Lesson Plan - Thanksgiving
7 pages
Block3412 11 2024 04 59 26 2025018670
No ratings yet
Block3412 11 2024 04 59 26 2025018670
20 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
32 pages
TLMweek 1 Intro Ds
No ratings yet
TLMweek 1 Intro Ds
11 pages
It - B Section List
No ratings yet
It - B Section List
6 pages
Fds Module 1
No ratings yet
Fds Module 1
65 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
Lesson6 IdentifyingFeelings PDF
No ratings yet
Lesson6 IdentifyingFeelings PDF
12 pages
What Is Data Science A Beginner's Guide To Data Science
No ratings yet
What Is Data Science A Beginner's Guide To Data Science
15 pages
Teachers Job Satisfaction and Motivation
No ratings yet
Teachers Job Satisfaction and Motivation
20 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Data Science and Analytics: An Overview From Data Driven Smart Computing, Decision Making and Applications Perspective
No ratings yet
Data Science and Analytics: An Overview From Data Driven Smart Computing, Decision Making and Applications Perspective
22 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Data Science Ppt1 Update
No ratings yet
Data Science Ppt1 Update
67 pages
Claudia Beattie CVVV
No ratings yet
Claudia Beattie CVVV
2 pages
Project Report
No ratings yet
Project Report
29 pages
Iste Stds Self Assessment
No ratings yet
Iste Stds Self Assessment
5 pages
Tema 2 Estructura Del Texto
No ratings yet
Tema 2 Estructura Del Texto
12 pages
Chapter 1 Data Science Fundamentals
No ratings yet
Chapter 1 Data Science Fundamentals
34 pages
Unit 1-FDS
100% (2)
Unit 1-FDS
18 pages
Data Science 2020
100% (1)
Data Science 2020
123 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
37 pages
End of Semester Reflection Paper
No ratings yet
End of Semester Reflection Paper
3 pages
Ab Assignment 3
No ratings yet
Ab Assignment 3
7 pages
DS Mod 1 To 2 Complete Notes
No ratings yet
DS Mod 1 To 2 Complete Notes
63 pages
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
Research On Data Science, Data Analytics and Big Data Rahul Reddy Nadikattu
No ratings yet
Research On Data Science, Data Analytics and Big Data Rahul Reddy Nadikattu
7 pages
Project Based Learning
100% (3)
Project Based Learning
2 pages
Data Analytics PDF
0% (1)
Data Analytics PDF
6 pages
What Is Data Science - A Beginner's Guide To Data Science - Edureka
No ratings yet
What Is Data Science - A Beginner's Guide To Data Science - Edureka
14 pages
Chapter 1
No ratings yet
Chapter 1
50 pages
Mental Status Examination: Concerns and Addiction
No ratings yet
Mental Status Examination: Concerns and Addiction
2 pages
Data Science Techniques AND PREDICTIONS
No ratings yet
Data Science Techniques AND PREDICTIONS
5 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Superior Subordinate Relationship in S.P.M.R. College of Commerce
No ratings yet
Superior Subordinate Relationship in S.P.M.R. College of Commerce
21 pages
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
From Everand
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
Riley Adams
5/5 (1)
1 1 Intro To Data and Data Science Course Notes
No ratings yet
1 1 Intro To Data and Data Science Course Notes
8 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Map KMC Stanford GSB
No ratings yet
Map KMC Stanford GSB
2 pages
ESCOBER, MELISSA T. WAP - For Teachers Expansion Training
No ratings yet
ESCOBER, MELISSA T. WAP - For Teachers Expansion Training
5 pages
Data Science
No ratings yet
Data Science
6 pages
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet
How Can Numerology Assist You in Achieving Success in Life
No ratings yet
How Can Numerology Assist You in Achieving Success in Life
3 pages
ORAL COMMUNICATION11 - Q1 - Module3 - FINAL
No ratings yet
ORAL COMMUNICATION11 - Q1 - Module3 - FINAL
34 pages
Locale of The Study: It Identifies and Describes Briefly The Location Where The Study Is
No ratings yet
Locale of The Study: It Identifies and Describes Briefly The Location Where The Study Is
2 pages

09 Handout 1

Uploaded by

09 Handout 1

Uploaded by

IT2302

INTRODUCTION TO DATA SCIENCE

09 Handout 1 *Property of STI

09 Handout 1 *Property of STI

09 Handout 1 *Property of STI

09 Handout 1 *Property of STI

You might also like