[go: up one dir, main page]

0% found this document useful (0 votes)
92 views4 pages

09 Handout 1

The document discusses data science, including what it is, its history and uses. It covers topics like analytics, machine learning, and the role of data scientists. Data science involves obtaining insights from data using skills like programming, business and analytics. It can be used for applications such as customer service, self-driving cars, and making predictions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views4 pages

09 Handout 1

The document discusses data science, including what it is, its history and uses. It covers topics like analytics, machine learning, and the role of data scientists. Data science involves obtaining insights from data using skills like programming, business and analytics. It can be used for applications such as customer service, self-driving cars, and making predictions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

IT2302

INTRODUCTION TO DATA SCIENCE


Overview (Campbell, 2021)
Let us first understand what data science is before we delve into various aspects of data science. In simple
terms, data science is a branch of mathematics and statistics to obtain useful and meaningful insights
about the data set and trends from the raw data or information. Using programming, business, and
analytical skills, you can process and manage the data set. This sounds tough. Most people do not know
how to work with data science or understand how to develop skills effectively.
The field of data science goes back to its roots in statistics. Having said that, this field is a combination of
programming, business acumen, and statistics. It is essential to learn more about each topic to have an
idea of how you approach the learning process. The art of finding any hidden insights and trends from the
data set goes way back. Ancient Egyptians analyzed census data to help them collect taxes efficiently.
They also used data analysis to forecast when there could be floods in the Nile. It is important to learn
from past data to identify trends or insights in the data set. This helps the business make informed
decisions.
Regardless of the industry, every company is looking for ways to manage and store large volumes of data.
This was a challenge for most companies until 2010. The introduction of Hadoop (a software framework
for distributed storage and processing of big data) and other platforms has given organizations an easier
way to store large volumes of data. Now, companies can focus on methods and solutions to process
information. This can only be done using data science. It is important to note that data science is the
future of technology.
Data science is a mix of numerous algorithms, tools, principles, and languages to identify the hidden
patterns within the variables in the data set. This may lead you to wonder how this is different from what
has been done on data for years. The answer is that earlier, we could only use tools and algorithms to
explain the variables in the data set, but using data science, it becomes easier to predict the outcomes.
A data analyst uses the data only to explain what is happening in the present using a historical data set.
On the other hand, a data scientist only looks at the data to obtain insights from the data set. He also uses
some advanced algorithms to identify the probability of the occurrence of an event. He looks at the data
from various angles and aspects.
Data science is used to make informed decisions based on predictions made using the existing data set.
It is centered on building, cleaning, and organizing datasets. On the other hand, data analytics pertains
to analyzing data to answer questions, extract insights, and identify trends. This is accomplished using
different tools, techniques, and frameworks that vary depending on the type of analysis being conducted
(Stobierski, 2021). Thus, you can apply numerous analytics to a data set to obtain information. We will
discuss these in brief in the subsequent sections.
Analytics
It is the systematic investigation of data or statistics. It is used to discover, interpret, and communicate
meaningful patterns in data.
• Predictive casual. If you want to develop a model to predict the possibilities or outcomes of a futuristic
event, you need to use predictive causal analytics. Let us assume you work for a credit company, and
you loan people money based on their credit. You will be concerned with your customer's ability to
repay the amount you have lent to them. You can develop models to perform predictive analysis using
the payment history. This can help you determine if the customer will pay you on time or not.

09 Handout 1 *Property of STI


 student.feedback@sti.edu Page 1 of 4
IT2302

• Prescriptive. You may need to use a model to make the required decisions and modify the parameters
based on the data set or question. To do this, you need to use prescriptive analytics. This form of
analytics is more about providing the right information to make an informed decision. You can also use
this type of analytics to predict a range of associated outcomes and prescribed actions. An example of
this type of analytics is a self-driving car. You can run numerous algorithms (a procedure or formula for
solving a problem, based on conducting a sequence of specified actions) on the data collected from
the cars and use the results to make the car more intelligent. This makes it easier for the car to make
the right decisions to turn, slow down, speed up, or identify the direction to take.
Data science and machine learning are both popular buzzwords today. These two (2) terms are often
thrown together but should not be used interchangeably. Although data science includes machine
learning, it is a vast field with many different tools.
Machine Learning
It is a group of computational algorithms that performs pattern recognition, classification, and prediction
by learning from existing data.

• Make predictions. Numerous machine-learning algorithms allow you to make predictions using
unstructured, semi-structured, and structured data sets. Let us assume you work for a finance company
and you have the transactional data available. You need to develop a model to determine the trend of
future transactions. To perform this analysis, you need to use a supervised machine-learning algorithm.
Such algorithms are used to train the machine with an existing data set. You can also use supervised
machine learning algorithms to develop and train a model to detect future frauds based on historical
information.
• Pattern discovery. Not every data set has variables you can use to make the necessary predictions.
This is not true. There is a hidden pattern in every data set, and you need to find those patterns to
make the required predictions. To do this, you need to use an unsupervised model since you do not
have any pre-defined labels in the data set (using which you can group the variables). One (1) of the
most common algorithms used to identify patterns is clustering. Let us assume you work for a phone
company, and you are tasked with identifying where to set up cell towers in an area to establish a
network. You can then use the clustering algorithm to identify where you can set up towers to ensure
every user in the area receives the optimum signal strength.
Why Use Data Science? (Campbell, 2021)
In the past, organizations manage small volumes of data. It was easy to analyze and understand the data
and relationships within the data set using some business intelligence tools. Most traditional business
intelligence tools only worked on structured data sets, but most of the data collected today are semi-
structured or unstructured. It is important to understand that most data collected now are semi-
structured or unstructured.
Simple business intelligence tools cannot process this type of data, especially since large volumes of data
are collected from different instruments. For this reason, we need to develop advanced and complex
analytical algorithms and tools to process, analyze, and draw some insights from the data.
It is not only for this reason why data science has gained popularity. Let us look at how data science is
used in different domains:
• Customer Service. How great would it be if you could know exactly what your customers want? Do you
think you can use existing data to learn more about your customers, such as purchase history, browsing
history, income, and age? You may have had this data with you in the past, as well. Since you use

09 Handout 1 *Property of STI


 student.feedback@sti.edu Page 2 of 4
IT2302

different mathematical and statistical models, you can effectively work with large volumes of data and
identify the right products to recommend to your customers. This is a great way to bring more business
to your firm.
• Self-Driven Cars. How would you feel if your car could drive you home? Numerous companies are
trying to develop and improve the workings of a self-driven car. The cars collect live information from
various sensors, such as lasers, radars, and cameras, to create a map of the surrounding environment.
The algorithm in the car uses this data to decide to speed up, slow down, park, stop, overtake, etc.
These algorithms are often machine learning algorithms.
• Predictions. Let us now consider how you can use data science in predictive analytics. Consider
weather forecasting. The algorithms used take data from aircraft, satellites, radars, ships, and other
parts to collect and analyze data. This helps you build the required models. You can use these models
to predict the occurrence of any natural calamities. Using this information, you can take the necessary
measures to save lives.
Who is a Data Scientist? (Campbell, 2021)
If you look for data scientist on the Internet, you may come across numerous definitions. A data scientist
uses data science to answer some business questions and concerns. The term data scientist was coined
when people learned that a data scientist uses data, various mathematical or statistical functions,
operations, and other scientific fields and applications to make sense of the data in the database.
Functions Performed by Data Scientists
Data scientists crack various data problems using their expertise in specific scientific disciplines. He works
with different mathematical, statistical, and computer science elements. He does not necessarily have to
be an expert in these fields. He would use some technologies and solutions to develop the right solutions
and reach conclusions crucial for the organization's development and growth. A data scientist finds a way
to present the data in a useful form compared to the data available in the data set. They work with both
structured and unstructured data.
Differences between Data Science and Business Intelligence (Campbell, 2021)
Before we look at the differences between data science and business intelligence, let us understand these
terms better.
Using business intelligence (BI), an organization can find insight and hindsight in the existing data set to
describe various trends in the data set. Through BI, businesses can take data from internal and external
sources, prepare that data, and run queries on the data set to obtain the required information. They can
then create the required dashboards to answer different questions or identify solutions to various
business problems. BI can also help businesses evaluate certain futuristic events. On the other hand, data
science is a different approach to looking at data. You can take a forward-looking approach and explain
any information or insight in the data set. Using data science, you can analyze the current or past data
that helps you predict the outcomes. This is one (1) way most organizations do their best to make
informed decisions.
Now you have an idea of what data science is, let us look at the lifecycle of data science. Most people rush
into using the models they develop on the data sets without understanding the basics of data science.
You need to understand these basics and assess the business requirements before you rush into using the
model. Make sure to follow the data science life cycle phases to ensure your results are accurate.
Lifecycle
This section gives you a brief overview of the phases in the data science lifecycle.

09 Handout 1 *Property of STI


 student.feedback@sti.edu Page 3 of 4
IT2302

• Phase One: Discovery. Before you work on the project, you need to understand the following: business
requirements, specifications, required or approved budget, and priorities. If you want to pursue a
career in data science, you need to possess the ability to ask important questions. You need to assess
if you have the right resources, people, technology, data, and time to support the work done on the
project. This phase involves framing the problem and identifying the initial hypothesis you want to test.
• Phase Two: Data Preparation. When you identify the required resources needed to work on the
analysis, you need to develop or identify an analytical sandbox where you can perform the testing and
analysis of the data. Before modeling it, you need to process, explore, and condition the data. You also
need to perform the following operations to move the data into the sandbox environment: extract-
transform-load-transform. Programming languages can be used to clean, transform, and visualize the
data used in the analysis. These programming languages help you identify the outliers in the data. You
can also use the information to develop or identify a relationship between variables. Once the data is
cleaned and prepared, you can perform different types of analysis on the data.
• Phase Three: Plan the Model. During this phase, you need to identify the techniques and methods to
help you draw the relationship between the different variables in the data set. These relationships will
help you determine the algorithms you can use in the next phase of the lifecycle. To do this, you need
to apply exploratory data analytics methods and tools using various formulas and visualization
methods. Let us look at some tools used for this below:
o R: This programming language has various modeling capabilities. It is also a good platform to
use and develop the right models if you are a beginner.
o SQL: This provides a set of methods to perform analysis within the database using different
predictive models and mining functions.
o ACCESS or SAS: These tools can be used to access data from various storage platforms, like
Hadoop, and use that data to create a reusable and repeatable model.
The market has numerous tools to develop modeling techniques, but R is commonly used. At the end
of this phase, you will have the required insights in your data that will help you determine the algorithm
to use. The next phase is where you apply this algorithm and develops the model.
• Phase Four: Build the Model. Now that you have decided which algorithm to use, you must split the
data set into training and testing data sets. In this phase, you need to consider the existing tools and
determine if they are sufficient for building a model. Make sure you identify a robust environment to
run the models. To develop the model, you need to analyze different techniques, such as clustering,
classification, and association.
• Phase Five: Operate the Model. In this phase, you run the data through the model and deliver the
reports and necessary technical documents. Additionally, you may also need to run the model in the
production environment to test if it works the way it needs. This gives you an idea of how the model
performs on real-time data. You can also determine any constraints in the model.
• Phase Six: Communicate the Results. It is important to evaluate if the model has given you the needed
results. You can do this by analyzing your hypotheses. This is the last phase of the data science lifecycle
and is where you identify the key findings and communicate the same to the organization. You can
determine the results of the model based on the criteria you identified in the first phase.

Reference:
Campbell, A. (2021). Data science for beginners: Comprehensive guide to most important basics in data science. Alex Campbell.
Stobierski, T. (2021). What's the difference between data analytics & data science? https://online.hbs.edu/blog/post/data-analytics-vs-data-
science

09 Handout 1 *Property of STI


 student.feedback@sti.edu Page 4 of 4

You might also like