test (1)
test (1)
test (1)
(Lesson 2)
- Data analytics is the process of examining and analyzing raw data sets to:
+ Draw conclusions
+ Derive more information
+ Improve businesses, products, and services
In addition to making business decisions, it is also used by data scientists and researchers
to verify scientific models and theories.
1. Descriptive analytics
Descriptive analytics is designed to access information about the past.
It is the conventional form of analytics.
It focuses on the summarized view of facts.
Its purpose is to summarize the findings.
Techniques of Descriptive analytics are data aggregation and data mining.
Data aggregation is the process of gathering and expressing information in a
summarized form.
Tools used for data aggregation include MS Excel, MATLAB, SPSS and STATA.
Company report is an example of descriptive analytics.
2. Diagnostic analytics
Diagnostic analytics helps you identify why something happened in the past.
It takes a deeper look at data to understand the root cause of events.
It has a limited ability to provide actionable insights.
It provides an understanding of causal relationships and sequences.
Diagnostic analytics techniques: drill-down, data discovery, data mining,
correlation.
They can be used to discover a causal relationship between two or more data sets.
Diagnostic analytics is helpful for those concerned with day-to-day operations.
For example, it helps identify why a sales representative has sold fewer items than
usual.
3. Predictive analytics:
Predicting future outcomes in terms of probability of an event to occur.
Analyzing sentiments where all opinions posted on social media are collected to
predict a person’s sentiment.
Identifying target audience for a promotional campaign.
Forecasting weather, plan-failure prediction, and travel products recommender
system.
Predictive analytics tools:
Machine learning algorithms such as random forests, SVM and statistics.
Popular tool for predictive analytics: Python, R and RapidMiner.
Trained data scientists and machine learning experts building these models.
4. Prescriptive analytics
Provides the solution for a prediction in the future.
It creates and updates the relationships between action and outcome using a
feedback system.
It helps in making optimal recommendations during the decision-making process.
It helps in mitigating the possible risks based on the available predictive analytics.
It has the power to suggest favorable solutions and ease the decision-making
process.
It is the final frontier of advanced analytics.
It is used by recommendation engines in companies.
- Observation: a single row or a record of data from the database. Any data can be
assumed as a set of observations. Besides that, observation is the unit of analytics on
which the measurements are taken. It is also known as a case, record, or row.
- Data Sampling: a statistical analysis technique used to select, manipulate, and analyze a
representative subset of data points. Data sampling identifies patterns and trends in the
larger data set. Data sampling is cost effective and surveys only the representative
sample. It enables data scientists, predictive modelers, and data analytics to produce
accurate findings
- Data Set: a collection of data or the total data captured about a particular use case. It can
hold information such as medical, insurance, and loan approval records. It’s not limited to
numbers and texts and may include collections of images or videos.
- Prediction: The goal of prediction is to move from what has happened to providing the
best assessment of what will happen.
Structured Data: It is the data that is processed, stored, and retrieved in a fixed format.
Unstructured Data: It is the type of data that lacks any specific form or structure and its
information is text-heavy and contains data such as dates, numbers and facts. About 80%
of business data is unstructured.
Example: Email
Semi-Structured Data: It is the data type containing both structured and unstructured
data.
At the nominal level of measurement, numbers in the variable are used to classify data.
At this level, words, letters, and alphanumeric symbols can be used. Example: People in
female gender category are classified as F and those in male gender category are
classified as M.
The interval level of measurement classifies and orders the measurements. It also
specifies that the distances between each interval on the scale are equivalent. Example:
Temperature in centigrade where the distance between 80 degrees and 100 degrees is the
same as the distance between 1000 degrees and 1020 degrees.
In the ratio level of measurement, observations can have a value of zero. Although
properties of ratio measurement are similar to the interval level of measurement, the zero
in scale makes it different from the other levels of measurement.
- Data visualization tools provide access to trends, outliers, and patterns in data.
- Data in user-friendly charts help businesses gain insights to make right decisions.
- They help organize and present important findings from the data.
- Decision makers see patterns, trends, and correlations in the data being analyzed.
The three types of data science is data analytics, machine learning and data mining.
- Data Analytics: is the process of examining and analyzing raw data sets to:
● Draw conclusion
● Derive information
- Machine Learning :
- Data Mining:
- Business Understanding: is the first stage of the data science methodology and
lays the foundation for a successful end result
● This stage identifies key business sponsors, steering committee, and internal
sponsors.
● It helps understand business and customer needs and identify who needs the
analytical solution.
- Analytic Approach
● It identifies the analytic methods, hardware and software, data content, formats,
and representations to be used.
- Data Requirements
● The requirement stage is specific to identifying necessary data with its initial
source and appropriate format.
● This stage has multiple sub-stages including data acquisition, data wrangling, data
analysis and data modeling.
- Data Collection
● In collection stage, data scientists identify and gather the available relevant data as
a good quality input data is required for a great output.
● Data scientists evaluate the volume and properties of the data and understand the
distribution of each attribute.
- Data Understanding
- Data Preparation
● The data preparation stage includes activities to construct a data set for data
modeling.
● This stage includes cleaning of data, eliminating duplicates, formatting data from
multiple sources, and transforming data into more useful variables.
- Modeling
● The modeling stage applies predictive model on historical data to obtain the
outcome.
● This stage helps organizations gain intermediate insights and future trends, leading
to strategic improvements.
- Evaluation
● Once the model is developed, data scientists evaluate the model to understand its
quality and ensure that it addresses the business problem.
● In model evaluation, diagnostic measures are computed and outputs such as tables
and graphs are evaluated.
● During the evaluation phase, data mining result is evaluated for novelty and
usefulness.
- Deployment
- Feedback