[go: up one dir, main page]

0% found this document useful (0 votes)
50 views8 pages

Data Analysis 2

Data analysis is the systematic application of statistical and logical techniques to understand and derive meaningful conclusions from data. It encompasses various methods, including qualitative and quantitative analysis, and employs tools like Excel, Tableau, and programming languages such as R and Python. The process involves defining data requirements, collecting and processing data, analyzing it, and interpreting results to inform decision-making.

Uploaded by

itsprime76
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views8 pages

Data Analysis 2

Data analysis is the systematic application of statistical and logical techniques to understand and derive meaningful conclusions from data. It encompasses various methods, including qualitative and quantitative analysis, and employs tools like Excel, Tableau, and programming languages such as R and Python. The process involves defining data requirements, collecting and processing data, analyzing it, and interpreting results to inform decision-making.

Uploaded by

itsprime76
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

What is Data Analysis?

Methods, Techniques &


Tools
From https://hackr.io/blog/what-is-data-analysis-methods-techniques-tools

1) What Is Data Analysis?


The systematic application of statistical and logical techniques to describe the data scope,
modularize the data structure, condense the data representation, illustrate via images, tables,
and graphs, and evaluate statistical inclinations, probability data, and derive meaningful
conclusions known as Data Analysis. These analytical procedures enable us to induce the
underlying inference from data by eliminating the unnecessary chaos created by its rest. Data
generation is a continual process; this makes data analysis a continuous, iterative process where
the collection and performing data analysis simultaneously. Ensuring data integrity is one of
the essential components of data analysis.
There are various examples where data analysis is used, ranging from transportation, risk and
fraud detection, customer interaction, city planning healthcare, web search, digital
advertisement, and more.
Considering the example of healthcare, as we have noticed recently that with the outbreak of
the pandemic, Coronavirus hospitals are facing the challenge of coping up with the pressure in
treating as many patients as possible, considering data analysis allows to monitor machine and
data usage in such scenarios to achieve efficiency gain.
Before diving any more in-depth, make the following pre-requisites for proper Data Analysis:

 Ensure availability of the necessary analytical skills


 Ensure appropriate implementation of data collection methods and analysis.
 Determine the statistical significance
 Check for inappropriate analysis
 Ensure the presence of legitimate and unbiased inference
 Ensure the reliability and validity of data, data sources, data analysis methods, and
inferences derived.
 Account for the extent of analysis
2) Data Analysis Methods
There are two main methods of Data Analysis:
1. Qualitative Analysis

This approach mainly answers questions such as ‘why,’ ‘what’ or ‘how.’ Each of these questions
is addressed via quantitative techniques such as questionnaires, attitude scaling, standard
outcomes, and more. Such analysis is usually in the form of texts and narratives, which might
also include audio and video representations.

2. Quantitative Analysis
Generally, this analysis is measured in terms of numbers. The data here present themselves in
terms of measurement scales and extend themselves for more statistical manipulation.

The other techniques include:


Text analysis is a technique to analyze texts to extract machine-readable facts. It aims to create
structured data out of free and unstructured content. The process consists of slicing and dicing
heaps of unstructured, heterogeneous files into easy-to-read, manage and interpret data pieces.
It is also known as text mining, text analytics, and information extraction.
The ambiguity of human languages is the biggest challenge of text analysis. For example,
humans know that “Red Sox Tames Bull” refers to a baseball match. Still, if this text is fed to
a computer without background knowledge, it would generate several linguistically valid
interpretations. Sometimes people who are not interested in baseball might have trouble
understanding it too.
4. Statistical analysis
Statistics involves data collection, interpretation, and validation. Statistical analysis is the
technique of performing several statistical operations to quantify the data and apply statistical
analysis. Quantitative data involves descriptive data like surveys and observational data. It is
also called a descriptive analysis. It includes various tools to perform statistical data analysis
such as SAS (Statistical Analysis System), SPSS (Statistical Package for the Social Sciences),
Stat soft, and more.
5. Diagnostic analysis
The diagnostic analysis is a step further to statistical analysis to provide a more in-depth
analysis to answer the questions. It is also referred to as root cause analysis as it includes
processes like data discovery, mining, and drill down and drill through.

The diagnostic analysis is a step further to statistical analysis to provide a more in-depth
analysis to answer the questions. It is also referred to as root cause analysis as it includes
processes like data discovery, mining, and drill down and drill through.
The functions of diagnostic analytics fall into three categories:
Identify anomalies: After performing statistical analysis, analysts are required to identify areas
requiring further study as such data raise questions that cannot be answered by looking at the
data.
Drill into the Analytics (discovery): Identification of the data sources helps analysts explain
the anomalies. This step often requires analysts to look for patterns outside the existing data
sets. It requires pulling in data from external sources, thus identifying correlations and
determining if they are causal in nature.
Determine Causal Relationships: Hidden relationships are uncovered by looking at events
that might have resulted in the identified anomalies. Probability theory, regression analysis,
filtering, and time-series data analytics can all be useful for uncovering hidden stories in the
data.
6. Predictive analysis
Predictive analysis uses historical data and feds it into the machine learning model to find
critical patterns and trends. The model is applied to the current data to predict what would
happen next. Many organizations prefer it because of its various advantages like volume and
type of data, faster and cheaper computers, easy-to-use software, tighter economic conditions,
and a need for competitive differentiation.

The following are the common uses of predictive analysis:


Fraud Detection: Multiple analytics methods improves pattern detection and prevents
criminal behavior.
Optimizing Marketing Campaigns: Predictive models help businesses attract, retain, and
grow their most profitable customers. It also helps in determining customer responses or
purchases, promoting cross-sell opportunities.
Improving Operations: The use of predictive models also involves forecasting inventory and
managing resources. For example, airlines use predictive models to set ticket prices.
Reducing Risk: The credit score used to assess a buyer’s likelihood of default for purchases is
generated by a predictive model that incorporates all data relevant to a person’s
creditworthiness. Other risk-related uses include insurance claims and collections.
7. Prescriptive Analysis
Prescriptive analytics suggests various courses of action and outlines the potential implications
that could be reached after predictive analysis. Prescriptive analysis generating automated
decisions or recommendations requires specific and unique algorithmic and clear direction
from those utilizing the analytical techniques.

3) Data Analysis Process


Once you set out to collect data for analysis, you are overwhelmed by the amount of
information you find to make a clear, concise decision. With so much data to handle, you
need to identify relevant data for your analysis to derive an accurate conclusion and make
informed decisions. The following simple steps help you identify and sort out your data for
analysis.

1. Data Requirement Specification - define your scope:


 Define short and straightforward questions, the answers to which you finally need to
make a decision.
 Define measurement parameters
 Define which parameter you take into account and which one you are willing to
negotiate.
 Define your unit of measurement. Ex – Time, Currency, Salary, and more.
2. Data Collection
 Gather your data based on your measurement parameters.
 Collect data from databases, websites, and many other sources. This data may not be
structured or uniform, which takes us to the next step.
3. Data Processing
 Organize your data and make sure to add side notes, if any.
 Cross-check data with reliable sources.
 Convert the data as per the scale of measurement you have defined earlier.
 Exclude irrelevant data.
4. Data Analysis
 Once you have collected your data, perform sorting, plotting, and identifying
correlations.
 As you manipulate and organize your data, you may need to traverse your steps again
from the beginning. You may need to modify your question, redefine parameters, and
reorganize your data.
 Make use of the different tools available for data analysis.
5. Infer and Interpret Results
 Review if the result answers your initial questions
 Review if you have considered all parameters for making the decision
 Review if there is any hindering factor for implementing the decision.
 Choose data visualization techniques to communicate the message better. These
visualization techniques may be charts, graphs, color coding, and more.
Once you have an inference, always remember it is only a hypothesis. Real-life scenarios
may always interfere with your results. In Data Analysis, there are a few related
terminologies that identity with different phases of the process.
1. Data Mining
This process involves methods in finding patterns in the data sample.

2. Data Modelling
This refers to how an organization organizes and manages its data.

4) Data Analysis Techniques


There are different techniques for Data Analysis depending upon the question at hand, the
type of data, and the amount of data gathered. Each focuses on taking onto the new data,
mining insights, and drilling down into the information to transform facts and figures into
decision-making parameters. Accordingly, the different techniques of data analysis can be
categorized as follows:

1. Techniques based on Mathematics and Statistics


Descriptive Analysis: Descriptive Analysis considers the historical data, Key Performance
Indicators and describes the performance based on a chosen benchmark. It takes into account
past trends and how they might influence future performance.
Dispersion Analysis: Dispersion in the area onto which a data set is spread. This technique
allows data analysts to determine the variability of the factors under study.
Regression Analysis: This technique works by modeling the relationship between a
dependent variable and one or more independent variables. A regression model can be linear,
multiple, logistic, ridge, non-linear, life data, and more.
Factor Analysis: This technique helps to determine if there exists any relationship between a
set of variables. This process reveals other factors or variables that describe the patterns in the
relationship among the original variables. Factor Analysis leaps forward into useful clustering
and classification procedures.
Discriminant Analysis: It is a classification technique in data mining. It identifies the
different points on different groups based on variable measurements. In simple terms, it
identifies what makes two groups different from one another; this helps to identify new items.
Time Series Analysis: In this kind of analysis, measurements are spanned across time, which
gives us a collection of organized data known as time series.
2. Techniques based on Artificial Intelligence and Machine Learning
Artificial Neural Networks: a Neural network is a biologically-inspired programming
paradigm that presents a brain metaphor for processing information. An Artificial Neural
Network is a system that changes its structure based on information that flows through the
network. ANN can accept noisy data and are highly accurate. They can be considered highly
dependable in business classification and forecasting applications.
Decision Trees: As the name stands, it is a tree-shaped model representing a classification or
regression model. It divides a data set into smaller subsets, simultaneously developing into a
related decision tree.
Evolutionary Programming: This technique combines the different types of data analysis
using evolutionary algorithms. It is a domain-independent technique, which can explore
ample search space and manages attribute interaction very efficiently.
Fuzzy Logic: It is a data analysis technique based on the probability that helps handle the
uncertainties in data mining techniques.
3. Techniques based on Visualization and Graphs
Column Chart, Bar Chart: Both these charts are used to present numerical differences
between categories. The column chart takes to the height of the columns to reflect the
differences. Axes interchange in the case of the bar chart.
Line Chart: This chart represents the change of data over a continuous interval of time.
Area Chart: This concept is based on the line chart. It also fills the area between the polyline
and the axis with color, representing better trend information.
Pie Chart: It is used to represent the proportion of different classifications. It is only suitable
for only one series of data. However, it can be made multi-layered to represent the proportion
of data in different categories.
Funnel Chart: This chart represents the proportion of each stage and reflects the size of each
module. It helps in comparing rankings.
Word Cloud Chart: It is a visual representation of text data. It requires a large amount of
data, and the degree of discrimination needs to be high for users to perceive the most
prominent one. It is not a very accurate analytical technique.
Gantt Chart: It shows the actual timing and the progress of the activity compared to the
requirements.
Radar Chart: It is used to compare multiple quantized charts. It represents which variables
in the data have higher values and which have lower values. A radar chart is used for
comparing classification and series along with proportional representation.
Scatter Plot: It shows the distribution of variables in points over a rectangular coordinate
system. The distribution in the data points can reveal the correlation between the variables.
Bubble Chart: It is a variation of the scatter plot. Here, in addition to the x and y
coordinates, the bubble area represents the 3rd value.
Gauge: It is a kind of materialized chart. Here the scale represents the metric, and the pointer
represents the dimension. It is a suitable technique to represent interval comparisons.
Frame Diagram: It is a visual representation of a hierarchy in an inverted tree structure.
Rectangular Tree Diagram: This technique is used to represent hierarchical relationships
but at the same level. It makes efficient use of space and represents the proportion
represented by each rectangular area.
Map
Regional Map: It uses color to represent value distribution over a map partition.
Point Map: It represents the geographical distribution of data in points on a geographical
background. When the points are the same in size, it becomes meaningless for single data, but
if the points are as a bubble, it also represents the size of the data in each region.
Flow Map: It represents the relationship between an inflow area and an outflow area. It
represents a line connecting the geometric centers of gravity of the spatial elements. The use
of dynamic flow lines helps reduce visual clutter.
Heat Map: This represents the weight of each point in a geographic area. The color here
represents the density.

5) Data Analysis Tools


There are several data analysis tools available in the market, each with its own set of
functions. The selection of tools should always be based on the type of analysis performed
and the type of data worked. Here is a list of a few compelling tools for Data Analysis.

1. Excel
It has various compelling features, and with additional plugins installed, it can handle a
massive amount of data. So, if you have data that does not come near the significant data
margin, Excel can be a versatile tool for data analysis.
2. Tableau
It falls under the BI Tool category, made for the sole purpose of data analysis. The essence of
Tableau is the Pivot Table and Pivot Chart and works towards representing data in the most
user-friendly way. It additionally has a data cleaning feature along with brilliant analytical
functions.
3. Power BI
It initially started as a plugin for Excel, but later on, detached from it to develop in one of the
most data analytics tools. It comes in three versions: Free, Pro, and Premium. Its PowerPivot
and DAX language can implement sophisticated advanced analytics similar to writing Excel
formulas.
4. Fine Report
Fine Report comes with a straightforward drag and drops operation, which helps design
various reports and build a data decision analysis system. It can directly connect to all kinds
of databases, and its format is similar to that of Excel. Additionally, it also provides a variety
of dashboard templates and several self-developed visual plug-in libraries.

5. R & Python
These are programming languages that are very powerful and flexible. R is best at statistical
analysis, such as normal distribution, cluster classification algorithms, and regression
analysis. It also performs individual predictive analyses like customer behavior, spending,
items preferred by him based on his browsing history, and more. It also involves concepts of
machine learning and artificial intelligence.

6. SAS
It is a programming language for data analytics and data manipulation, which can easily
access data from any source. SAS has introduced a broad set of customer profiling products
for web, social media, and marketing analytics. It can predict their behaviors, manage, and
optimize communications.

Conclusion
This is our complete beginner's guide on "What is Data Analysis". If you want to learn more
about data analysis, Complete Introduction to Business Data Analysis is a great introductory
course.

Data Analysis is the key to any business, whether starting up a new venture, making
marketing decisions, continuing with a particular course of action, or going for a complete
shut-down. The inferences and the statistical probabilities calculated from data analysis help
base the most critical decisions by ruling out all human bias. Different analytical tools have
overlapping functions and different limitations, but they are also complementary tools.
Before choosing a data analytical tool, it is essential to consider the scope of work,
infrastructure limitations, economic feasibility, and the final report to be prepared.

You might also like