[go: up one dir, main page]

0% found this document useful (0 votes)
73 views7 pages

Chapter 16 Exploring

The document discusses various techniques for exploratory data analysis including visualizations like frequency tables, bar charts, histograms, stem-and-leaf displays, boxplots, and cross-tabulation tables. It emphasizes that exploratory data analysis uses visual representations to provide insights and diagnostics about relationships in the data. Cross-tabulation tables in particular examine relationships between categorical variables and serve as a framework for statistical testing and decision-making. Guidelines are also provided for effective usage of percentages in data presentation and analysis.

Uploaded by

kingbahbry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views7 pages

Chapter 16 Exploring

The document discusses various techniques for exploratory data analysis including visualizations like frequency tables, bar charts, histograms, stem-and-leaf displays, boxplots, and cross-tabulation tables. It emphasizes that exploratory data analysis uses visual representations to provide insights and diagnostics about relationships in the data. Cross-tabulation tables in particular examine relationships between categorical variables and serve as a framework for statistical testing and decision-making. Guidelines are also provided for effective usage of percentages in data presentation and analysis.

Uploaded by

kingbahbry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Chapter 16 Exploring, Displaying, and Examining Data

Learning Objectives
•That exploratory data analysis techniques provide insights and data diagnostics by
emphasizing visual representations of the data.
•How cross-tabulation is used to examine relationships involving categorical variables, serves
as a framework for later statistical testing, and makes an efficient tool for data visualization
and later decision-making.

> Exploratory Data Analysis


In exploratory data analysis, the researcher has the flexibility to respond to the patterns
revealed in the preliminary analysis of the data. Patterns in the collected data guide the data
analysis or suggest revisions to the preliminary data analysis plan. This flexibility is an
important attribute of this approach. When the researcher is attempting to show causation,
confirmatory data analysis is required.
Confirmatory data analysis is an analytical process guided by classical statistical inference
in its use of significance testing and confidence.

Exhibit 16-1 reminds one of the importance of data visualization as an integral element in the
data analysis process and as a necessary step prior to hypothesis testing.
Summary statistics, as you will see momentarily, may obscure, conceal, or even misrepresent
the underlying structure of the data. When numerical summaries are used exclusively and
accepted without visual inspection, the selection of confirmatory models may be based on
flawed assumptions. 3 For these reasons, data analysis should begin with visual inspection.
After that, it is not only possible but also desirable to cycle between exploratory and
confirmatory approaches.
Frequency Tables, Bar Charts, and Pie Charts
Display techniques like frequency tables, pie charts, and bar charts prove crucial in
illustrating data, with examples showcasing their effectiveness in depicting social networking
age preferences; however, for variables with numerous values, such methods may lack
informative value, as seen in PrimeSell's top 50 customers' annual purchases due to sparse
data distribution.

Exhibit 16-2 A Frequency Table (Minimum Age for Social Networking)

Histograms
Histograms are effective for interval-ratio data, aiding in visualizing distribution patterns,
identifying skewness, kurtosis, and gaps, exemplified in the depiction of PrimeSell's average
annual purchases; however, unsuitable for nominal variables like minimum age for social
networking due to lack of categorical order exemplified in PrimeSell's average annual
purchases display (Exhibit 16-5)

Stem-and-Leaf Displays
The stem-and-leaf display is a technique that is closely related to the histogram. The stem-
and-leaf display, akin to histograms, offers direct access to individual data values, preserving
their order and aiding in summary statistics and data linking.
Exhibit 16-6 :The stem-and-leaf display is a technique that is closely related to the
histogram. It shares some of the histogram’s features but offers several unique advantages.
• In contrast to histograms, which lose information by grouping data values into
intervals, the stem-and-leaf presents actual data values that can be inspected directly,
without the use of enclosed bar or asterisks as the representation medium.
• Visualization is the second advantage of stem-and-leaf displays. The range of values
is apparent at a glance, and both shape and spread impressions are immediate.
Patterns in the data are easily observed.
Each line or row in the display is referred to as a stem, and each piece of information on the
stem is called a leaf. In the first stem, there are 12 items (leaves) in the data set whose first
digit is 5.
455666788889 representing 54,55,55,56,56,56,57,58,58,58,58,59
The second line shows that there are eight average annual purchase values whose first digit is
six.
12366799 representing 61,62,63,66,66,67,69,69

Pareto Diagrams
The Pareto diagram is a bar chart whose percentages sum to 100 percent. The data are
derived from a multiple-choice, single-response scale; a multiple-choice, multiple-response
scale; or frequency counts of words (or themes) from content analysis. The respondents’
answers are sorted in decreasing importance, with bar height in descending order from left to
right.

Boxplots
Boxplots summarize distribution details like location, spread, and outliers based on resistant
statistics, offering a concise visualization distinct from stem-and-leaf displays.
Exhibit 16-8
The boxplot, or box-and-whisker plot, is another technique used frequently in exploratory
data analysis. A boxplot reduces the detail of the stem-and-leaf display and provides a
different visual image of the distribution’s location, spread, shape, tail length, and outliers.
Boxplots are extensions of the five-number summary of a distribution. This summary consists
of the median, the upper and lower quartiles, and the largest and smallest observations.
• The median and quartiles are used because they are particularly resistant statistics.
Resistance is a characteristic that provides insensitivity to localized misbehavior in
data.
• The mean and standard deviation are considered nonresistant statistics, because they
are susceptible to the effects of extreme values in the tails of the distribution and do
not represent typical values well under conditions of asymmetry.
• Boxplots may be constructed easily by hand or by computer programs. The
ingredients of the plot are
1) The rectangular plot that encompasses 50% of the data values,
2) A center line--marking the median and going through the width of the box,
3) The edges of the box, called hinges, and
4) The whiskers that extend from the right and left hinges to the largest and
smallest values. These values may be found within 1.5 times the interquartile
range (IQR) from either edge of the box.
Exhibit 16-9
Exhibit 16-9 summarizes several comparisons that are of help to the analyst. Boxplots are an
excellent diagnostic tool, especially when graphed on the same scale. The upper two plots in
the exhibit are both symmetric, but one is larger than the other. Larger box widths are
sometimes used when the second variable, from the same measurement scale, comes from a
larger sample size. The box widths should be proportional to the square root of the sample
size, but not all plotting programs account for this. Right- and left-skewed distributions and
those with reduced spread are also presented clearly in the plot comparison. Groups may be
compared by means of multiple plots.

Mapping
Geographic Information Systems (GIS) integrate participant data with geographic
dimensions, employing maps to visualize demographics, behaviour, and segmentation for
targeted promotions and product rollouts, demanding specialized software and expertise for
comprehensive exploratory analysis.

> Cross-Tabulation
Cross-tabulation provides valuable insights by comparing categorical variables, utilizing
tables to analyze relationships between demographics and target variables for effective data
examination.

Exhibit 16-11 demonstrates computer-generated cross-tabulation, comparing categorical


variables like gender and assignment selection using tables to display joint classifications,
counts, and percentages, typically arranged in rows and columns with row and column totals.
The Use of Percentages
Percentages serve two purposes in data presentation. First, they simplify the data by reducing
all numbers to a range from 0 to 100. Second, they translate the data into standard form, with
a base of 100, for relative comparisons.
Exhibit 16-12
Percentages serve two purposes in data presentation.
• They simplify the data by reducing all numbers to a range from 0 to 100.
• They also translate the data into standard form with a base of 100 for relative
comparisons.
One can see in Exhibit 16-12 that the percentage of females selected for overseas assignments
rose from 15.8 to 22.5 percent of their respective samples. Among all overseas selectees, in
the first study, 21.4% were women, while in the second study, 37.5% were women. The
tables verify an increase in women with overseas assignments, but we cannot conclude that
their gender had anything to do with the increase.

Guidelines for using Percentages


• Averaging percentages. • Use of too large percentages.
• Using too small a base. • Percentage decreases can never exceed 100 percent
These guidelines for using percentages aim to prevent errors in reporting by advocating
weighted averaging for percentiles, suggesting alternative descriptions for large percentage
changes, revealing the base for contextualization, and ensuring the higher figure is used as the
denominator for accuracy in calculations
Other Table-Based Analysis
The recognition of a meaningful relationship between variables generally signals a need for
further investigation. Even if one finds a statistically significant relationship, the questions of
why and under what conditions remain. The introduction of a control variable to interpret the
relationship is often necessary. Cross-tabulation tables serve as the framework.
Exhibit 16-14
An advanced variation on n-way tables is automatic interaction detection (AID).
• AID is a computerized statistical process that requires that the researcher
identify a dependent variable and a set of predictors or independent variables.
The computer then searches among up to 300 variables for the best single
division of the data according to each predictor variable, chooses one, and
splits the sample using a statistical test to verify the appropriateness of this
choice.
Exhibit 16-14 shows the tree diagram that resulted from an AID study of customer
satisfaction with Mind Writer’s Complete Care repair service. The initial dependent variable
is the overall impression of the repair service. The variable was measured on an interval scale
of 1 to 5. The variables that contribute to perceptions of repair effectiveness were also
measured on the same scale but were rescaled to ordinal data for this example. The top box
shows that 62% of the respondents rated the repair service as excellent. The best predictor of
repair effectiveness s “resolution of the problem.”

Ariq 24m
philip13m
Abdul aziz 9m
Johan11 h 49 m
Careldem4h 17m

You might also like