Chapter 16 Exploring

The document discusses various techniques for exploratory data analysis including visualizations like frequency tables, bar charts, histograms, stem-and-leaf displays, boxplots, and cross-tabulation tables. It emphasizes that exploratory data analysis uses visual representations to provide insights and diagnostics about relationships in the data. Cross-tabulation tables in particular examine relationships between categorical variables and serve as a framework for statistical testing and decision-making. Guidelines are also provided for effective usage of percentages in data presentation and analysis.

Uploaded by

kingbahbry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views7 pages

Chapter 16 Exploring

Uploaded by

kingbahbry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Chapter 16 Exploring, Displaying, and Examining Data

Learning Objectives
•That exploratory data analysis techniques provide insights and data diagnostics by
emphasizing visual representations of the data.
•How cross-tabulation is used to examine relationships involving categorical variables, serves
as a framework for later statistical testing, and makes an efficient tool for data visualization
and later decision-making.

> Exploratory Data Analysis

In exploratory data analysis, the researcher has the flexibility to respond to the patterns
revealed in the preliminary analysis of the data. Patterns in the collected data guide the data
analysis or suggest revisions to the preliminary data analysis plan. This flexibility is an
important attribute of this approach. When the researcher is attempting to show causation,
confirmatory data analysis is required.
Confirmatory data analysis is an analytical process guided by classical statistical inference
in its use of significance testing and confidence.

Exhibit 16-1 reminds one of the importance of data visualization as an integral element in the
data analysis process and as a necessary step prior to hypothesis testing.
Summary statistics, as you will see momentarily, may obscure, conceal, or even misrepresent
the underlying structure of the data. When numerical summaries are used exclusively and
accepted without visual inspection, the selection of confirmatory models may be based on
flawed assumptions. 3 For these reasons, data analysis should begin with visual inspection.
After that, it is not only possible but also desirable to cycle between exploratory and
confirmatory approaches.
Frequency Tables, Bar Charts, and Pie Charts
Display techniques like frequency tables, pie charts, and bar charts prove crucial in
illustrating data, with examples showcasing their effectiveness in depicting social networking
age preferences; however, for variables with numerous values, such methods may lack
informative value, as seen in PrimeSell's top 50 customers' annual purchases due to sparse
data distribution.

Exhibit 16-2 A Frequency Table (Minimum Age for Social Networking)

Histograms
Histograms are effective for interval-ratio data, aiding in visualizing distribution patterns,
identifying skewness, kurtosis, and gaps, exemplified in the depiction of PrimeSell's average
annual purchases; however, unsuitable for nominal variables like minimum age for social
networking due to lack of categorical order exemplified in PrimeSell's average annual
purchases display (Exhibit 16-5)

Stem-and-Leaf Displays
The stem-and-leaf display is a technique that is closely related to the histogram. The stem-
and-leaf display, akin to histograms, offers direct access to individual data values, preserving
their order and aiding in summary statistics and data linking.
Exhibit 16-6 :The stem-and-leaf display is a technique that is closely related to the
histogram. It shares some of the histogram’s features but offers several unique advantages.
• In contrast to histograms, which lose information by grouping data values into
intervals, the stem-and-leaf presents actual data values that can be inspected directly,
without the use of enclosed bar or asterisks as the representation medium.
• Visualization is the second advantage of stem-and-leaf displays. The range of values
is apparent at a glance, and both shape and spread impressions are immediate.
Patterns in the data are easily observed.
Each line or row in the display is referred to as a stem, and each piece of information on the
stem is called a leaf. In the first stem, there are 12 items (leaves) in the data set whose first
digit is 5.
455666788889 representing 54,55,55,56,56,56,57,58,58,58,58,59
The second line shows that there are eight average annual purchase values whose first digit is
six.
12366799 representing 61,62,63,66,66,67,69,69

Pareto Diagrams
The Pareto diagram is a bar chart whose percentages sum to 100 percent. The data are
derived from a multiple-choice, single-response scale; a multiple-choice, multiple-response
scale; or frequency counts of words (or themes) from content analysis. The respondents’
answers are sorted in decreasing importance, with bar height in descending order from left to
right.

Boxplots
Boxplots summarize distribution details like location, spread, and outliers based on resistant
statistics, offering a concise visualization distinct from stem-and-leaf displays.
Exhibit 16-8
The boxplot, or box-and-whisker plot, is another technique used frequently in exploratory
data analysis. A boxplot reduces the detail of the stem-and-leaf display and provides a
different visual image of the distribution’s location, spread, shape, tail length, and outliers.
Boxplots are extensions of the five-number summary of a distribution. This summary consists
of the median, the upper and lower quartiles, and the largest and smallest observations.
• The median and quartiles are used because they are particularly resistant statistics.
Resistance is a characteristic that provides insensitivity to localized misbehavior in
data.
• The mean and standard deviation are considered nonresistant statistics, because they
are susceptible to the effects of extreme values in the tails of the distribution and do
not represent typical values well under conditions of asymmetry.
• Boxplots may be constructed easily by hand or by computer programs. The
ingredients of the plot are
1) The rectangular plot that encompasses 50% of the data values,
2) A center line--marking the median and going through the width of the box,
3) The edges of the box, called hinges, and
4) The whiskers that extend from the right and left hinges to the largest and
smallest values. These values may be found within 1.5 times the interquartile
range (IQR) from either edge of the box.
Exhibit 16-9
Exhibit 16-9 summarizes several comparisons that are of help to the analyst. Boxplots are an
excellent diagnostic tool, especially when graphed on the same scale. The upper two plots in
the exhibit are both symmetric, but one is larger than the other. Larger box widths are
sometimes used when the second variable, from the same measurement scale, comes from a
larger sample size. The box widths should be proportional to the square root of the sample
size, but not all plotting programs account for this. Right- and left-skewed distributions and
those with reduced spread are also presented clearly in the plot comparison. Groups may be
compared by means of multiple plots.

Mapping
Geographic Information Systems (GIS) integrate participant data with geographic
dimensions, employing maps to visualize demographics, behaviour, and segmentation for
targeted promotions and product rollouts, demanding specialized software and expertise for
comprehensive exploratory analysis.

> Cross-Tabulation
Cross-tabulation provides valuable insights by comparing categorical variables, utilizing
tables to analyze relationships between demographics and target variables for effective data
examination.

Exhibit 16-11 demonstrates computer-generated cross-tabulation, comparing categorical

variables like gender and assignment selection using tables to display joint classifications,
counts, and percentages, typically arranged in rows and columns with row and column totals.
The Use of Percentages
Percentages serve two purposes in data presentation. First, they simplify the data by reducing
all numbers to a range from 0 to 100. Second, they translate the data into standard form, with
a base of 100, for relative comparisons.
Exhibit 16-12
Percentages serve two purposes in data presentation.
• They simplify the data by reducing all numbers to a range from 0 to 100.
• They also translate the data into standard form with a base of 100 for relative
comparisons.
One can see in Exhibit 16-12 that the percentage of females selected for overseas assignments
rose from 15.8 to 22.5 percent of their respective samples. Among all overseas selectees, in
the first study, 21.4% were women, while in the second study, 37.5% were women. The
tables verify an increase in women with overseas assignments, but we cannot conclude that
their gender had anything to do with the increase.

Guidelines for using Percentages

• Averaging percentages. • Use of too large percentages.
• Using too small a base. • Percentage decreases can never exceed 100 percent
These guidelines for using percentages aim to prevent errors in reporting by advocating
weighted averaging for percentiles, suggesting alternative descriptions for large percentage
changes, revealing the base for contextualization, and ensuring the higher figure is used as the
denominator for accuracy in calculations
Other Table-Based Analysis
The recognition of a meaningful relationship between variables generally signals a need for
further investigation. Even if one finds a statistically significant relationship, the questions of
why and under what conditions remain. The introduction of a control variable to interpret the
relationship is often necessary. Cross-tabulation tables serve as the framework.
Exhibit 16-14
An advanced variation on n-way tables is automatic interaction detection (AID).
• AID is a computerized statistical process that requires that the researcher
identify a dependent variable and a set of predictors or independent variables.
The computer then searches among up to 300 variables for the best single
division of the data according to each predictor variable, chooses one, and
splits the sample using a statistical test to verify the appropriateness of this
choice.
Exhibit 16-14 shows the tree diagram that resulted from an AID study of customer
satisfaction with Mind Writer’s Complete Care repair service. The initial dependent variable
is the overall impression of the repair service. The variable was measured on an interval scale
of 1 to 5. The variables that contribute to perceptions of repair effectiveness were also
measured on the same scale but were rescaled to ordinal data for this example. The top box
shows that 62% of the respondents rated the repair service as excellent. The best predictor of
repair effectiveness s “resolution of the problem.”

Ariq 24m
philip13m
Abdul aziz 9m
Johan11 h 49 m
Careldem4h 17m

7.13 Kuhn Chapter 9
No ratings yet
7.13 Kuhn Chapter 9
25 pages
Causation and Explanation in Social Science
No ratings yet
Causation and Explanation in Social Science
28 pages
Structure of Scientific Revolutions PP Presentation1
100% (1)
Structure of Scientific Revolutions PP Presentation1
28 pages
BIDERMAN - The Playfair Enigma - The Development of The Schematic Representation of Statistics Idj.6.1.01 PDF
No ratings yet
BIDERMAN - The Playfair Enigma - The Development of The Schematic Representation of Statistics Idj.6.1.01 PDF
23 pages
Statistics For The Social Sciences
No ratings yet
Statistics For The Social Sciences
101 pages
Paradigm, Theory and Methods
100% (1)
Paradigm, Theory and Methods
24 pages
Scholarly Book Reviewing in The Social Sciences and Humanities: The Flow of Ideas Within and Among Disciplines
No ratings yet
Scholarly Book Reviewing in The Social Sciences and Humanities: The Flow of Ideas Within and Among Disciplines
164 pages
Chapter 3 Identifying Your Paradigm
No ratings yet
Chapter 3 Identifying Your Paradigm
24 pages
9what Is A Paradigm
No ratings yet
9what Is A Paradigm
3 pages
Approaching Development-An Opinionated Review
No ratings yet
Approaching Development-An Opinionated Review
17 pages
Freedman - Shoe Leather Statistical Model
No ratings yet
Freedman - Shoe Leather Statistical Model
24 pages
Comm 215.MidtermReview
No ratings yet
Comm 215.MidtermReview
71 pages
Response To Collins About One Point' That Is Absent From My Review of His Book - Yves Gingras 2009
No ratings yet
Response To Collins About One Point' That Is Absent From My Review of His Book - Yves Gingras 2009
1 page
Teaching and Learning Ontology and Epistemology in Political Science.
No ratings yet
Teaching and Learning Ontology and Epistemology in Political Science.
9 pages
Jeanjean e Ramirez (2009)
No ratings yet
Jeanjean e Ramirez (2009)
21 pages
Messerschmidt, James W. - Masculinities and Crime
No ratings yet
Messerschmidt, James W. - Masculinities and Crime
10 pages
8) Multilevel Analysis
No ratings yet
8) Multilevel Analysis
41 pages
Undisciplining Knowledge Interdisciplinarity in The Twentieth Century (Harvey J. Graff)
No ratings yet
Undisciplining Knowledge Interdisciplinarity in The Twentieth Century (Harvey J. Graff)
343 pages
Robert Merton Three Fragments From A Sociologist S Notebooks Establishing The Phenomenon, Specified Ignorance, and Strategic Research Materials
No ratings yet
Robert Merton Three Fragments From A Sociologist S Notebooks Establishing The Phenomenon, Specified Ignorance, and Strategic Research Materials
29 pages
7001statistical Methods For The Social Sciences 4th Edition Alan Agresti Newest Edition 2025
100% (8)
7001statistical Methods For The Social Sciences 4th Edition Alan Agresti Newest Edition 2025
148 pages
Braverman
No ratings yet
Braverman
183 pages
Institutionalism in Global Management
No ratings yet
Institutionalism in Global Management
15 pages
Mixed Methods Research Guide
No ratings yet
Mixed Methods Research Guide
23 pages
Quantitative Research for Theses
No ratings yet
Quantitative Research for Theses
58 pages
Introduction To Falsificationism
No ratings yet
Introduction To Falsificationism
2 pages
Handouts (Workshop 1)
No ratings yet
Handouts (Workshop 1)
50 pages
Sustainable Livelihoods Concepts
No ratings yet
Sustainable Livelihoods Concepts
20 pages
Mixed Methods Research PDF
No ratings yet
Mixed Methods Research PDF
14 pages
01 Chapter 1 - Thinking Critically With Psychological Science - Student
No ratings yet
01 Chapter 1 - Thinking Critically With Psychological Science - Student
32 pages
How To Write Chapter 1-3
No ratings yet
How To Write Chapter 1-3
23 pages
Materials Chemistry - Intro
No ratings yet
Materials Chemistry - Intro
11 pages
1 Merged PDF
No ratings yet
1 Merged PDF
296 pages
Is Future Given 1st Edition Ilya Prigogine Download
No ratings yet
Is Future Given 1st Edition Ilya Prigogine Download
117 pages
Theory and Reality An Introduction To The Philosophy of Science Science and Its Conceptual Foundations Series First Edition Peter Godfrey-Smith Ready To Read
100% (3)
Theory and Reality An Introduction To The Philosophy of Science Science and Its Conceptual Foundations Series First Edition Peter Godfrey-Smith Ready To Read
124 pages
Introducing Research
No ratings yet
Introducing Research
30 pages
Popper's Demarcation Problem Analysis
No ratings yet
Popper's Demarcation Problem Analysis
10 pages
02 Methodology and Methods in IR
100% (1)
02 Methodology and Methods in IR
15 pages
Lijphart - Comparative Politics and Comparative Methods PDF
No ratings yet
Lijphart - Comparative Politics and Comparative Methods PDF
12 pages
Max Webers Iron Cage
No ratings yet
Max Webers Iron Cage
12 pages
Knowledge and Science
No ratings yet
Knowledge and Science
7 pages
Theory Construction and Model Building Skills A Practical Guide For Social Scientists 2nd Edition James Jaccard Instant Download
100% (1)
Theory Construction and Model Building Skills A Practical Guide For Social Scientists 2nd Edition James Jaccard Instant Download
56 pages
STS: Science, Technology and Society
No ratings yet
STS: Science, Technology and Society
16 pages
(Ebook) Real Stats: Using Econometrics For Political Science and Public Policy by Bailey, Michael A. ISBN 9780199981946, 0199981949 Full
No ratings yet
(Ebook) Real Stats: Using Econometrics For Political Science and Public Policy by Bailey, Michael A. ISBN 9780199981946, 0199981949 Full
116 pages
Complexity and Collapse
No ratings yet
Complexity and Collapse
8 pages
Henchey, N (1978) - Making Sense of Future Studies
100% (1)
Henchey, N (1978) - Making Sense of Future Studies
5 pages
Paradigms and Scientific Revolutions
100% (1)
Paradigms and Scientific Revolutions
12 pages
DEV 1150 - Moonga - Impasse of Development Lecture
No ratings yet
DEV 1150 - Moonga - Impasse of Development Lecture
6 pages
Critical Cartography
No ratings yet
Critical Cartography
23 pages
Failure Charisma
No ratings yet
Failure Charisma
357 pages
Sandra Halperin and Oliver Heath - Political Research - Methods and Practical Skills (2012)
No ratings yet
Sandra Halperin and Oliver Heath - Political Research - Methods and Practical Skills (2012)
460 pages
Introduction To Focus Group Discussion: By: Macaraig, Charis G. Manzon, Dianne Rae O. Serrano, Jasmin Cayte C
No ratings yet
Introduction To Focus Group Discussion: By: Macaraig, Charis G. Manzon, Dianne Rae O. Serrano, Jasmin Cayte C
19 pages
4 Functionalism - Anthropology
No ratings yet
4 Functionalism - Anthropology
7 pages
Rhetoric of Science - Wikipedia PDF
No ratings yet
Rhetoric of Science - Wikipedia PDF
14 pages
Understanding Worldview Basics
100% (1)
Understanding Worldview Basics
15 pages
Exploratory Data Analysis Guide
No ratings yet
Exploratory Data Analysis Guide
8 pages
Unit 01 Statistics
No ratings yet
Unit 01 Statistics
10 pages
AP Statistics Chapter 1-3 Outlines
No ratings yet
AP Statistics Chapter 1-3 Outlines
9 pages
As Level Math STATISTIC
No ratings yet
As Level Math STATISTIC
32 pages
Module 2 - Descriptive Statistics - PPT-3
No ratings yet
Module 2 - Descriptive Statistics - PPT-3
31 pages
Unit 2 Chapter 2 Notes - Statistics
No ratings yet
Unit 2 Chapter 2 Notes - Statistics
4 pages
Single Phase Half Controlled Converter With R Load
No ratings yet
Single Phase Half Controlled Converter With R Load
3 pages
Criminology The Essentials 3rd Edition Anthony Walsh ISBN 9781506359717 ISBN 9781544341651 Official Test Bank
No ratings yet
Criminology The Essentials 3rd Edition Anthony Walsh ISBN 9781506359717 ISBN 9781544341651 Official Test Bank
333 pages
Private Cloud Adoption Benefits
No ratings yet
Private Cloud Adoption Benefits
10 pages
PCA82C250 / 251 CAN Transceiver: Application Note
No ratings yet
PCA82C250 / 251 CAN Transceiver: Application Note
24 pages
International Transactions On Electrical Energy Systems - 2023 - Kumar - Intelligent Controller Design and Fault Prediction
No ratings yet
International Transactions On Electrical Energy Systems - 2023 - Kumar - Intelligent Controller Design and Fault Prediction
9 pages
Sched
No ratings yet
Sched
6 pages
Basic Theorems On Infinite Series
No ratings yet
Basic Theorems On Infinite Series
5 pages
CBSE UGC NET Computer Science Solved Paper II Dec 2015
No ratings yet
CBSE UGC NET Computer Science Solved Paper II Dec 2015
8 pages
Configuring The Cisco 2960X Series of Switches For Livewire®
No ratings yet
Configuring The Cisco 2960X Series of Switches For Livewire®
5 pages
Engineering Students' Guide to Networks
No ratings yet
Engineering Students' Guide to Networks
86 pages
Contoh Application Job
No ratings yet
Contoh Application Job
4 pages
Hotel Security Essentials
100% (2)
Hotel Security Essentials
42 pages
Welcome To HUAWEI: User Guide HUAWEI U8185-1
No ratings yet
Welcome To HUAWEI: User Guide HUAWEI U8185-1
75 pages
Projection Slice Theorem Guide
No ratings yet
Projection Slice Theorem Guide
15 pages
How To Set Up A New Internal Audit Activity: Chartered Institute of Internal Auditors
No ratings yet
How To Set Up A New Internal Audit Activity: Chartered Institute of Internal Auditors
7 pages
Spesifikasi DJI Phantom 4 Pro V2.0 Aircraft
No ratings yet
Spesifikasi DJI Phantom 4 Pro V2.0 Aircraft
5 pages
82 Govind2
No ratings yet
82 Govind2
47 pages
Centrifugal Pumps
No ratings yet
Centrifugal Pumps
0 pages
The Best Recruitment Management Software Dubai - HR Services Middle East
No ratings yet
The Best Recruitment Management Software Dubai - HR Services Middle East
1 page
Image Segmentation: Femur
No ratings yet
Image Segmentation: Femur
18 pages
Exp 4 Libre Writer
No ratings yet
Exp 4 Libre Writer
11 pages
Ebooks File Power Tactics of Jesus Christ and Other Essays by Jay Haley The Wei Zhi All Chapters
100% (1)
Ebooks File Power Tactics of Jesus Christ and Other Essays by Jay Haley The Wei Zhi All Chapters
22 pages
Dircutaacb 1
No ratings yet
Dircutaacb 1
59 pages
Azure Questions
No ratings yet
Azure Questions
4 pages
Guidelines For Using Ibm Totalstorage Ds8000 Series For Oracle Database Disaster Recovery Using Metro Mirror Function
No ratings yet
Guidelines For Using Ibm Totalstorage Ds8000 Series For Oracle Database Disaster Recovery Using Metro Mirror Function
27 pages
E-Business - Voice Over IP
No ratings yet
E-Business - Voice Over IP
9 pages
5 AZ-104 Manage Identiies and Policy, RBAC
No ratings yet
5 AZ-104 Manage Identiies and Policy, RBAC
15 pages
Pecoff v8
No ratings yet
Pecoff v8
69 pages
IT Support & Networking Expertise
No ratings yet
IT Support & Networking Expertise
3 pages
Law Firm Receptionist Cover Letter
No ratings yet
Law Firm Receptionist Cover Letter
8 pages

Chapter 16 Exploring

Uploaded by

Chapter 16 Exploring

Uploaded by

Chapter 16 Exploring, Displaying, and Examining Data

> Exploratory Data Analysis

Exhibit 16-2 A Frequency Table (Minimum Age for Social Networking)

Exhibit 16-11 demonstrates computer-generated cross-tabulation, comparing categorical

Guidelines for using Percentages

You might also like