Statistics for Data Science -1
Statistics for Data Science -1
Describing Categorical Data- Single Variable
Usha Mohan
Indian Institute of Technology Madras
1/ 12
Statistics for Data Science -1
Mode and Median
Summarizing categorical data
I Graphical summaries of categorical data: bar chart and pie
chart.
2/ 12
Statistics for Data Science -1
Mode and Median
Summarizing categorical data
I Graphical summaries of categorical data: bar chart and pie
chart.
I Need for a compact measure.
2/ 12
Statistics for Data Science -1
Mode and Median
Summarizing categorical data
I Graphical summaries of categorical data: bar chart and pie
chart.
I Need for a compact measure.
I Numbers that are used to describe data sets are called
descriptive measures.
2/ 12
Statistics for Data Science -1
Mode and Median
Summarizing categorical data
I Graphical summaries of categorical data: bar chart and pie
chart.
I Need for a compact measure.
I Numbers that are used to describe data sets are called
descriptive measures.
I Descriptive measures that indicate where the center or most
typical value of a data set lies are called measures of central
tendency.
2/ 12
Statistics for Data Science -1
Mode and Median
Mode
Definition
The mode of a categorical variable is the most common category,
the category with the highest frequency
The mode labels
I The longest bar in a bar chart
I The widest slice in a pie chart.
I In a Pareto chart, the mode is the first category shown.
3/ 12
Statistics for Data Science -1
Mode and Median
Example
I Let consider the example A,A,B,C,A,D,A,B,C,C, A,B,C,D,A
I The longest bar in a bar chart
The most common category is ”A”
4/ 12
Statistics for Data Science -1
Mode and Median
Example
I Let consider the example A,A,B,C,A,D,A,B,C,C, A,B,C,D,A
I The widest slice in a pie chart.
The most common category is ”A” 5/ 12
Statistics for Data Science -1
Mode and Median
Bimodal and multimodal data
I If two or more categories tie for the highest frequency, the
data are said to be bimodal (in the case of two) or multimodal
(more than two).
I Let consider the example A,A,B,C,A,C,A,B,C,C,
A,C,C,D,A,A,C,D,B
I Both category ”A” and ”C” have highest frequency.
6/ 12
Statistics for Data Science -1
Mode and Median
Median
I Ordinal data offer another summary, the median, that is not
available unless the data can be put into order.
7/ 12
Statistics for Data Science -1
Mode and Median
Median
I Ordinal data offer another summary, the median, that is not
available unless the data can be put into order.
Definition
The median of an ordinal variable is the category of the middle
observation of the sorted values.
I If there are an even number of observations, choose the
category on either side of the middle of the sorted list as the
median.
7/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
8/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D
8/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
I Consider the grades of 14 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D
8/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
I Consider the grades of 14 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D
I The ordered data is A,A,A,B,B,B,B,B,B,C,C,C,D,D
8/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
I Consider the grades of 14 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D
I The ordered data is A,A,A,B,B,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 7 or 8
observation which is ”B”.
8/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
9/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D
9/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
9/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
I The most common grade is ”B”, hence mode is ”B”
9/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
I The most common grade is ”B”, hence mode is ”B”
I In this example both mode and median are the same.
9/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,A,B,A,C,B,A,C,D,A
10/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,A,B,A,C,B,A,C,D,A
I The ordered data is A,A,A,A,A,A,B,B,B,B,C,C,C,D,D
10/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,A,B,A,C,B,A,C,D,A
I The ordered data is A,A,A,A,A,A,B,B,B,B,C,C,C,D,D
10/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,A,B,A,C,B,A,C,D,A
I The ordered data is A,A,A,A,A,A,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
10/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,A,B,A,C,B,A,C,D,A
I The ordered data is A,A,A,A,A,A,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
I The most common grade is ”A”, hence mode is ”A”
10/ 12
Statistics for Data Science -1
Mode and Median
Example
I Consider the grades of 15 students which is listed as
A,B,B,C,A,D,A,B,A,C,B,A,C,D,A
I The ordered data is A,A,A,A,A,A,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
I The most common grade is ”A”, hence mode is ”A”
I In this example both mode and median are the different.
10/ 12
Statistics for Data Science -1
Mode and Median
Sectional summary
I The mode of a categorical variable is the most common
category.
I The median of an ordinal variable is the category of the
middle observation of the sorted values.
11/ 12
Statistics for Data Science -1
Mode and Median
Summary
1. Tabulate data: frequency and relative frequency.
2. Charts of categorical data
2.1 Pie charts
2.2 Bar charts and Pareto charts
3. Best practices and misleading graphs
3.1 Label your data.
3.2 Dealing with multiple categories.
3.3 Area principle
3.4 Misleading graphs
3.4.1 Decorated graphs
3.4.2 Truncated graphs.
3.4.3 Round-off errors.
4. Descriptive measures
4.1 Mode.
4.2 Median for ordinal data.
12/ 12