[go: up one dir, main page]

0% found this document useful (0 votes)
24 views30 pages

Lecture 2.5

Uploaded by

23f3004060
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views30 pages

Lecture 2.5

Uploaded by

23f3004060
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Statistics for Data Science -1

Statistics for Data Science -1


Describing Categorical Data- Single Variable

Usha Mohan

Indian Institute of Technology Madras

1/ 12
Statistics for Data Science -1
Mode and Median

Summarizing categorical data

I Graphical summaries of categorical data: bar chart and pie


chart.

2/ 12
Statistics for Data Science -1
Mode and Median

Summarizing categorical data

I Graphical summaries of categorical data: bar chart and pie


chart.
I Need for a compact measure.

2/ 12
Statistics for Data Science -1
Mode and Median

Summarizing categorical data

I Graphical summaries of categorical data: bar chart and pie


chart.
I Need for a compact measure.
I Numbers that are used to describe data sets are called
descriptive measures.

2/ 12
Statistics for Data Science -1
Mode and Median

Summarizing categorical data

I Graphical summaries of categorical data: bar chart and pie


chart.
I Need for a compact measure.
I Numbers that are used to describe data sets are called
descriptive measures.
I Descriptive measures that indicate where the center or most
typical value of a data set lies are called measures of central
tendency.

2/ 12
Statistics for Data Science -1
Mode and Median

Mode

Definition
The mode of a categorical variable is the most common category,
the category with the highest frequency
The mode labels
I The longest bar in a bar chart
I The widest slice in a pie chart.
I In a Pareto chart, the mode is the first category shown.

3/ 12
Statistics for Data Science -1
Mode and Median

Example

I Let consider the example A,A,B,C,A,D,A,B,C,C, A,B,C,D,A


I The longest bar in a bar chart

The most common category is ”A”

4/ 12
Statistics for Data Science -1
Mode and Median

Example
I Let consider the example A,A,B,C,A,D,A,B,C,C, A,B,C,D,A
I The widest slice in a pie chart.

The most common category is ”A” 5/ 12


Statistics for Data Science -1
Mode and Median

Bimodal and multimodal data


I If two or more categories tie for the highest frequency, the
data are said to be bimodal (in the case of two) or multimodal
(more than two).
I Let consider the example A,A,B,C,A,C,A,B,C,C,
A,C,C,D,A,A,C,D,B

I Both category ”A” and ”C” have highest frequency.


6/ 12
Statistics for Data Science -1
Mode and Median

Median

I Ordinal data offer another summary, the median, that is not


available unless the data can be put into order.

7/ 12
Statistics for Data Science -1
Mode and Median

Median

I Ordinal data offer another summary, the median, that is not


available unless the data can be put into order.

Definition
The median of an ordinal variable is the category of the middle
observation of the sorted values.
I If there are an even number of observations, choose the
category on either side of the middle of the sorted list as the
median.

7/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,B,B,A,C, B,B,C,D,A

8/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D

8/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
I Consider the grades of 14 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D

8/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
I Consider the grades of 14 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D
I The ordered data is A,A,A,B,B,B,B,B,B,C,C,C,D,D

8/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
I Consider the grades of 14 students which is listed as
A,B,B,C,A,D,B,B,A,C, B,B,C,D
I The ordered data is A,A,A,B,B,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 7 or 8
observation which is ”B”.

8/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,B,B,A,C, B,B,C,D,A

9/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D

9/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.

9/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
I The most common grade is ”B”, hence mode is ”B”

9/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,B,B,A,C, B,B,C,D,A
I The ordered data is A,A,A,A,B,B,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
I The most common grade is ”B”, hence mode is ”B”
I In this example both mode and median are the same.

9/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,A,B,A,C,B,A,C,D,A

10/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,A,B,A,C,B,A,C,D,A
I The ordered data is A,A,A,A,A,A,B,B,B,B,C,C,C,D,D

10/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,A,B,A,C,B,A,C,D,A
I The ordered data is A,A,A,A,A,A,B,B,B,B,C,C,C,D,D

10/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,A,B,A,C,B,A,C,D,A
I The ordered data is A,A,A,A,A,A,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.

10/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,A,B,A,C,B,A,C,D,A
I The ordered data is A,A,A,A,A,A,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
I The most common grade is ”A”, hence mode is ”A”

10/ 12
Statistics for Data Science -1
Mode and Median

Example

I Consider the grades of 15 students which is listed as


A,B,B,C,A,D,A,B,A,C,B,A,C,D,A
I The ordered data is A,A,A,A,A,A,B,B,B,B,C,C,C,D,D
I The median grade is the category associated with the 8
observation which is ”B”.
I The most common grade is ”A”, hence mode is ”A”
I In this example both mode and median are the different.

10/ 12
Statistics for Data Science -1
Mode and Median

Sectional summary

I The mode of a categorical variable is the most common


category.
I The median of an ordinal variable is the category of the
middle observation of the sorted values.

11/ 12
Statistics for Data Science -1
Mode and Median

Summary
1. Tabulate data: frequency and relative frequency.
2. Charts of categorical data
2.1 Pie charts
2.2 Bar charts and Pareto charts
3. Best practices and misleading graphs
3.1 Label your data.
3.2 Dealing with multiple categories.
3.3 Area principle
3.4 Misleading graphs
3.4.1 Decorated graphs
3.4.2 Truncated graphs.
3.4.3 Round-off errors.
4. Descriptive measures
4.1 Mode.
4.2 Median for ordinal data.
12/ 12

You might also like