[go: up one dir, main page]

0% found this document useful (0 votes)
52 views464 pages

MATH211 ProbabilityAndStatistics

Uploaded by

Blah Blah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views464 pages

MATH211 ProbabilityAndStatistics

Uploaded by

Blah Blah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 464

MAT211: Probability and Statistics1

Dr. Md. Rezaul Karim


PhD(KULeuven & UHasselt), MS(Biostatistics), MS(Statistics)
Associate Professor, Department of Statistics
Jahangirnagar University (JU), Savar, Dhaka - 1342, Bangladesh
Mobile: 01912605556, Email: mrkarim5556sets@iub.edu.bd

Spring - 2023

1
These course slides should not be reproduced nor used by others (without
permission).

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 1 / 463
Lecture Outline I

1 Lecture  1: Introduction

1.1 Problems & Motivation

1.2 Background

1.3 Text and Reference Book List

1.4 What is Statistics?

1.5 Limitation of Statistics

1.6 What is Data?

1.7 Population & Sample

1.8 Variable

1.9 Level or Scales of Measurement


Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 2 / 463
Lecture Outline II
1.10 Computer and Statistical packages

1.11 Homework

2 Lecture  2: Summarizing Raw Data

2.1 Frequency Distribution and Relative Frequency Distribution

2.2 Graphical Presentation of data

2.3 Bar Diagram

2.4 Pie Chart

2.5 Homework

3 Lecture - 3: Summarizing Raw Data (Conti...)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 3 / 463
Lecture Outline III

3.1 Line Diagram

3.2 Frequency Polygons

3.3 Ogive or Cumulative Frequency Polygons

3.4 Histogram

3.5 Stem-and-Leaf Plot

3.6 Exercises

3.7 Homework

3.8 Summarizing bivariate data

3.9 Scatter Diagram

3.10 Practice Problem in the Classroom

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 4 / 463
Lecture Outline IV

4 Lecture - 4: Measures of Central Tendency

4.1 Measures of Central Tendency

4.2 Arithmetic Mean

4.3 Geometric Mean

4.4 Harmonic Mean

4.5 Median

4.6 Mode

4.7 Quartiles and Percentiles

4.8 Five-Number summaries and Boxplot

4.9 Boxplots

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 5 / 463
Lecture Outline V
4.10 Exercises

4.11 Homework

5 Lecture - 5: Measures of Dispersion

5.1 Measure of Dispersion

5.2 Absolute Measures

5.3 Relative Measures

5.4 Exercises

5.5 Homework

6 Lecture - 6: Review Class

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 6 / 463
Lecture Outline VI

7 Lecture - 7: Class Test - 1

8 Lecture - 8: Measures of Shape of the Distribution

8.1 Shape of the Distribution

8.2 Skewness

8.3 Mean for grouped data

8.4 Weighted Mean

8.5 Median for Group Data

8.6 Mode for the Group data

8.7 Standard Deviation for Group Data


Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 7 / 463
Lecture Outline VII
8.8 Homework

9 Lecture - 9: Introduction to probability

9.1 Sample Space and Events

9.2 Events

9.3 Probabilities Dened on Events

9.4 Counting Rules, Combinations, and Permutations

9.5 Approaches Assigning Probabilities

9.6 Homework

10 Lecture - 10: Basic rules for computing probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 8 / 463
Lecture Outline VIII
10.1 Joint and Marginal Probabilities

10.2 Conditional Probability

10.3 Homework

11 Lecture - 11: Review Class

12 Lecture - 12: Mid-term test

13 Lecture 13: Normal Distribution

13.1 Random Variable

13.2 Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 9 / 463
Lecture Outline IX

14 Lecture 14: Standard Normal Distribution

14.1 Standard Normal Probability Distribution

14.2 Homework

15 Lecture 15: Class Test - 2

16 Lecture 16: Random Sampling Methods

16.1 Population & Sample

16.2 Census & Survey

16.3 Sampling Methods

16.4 Simple Random Sampling


Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 10 / 463
Lecture Outline X
16.5 How to draw a simple random sample?

17 Lecture 17: Interval estimation

17.1 Interval Estimation

17.2 Condence interval for µ based on large samples

17.3 Condence interval for µ based on small samples

17.4 Condence interval for variance and standard deviation

18 Lecture 18: Interval Estimations about Two Population Means

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 11 / 463
Lecture Outline XI
18.1 Interval Estimation of µ1 − µ2
18.2 Determining the Sample Size

19 Lectures 19: Tests of hypothesis

19.1 Problems & Motivation

19.2 The Null and Alternative Hypotheses

19.3 Key Components of Test of Hypothesis

19.4 The p -value


19.5 Basic Steps for Test of Hypothesis

19.6 Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 12 / 463
Lecture Outline XII

20 Lectures 20: Tests of hypothesis (Conti...)

20.1 Testing on a Single Proportion

21 Lecture 21: Correlation Analysis

21.1 Scatter Diagram

21.2 Covariance

21.3 Correlation Coecient

21.4 Exercises

21.5 Test for Signicance Using Correlation


Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 13 / 463
Lecture Outline XIII
21.6 Homework

22 Lecture 22: Regression Analysis

22.1 Concept of Regression Analysis

22.2 Linear Regression Model

22.3 Homework

23 Lecture 23: Regression Analysis (Conti...)

23.1 Coecient of Determination (R )


2
23.2 Adjusted R2
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 14 / 463
Lecture Outline XIV
23.3 Homework

24 Lecture 24: Review Class

24.1 Syllabus for the Final Exam and Marks Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 15 / 463
Lecture  1: Introduction

Lecture  1: Introduction

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 16 / 463
Lecture  1: Introduction

1 Lecture  1: Introduction

1.1 Problems & Motivation

1.2 Background

1.3 Text and Reference Book List

1.4 What is Statistics?

1.5 Limitation of Statistics

1.6 What is Data?

1.7 Population & Sample

1.8 Variable

1.9 Level or Scales of Measurement

1.10 Computer and Statistical packages

1.11 Homework
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 17 / 463
Lecture  1: Introduction Text and Reference Book List

Text and Reference Book List

Text Book

1 Anderson D. R., Sweeney D. J., and Thomas A. W. (2011): Statistics


for Business & Economics 11th Edition, South-Western, A Division of
Thomson Learning.

Reference list

1 Lind, A. D., Marchal, W. and Wathen, S. (2019): Statistical


Techniques in Business and Economics, 17th Edition, McGraw Hill
Inc.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 18 / 463
Lecture  1: Introduction Text and Reference Book List

Motivation Examples to Study the Course


MATH211: Probability & Statistics

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 19 / 463
Lecture  1: Introduction Text and Reference Book List

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 20 / 463
Lecture  1: Introduction Text and Reference Book List

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 21 / 463
Lecture  1: Introduction Text and Reference Book List

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 22 / 463
Lecture  1: Introduction Text and Reference Book List

graphical presentation of the 4th T20 Bangladesh vs Australia

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 23 / 463
Lecture  1: Introduction Text and Reference Book List

Budget of Bangladesh in 2022 2023

source: https://www.newagebd.net/article/172787/
big-budget-with-high-bank-borrowing-proposed

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 24 / 463
Lecture  1: Introduction Text and Reference Book List

Problems & Motivation

1 from long-term experience, a factory owner knows that a worker can


produce a product in an average time of 89 min. however, on Sunday
morning, there is the impression that it takes longer. how do you
justify whether this impression is correct or not?

2 is there any association between smoking habit and Myocardial


Infarction (MI)?

Myocardial Infarction

Smoking Heart attack No heart attack total

Smoker 325 175 500

Non-smoker 175 325 500

total 500 500 1000

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 25 / 463
Lecture  1: Introduction Text and Reference Book List

Problems & Motivation

3 do temperature and humidity aect the transmission of SARS-CoV-2?


seehttps:
//link.springer.com/article/10.1007/s40745-021-00351-y
4 eect of coee drinking on cancer
the scientist want to see the
disease and they may assume that coee drinking increases the risk of
cancer in humans

how do you justify these hypotheses?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 26 / 463
Lecture  1: Introduction Text and Reference Book List

Chapter-1: Introduction to
Statistics

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 27 / 463
Lecture  1: Introduction What is Statistics?

What is Statistics??

. Statistics
▸ it is a science of data collection, organization, summarization,
classication, comparison and drawing of inferences about the
population

▸ is a backbone of data science and is the art of learning from data

▸ it is concerned with the collection of data, its subsequent description


with presentation and its analysis, which often leads to the drawing of
conclusions

▸ also provides tools for prediction and forecasting using data and
statistical models

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 28 / 463
Lecture  1: Introduction What is Statistics?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 29 / 463
Lecture  1: Introduction What is Statistics?

The Scope of Modern Statistics

molecular biology (analysis of microarray data)

ecology (describing quantitatively how individuals in various animal


and plant populations are spatially distributed)

materials engineering (studying properties of various treatments to


retard corrosion)

marketing (developing market surveys and strategies for marketing


new products)

public health (identifying sources of diseases and ways to treat them)

civil engineering (assessing the eects of stress on structural elements


and the impacts of trac ows on communities)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 30 / 463
Lecture  1: Introduction Limitation of Statistics

Limitation of Statistics

Any Limitation!
statistics is always not suited to the study of qualitative phenomenon

statistics does not study individuals

statistical laws are not exact

statistics is liable to be misused

privacy???

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 31 / 463
Lecture  1: Introduction What is Data?

What is Data?
Data
data are individual units of information

these facts and gures are collected, analyzed, and summarized for
presentation and interpretation

all the data collected in a particular study are referred to as the data
set for the study

Datum
datum (a singular form of "data") is a single piece of information

Raw data
data are collected by survey, census etc.

it is known as ungrouped data


Note: Always we have raw data. We have to process or make data
summaries by various statistical techniques (we will learn all by Chapters
2-3).
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 32 / 463
Lecture  1: Introduction What is Data?

Types of data
1 quantitative data ex: Family size, Score, Age

▸ discrete data ex: family size, whole number


▸ continuous data ex: Score, Age
2 qualitative data (also called categorical data)ex: Gender, Grade, ID
number, cell number

▸ nominal-level data
▸ ordinal-level data
Types of data based on sources
1 primary data
2 secondary data
Types of data based on the purposes of statistical analysis
1 time series data
2 cross-sectional data
3 Pooled data

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 33 / 463
Lecture  1: Introduction What is Data?

for the purpose of statistical analysis, distinguishing time series data and
cross-sectional data are meaningful
1 cross-sectional data

▸ are data collected at the same or approximately the same point in time
for example, company's prot, students prole we collect at the
same time
Note: in this course most of the data will be considered as cross-sectional
data

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 34 / 463
Lecture  1: Introduction What is Data?

the data in Table 1.1 are cross-sectional because they describe the
ve variables for the 25 mutual funds at the same point in time
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 35 / 463
Lecture  1: Introduction What is Data?

2 time series data

▸ are data collected at dierent points in time


 for example, Exchange rate, interest rate, gross national product
(GNP), gross domestic product (GDP) and many others
▸ is a series of data points indexed in time order

▸ for example, the time series in Figure 1.1 shows the U.S. average price
per gallon of conventional regular gasoline between 2012 and 2018
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 36 / 463
Lecture  1: Introduction What is Data?

3 pooled data

▸ A combination of time series data and cross-sectional data


▸ one example is GNP per capita of all European countries over ten years
Elements
elements are the entities on which data are collected

if variable is denoted by X, then elements are x1 , x2 , . . . , xn


For the data set in Table 1.1 each individual mutual fund is an
element: the element names appear in the rst column. With 25
mutual funds, the data set contains 25 elements.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 37 / 463
Lecture  1: Introduction Population & Sample

Population and Sample

Target Population (usually, we refer to it as a population in


statistics)

▸ is the aggregate of all possible values of a variable or all possible


objects whose characteristics are of interest in any particular research
Sample
▸ is a part of the target population

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 38 / 463
Lecture  1: Introduction Population & Sample

Random Sample
▸ is a representative part of the target population which has been drawn
randomly
Census
▸ method to collect data on the entire population
▸ As per as 2022 census, Bangladesh have a population of 165,158,616
people, of which 81,712,824 are male, while 83,347,206 are female. As
many as 113,063,587 of them live in rural areas and 52,009,072 live in
Urban.
Sample survey
▸ method for collecting data about the random sample

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 39 / 463
Lecture  1: Introduction Population & Sample

Statistical methods
can be used to summarize or describe a collection of statistical
methods statistics data; this is called descriptive statistics
allows to make predictions (inferences ) from that data; this is called
inferential statistics

Parameter & Statistic


measurable of a population characteristics ⇒ parameter

measurable of a sample characteristics ⇒ statistic

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 40 / 463
Lecture  1: Introduction Variable

Variable & it's types

Variable
. is a characteristics of interest for the elements
. it is also called feature, or factor that vary or change respondent to
respondent
. denoted by X, Y , Z, or denoted by rst letter (e.g., Score (S ), Age
(A))

Types of Variable
1 qualitative variable (also known as categorical variable)
2 quantitative variable (also known as numerical variable)
▸ discrete variable
▸ continuous variable

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 41 / 463
Lecture  1: Introduction Variable

Types of Variable with Examples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 42 / 463
Lecture  1: Introduction Level or Scales of Measurement

Level or Scales of Measurement

level of measurement or scale of measure is a classication that describes


the nature of information within the values assigned to variables

1 nominal

2 ordinal

3 interval

4 ratio

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 43 / 463
Lecture  1: Introduction Level or Scales of Measurement

Please read page 6 of the textbook for details.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 44 / 463
Lecture  1: Introduction Level or Scales of Measurement

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 45 / 463
Lecture  1: Introduction Level or Scales of Measurement

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 46 / 463
Lecture  1: Introduction Level or Scales of Measurement

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 47 / 463
Lecture  1: Introduction Level or Scales of Measurement

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 48 / 463
Lecture  1: Introduction Computer and Statistical packages

Computer and Statistical packages

Statistical analyses generally involve a large amount of data. That's why


analysis frequently uses computer software. Several very useful software
are available in computing literature. These are:

Minitab

Excel

R language

MATLAB

Strata

SPSS, and many others.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 49 / 463
Lecture  1: Introduction Homework

Homework

HW: Read Chapter 1 of the text book

Exercises: 2, 4, 6, 9-13, Pages 21-23

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 50 / 463
Lecture  1: Introduction Homework

Chapter-2: Descriptive Statistics:


Tabular and Graphical
Presentations

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 51 / 463
Lecture  2: Summarizing Raw Data

Lecture  2: Summarizing Raw Data

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 52 / 463
Lecture  2: Summarizing Raw Data

2 Lecture  2: Summarizing Raw Data

2.1 Frequency Distribution and Relative Frequency Distribution

2.2 Graphical Presentation of data

2.3 Bar Diagram

2.4 Pie Chart

2.5 Homework

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 53 / 463
Lecture  2: Summarizing Raw Data

Summarizing Raw Data

Summarizing Categorical Data Summarizing Quantitative Data

Frequency Distribution Frequency Distribution

Relative Frequency Relative Frequency (with

Percent Frequency Distributions percentages)

Bar Diagram or Charts Cumulative Distributions

Pie Charts Line Diagram

Dot Plot

Histogram

Ogive

stem-and-leaf plot and


so on
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 54 / 463
Lecture  2: Summarizing Raw Data Frequency Distribution and Relative Frequency Distribution

Frequency Distribution
a frequency distribution is a tabular summary of data showing the number
(frequency) of items in each of several nonoverlapping classes

Relative frequency (rfi ) = frequency(fi ) ÷ number of observations (n)

Example: Suppose we have 15 students ll out a questionnaire. One of


the questions was, what was the score of your MATH211 course? The
data are following

Make a tabular and graphical summary of the above data.


Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 55 / 463
Lecture  2: Summarizing Raw Data Frequency Distribution and Relative Frequency Distribution

Solution:

where relative frequency (rfi = fi /n) and percent frequency distribution


(pfi = rfi × 100)

Summary: There are 6 students whose performances are excellent, 5


students show good performance and so on. There are 40% students
performances are excellent, 33% percent are good and so on.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 56 / 463
Lecture  2: Summarizing Raw Data Graphical Presentation of data

Data Presentation

Types of diagram/chart:
. bar diagram
. pie diagram
. line diagram
. frequency polygon
. cumulative frequency polygon or Ogive
. histogram
. stem-and-leaf plot
. box plot
. scatter diagram
. ⋯

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 57 / 463
Lecture  2: Summarizing Raw Data Bar Diagram

Bar Diagram
bar diagram (also called bar chart) is a graphical device for displaying
categorical data summarized in a frequency, relative frequency, or percent
frequency distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 58 / 463
Lecture  2: Summarizing Raw Data Bar Diagram

Bar Diagram (Cont...)


we can use bar diagrams to show the relative sizes of many things, such as
what type of car people have, how many customers a shop has on dierent
days and so on.frequencies

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 59 / 463
Lecture  2: Summarizing Raw Data Pie Chart

Pie Chart (Cont...)


a pie Chart (also called pie diagram) is a special chart that uses "pie
slices" to show relative sizes of data

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 60 / 463
Lecture  2: Summarizing Raw Data Pie Chart

Pie Chart (Cont...)

Category frequency (fi ) Angle = fi


∑i fi
× 360○
Comedy 4 72
Action 5 90
Romance 6 108
Drama 1 18
SciFi 4 72
Total ∑i fi = 20

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 61 / 463
Lecture  2: Summarizing Raw Data Pie Chart

Example

Back to the rst example : Suppose we have 15 students ll out a


questionnaire. One of the questions was, what was the score of your
MATH211 course? The data are following

Make a tabular and graphical summary of the above data.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 62 / 463
Lecture  2: Summarizing Raw Data Pie Chart

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 63 / 463
Lecture  2: Summarizing Raw Data Pie Chart

How to Construct a Frequency Distribution?


Steps for constructing a frequency table:
Step 1: Sort the data in ascending order
Step 2: Find minimum and maximum observation of data
Step 3: Decide on the number of classes (k ) in the frequency distribution
k = 1 + 3.322 log(n) or choose k such that 2
k
≥n

Step 4: Determine the class interval (h) size

Maximum observation - Minimum observation


h≥
Number of class (k)

Step 5: Decide the starting point: the lower class limit or class boundary
should cover the smallest value in the raw data.
Step 6: Tally and count the observations under each interval

Relative frequency (rfi ) = frequency(fi ) ÷ number of observations (n)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 65 / 463
Lecture  2: Summarizing Raw Data Pie Chart

Example
Let's take an example to understand how to construct a frequency
distribution. Suppose we have a weekly expenditure of 30 students.
Constructing a frequency distribution table for the following numbers of
observations:
423, 369, 387, 411, 393, 394, 371, 377, 389, 409, 392, 408, 431, 401, 363,
391, 405, 382, 400, 381, 399, 415, 428, 422, 396, 372, 410, 419, 386, 390

1 Step 1: sort data in ascending order

363 369 371 372 377 381 382 386 387 389 390 391 392 393 394
396 399 400 401 405 408 409 410 411 415 419 422 423 428 431

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 66 / 463
Lecture  2: Summarizing Raw Data Pie Chart

2 Step 2:
Minimum Observation is 363 and maximum observation is 431

3 Step 3: Number of classes

k = 1 + 3.322 log(30) = 5.907 ≈ 6

4 Step 4:
Maximum observation - Minimum observation
h≥
Number of class
431 − 363
≥ = 11.33 ≈ 12
6

5 Step 5: Decide the staring point 360

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 67 / 463
Lecture  2: Summarizing Raw Data Pie Chart

Frequency Distribution Table


Therefore, the distribution of weekly expenditure of 30 students is the
following.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 68 / 463
Lecture  2: Summarizing Raw Data Pie Chart

Supplementary Exercises

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 69 / 463
Lecture  2: Summarizing Raw Data Pie Chart

Frequency Distribution with Open-End Classes


Open-ended classes: there are the classes in which either the lower
limit of the rst class is not given or the upper limit of the last class is
not given or the upper limit of the last class is not given or both are
not given
a distribution with open-ended classes is known as frequency
distribution with open end classes.
example:

▸ in the rst class, lower limit is not given


▸ in the last class, upper limit is not given
▸ both these classes are open end classes
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 70 / 463
Lecture  2: Summarizing Raw Data Pie Chart

Practice Problem in the Classroom

Suppose we select 15 students. Their scores of MAT101 are the following:

Table: Test score of MAT 101

90 88 78
87 69 93
56 78 57
67 85 46
95 59 89

Summarize these data by using frequency distribution and comments on it.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 71 / 463
Lecture  2: Summarizing Raw Data Pie Chart

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 72 / 463
Lecture  2: Summarizing Raw Data Homework

Homework

HW: Read related text in Chapter 2 of the text book

Exercises: 4-10, pp.36-39

Exercises: 15-21, pp.46-48

Exercises: 39, 41, 42, pp.65-67

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 73 / 463
Lecture - 3: Summarizing Raw Data (Conti...)

Lecture - 3: Summarizing Raw Data (Conti...)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 74 / 463
Lecture - 3: Summarizing Raw Data (Conti...)

3 Lecture - 3: Summarizing Raw Data (Conti...)

3.1 Line Diagram

3.2 Frequency Polygons

3.3 Ogive or Cumulative Frequency Polygons

3.4 Histogram

3.5 Stem-and-Leaf Plot

3.6 Exercises

3.7 Homework

3.8 Summarizing bivariate data

3.9 Scatter Diagram

3.10 Practice Problem in the Classroom

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 75 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Line Diagram

Line Diagram
a line diagram is a graph that shows related information by drawing a
continuous line between all the points on a grid (usually, as change over
time)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 76 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Line Diagram

Example of Line Diagram

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 77 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Frequency Polygons

Frequency Polygons
a frequency polygon is a graph constructed by using lines to join the
midpoints of each class interval. The heights of the points represent the
frequencies.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 78 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Ogive or Cumulative Frequency Polygons

Cumulative Frequency Polygons (also called Ogive)

graphical presentation of the 4th T20 Bangladesh vs Australia

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 79 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Ogive or Cumulative Frequency Polygons

Remember: Frequency Distribution Table


Remember the last class, the distribution of weekly expenditure of 30
students is the following:

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 80 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Ogive or Cumulative Frequency Polygons

Ogive or Cumulative Frequency Polygons


a ogive is a graph whose vertices represent cumulative frequency of
the data

it is a cumulative distribution graph in which the horizontal axis


represents data values, and the vertical axis represents cumulative
relative frequencies, cumulative frequencies, or cumulative percent
frequencies

How do you interpret these plots?


Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 81 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Histogram

Bar Diagram vs Histogram

a histogram is a graphical display of data using bars of dierent class


and their frequency

it looks very much like a bar chart, but there are important
dierences between them

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 82 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Histogram

Histogram
to draw a histogram, we sometimes use the mid-value of the class on the
X-axis and frequency (or percentages of relative frequency) on the Y-axis

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 83 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Histogram

Example: Histogram of Two Groups

data about the Basketball teams and the heights of the players

the coach is comparing the heights of his team, the Markwell Cougars
to the rival team, the Sampson Hawks

Markwell Cougars:
170, 172, 175, 176, 176, 176, 178, 181, 182, 183, 183, 183, 185, 185,
187, 188, 188, 189, 190, 195

Sampson Hawks:
169, 175, 176, 176, 178, 179, 180, 183, 183, 186, 186, 186, 187, 187,
187, 187, 187, 188, 190, 191, 192

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 85 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Histogram

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 86 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Histogram

Practice Problem in the Classroom

Suppose we select 15 students. Their scores of MAT101 are the following:

Table: Test score of MAT 101

90 88 78
87 69 93
56 78 57
67 85 46
95 59 89

Summarize these data by using histogram and ogive curve.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 87 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Histogram

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 88 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Histogram

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 89 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Stem-and-Leaf Plot

Stem-and-Leaf Plot
A stem-and-leaf Plot is a special table where each data value is split into a
"stem" (the rst digit or digits) and a "leaf" (usually the last digit). Like
in this example:

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 90 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Stem-and-Leaf Plot

Stem-and-Leaf Plot
the "stem" values are listed down, and the "leaf" values go right (or
left) from the stem values
the "stem" is used to group the scores and each "leaf" shows the
individual scores within each group

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 91 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Stem-and-Leaf Plot

rounding may be needed to create a stem-and-leaf display


based on the following set of data, the stem plot below would be
created:

−23.678758, −12.45, −3.4, 4.43, 5.5, 5.678, 16.87, 24.7, 56.8


for negative numbers, a negative is placed in front of the stem unit,
which is still the value X /10
non-integers are rounded

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 92 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Exercises

Practice Problem in the Classroom

Suppose we select 15 students. Their scores of MAT101 are the following:

Table: Test score of MAT 101

90 88 78
87 69 93
56 78 57
67 85 46
95 59 89

Summarize these data by using stem-and-leaf plot.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 93 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Exercises

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 94 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Exercises

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 95 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Exercises

Supplementary Exercises

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 96 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Exercises

Exercises
1 What do you mean by Statistics? Write down some applications of
Statistics in engineering eld.
2 What is meant by population and sample? Discuss the dierent types
of data.
3 What do you mean by sampling (or sampling technique)? Discuss the
dierent types of sampling technique for selecting a sample.
4 What are the points to be borne in mind in the formation of
frequency table? Choosing appropriate class-intervals, form a
frequency table for the following data:
10.2 0.5 5.2 6.1 3.1 6.7 8.9 7.2 8.9
5.4 3.6 9.2 6.1 7.3 2.0 1.3 6.4 8.0
4.3 4.7 12.4 8.6 13.1 3.2 9.5 7.6 4.0
5.1 8.1 1.1 11.5 3.1 6.8 7.0 8.2 2.0
3.1 6.5 11.2 12.0 5.1 10.9 11.2 8.5 2.3
3.4 5.2 10.7 4.9 6.2
Draw the histogram, frequency curve and stem-and-leaf for the above
data.
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 97 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Exercises

Exercises: Ross(2020)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 98 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Homework

Homework

HW: Read related text in Chapter 2 of the text book

Exercises: 15-21, pp.46-48

Exercises: 25-28, pp.52-53 and

Exercises: 31, 33-36, pp.60-61

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 99 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Summarizing bivariate data

Summarizing bivariate data


Tabular Method-Crosstabulation

▸ A tabular summary of data for two variables. The classes for one
variable are represented by the rows; the classes for the other variable
are represented by the columns.
Graphical method- scatter diagram

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 100 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Summarizing bivariate data

Example: Test Performances of MAT 101

ID Gender Test Performance Study Hour Score of STAT


1 Male Good 10 71
2 Female Good 11 75
3 Male Excellent 14 85
4 Female Excellent 10 90
5 Male Poor 8 50
6 Female Excellent 13 88
7 Male Poor 6 45
8 Female Excellent 15 80
9 Male Good 14 65
10 Female Good 10 82
11 Male Excellent 14 92
12 Female Poor 8 55
13 Male Poor 5 40
14 Male Good 10 68
15 Female Good 9 62

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 101 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Summarizing bivariate data

Table: The Crosstabulation of Gender and Test Performance.

Gender Test Performance Total


Poor Good Excellent
Male 3 3 2 8
Female 1 3 3 7
Total 4 6 5 15

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 102 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Summarizing bivariate data

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 103 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Scatter Diagram

Scatter Diagram
a scatter diagram is a graphic tool used to display the relationship between
two variables

the dependent variable is scaled on the Y-axis and is the variable


being estimated

the independent variable is scaled on the X-axis and is the variable


used as the predictor

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 104 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Practice Problem in the Classroom

Example: Scatter Diagram of Study Hour and Score

For interpretation, we have to focus on the following points:

Strength

Shape  linear, curved etc.

Direction  positive or negative

Presence of outliers
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 105 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Practice Problem in the Classroom

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 106 / 463
Lecture - 3: Summarizing Raw Data (Conti...) Practice Problem in the Classroom

Chapter 3: Descriptive Statistics:


Numerical Measures

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 107 / 463
Lecture - 4: Measures of Central Tendency

Lecture - 4: Measures of Central Tendency

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 108 / 463
Lecture - 4: Measures of Central Tendency

4 Lecture - 4: Measures of Central Tendency

4.1 Measures of Central Tendency

4.2 Arithmetic Mean

4.3 Geometric Mean

4.4 Harmonic Mean

4.5 Median

4.6 Mode

4.7 Quartiles and Percentiles

4.8 Five-Number summaries and Boxplot

4.9 Boxplots

4.10 Exercises

4.11 Homework
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 109 / 463
Lecture - 4: Measures of Central Tendency

We will learn several numerical measures that provide a data summary


using numeric formulas. Now we will learn the following:

Measures of average: simple mean, weighted mean, median, mode,


quartiles, percentiles

Measures of variation: Range, inter-quartile range, variance, standard


deviation

Measures of skewness: symmetry, positive skewness, negative


skewness

Measures of Kurtosis: leptokurtic, platykurtic and mesokurtic

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 110 / 463
Lecture - 4: Measures of Central Tendency Measures of Central Tendency

A measure of central tendency

Questions

1 How to estimate central value of a population?


2 Would you tell a characteristic which represent whole dataset?
Measure of central tendency
a summary statistic that represents the center point or typical value
of a dataset

the most common measures of central tendency are the


(i). mean
∎ arithmetic mean
∎ harmonic mean
∎ geometric mean
(ii). median
(iii). mode

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 111 / 463
Lecture - 4: Measures of Central Tendency Arithmetic Mean

Arithmetic Mean

Arithmetic mean (AM) (usually refer sample mean)


sum all of the values and divide by the number of observations in your
dataset

the (arithmetic) mean of a set of numbers x1 , x2 , . . . , xn is typically


denoted by x̄ , is the sum of the sampled values divided by the number
of items in the sample

1
n
x1 + x2 + ⋯ + xn
x̄ = (∑ xi ) =
n i=1 n

ex: the arithmetic mean of ve values: 4, 36, 45, 50, 75 is:

x1 + x2 + ⋯ + x5 4 + 36 + 45 + 50 + 75 210
x̄ = = = = 42
5 5 5

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 112 / 463
Lecture - 4: Measures of Central Tendency Geometric Mean

Geometric Mean

Geometric mean (GM)


is an average that is useful for sets of positive numbers that are
interpreted according to their product and not their sum (as is the
case with the arithmetic mean); e.g., rates of growth

1
n n 1
x̄ = (∏ xi ) = (x1 x2 ⋯xn ) n
i= 1
ex: the geometric mean of ve values: 4, 36, 45, 50, 75 is:

1 1 √
5
x̄ = (x1 x2 ⋯x5 ) 5 = (4 × 36 × 45 × 50 × 75) 5 = 24 300 000 = 30

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 113 / 463
Lecture - 4: Measures of Central Tendency Harmonic Mean

Harmonic Mean
Harmonic mean (HM)
is an average which is useful for sets of numbers which are dened in
relation to some unit, for example speed (distance per unit of time).

n
1
− 1
x̄ = n (∑ )
i= 1 xi
ex: the harmonic mean of the ve values: 4, 36, 45, 50, 75 is

5 − 1
1 5 5
x̄ = 5 (∑ ) = 1 1 1 1 1 = 1 = 15
i= 1 xi 4 + 36 + 45 + 50 + 75 3
Relationship between AM, GM, and HM

AM ≥ GM ≥ HM
equality holds if and only if all the elements of the given sample are equal
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 114 / 463
Lecture - 4: Measures of Central Tendency Harmonic Mean

Practice problem in the Class

1 Consider a sample with data values of 10, 20, 12, 17, and 16.
Compute the arithmetic mean, geometric mean and harmonic mean
and comment on your results.

2 Consider a sample with data values of 10, 20, 12, 17,10, and 16.
Compute the arithmetic mean, geometric mean and harmonic mean
and comment on your results.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 115 / 463
Lecture - 4: Measures of Central Tendency Harmonic Mean

Properties of Arithmetic Mean

it requires at least the interval scale

all values are used

it is unique

it is easy to calculate and allow easy mathematical treatment

the sum of the deviations from the mean is 0

the arithmetic mean is the only measure of central tendency where


the sum of the deviations of each value from the mean is zero!

it is easily aected by extremes, such as very big or small numbers in


the set (non-robust)

...

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 116 / 463
Lecture - 4: Measures of Central Tendency Harmonic Mean

Merits of Mean
rigidly dened by algebraic formula
easy to calculate and simple to understand
based on all observations of the given data
capable of being treated mathematically hence it is widely used in statistical
analysis
can be computed even if the derailed distribution is not known but some of
the observation and number of the observation are known
least aected by the uctuation of sampling
for every kind of numeric data mean can be calculated
...

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 117 / 463
Lecture - 4: Measures of Central Tendency Harmonic Mean

Demerits of Mean
1 an not be computed for qualitative data like data on intelligence honesty and
smoking habit etc.
2 too much aected by extreme observations and hence it is not adequately
represent data consisting of some extreme point
3 can not be computed when class intervals have open ends
4 if any one of the data is missing then mean can not be calculated
5 ...

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 118 / 463
Lecture - 4: Measures of Central Tendency Median

Median

is the value separating the higher half from the lower half of a data
sample (a population or a probability distribution)

used to nd the index of the middle number of a data set of n


numerically ordered numbers is (n + 1)/2. This either gives the
middle number (for an odd number of values) or the halfway point
between the two middle values

ex: If there is an odd number of numbers, the middle one is picked.


e.g. consider the list of numbers 1, 3, 3, 6, 7, 8, 9. This list contains
seven numbers. The median is the fourth of them, which is 6.
If there is an even number of observations, then there is no single
middle value; the median is then usually dened to be the mean of the
two middle values. For the data set: 1, 2, 3, 4, 5, 6, 8, 9, the median
is the mean of the middle two numbers: this is (4 + 5)/2, which is 4.5

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 119 / 463
Lecture - 4: Measures of Central Tendency Median

Practice problem in the Class

1 Consider a sample with data values of 10, 20, 12, 17, and 16.
Compute the mean and median.

2 Consider a sample with data values of 10, 20, 12, 17, 10 and 16.
Compute the mean and median.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 120 / 463
Lecture - 4: Measures of Central Tendency Median

Merits of Median
1 easy to calculate and simple to understand
2 well dened an ideal average should be
3 can also be computed in case of frequency distribution with open
ended classes
4 not aected by extreme values and also interdependent of range or
dispersion of the data
5 can be determined graphically
6 proper average for qualitative data where items are not measured but
are scored
7 only suitable average when the data are qualitative & it is possible to
rank various items according to qualitative characteristics.
8 ...

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 121 / 463
Lecture - 4: Measures of Central Tendency Median

Demerits of Median
1 for computing median data needs to be arranged in ascending or
descending order
2 not based on all the observations of the data
3 can not be given further algebraic treatment
4 aected by uctuation of sampling
5 not accurate when the data size is not large
6 ...

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 122 / 463
Lecture - 4: Measures of Central Tendency Mode

Mode
is simply the number which appears most often
⧫ To nd the mode, or modal value, it is best to put the numbers in order.
hen count how many of each number. A number that appears most often
is the mode.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 123 / 463
Lecture - 4: Measures of Central Tendency Mode

Practice problem in the Classroom

1 Consider a sample with data values of 10, 12, 12, 16, and 18.
Compute the mode.

2 Consider a sample with data values of 10, 12, 12, 16, 16 and 18.
Compute the mode and comments on your results.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 124 / 463
Lecture - 4: Measures of Central Tendency Quartiles and Percentiles

Quartiles and Percentiles


based on their position in a series of observations
not necessarily central values
quartile can be dened as

n+1
Qk = ( × k) th observation k = 1 , 2, 3
4

percentile can be dened as

n+1
Pk = ( × k) th observation k = 1, 2, . . . , 99
100

decile can be dened as

n+1
Dk = ( × k) th observation k = 1, 2, . . . , 9
10

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 125 / 463
Lecture - 4: Measures of Central Tendency Quartiles and Percentiles

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 126 / 463
Lecture - 4: Measures of Central Tendency Quartiles and Percentiles

Example

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 127 / 463
Lecture - 4: Measures of Central Tendency Quartiles and Percentiles

Example

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 128 / 463
Lecture - 4: Measures of Central Tendency Five-Number summaries and Boxplot

Five-Number Summary

in a ve-number summary, ve numbers are used to summarize the data:


1 smallest (or minimum) value

2 rst quartile (Q1 )


3 median (Q2 )
4 third quantile (Q3 )
5 largest (or maximum) value

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 129 / 463
Lecture - 4: Measures of Central Tendency Boxplots

Boxplot
a boxplot is a standardized way of displaying the distribution of data
based on a ve number summary (minimum, rst quartile (Q1),
median, third quartile (Q3), and maximum)
it can tell you about your outliers and what their values are
it can also tell you if your data is symmetrical, how tightly your data
is grouped, and if and how your data is skewed.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 130 / 463
Lecture - 4: Measures of Central Tendency Boxplots

Boxplot

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 131 / 463
Lecture - 4: Measures of Central Tendency Boxplots

Example

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 132 / 463
Lecture - 4: Measures of Central Tendency Boxplots

Example

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 133 / 463
Lecture - 4: Measures of Central Tendency Boxplots

Example

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 134 / 463
Lecture - 4: Measures of Central Tendency Boxplots

Detecting Outliers

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 135 / 463
Lecture - 4: Measures of Central Tendency Boxplots

Example

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 136 / 463
Lecture - 4: Measures of Central Tendency Exercises

What do you think about the following questions?


1 What is meant by central tendency of data? What are the various
measures of central tendency and location? Explain them with
examples.

2 What are the characteristics of ideal central tendency measure?


According to you, what is the ideal central tendency measure and
why?

3 What are appropriate measure of central tendency for nominal and


ordinal data? Which measures of central tendency is applicable at all
levels of measurements? Why?

4 What is a median, quartiles, deciles and percentiles? ? The following


values represent the results of several measurements on the
concentration of calcium in a stream
14, 8, 9, 8, 13, 14, 8, 8, 12, 11, 10, 12, 12
(a). Calculate the mean concentration
(b). Calculate the median and the mode
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 137 / 463
Lecture - 4: Measures of Central Tendency Homework

Homework

Read Course Guide, pp.15-16

HW: Read related text in Chapter 3 of the text book

Exercise: 5-10, pp.92-94

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 138 / 463
Lecture - 5: Measures of Dispersion

Lecture - 5: Measures of Dispersion

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 139 / 463
Lecture - 5: Measures of Dispersion

5 Lecture - 5: Measures of Dispersion

5.1 Measure of Dispersion

5.2 Absolute Measures

5.3 Relative Measures

5.4 Exercises

5.5 Homework

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 140 / 463
Lecture - 5: Measures of Dispersion Measure of Dispersion

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 141 / 463
Lecture - 5: Measures of Dispersion Measure of Dispersion

A Measure of Dispersion

Dispersion
. is the state of getting dispersed, i. e. spread of data
. is contrasted with location or central tendency, and together they are
the most used properties of distributions
. the measure which express the scattering of observation in terms of
distances i.e., range, quartile deviation
. the measure which expresses the variations in terms of the average of
deviations of observations like mean deviation and standard deviation
. spread can also be shown in graphs: dot plots, boxplots, and stem
and leaf plots have a greater distance with samples that have a larger
dispersion and vice versa

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 142 / 463
Lecture - 5: Measures of Dispersion Measure of Dispersion

Dierent Measures of Dispersion

Absolute Measures
1 range: maximum value - minimum value
2 interquartile range (IQR): Q3 − Q1
3 semi-interquartile range : (Q3 − Q1 )/2
4 variance (or standard deviation)
5 mean absolute dierence (also known as Gini mean absolute
dierence)

6 median absolute deviation (MAD)


Relative Measures
1 coecient of variation
2 skewness
3 kurtosis

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 143 / 463
Lecture - 5: Measures of Dispersion Absolute Measures

Absolute Measures

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 144 / 463
Lecture - 5: Measures of Dispersion Absolute Measures

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 145 / 463
Lecture - 5: Measures of Dispersion Absolute Measures

Practice in Classroom

1 Consider a sample with data values of 10, 20, 12, 17, and Compute
the range and interquartile range.

2 Consider a sample with data values of 10, 20, 12, 17, and Compute
the range, interquartile range and Semi-interquartile range.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 146 / 463
Lecture - 5: Measures of Dispersion Absolute Measures

Variance
the variance (and standard deviation) is a measure of how widely
values are dispersed from the average value (the mean)

variance is an average of squared deviation of values from the mean


( µ)
−− usually, population variance is denoted by σ2
−− usually, sample variance is denoted by s
2
where

▸ for population

1 N 2 1 n 2
σ2 = 2
∑ (Xi − µ) = (∑ Xi − Nµ )
N i=1 N i=1

▸ for sample

1 n 2 1 n
s2 = ∑ (xi − x̄) = (∑ xi2 − nx̄ 2 )
n − 1 i=1 n − 1 i=1

advantages & disadvantages?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 147 / 463
Lecture - 5: Measures of Dispersion Absolute Measures

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 148 / 463
Lecture - 5: Measures of Dispersion Absolute Measures

Standard Deviation
is a square root of a variance and sample standard deviation is
denoted by s
for sample

¿ ¿
Á
s =Á
À 1
n
2 Á
∑ (xi − x̄) = Á
À 1
n
(∑ xi2 − nx̄ 2 )
n − 1 i=1 n−1 i= 1
the major characteristics of the standard deviation are:
1 it is in the same units as the original data
2 it is the square root of the average squared distance from the mean
3 it cannot be negative
4 it is the most widely reported measure of dispersion
advantages & disadvantages?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 149 / 463
Lecture - 5: Measures of Dispersion Absolute Measures

Practice in Classroom

1 Consider a sample with data values of 10, 20, 12, 17, and Compute
the range and interquartile range.

2 Consider a sample with data values of 10, 20, 12, 17, and Compute
the variance and standard deviation.

3 Consider a sample with data values of 27, 25, 20, 15, 30, 34, 28, and
25. Compute the range, interquartile range, variance, and standard
deviation.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 150 / 463
Lecture - 5: Measures of Dispersion Absolute Measures

Mean Absolute Deviation


. mean absolute deviation from the average value (the mean)
 for sample
n
1
MAD(x̄) = ∑ ∣xi − x̄∣
n i=1
where X̄ and x̄ are respectively, population mean and sample mean
. this measure does not square the distance from the mean, so it is less
aected by extreme observations than are the variance and standard
deviation
. advantages & disadvantages?
. Use data given in Table 3.2, what is Mean Absolute Deviation?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 151 / 463
Lecture - 5: Measures of Dispersion Absolute Measures

Median Absolute Deviation


. mean absolute deviation from the average value (the mean)
▸ for sample
MedAD(x̄) = median∣xi − x̃∣
where X̃ and x̃ are respectively, population mean and sample median
. this measure does not square the distance from the mean, so it is less
aected by extreme observations than are the variance and standard
deviation
. advantages & disadvantages?
. Use data given in Table 3.2, what is Median Absolute Deviation?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 152 / 463
Lecture - 5: Measures of Dispersion Relative Measures

Coecient of Variation
the coecient of variation (CV) is a statistical measure of the
dispersion of data points in a data series around the mean
the coecient of variation represents the ratio of the standard
deviation to the mean, and it is a useful statistic for comparing the
degree of variation from one data series to another, even if the means
are drastically dierent from one another
▸ for sample
s
cv = × 100%

advantages & disadvantages?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 153 / 463
Lecture - 5: Measures of Dispersion Exercises

Exercises

1 What do you mean by dispersion? What are the dierent measures of


dispersion? Discuss them with a suitable example.

2 What are the relative measures of dispersion? Discuss the importance


of coecient of variation.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 154 / 463
Lecture - 5: Measures of Dispersion Homework

Homework

Course Guide, pp.17-18

HW: Read related text in Chapter 3 of the text book

Exercise: 16-24, pp.100-102 and

Exercise: 40-41, pp.112-113

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 155 / 463
Lecture - 6: Review Class

Lecture - 6: Review Class

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 156 / 463
Lecture - 6: Review Class

6 Lecture - 6: Review Class

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 157 / 463
Lecture - 7: Class Test - 1

Lecture - 7: Class Test - 1

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 158 / 463
Lecture - 7: Class Test - 1

7 Lecture - 7: Class Test - 1

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 159 / 463
Lecture - 8: Measures of Shape of the Distribution

Lecture - 8: Measures of Shape of the Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 160 / 463
Lecture - 8: Measures of Shape of the Distribution

8 Lecture - 8: Measures of Shape of the Distribution

8.1 Shape of the Distribution

8.2 Skewness

8.3 Mean for grouped data

8.4 Weighted Mean

8.5 Median for Group Data

8.6 Mode for the Group data

8.7 Standard Deviation for Group Data

8.8 Homework

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 161 / 463
Lecture - 8: Measures of Shape of the Distribution Shape of the Distribution

Measures of Distribution Shape

shape of the distribution

▸ skewness
▸ kurtosis

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 162 / 463
Lecture - 8: Measures of Shape of the Distribution Shape of the Distribution

Skewness

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 163 / 463
Lecture - 8: Measures of Shape of the Distribution Shape of the Distribution

Shape of the Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 164 / 463
Lecture - 8: Measures of Shape of the Distribution Skewness

Skewness
skewness characterizes the degree of asymmetry of a distribution
around its mean

▸ Fisher-Pearson coecient of skewness

n n
xi − x̄ 3
skewness: s1 = ∑( )
(n − 1)(n − 2) i=1 s

▸ Pearson median coecient of skewness


x̄ − median
skewness: s1 = 3 ( )
s
positive skewness indicates a distribution with an asymmetric tail
extending toward more positive values ⇒right skewed distribution

zero indicates a symmetric distribution ⇒mean = median = mode

negative skewness indicates a distribution with an asymmetric tail


extending toward more negative values ⇒ left skewed distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 165 / 463
Lecture - 8: Measures of Shape of the Distribution Skewness

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 166 / 463
Lecture - 8: Measures of Shape of the Distribution Skewness

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 167 / 463
Lecture - 8: Measures of Shape of the Distribution Skewness

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 168 / 463
Lecture - 8: Measures of Shape of the Distribution Skewness

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 169 / 463
Lecture - 8: Measures of Shape of the Distribution Skewness

Kurtosis

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 170 / 463
Lecture - 8: Measures of Shape of the Distribution Skewness

Kurtosis

kurtosis characterizes the relative peakedness or atness of a


distribution compared with the normal distribution

the kurtosis for a normal distribution is 3

in other words, kurtosis identies whether the tails of a given


distribution contain extreme values

▸ most software packages (including Microsoft Excel) use the formula


below: coecient of kurtosis (or, excess kurtosis)

n(n + 1) n
xi − x̄ 4 3(n − 1)2
s2 = ∑( ) −
(n − 1)(n − 2)(n − 3) i=1 s (n − 2)(n − 3)

it is also called coecient of kurtosis (or, excess kurtosis)


Dr. Wheeler denes kurtosis as:
The kurtosis parameter is a measure of the combined weight of the
tails relative to the rest of the distribution.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 171 / 463
Lecture - 8: Measures of Shape of the Distribution Skewness

Kurtosis

positive kurtosis indicates a relatively peaked distribution ⇒


Leptokurtic
the leptokurtic distribution shows heavy tails on either side, indicating
large outliers

zero indicates a normal distribution ⇒ Mesokurtic

negative kurtosis indicates a relatively at distribution ⇒Platykurtic


the at tails indicate the small outliers in a distribution

advantages & disadvantages?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 172 / 463
Lecture - 8: Measures of Shape of the Distribution Mean for grouped data

Exercises

1 Suppose you have following data:


423, 369, 387, 411, 393, 394, 371, 377, 389, 409, 392, 408, 431, 401,
363, 391, 405, 382, 400, 381, 399, 415, 428, 422, 396, 372, 410, 419,
386, 390.
Draw a frequency distribution table and hence calculate mean,
median and mode.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 173 / 463
Lecture - 8: Measures of Shape of the Distribution Mean for grouped data

for group data:

∑ki=1 fi xi f1 x1 + f2 x2 + ⋯ + fk xk
x̄ = =
∑ki=1 fi ∑ki=1 fi

example:

▸ the weights (in gms) of 30 articles are given below :


14, 16, 16, 14, 22, 13, 15, 24, 23, 14, 20, 17, 21, 18, 18, 19, 20, 17,
16, 15, 11, 22, 21, 20, 17, 18, 19, 22, 23.
▸ form a grouped frequency table, by dividing the variate range into
intervals of equal width, one class being 11-13 and then compute the
arithmetic mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 174 / 463
Lecture - 8: Measures of Shape of the Distribution Mean for grouped data

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 175 / 463
Lecture - 8: Measures of Shape of the Distribution Weighted Mean

Weighted Mean

▸▸ the weighted mean is found by multiplying each observation by its


corresponding weight

w1 x1 + w2 x2 + ⋯ + wk xk
x̄w =
w1 + w2 + ⋯ + wk
Self Exercise: Springers sold 95 Antonelli men's suits for the regular price
of $400. For the spring sale, the suits were reduced to $200 and 126 were
sold. At the nal clearance, the price was reduced to $100 and the
remaining 79 suits were sold.

(a). What was the weighted mean price of an Antonelli suit?

(b). Springers paid $200 a suit for the 300 suits. Comment on the store's
prot per suit if a salesperson receives a $25 commission for each one
sold.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 176 / 463
Lecture - 8: Measures of Shape of the Distribution Weighted Mean

Practice problem in the Classroom

Consider the following data and corresponding weights.

xi weight (wi )
3.2 6
2.0 3
2.5 2
5.0 8

(i). Compute the weighted mean.

(ii). Compute the sample mean of the four data values without weighting.
Note the dierence in the results provided by the two computations.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 177 / 463
Lecture - 8: Measures of Shape of the Distribution Median for Group Data

Median for Group Data

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 178 / 463
Lecture - 8: Measures of Shape of the Distribution Median for Group Data

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 179 / 463
Lecture - 8: Measures of Shape of the Distribution Median for Group Data

Consider the following data.

Class Interval frequency (fi )


4-6 6
6-8 4
8-10 2
10-12 8

Compute the mean and median.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 180 / 463
Lecture - 8: Measures of Shape of the Distribution Mode for the Group data

Mode for the Group data

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 181 / 463
Lecture - 8: Measures of Shape of the Distribution Mode for the Group data

Mode for group data

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 182 / 463
Lecture - 8: Measures of Shape of the Distribution Mode for the Group data

Practice problem in the Classroom

1 Consider the following data.

Class Interval frequency (fi )


4-6 6
6-8 4
8-10 2
10-12 8

Compute the mode and comment on your results.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 183 / 463
Lecture - 8: Measures of Shape of the Distribution Standard Deviation for Group Data

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 184 / 463
Lecture - 8: Measures of Shape of the Distribution Standard Deviation for Group Data

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 185 / 463
Lecture - 8: Measures of Shape of the Distribution Standard Deviation for Group Data

Exercises: Ross(2020)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 186 / 463
Lecture - 8: Measures of Shape of the Distribution Standard Deviation for Group Data

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 187 / 463
Lecture - 8: Measures of Shape of the Distribution Homework

Homework

Course Guide, pp.20-28

HW: Read related text in Chapter 3 of the text book

Exercise: 54-57, pp.128-129 and

Case problems 1, 2, 3, 4, pp.137-141

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 188 / 463
Lecture - 8: Measures of Shape of the Distribution Homework

Chapter - 4: Introduction to
Probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 189 / 463
Lecture - 9: Introduction to probability

Lecture - 9: Introduction to probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 190 / 463
Lecture - 9: Introduction to probability

9 Lecture - 9: Introduction to probability

9.1 Sample Space and Events

9.2 Events

9.3 Probabilities Dened on Events

9.4 Counting Rules, Combinations, and Permutations

9.5 Approaches Assigning Probabilities

9.6 Homework

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 191 / 463
Lecture - 9: Introduction to probability

Chapter 3: Introduction to
probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 192 / 463
Lecture - 9: Introduction to probability

Chance or Probability Theory

What is the chance of getting grade-A for the course MAT-211?

What are the chances that sales will decrease if we increase prices?

What is the likelihood a new assembly method will increase


productivity?

How likely is it that the project will be nished on time?

What is the chance that a new investment will be protable?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 193 / 463
Lecture - 9: Introduction to probability Sample Space and Events

Sample Space and Events

Sample Space
an experiment is any action whose outcome is subject to uncertainty
(randomness) i.e., outcome is not predictable in advance

let us suppose that the set of all possible outcomes of an experiment


is known

the sample space of an experiment is the set of all possible outcomes


of the experiment and is denoted by S or Ω
Events
an event is a subset of the sample space
for example, S = {1, 3, 4} and A = {4, 1}, since A is a subset of the
sample space S and hence, A in an event

random experiment is the process of getting all possible events

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 194 / 463
Lecture - 9: Introduction to probability Sample Space and Events

Examples: Sample Space and Events

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 195 / 463
Lecture - 9: Introduction to probability Sample Space and Events

Other Examples: Sample Space

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 196 / 463
Lecture - 9: Introduction to probability Sample Space and Events

Practice Problem in the Classroom

1 What is the sample space when a coin is tossed three times?

2 What is the sample space for counting the number of females in a


group of n people?

3 A car repair is performed either on time or late and either satisfactorily


or unsatisfactorily. What is the sample space for a car repair?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 197 / 463
Lecture - 9: Introduction to probability Events

Events

Events
an event is a subset of the sample space
for example, S = {1, 3, 4} and A = {4, 1}, since A is a subset of the
sample space S and hence, A in an event
Complements of Events:
the complement of an event A, is the event consisting of everything in
the sample space S that is not contained within the event A

the notation A is used for the complement of A
for example, a six-sides dice has a sample space S = {1, 2, 3, 4, 5, 6}
if event A = {1, 3, 6} then complementary of event A = { 2, 4, 5}

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 198 / 463
Lecture - 9: Introduction to probability Probabilities Dened on Events

Probabilities Dened on Events


probability is a numerical measure of the likelihood that an event will
occur
the probability of an event is a number between 0 and 1, where,
roughly speaking, 0 indicates impossibility of the event and 1
indicates certainty

given a sample space S, probability, denoted by P, is a function that


satises the following three conditions:
(i). 0 ≤ P(E ) ≤ 1,
(ii). P(S) = 1,
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 199 / 463
Lecture - 9: Introduction to probability Probabilities Dened on Events

Practice Problem in the Classroom

1 An experiment has ve outcomes, I, II, III, IV, and V. If P(I) = 0.13,
P(II) = 0.24, P(III) = 0.07, and P(IV) = 0.38, what is P(V)?

2 An experiment has ve outcomes, I, II, III, IV, and V. If P(I) = 0.08,
P(II) = 0.20, and P(III) = 0.33, what are the possible values for the
probability of outcome V? If outcomes IV and V are equally likely,
what are their probability values?

3 An experiment has three outcomes, I, II, and III. If outcome I is twice


as likely as outcome II, and outcome II is three times as likely as
outcome III, what are the probability values of the three outcomes?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 200 / 463
Lecture - 9: Introduction to probability Counting Rules, Combinations, and Permutations

If an experiment can be described as a sequence of k steps with n1


possible outcomes on the rst step, n2 possible outcomes on the
second step, and so on, then the total number of experimental
outcomes is given by (n1 )(n2 )⋯(nk )
Example: The experiment of tossing two coins can be thought of as a
two-step experiment in which step 1 is the tossing of the rst coin
and step 2 is the tossing of the second coin. If we use H to denote a
head and T to denote a tail, (H, H) indicates the experimental
outcome with a head on the rst coin and a head on the second coin.
The sample space is

S = {(H, H), (H, T ), (T , H), (T , T )}.

Viewing the experiment of tossing two coins as a sequence of rst


tossing one coin (n1 = 2) and then tossing the other coin (n2 = 2), we
can see from the counting rule that (2)(2) = 4 distinct experimental
outcomes are possible. The number of experimental outcomes in an
experiment involving tossing six coins is (2)(2)(2)(2)(2)(2) = 64.
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 201 / 463
Lecture - 9: Introduction to probability Counting Rules, Combinations, and Permutations

Tree Diagram
A tree diagram is a graphical representation that helps in visualizing a
multiple-step experiment.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 202 / 463
Lecture - 9: Introduction to probability Counting Rules, Combinations, and Permutations

Counting rule for combination


A second useful counting rule allows one to count the number of
experimental outcomes when the experiment involves selecting n objects
from a (usually larger) set of N objects.

Consider a quality control procedure in which an inspector randomly


selects 2 of 5 parts to test for defects. In a group of ve parts, how many
combinations of two parts can be selected?
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 203 / 463
Lecture - 9: Introduction to probability Counting Rules, Combinations, and Permutations

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 204 / 463
Lecture - 9: Introduction to probability Counting Rules, Combinations, and Permutations

Permutations
A third counting rule that is permutations. It allows one to compute the
number of experimental outcomes when n objects are to be selected from
a set of N objects where the order of selection is important.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 205 / 463
Lecture - 9: Introduction to probability Approaches Assigning Probabilities

Approaches Assigning Probabilities

there are three ways of assigning probability:

(i). classical,

(ii). empirical (Relative frequency method), and

(iii). subjective

the classical and empirical methods are objective and are based on
information and data

the subjective method is based on a person's belief or estimate of an


event's likelihood
 it contains no formal calculations and only reects the subject's
opinions and past experience

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 206 / 463
Lecture - 9: Introduction to probability Approaches Assigning Probabilities

Approaches Assigning Probabilities

Classical Probability
the classical probability is based on the assumption that the outcomes
of an experiment are equally likely

using the classical viewpoint, the probability of an event happening is


computed by dividing the number of favorable outcomes by the
number of possible outcomes:

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 207 / 463
Lecture - 9: Introduction to probability Approaches Assigning Probabilities

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 208 / 463
Lecture - 9: Introduction to probability Approaches Assigning Probabilities

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 209 / 463
Lecture - 9: Introduction to probability Approaches Assigning Probabilities

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 210 / 463
Lecture - 9: Introduction to probability Approaches Assigning Probabilities

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 211 / 463
Lecture - 9: Introduction to probability Approaches Assigning Probabilities

Complementary Event

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 212 / 463
Lecture - 9: Introduction to probability Approaches Assigning Probabilities

Example of Events

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 213 / 463
Lecture - 9: Introduction to probability Approaches Assigning Probabilities

Intersections of Events
contains common outcomes between two events

Suppose, A = {1, 3, 4} and B = {4}. What is the intersection of event


A and B?
A ∩ B = {4}
another example:

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 214 / 463
Lecture - 9: Introduction to probability Approaches Assigning Probabilities

Mutually Exclusive Events


events are mutually exclusive if the occurrence of any one event
means that none of the others can occur at the same time

two events A and B are said to be mutually exclusive if they have no


outcomes in common

for example, A = {1, 3, 4}, B = {5}, since event A and B don't have
any common outcome, so they are Mutually Exclusive Events

Unions of Events
consists of the outcomes that are contained within at least one of the
events A and B
for example, suppose A = {1, 3, 4}, B = {5}, then A ∪ B = {1, 3, 4, 5}

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 215 / 463
Lecture - 9: Introduction to probability Approaches Assigning Probabilities

Unions of Events

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 216 / 463
Lecture - 9: Introduction to probability Homework

Homework

Read the related topics of Chapter 4 of the text book

Exercise: 1-9, pp.158-159

Exercise: 14-21, pp.162-164

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 217 / 463
Lecture - 10: Basic rules for computing probability

Lecture - 10: Basic rules for computing probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 218 / 463
Lecture - 10: Basic rules for computing probability

10 Lecture - 10: Basic rules for computing probability

10.1 Joint and Marginal Probabilities

10.2 Conditional Probability

10.3 Homework

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 219 / 463
Lecture - 10: Basic rules for computing probability

Rule of addition for computing probabilities


▸ given a sample space S , probability, denoted by P or Pr, is a function
that satises the following three conditions:
(i). 0 ≤ P(E ) ≤ 1,
(ii). Pr(S) = 1,
(iii). for mutually exclusive events E1 , E2 ∈ S
Pr(E1 ∪ E2 ) = Pr(E1 ) + Pr(E2 )

this is an addition rule for two mutually exclusive events


(iv). the addition rule for not mutually exclusive events is:
Pr(E1 ∪ E2 ) = Pr(E1 ) + Pr(E2 ) − Pr(E1 ∩ E2 )

▸ if two events are mutually exclusive, then the probability of both


occurring is denoted as P(E1 ∩ E2 ) and

P(E1 and E2 ) = P(E1 ∩ E2 ) = 0

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 220 / 463
Lecture - 10: Basic rules for computing probability

1 Example: A single 6-sided die is rolled. What is the probability of


rolling a 2 or a 5?

▸ Pr(2) = 16 and Pr(5) = 1


6
▸ therefore,

Pr(2 or 5) = Pr(2 ∪ 5) = Pr(2) + Pr(5)


1 1
= +
6 6
2
=
6
1
=
3

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 221 / 463
Lecture - 10: Basic rules for computing probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 222 / 463
Lecture - 10: Basic rules for computing probability

2 Example: In a Math class of 30 students, 17 are boys and 13 are girls.


On a unit test, 4 boys and 5 girls made an A grade. If a student is
chosen at random from the class, what is the probability of choosing
a girl or an A-grade student?

▸ Pr(girl) = 13
30 , Pr(A-grade student) =
9
30 and
5
Pr(girl ∩ A-grade student) = 30
▸ therefore,

Pr(girl or A-grade student) = Pr(girl) + Pr(A-grade student)


− Pr(girl ∩ A-grade student)
13 9 5
= + −
30 30 30
17
=
30

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 223 / 463
Lecture - 10: Basic rules for computing probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 224 / 463
Lecture - 10: Basic rules for computing probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 225 / 463
Lecture - 10: Basic rules for computing probability Joint and Marginal Probabilities

Joint and Marginal Probability

joint probability: the probability of two events both occurring; that


is, the probability of the intersection of two events

marginal probability: the values in the margins of a joint probability


table that provide the probabilities of each event separately

example:

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 226 / 463
Lecture - 10: Basic rules for computing probability Joint and Marginal Probabilities

Another Example

Find the joint and marginal probabilities table.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 227 / 463
Lecture - 10: Basic rules for computing probability Joint and Marginal Probabilities

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 228 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Independent Events and Conditional Probability


independent events
if two events, A and B are independent then the joint probability is

P(A and B) = P(A ∩ B) = P(A)P(B)


 for example, if two coins are ipped, then the chance of both
being heads is
1 1 1
2×2=4
conditional probability
▸ is the probability of a particular event occurring, given that another
event has occurred, i. e., the conditional probability is
P(A ∩ B)
P(A∣B) =
P(B)
when events are independent then the occurrence of one event does
not aect the occurrence of another, then

Pr(A∣B) = Pr(A)
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 229 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Continuous Assessment:
1 what the conditional probability when two events A and B are
mutually exclusive?
2 what the conditional probability that an event B is contained within
an event A, that is B ⊂A
Back to the Previous Example

what the conditional probability of event A occurring, given B?


Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 230 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Back to the Previous Example

(a). What is the probability that an ocer is promoted if he is a man?

(b). What is the probability that an ocer is promoted if she is a women?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 231 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Solution of (a).

Continuous Assessment:

(b). What is the probability that an ocer is promoted if she is a women?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 232 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Example: GAMES OF CHANC

Continuous Assessment:

(a). If somebody rolls a fair die without showing you but announces that
the result is even, then what is the probability of scoring a 6?

(b). What is the probability that at least one 6 is obtained on the two
dice?

(c). Suppose that somebody rolls the two dice without showing you but
announces that at least one 6 has been scored. What is the
probability that the red die scored a 6?

(d). If a red die and a blue die are thrown. What is the probability that
the red die scores a 6 given that an exactly one 6 has been score?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 233 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Solution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 234 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 235 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 236 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 237 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 238 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Example

Suppose that somebody secretly rolls two fair six-sided dice, and we
wish to compute the probability that the face-up value of the rst one
is 2, given the information that their sum is no greater than 5.

▸ Let D1 be the value rolled on die 1.


▸ Let D2 be the value rolled on die 2.
what is the probability that D1 = 2 given that D1 + D2 ≤ 5?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 239 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 240 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 241 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 242 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 243 / 463
Lecture - 10: Basic rules for computing probability Conditional Probability

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 244 / 463
Lecture - 10: Basic rules for computing probability Homework

Homework

Read the related topics of Chapter 4 of the text book

Course Guide, pp.34-36

Exercise: 22-27, pp.169-170

Exercise: 32-35, pp.176-177

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 245 / 463
Lecture - 11: Review Class

Lecture - 11: Review Class

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 246 / 463
Lecture - 11: Review Class

11 Lecture - 11: Review Class

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 247 / 463
Lecture - 12: Mid-term test

Lecture - 12: Mid-term test

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 248 / 463
Lecture - 12: Mid-term test

12 Lecture - 12: Mid-term test

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 249 / 463
Lecture - 12: Mid-term test

Chapter - 5 & 6: Discrete ad


Continuous Probability
Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 250 / 463
Lecture 13: Normal Distribution

Lecture 13: Normal Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 251 / 463
Lecture 13: Normal Distribution

13 Lecture 13: Normal Distribution

13.1 Random Variable

13.2 Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 252 / 463
Lecture 13: Normal Distribution Random Variable

Random Variable

a random variable is a numerical description of the outcome of a


random experiment

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 253 / 463
Lecture 13: Normal Distribution Random Variable

Concept of a Random Variable

consider a random experiment whose the sample space giving a


detailed description of each possible outcome when three electronic
components are tested may be written

Outcome NNN NDN NND DNN NDD DND DDN DDD


x 0 1 1 1 2 2 2 3

random variable

▸ is a variable (usually denoted by X , Y , Z , etc. and respectively it's


value is denoted by x , y , z , etc) whose value is determined by the
outcome of a random experiment
▸ is a function that associates a real number with each element in the
sample space

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 254 / 463
Lecture 13: Normal Distribution Random Variable

Example

Consider the sales of automobiles at DiCarlo Motors in Saratoga, New


York. Over the past 300 days of operation, sales data show 54 days
with no automobiles sold, 117 days with 1 automobile sold, 72 days
with 2 automobiles sold, 42 days with 3 automobiles sold, 12 days
with 4 automobiles sold, and 3 days with 5 automobiles sold. Suppose
we consider the experiment of selecting a day of operation at DiCarlo
Motors and dene the random variable of interest as x= the number
of automobiles sold during a day.

From historical data, we know x is a discrete random variable that


can assume the values 0, 1, 2, 3, 4, or 5.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 255 / 463
Lecture 13: Normal Distribution Random Variable

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 256 / 463
Lecture 13: Normal Distribution Random Variable

Graphical presentation of Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 257 / 463
Lecture 13: Normal Distribution Random Variable

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 258 / 463
Lecture 13: Normal Distribution Random Variable

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 259 / 463
Lecture 13: Normal Distribution Random Variable

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 260 / 463
Lecture 13: Normal Distribution Random Variable

Expected Value of a Random Variable


The expected value, or mean, of a random variable is a measure of
the central location for the random variable
n
µ = E (X ) = ∑ xi f (x)
i= 1

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 261 / 463
Lecture 13: Normal Distribution Random Variable

Variance Value of a Random Variable


The variance of a random variable is a measure of the variation for
2
the random variable. It is denoted by σ and dened by
n
σ 2 = E (X − µ)2 = ∑(xi − µ)2 f (x)
i= 1

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 262 / 463
Lecture 13: Normal Distribution Random Variable

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 263 / 463
Lecture 13: Normal Distribution Random Variable

Practice in Classroom

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 264 / 463
Lecture 13: Normal Distribution Normal Probability Distribution

the most important probability distribution for describing a


continuous random variable is the normal probability distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 265 / 463
Lecture 13: Normal Distribution Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 266 / 463
Lecture 13: Normal Distribution Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 267 / 463
Lecture 13: Normal Distribution Normal Probability Distribution

Characteristics of the normal distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 268 / 463
Lecture 13: Normal Distribution Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 269 / 463
Lecture 13: Normal Distribution Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 270 / 463
Lecture 14: Standard Normal Distribution

Lecture 14: Standard Normal Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 271 / 463
Lecture 14: Standard Normal Distribution

14 Lecture 14: Standard Normal Distribution

14.1 Standard Normal Probability Distribution

14.2 Homework

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 272 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

Standard Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 273 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

We start by showing how to compute the probability that z is less than or


equal to 1.00; that is, P(z ≤ 1.00). This cumulative probability is the area
under the normal curve to the left of z ≤ 1.00 in the following graph.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 274 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 275 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 276 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 277 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 278 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 279 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 280 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 281 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 282 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 283 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

Practice in Classroom

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 284 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

Practice in Classroom

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 285 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

Practice in Classroom

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 286 / 463
Lecture 14: Standard Normal Distribution Standard Normal Probability Distribution

Practice in Classroom

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 287 / 463
Lecture 14: Standard Normal Distribution Homework

Homework

Read the related topics of Chapters 5-6 of the text book

Course Guide, pp.38-40 Text

Exercise: 10-25, pp.248-250

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 288 / 463
Lecture 15: Class Test - 2

Lecture 15: Class Test - 2

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 289 / 463
Lecture 15: Class Test - 2

15 Lecture 15: Class Test - 2

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 290 / 463
Lecture 15: Class Test - 2

Chapter - 7: Sampling and


Sampling Distributions

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 291 / 463
Lecture 16: Random Sampling Methods

Lecture 16: Random Sampling Methods

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 292 / 463
Lecture 16: Random Sampling Methods

16 Lecture 16: Random Sampling Methods

16.1 Population & Sample

16.2 Census & Survey

16.3 Sampling Methods

16.4 Simple Random Sampling

16.5 How to draw a simple random sample?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 293 / 463
Lecture 16: Random Sampling Methods Population & Sample

Population and Sample (from the rst lecture)

Target Population (usually, we refer to it as a population in


statistics)

▸ is the aggregate of all possible values of a variable or all possible


objects whose characteristics are of interest in any particular research
Sample
▸ is a part of the target population

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 294 / 463
Lecture 16: Random Sampling Methods Census & Survey

Random Sample
▸ is a representative part of the target population which has been drawn
randomly
Census
▸ method to collect data on the entire population
▸ As per as 2022 census, Bangladesh have a population of 165,158,616
people, of which 81,712,824 are male, while 83,347,206 are female. As
many as 113,063,587 of them live in rural areas and 52,009,072 live in
Urban.
▸ Bangladesh Population Census: 1974, 1981, 1991, 2001, 2011, 2022
▸ advantages of a Census  accurate and complete information
▸ according to the UN, the census is featured by the following aspects:
(i) Individual enumeration, (ii) Universality within a dened territory,
(iii) Simultaneity and (iv) Dened periodicity.
Sample survey
▸ method for collecting data about the random sample

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 295 / 463
Lecture 16: Random Sampling Methods Census & Survey

Disadvantages of a Census

▸ huge cost and time consuming


▸ data may become out-of-date once it is collected
▸ sometimes, not possible to carry out census as items unidentiable
Ex: sh in the sea
Census vs. Survey
▸ a census gathers information from
▸ in a survey, however, only part of
every entity in a population
the total population is selected
▸ data is accurately representative of
▸ since surveys do not represent the
the whole population and detailed
entire population, they are not
data can be made available right
quite as accurate or reliable
down to small areas
▸ the advantages of a census include
▸ there is comparatively less data to
process
accuracy and detail
▸ they are also expensive and time
▸ there is comparatively less
expensive and less time consuming
consuming

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 296 / 463
Lecture 16: Random Sampling Methods Census & Survey

Sampling or Sampling method


▸ the act, process, or technique of selecting a representative part of a
population for the purpose of determining parameters or characteristics
of the whole population
▸ a small part selected as a sample for inspection or analysis
Advantages of Sampling

▸ saving of time
▸ reduced cost
▸ detailed study
▸ accuracy of result
▸ administrative convenience
▸ impossibility of the use of census method, etc.
Disadvantages/Limitations of Sampling

▸ chances of bias
▸ diculties in selecting truly a representative sample
▸ in adequate knowledge in the subject
▸ changeability of sampling units
▸ impossibility of sampling, etc.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 297 / 463
Lecture 16: Random Sampling Methods Census & Survey

Sampling Error
▸ the error, which arises entirely due to sampling and no other reasons
can be attributed to cause such error, is called sampling error
▸ is the dierence between a sample statistic and its corresponding
population parameter
Non-sampling Error
▸ non-sampling error can be attributed to sources other than sampling
and they may be random or non-random
Non-response Error

▸ non-response error arises when some of the respondent included in the


sample does not respond
Response Error

▸ response error arises when respondent give inaccurate answer or their


answers are misreported
Other Sampling Errors (i) sampling frame error (ii) data analysis error
(iii) respondent selection error (iv) questioning error (v) recoding
error, etc.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 298 / 463
Lecture 16: Random Sampling Methods Census & Survey

Sampling Units
▸ Sample units are the members of the population from which
measurements are taken during sampling
▸ examples
∎ all rickshaw puller of J.U campus
∎ all under ve children of J.U
∎ all sex worker of a brothel
Sampling Frame
▸ a sampling frame is a representation of the elements of the target
population
▸ it consist of a least or a set of narrations for identify the target
population

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 299 / 463
Lecture 16: Random Sampling Methods Census & Survey

Advantages of sampling over complete count

▸ saving of time ▸ accuracy of result

▸ reduced cost ▸ administrative convenience

▸ detailed study ▸ impossibility of the use of census


method

Sampling with and without replacement

▸ if the specied procedure of selection of the units permits the selected


unit at each draw to go back into the population and allows it to get
selected again it is called as sampling with replacement
▸ the selected unit at each draw is not permitted go back to the
population and is not allowed to get selected more than once, it is
termed as sampling without replacement in more cases

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 300 / 463
Lecture 16: Random Sampling Methods Sampling Methods

Sampling Methods

Sampling techniques may be broadly classied as


1 non-probability sampling
(i). convenience Sampling
(ii). judgement Sampling
(iii). quota Sampling
(iv). snowball Sampling or networking Sampling
2 probability sampling
(i). simple random sampling
(ii). stratied random sampling
(iii). systematic random sampling
(iv). cluster sampling
∎ rst-stage cluster sampling
∎ second-stage cluster sampling
∎ multi-stage cluster sampling

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 301 / 463
Lecture 16: Random Sampling Methods Sampling Methods

Collection of Data

Sampling methods
1 simple random sampling
⇒ same/equal chance to select sampling unit

2 stratied random sampling


⇒ selection of a sampling unit from each stratum

3 systematic random sampling


⇒ rst sampling unit is selected by randomly then every k th member
of the population is selected

4 cluster sampling
⇒ selection of all (or some) sampling unit from selected cluster(s)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 302 / 463
Lecture 16: Random Sampling Methods Sampling Methods

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 303 / 463
Lecture 16: Random Sampling Methods Sampling Methods

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 304 / 463
Lecture 16: Random Sampling Methods Sampling Methods

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 305 / 463
Lecture 16: Random Sampling Methods Simple Random Sampling

What is meant by Simple Random Sample?

a simple random sample is a subset of a statistical population in which


each member of the subset has an equal probability of being chosen.

it is the basic sampling technique where we select a group of subjects


(a sample) for study from a larger group (a population)

each individual is chosen entirely by chance and each member of the


population has an equal chance of being included in the sample

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 306 / 463
Lecture 16: Random Sampling Methods Simple Random Sampling

Advantages of Simple Random Sampling

. it is very simple  the researcher need not exercise his brain in


deciding whether a particular unit can be representative or not
. simple random sample provides foundation for much of statistical
theory
. it is free from bias and therefore not aected by the choice of the
researcher
. it is generally more representative because each unit has equal chance
of being selected
. simple random sample provides a basis to which other methods can
be compared
. assessment of sampling error can be made and it is possible to
calculate the limit of error due to sampling

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 307 / 463
Lecture 16: Random Sampling Methods Simple Random Sampling

Disadvantages of Simple Random Sampling

SRS suers from the following disadvantages:-


. it is very dicult to have completely catalogued universe and thus
selection according to strictly random basis is frequently not possible
. cases studied may be too widely dispersed or even impossible to
contact and thus adherence in the whole sample may not be possible
. if units are of dierent sizes and the population consists of many
heterogeneous units simple random sampling method is unsuitable
. sampled individuals may be so widely dispersed that visiting each
selected individual may be extremely expensive and time consuming
. when the population measurements vary considerably in size, then
simple random sample produces larger variances than other methods
of sampling

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 308 / 463
Lecture 16: Random Sampling Methods Simple Random Sampling

Simple Random Sample vs. Random Sample


. the dierence between the two is that with a simple random sample,
each object in the population has an equal chance of being chosen
whereas with random sampling, each object does not necessarily have
an equal chance of being chosen

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 309 / 463
Lecture 16: Random Sampling Methods How to draw a simple random sample?

How to draw a sample by simple random sampling?

lottery method

table of random numbers

computer random number generator

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 310 / 463
Lecture 16: Random Sampling Methods How to draw a simple random sample?

Steps for obtaining a simple random sample

Step 1: dene a target population

Step 2: make a sampling frame and assign a sequential number 1, 2,


3,... for the sampling frame

Step 3: gure out what your sample size (n) is going to be

Step 4: use a random number generator to select the sample, using


your sampling frame (population size) from target population
(i). select at random, any page of the random number table and pick up
the numbers in any row, column or diagonal at random
(ii). the population units corresponding to the numbers in step (i)
constitute the random samples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 311 / 463
Lecture 16: Random Sampling Methods How to draw a simple random sample?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 312 / 463
Lecture 16: Random Sampling Methods How to draw a simple random sample?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 313 / 463
Lecture 16: Random Sampling Methods How to draw a simple random sample?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 314 / 463
Lecture 16: Random Sampling Methods How to draw a simple random sample?

Then, in the given random number table, starting with the rst
number and moving row wise (or column wise or diagonal wise) to
pick out the numbers in pairs, one by one, ignoring those numbers
which are greater than 50, until a selection of 10 numbers is made.

Selected row-wise sample numbers: 27, 15, 45, 11, 02, 14, 18, 07, 39,
31

Selected row-wise monthly pocket money (TK/-) of 10 students out


of 50: 7100, 2400, 3700, 7500, 1500, 2000, 6500, 3000, 1700, 7600.

bf Homework: Calculate mean and standard deviation of 10 students'


monthly pocket money (Use formula and scientic calculator)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 315 / 463
Lecture 16: Random Sampling Methods How to draw a simple random sample?

Homework

HW: Read Chapter 7 of the text book

Exercises: 3-8, pp.272-273

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 316 / 463
Lecture 16: Random Sampling Methods How to draw a simple random sample?

Chapter - 8: Interval estimation

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 317 / 463
Lecture 17: Interval estimation

Lecture 17: Interval estimation

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 318 / 463
Lecture 17: Interval estimation

17 Lecture 17: Interval estimation

17.1 Interval Estimation

17.2 Condence interval for µ based on large samples

17.3 Condence interval for µ based on small samples

17.4 Condence interval for variance and standard deviation

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 319 / 463
Lecture 17: Interval estimation Interval Estimation

Estimation

There are two types of estimation

Point estimation
It is the single best value. For example, mean and SD of total marks
for a course of IUB students are point estimates because these are
single value

Interval estimation
Interval estimation is the use of sample data to calculate an interval
of possible (or probable) values of an unknown population parameter,
in contrast to point estimation, which is a single number.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 320 / 463
Lecture 17: Interval estimation Interval Estimation

What is meant by the condence interval?

A condence interval is an interval estimate with a specic level of


condence. A level of condence is the probability that the interval
estimate will contain the parameter. The level of condence is 1 − α. 1 − α
area lies within the condence interval.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 321 / 463
Lecture 17: Interval estimation Condence interval for µ based on large samples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 322 / 463
Lecture 17: Interval estimation Condence interval for µ based on large samples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 323 / 463
Lecture 17: Interval estimation Condence interval for µ based on large samples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 324 / 463
Lecture 17: Interval estimation Condence interval for µ based on large samples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 325 / 463
Lecture 17: Interval estimation Condence interval for µ based on large samples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 326 / 463
Lecture 17: Interval estimation Condence interval for µ based on large samples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 327 / 463
Lecture 17: Interval estimation Condence interval for µ based on large samples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 328 / 463
Lecture 17: Interval estimation Condence interval for µ based on large samples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 329 / 463
Lecture 17: Interval estimation Condence interval for µ based on small samples

Condence interval for µ based on small samples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 330 / 463
Lecture 17: Interval estimation Condence interval for µ based on small samples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 331 / 463
Lecture 17: Interval estimation Condence interval for µ based on small samples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 332 / 463
Lecture 17: Interval estimation Condence interval for µ based on small samples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 333 / 463
Lecture 17: Interval estimation Condence interval for µ based on small samples

Condence interval for µ based on small sample size

When sample size is less than 30, i.e., n < 30, the sampling distribution of
the sample is a student's-t distribution. The student's-t distribution is
very similar to the standard normal distribution.

it is symmetric about its mean

as the sample size increases, the t -distribution approaches the normal


distribution

it is a bell shaped distribution.

(1 − α)100% Condence interval for µ


s s
x̄ − t α ,(n−1) √ ≤ µ ≤ x̄ + t α ,(n−1) √
2 n 2 n

this formula is used when n < 30 and population standard deviation σ is


unknown.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 334 / 463
Lecture 17: Interval estimation Condence interval for µ based on small samples

Problem 1

Suppose we have given sample heights of 20 IUB students, where


x̄ = 67.3” and standard deviation is 3.6”, and the distribution is symmetric.
Develop a 95% condence interval for the population mean and make a
summary based on your ndings.

Solution:
Let X is height of the students. It is given that X ∼ N(µ, 3.6”) and n = 20.
95% Condence interval for µ
s s
x̄ − t α ,(n−1) √ ≤ µ ≤ x̄ + t α ,(n−1) √
2 n 2 n
Here, x̄ = 67.3”, s = 3.6”, n = 25, α = 1 − 0.95 = 0.05, α/2 = 0.025 and
t α
n
,(n− 1) = t0.025,(19) = 2.093. Thus
3.6 3.6
67.3 − 2.093 × √ ≤ µ ≤ 67.3 + 2.093 × √ ⇒ 65.61 ≤ µ ≤ 68.98
20 20

Summary: Based on our ndings, we are 95% condent that population


mean is ranging 65.61 to 68.98.
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 335 / 463
Lecture 17: Interval estimation Condence interval for µ based on small samples

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 336 / 463
Lecture 17: Interval estimation Condence interval for µ based on small samples

Home Work

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 337 / 463
Lecture 17: Interval estimation Condence interval for variance and standard deviation

Condence Interval for σ 2

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 338 / 463
Lecture 17: Interval estimation Condence interval for variance and standard deviation

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 339 / 463
Lecture 17: Interval estimation Condence interval for variance and standard deviation

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 340 / 463
Lecture 17: Interval estimation Condence interval for variance and standard deviation

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 341 / 463
Lecture 17: Interval estimation Condence interval for variance and standard deviation

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 342 / 463
Lecture 17: Interval estimation Condence interval for variance and standard deviation

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 343 / 463
Lecture 17: Interval estimation Condence interval for variance and standard deviation

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 344 / 463
Lecture 17: Interval estimation Condence interval for variance and standard deviation

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 345 / 463
Lecture 17: Interval estimation Condence interval for variance and standard deviation

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 346 / 463
Lecture 18: Interval Estimations about Two Population Means

Lecture 18: Interval Estimations about Two Population


Means

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 347 / 463
Lecture 18: Interval Estimations about Two Population Means

18 Lecture 18: Interval Estimations about Two Population Means

18.1 Interval Estimation of µ1 − µ2


18.2 Determining the Sample Size

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 348 / 463
Lecture 18: Interval Estimations about Two Population Means Interval Estimation of µ1 − µ2

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 349 / 463
Lecture 18: Interval Estimations about Two Population Means Interval Estimation of µ1 − µ2

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 350 / 463
Lecture 18: Interval Estimations about Two Population Means Interval Estimation of µ1 − µ2

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 351 / 463
Lecture 18: Interval Estimations about Two Population Means Interval Estimation of µ1 − µ2

Exercises

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 352 / 463
Lecture 18: Interval Estimations about Two Population Means Interval Estimation of µ1 − µ2

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 353 / 463
Lecture 18: Interval Estimations about Two Population Means Interval Estimation of µ1 − µ2

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 354 / 463
Lecture 18: Interval Estimations about Two Population Means Determining the Sample Size

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 355 / 463
Lecture 18: Interval Estimations about Two Population Means Determining the Sample Size

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 356 / 463
Lecture 18: Interval Estimations about Two Population Means Determining the Sample Size

Chapter - 9: Hypothesis Tests

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 357 / 463
Lectures 19: Tests of hypothesis

Lectures 19: Tests of hypothesis

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 358 / 463
Lectures 19: Tests of hypothesis

19 Lectures 19: Tests of hypothesis

19.1 Problems & Motivation

19.2 The Null and Alternative Hypotheses

19.3 Key Components of Test of Hypothesis

19.4 The p -value


19.5 Basic Steps for Test of Hypothesis

19.6 Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 359 / 463
Lectures 19: Tests of hypothesis

Tests of hypothesis

In general, we do not know the true value of population parameters


(mean, proportion, variance, SD and others). They must be
estimated based on random samples. However, we do have
hypotheses about what the true values are.

The major purpose of hypothesis testing is to choose between two


competing hypotheses about the value of a population parameter

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 360 / 463
Lectures 19: Tests of hypothesis Problems & Motivation

Problems & Motivation

1 from long-term experience, a factory owner knows that a worker can


produce a product in an average time of 89 min. however, on Sunday
morning, there is the impression that it takes longer. how do you
justify whether this impression is correct or not?

2 is there any association between smoking habit and Myocardial


Infarction (MI)?

Myocardial Infarction

Smoking Heart attack No heart attack total

Smoker 325 175 500

Non-smoker 175 325 500

total 500 500 1000

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 361 / 463
Lectures 19: Tests of hypothesis Problems & Motivation

Problems & Motivation

3 eect of coee drinking on cancer


the scientist want to see the
disease and they may assume that coee drinking increases the risk of
cancer in humans

how do we justify these hypotheses?

a hypothesis is an assertion or assumption concerning about the


population parameter or characteristics of random variables

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 362 / 463
Lectures 19: Tests of hypothesis The Null and Alternative Hypotheses

(1). Statistical Hypothesis

Hypothesis

Null Hypothesis Alternatively Hypothesis


(denoted by H0 ) (denoted by H1 or Ha )

▸ assumed to be true ▸ any statement without H0


▸ a given value ▸ rejection of true value, a
given
▸ assumption, nothing new
▸ independent
▸ rejection of the assumption

▸ negation of the research aim


▸ does not contain equality
(usually contains >, <, ≠)
▸ usually contains an equality (e.g, =,
≥, ≤)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 363 / 463
Lectures 19: Tests of hypothesis The Null and Alternative Hypotheses

Exercise

1 The manager of an automobile dealership is considering a new bonus


plan designed to increase sales volume. Currently, the mean sales
volume is 14 automobiles per month. The manager wants to conduct
a research study to see whether the new bonus plan increases sales
volume. To collect data on the plan, a sample of sales personnel will
be allowed to sell under the new bonus plan for a 1-month period.
Dene the null and the alternative hypotheses.

2 The manager of an automobile dealership is considering a new bonus


plan designed to increase sales volume. Currently, the mean sales
volume is 14 automobiles per month. The manager wants to conduct
a research study to see whether the new bonus plan decreases sales
volume. To collect data on the plan, a sample of sales personnel will
be allowed to sell under the new bonus plan for a 1-month period.
Dene the null and the alternative hypotheses.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 364 / 463
Lectures 19: Tests of hypothesis The Null and Alternative Hypotheses

Exercise

3 The manager of an automobile dealership is considering a new bonus


plan designed to increase sales volume. Currently, the mean sales
volume is 14 automobiles per month. The manager wants to conduct
a research study to see whether the new bonus plan changes sales
volume. To collect data on the plan, a sample of sales personnel will
be allowed to sell under the new bonus plan for a 1-month period.
Dene the null and the alternative hypotheses.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 365 / 463
Lectures 19: Tests of hypothesis The Null and Alternative Hypotheses

Exercise

4 The manager of the Danvers-Hilton Resort Hotel stated that the


mean guest bill for a weekend is $600 or less. A member of the
hotel's accounting sta noticed that the total charges for guest bills
have been increasing in recent months. The accountant will use a
sample of weekend guest bills to test the manager's claim.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 366 / 463
Lectures 19: Tests of hypothesis Key Components of Test of Hypothesis

Key Components of Test of Hypothesis

(2). Level of signicance

types of error

reject H0 when it is true do not reject H0 when it false


(type I error) (type II error)

In Reality

H0 is TRUE H0 is FALSE
Correct Decision Type II error
Decision

Accept H0 1 − α = Condence level β = Pr(Type II Error)


Type I Error Correct Decision
Reject H0 α = Pr(Type I Error) 1 − β =Power of the test

▸ α is also called the level of signicance and x it rst


Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 367 / 463
Lectures 19: Tests of hypothesis Key Components of Test of Hypothesis

(3). Test statistic

▸ is a random variable that is calculated from sample data and used in a


hypothesis test
▸ in general, the test statistic for the null hypothesis H0 ∶ θ = θ0 is

θ̂ − θ0
test statistic =
̂
se(θ)

▸ Z -test statistic
▸ t -test statistic
▸ F -test statistic
▸ χ2 -test statistic
▸ ⋯
(4). Critical region: is the region of values that corresponds to the
rejection of the null hypothesis at some chosen probability level

▸ acceptance region
▸ rejection region

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 368 / 463
Lectures 19: Tests of hypothesis Key Components of Test of Hypothesis

Acceptance / Rejection regions (One-sided Ha )

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 369 / 463
Lectures 19: Tests of hypothesis Key Components of Test of Hypothesis

Acceptance / Rejection regions (Two-sided Ha )

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 370 / 463
Lectures 19: Tests of hypothesis Key Components of Test of Hypothesis

(5). Decision rule


▸ reject H0
∎ if the value of test statistic belongs to the rejection region at level of
signicance 100α%
∎ or p -value is less then α
▸ otherwise, do not reject H0 at level of signicance 100α%

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 371 / 463
Lectures 19: Tests of hypothesis The p -value

The p -value

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 372 / 463
Lectures 19: Tests of hypothesis Basic Steps for Test of Hypothesis

Basic Steps for Test of Hypothesis


. Step 1: State the null and alternative Hypothesis
. Step 2: Set level of signicance, usually, denoted by α
. Step 3: Collect Data and dene a test statistic and then compute it
. Step 4: Construct Acceptance / Rejection regions by collecting
tabulated value from
▸ standard normal table for Z -test
▸ standard t -table for t -test
▸ F -table for F -test
▸ χ2 -table for χ2 -test
for the level of signicance α or compute p−value and then dene
decision rule
. Step 5: Based on steps 3 and 4, draw a conclusion about H0
▸ reject H0
∎ if the value of test statistic belongs to the rejection region at level of
signicance 100α%
∎ or p -value is less then α
▸ otherwise, do not reject H0 at level of signicance 100α%
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 373 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Example: Normal distribution with known σ 2


Example: From long-term experience, a factory owner knows that a
worker can produce a product in an average time of 89 min. However,
on Sunday morning, there is the impression that it takes longer. How
do you justify whether this impression is correct or not?
to test whether this impression is correct, a sample (n = 12) is taken
with x̄ = 92.2
we assume that the production time is normal with σ 2 = 144
Step 1: we get that the hypothesis

H0 ∶ µ = 89
vs. Ha ∶ µ > 89
we generalize this situation and consider the following one-side test


⎪ H0 ∶ µ = µ0

⎪ µ > µ0
⎩ Ha ∶

Step 2: select the level of signicance level α = 5% , i.e., α = 0.05
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 374 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Step 3: test statistic


we have that n = 12, x̄ = 92.2 and σ 2 = 144 such that

92.2 − 89
zcal = √ = 0.9237
12/ 12

Step 4: critical value

x̄ − µ0
zcal = √ ∼ N ( 0, 1) under H0
σ/ n

we use the signicance level α to nd the critical value a


from the table of the standard normal we get that a = zα
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 375 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 376 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

in the example, we reject H0 if the test statistic is within the interval

[zα , +∞) = [1.645, +∞)

Step 5: decision
since,
zcal = 0.9237 < 1.645
we can not reject H0 at signicance level 5%

there is insucient evidence to show that it takes longer to produce


on Sunday morning

Alternatively
we nd the same conclusion from the p -value

p -value = Pr(Z > 0.9237) = 0.1778

since p -value is larger than 0.05, then we do not reject H0 at


signicance level 5%

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 377 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Exercise

Benzene is a possible cancer-agent. We want to look whether in a chemical


company the concentration of Benzene in the air is greater than 1 ppm.

0.21 1.44 2.54 2.97 0.00


3.91 2.24 2.41 4.50 0.15
0.30 0.36 4.50 5.03 0.00
2.89 4.71 0.85 2.60 1.26

As hypotheses, we get



⎪ H0 ∶ µ=1


⎩ Ha ∶
⎪ µ>1

Test this hypothesis by using 5% level of signicance.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 378 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Exam Question-1
Individuals ling federal income tax returns prior to March 31 had an
average refund of $1056. Consider the population of last minute lers who
mail their returns during the last 5 days of the income tax period typically
April 10 to April 15. A researcher suggests that one of the reasons
individuals wait until the last 5 days to le their returns is that on average
those individuals have a lower refund than early llers.

(a). Develop appropriate hypotheses such that rejection of null hypothesis


will support the researcher's argument.

(b). Using 5% level of signicance, what is the critical value for the test
statistic and what is the rejection rule?

(c). For a sample of 400 individuals who led a return between April 10
and April 15, the sample mean refund was $910 and the sample
standard deviation was $1600. Compute the value of the test statistic.

(d). What is your conclusion?

(e). What is the p -value for the test?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 379 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 380 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 381 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Exam Question-2

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 382 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 383 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 384 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Exam Question-3

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 385 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 386 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 387 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 388 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Exam Question-4

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 389 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 390 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 391 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 392 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 393 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 394 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 395 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 396 / 463
Lectures 19: Tests of hypothesis Testing on a Single Mean

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 397 / 463
Lectures 20: Tests of hypothesis (Conti...)

Lectures 20: Tests of hypothesis (Conti...)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 398 / 463
Lectures 20: Tests of hypothesis (Conti...)

20 Lectures 20: Tests of hypothesis (Conti...)

20.1 Testing on a Single Proportion

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 399 / 463
Lectures 20: Tests of hypothesis (Conti...)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 400 / 463
Lectures 20: Tests of hypothesis (Conti...)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 401 / 463
Lectures 20: Tests of hypothesis (Conti...)

The One-Sample Problem: Mean Test


rst we develop some hypothesis test for the mean µ of a population X
and we distinguish the following three cases
1 X has a normal distribution, i.e., X ∼ N (µ, σ 2 ) with

▸ 2
known variance σ
▸ unknown variance σ 2
2 X has a general distribution, but we have a large sample size (n ≥ 30)

Hypothesis H0 ∶ µ = µ0 vs. H1 ∶ µ ≠ µ0
Assumption σ 2 is known σ 2 is unknown
any n n < 30 n ≥ 30
Test statistic zcal = (x̄−µ0 )

σ/ n
tcal = (x̄−µ0 )
σ

̂/ n
zcal = (x̄−µ0 )
σ

̂/ n

if ∣zcal ∣ > z α2 if ∣tcal ∣ > t α2 if ∣zcal ∣ > z α2


Reject H0 p -value< α p -value< α p -value< α

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 402 / 463
Lectures 20: Tests of hypothesis (Conti...)

One Sided Mean Test

Hypothesis H0 ∶ µ = µ0 vs. H1 ∶ µ > µ0 or H1 ∶ µ < µ0


Assumption σ 2 is known σ 2 is unknown
any n n < 30 n ≥ 30
Test statistic zcal = (x̄−µ0 )

σ/ n
tcal = (x̄−µ0 )
σ

̂/ n
zcal = (x̄−µ0 )
σ

̂/ n

if ∣zcal ∣ > zα if ∣tcal ∣ > tα if ∣zcal ∣ > z α2


Reject H0 p -value< α p -value< α p -value< α

p -value = Pr(T ≥ tcal ∣H0 ) for right sided Ha ,


p -value = Pr(T ≤ tcal ∣H0 ) for left sided Ha ,
p -value = 2 min{Pr(T ≤ tcal ∣H0 ), Pr(T ≥ tcal ∣H0 )} for both sided Ha

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 403 / 463
Lectures 20: Tests of hypothesis (Conti...) Testing on a Single Proportion

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 404 / 463
Lectures 20: Tests of hypothesis (Conti...) Testing on a Single Proportion

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 405 / 463
Lectures 20: Tests of hypothesis (Conti...) Testing on a Single Proportion

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 406 / 463
Lectures 20: Tests of hypothesis (Conti...) Testing on a Single Proportion

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 407 / 463
Lectures 20: Tests of hypothesis (Conti...) Testing on a Single Proportion

Chapter - 10: Correlation Analysis

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 408 / 463
Lecture 21: Correlation Analysis

Lecture 21: Correlation Analysis

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 409 / 463
Lecture 21: Correlation Analysis

21 Lecture 21: Correlation Analysis

21.1 Scatter Diagram

21.2 Covariance

21.3 Correlation Coecient

21.4 Exercises

21.5 Test for Signicance Using Correlation

21.6 Homework

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 410 / 463
Lecture 21: Correlation Analysis Scatter Diagram

Scatter Diagram
a scatter diagram is a graphic tool used to display the relationship between
two variables

the dependent variable is scaled on the Y-axis and is the variable


being estimated

the independent variable is scaled on the X-axis and is the variable


used as the predictor

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 411 / 463
Lecture 21: Correlation Analysis Covariance

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 412 / 463
Lecture 21: Correlation Analysis Covariance

Sample Covariance
what is the nature of the relationship between x and y?

sample covariance:

∑(xi − x̄)(yi − xy
¯)
sxy =
n−1
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 413 / 463
Lecture 21: Correlation Analysis Covariance

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 414 / 463
Lecture 21: Correlation Analysis Correlation Coecient

What is Correlation Analysis?

Correlation analysis
▸▸ a group of techniques to measure the relationship between two variables
Pearson's Correlation coecient
is a measure of the strength of the linear relationship between two
variables

is denoted by r or rxy and dened as

n
∑ (xi − x̄)(yi − ȳ )
i=1
rxy =√ √
n n
∑ (xi − x̄)2 ∑ (yi − ȳ )2
i=1 i=1

n
1
where x̄ = n ∑ xi (the sample mean); and analogously for ȳ
i=1

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 415 / 463
Lecture 21: Correlation Analysis Correlation Coecient

Pearson's Correlation Coecient

this correlation coecient is called the Pearson's correlation coecient


whicgh is also dened as

n
∑ xi yi − nx̄ ȳ
i=1
rxy =¿ ¿
Á n Á n
À( ∑ x 2 − nx̄ 2 ) Á
Á À( ∑ y 2 − nȳ 2 )
i= 1 i
i= 1 i

sxy
=
sx sy

sx = 1 ∑ n
2
where
n−1 i=1(xi − x̄) (the sample standard deviation); and

analogously for sy

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 416 / 463
Lecture 21: Correlation Analysis Correlation Coecient

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 417 / 463
Lecture 21: Correlation Analysis Correlation Coecient

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 418 / 463
Lecture 21: Correlation Analysis Correlation Coecient

Correlation Coecient

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 419 / 463
Lecture 21: Correlation Analysis Correlation Coecient

Correlation Coecient

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 420 / 463
Lecture 21: Correlation Analysis Correlation Coecient

How do you interpret the value of correlation coecient?

the following drawing summarizes the strength and direction of the


correlation coecient

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 421 / 463
Lecture 21: Correlation Analysis Correlation Coecient

Write down the properties (or characteristics) of the


correlation coecient.

Characteristics of correlation coecient


1 the sample correlation coecient is identied by the lowercase letter r
2 it shows the direction and strength of the linear relationship between
two interval or ratio-scale variables

3 it ranges from −1 up to and including +1


4 a value near 0 indicates there is little linear relationship between the
variables

5 a value near 1 indicates a direct or positive linear relationship between


the variables

6 a value near −1 indicates an inverse or negative linear relationship


between the variables

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 422 / 463
Lecture 21: Correlation Analysis Exercises

Exercises
1 The following sample of observations were randomly selected.
x 4 5 3 6 10
y 4 6 5 7 7

(a). Draw a scatter diagram.


(b). Determine the correlation coecient and interpret the relationship between x
and y .
(c). Interpret these statistical measures.
Solution (b)
x y x2 y2 xy
4 4 16 16 16
5 6 25 36 30
3 5 9 25 15
6 7 36 49 42
10 7 100 49 70

∑ xi = 28 ∑ yi = 29 ∑ xi2 = 186 ∑ yi2 = 175 ∑ xi yi = 173


x̄ = 5.6 ȳ = 5.8
n
∑ xi yi − nx̄ ȳ
rxy =¿
1 i=
¿ =√
173 × −5 × 5.6 × 5.8
√ = 0.7522
Á n Á n 186 − 5 × (5.6)2 175 − 5 × (5.8)2
À( ∑ x 2 − nx̄ 2 ) Á
Á À( ∑ y 2 − nȳ 2 )
i=1 i
i= 1 i

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 423 / 463
Lecture 21: Correlation Analysis Exercises

Exercises

2 The owner of Maumee Ford-Volvo wants to study the relationship


between the age of a car and its selling price. Listed below is a
random sample of 12 used cars sold at the dealership during the last
year.

Car Age (years) Selling Price ($000) Car Age (years) Selling Price ($000)
1 9 8.1 7 8 7.6
2 7 6.0 8 11 8.0
3 11 3.6 9 10 8.0
4 12 4.0 10 12 6.0
5 8 5.0 11 6 8.6
6 7 10.0 12 6 8.0

(a). Draw a scatter diagram.


(b). Determine the correlation coecient.
(c). Interpret the correlation coecient. Does it surprise you that the
correlation coecient is negative?

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 424 / 463
Lecture 21: Correlation Analysis Test for Signicance Using Correlation

Test whether the linear relationship is signicant or not.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 425 / 463
Lecture 21: Correlation Analysis Test for Signicance Using Correlation

Another Example

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 426 / 463
Lecture 21: Correlation Analysis Test for Signicance Using Correlation

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 427 / 463
Lecture 21: Correlation Analysis Test for Signicance Using Correlation

Example

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 428 / 463
Lecture 21: Correlation Analysis Test for Signicance Using Correlation

Conclusion: Since the calculated value of t = 3.297 belongs to the


rejection region, we reject H0 . It means that there is a signicant linear
relationship between calls (Y ) and sales (X ).

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 429 / 463
Lecture 21: Correlation Analysis Homework

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 430 / 463
Lecture 21: Correlation Analysis Homework

Homework

HW: Read the Section 3.5 in Chapter 3 of the text book

Exercises: 45-51, Pages 122-123

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 430 / 463
Lecture 21: Correlation Analysis Homework

Chapter - 11: Regression Analysis

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 431 / 463
Lecture 22: Regression Analysis

Lecture 22: Regression Analysis

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 432 / 463
Lecture 22: Regression Analysis

22 Lecture 22: Regression Analysis

22.1 Concept of Regression Analysis

22.2 Linear Regression Model

22.3 Homework

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 433 / 463
Lecture 22: Regression Analysis Concept of Regression Analysis

Regression Analysis

an equation that expresses the relationship between two variables

in regression analysis, we nd the relationship among variables and we


also estimate or predict one variable based on another variable

regression analysisdescribes how a set of independent variables is


numerically related to the dependent variable
the variable being predicted or estimated is called the is the
dependent variable

the variable or variables being used to predict the value of the


dependent variable are called the independent variables

both the independent and the dependent variables must be interval or


ratio scale

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 434 / 463
Lecture 22: Regression Analysis Concept of Regression Analysis

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 435 / 463
Lecture 22: Regression Analysis Linear Regression Model

Simple Linear Regression Model


suppose the linear mean regression model is

yi = β0 + β1 xi + εi ; i = 1, 2, . . . , n

where β0 is the constant or intercept and β1 is the slope coecient

the β0 and β1 are also called regression coecients

usually, β0 and β1 are unknown


we can estimate β0 and β1 by following methods
▸ ordinary least square (OLS) method
▸ maximum likelihood estimation
▸ method-of-moments
▸ ...
the estimated mean regression model (or equation) is

ŷi = b0 + b1 xi

where b0 and b1 are respectively, the estimator of β0 and β1


Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 436 / 463
Lecture 22: Regression Analysis Linear Regression Model

where, the estimator of β0 and β1 are

b0 = ȳ − b1 x̄
n
∑ xi yi − nx̄ ȳ
b1 =
i= 1
∑ xi2 − nx̄ 2
n

i= 1
these estimators are also called ordinary least square (OLS) estimators
the expression of b1 can also be written as
n
∑ (xi − x̄)(yi − ȳ )
b1 =
i= 1
∑ (xi − x̄)2
n

i= 1
because
n n n n
2 2
∑(xi − x̄) = ∑ xi − nx̄
2 and ∑(xi − x̄)(yi − ȳ ) = ∑ xi yi − nx̄ ȳ
i= 1 i= 1 i= 1 i= 1
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 437 / 463
Lecture 22: Regression Analysis Linear Regression Model

Interpretation of Slope Coecient

b1 is the slope of the tted (estimated) line

it shows the amount of change in ŷ for a change of one unit in x


a positive value for b1 indicates a direct relationship between the two
variables and a negative value indicates an inverse relationship

the sign of b1 and the sign of r, the correlation coecient, are always
the same

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 438 / 463
Lecture 22: Regression Analysis Linear Regression Model

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 439 / 463
Lecture 22: Regression Analysis Linear Regression Model

Example
To illustrate the least squares method, suppose data were collected from a
sample of 10 Armand's Pizza Parlor restaurants located near college
campuses. For the i th observation or restaurant in the sample, xi is the
size of the student population (in thousands) and yi is the quarterly sales
(in thousands of dollars).

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 440 / 463
Lecture 22: Regression Analysis Linear Regression Model

(i). Show the relationship between the size of student population and the
quarterly sales by using scatter diagram. Make a comment on the
scatter diagram.

(ii). Write down the regression model and estimated regression equation.

(ii). Find the least square estimates and write the estimated regression
equation. Interpret the results.

(iv). Draw the regression line.

(v). Predict quarterly sales for a restaurant to be located near a campus


with 16,000 students.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 441 / 463
Lecture 22: Regression Analysis Linear Regression Model

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 442 / 463
Lecture 22: Regression Analysis Linear Regression Model

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 443 / 463
Lecture 22: Regression Analysis Linear Regression Model

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 444 / 463
Lecture 22: Regression Analysis Linear Regression Model

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 445 / 463
Lecture 22: Regression Analysis Linear Regression Model

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 446 / 463
Lecture 22: Regression Analysis Linear Regression Model

To predict quarterly sales for a restaurant to be located near a campus


with 16,000 students, we would compute

ŷ = 60 + 5 × 16 = 140

Hence, we would predict quarterly sales of $140,000 for this restaurant.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 447 / 463
Lecture 22: Regression Analysis Linear Regression Model

Classroom Practice
1 The following sample of observations were randomly selected.
x 4 5 3 6 10
y 4 6 5 7 7

(a). Determine the regression equation.


(b). Write the estimated regression equation (or line).
(c). Determine the estimated value of y when x is 7.
Solution of the Exercises
x y x2 y2 xy
4 4 16 16 16
5 6 25 36 30
3 5 9 25 15
6 7 36 49 42
10 7 100 49 70

∑ xi = 28 ∑ yi = 29 ∑ xi2 = 186 ∑ yi2 = 175 ∑ xi yi = 173


x̄ = 5.6 ȳ = 5.8

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 448 / 463
Lecture 22: Regression Analysis Linear Regression Model

Solution of the Exercises

(a). the least square estimator of β0 and β1 are

n
∑ xi yi − nx̄ ȳ
b1 =
i= 1 = 0.3630 and b0 = ȳ − b1 x̄ = 3.7671
∑ xi2 − nx̄ 2
n

i=1

(b). hence the estimated equation (or line) is

ŷ = b0 + b1 x = 3.7671 + 0.3630x

(c). the estimated value of y is

ŷ = 3.7671 + 0.3630 × 7 = 6.3082

when x is 7.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 449 / 463
Lecture 22: Regression Analysis Linear Regression Model

Classroom Practice

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 450 / 463
Lecture 22: Regression Analysis Linear Regression Model

Classroom Practice (Hints.)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 451 / 463
Lecture 22: Regression Analysis Homework

Homework

HW: Read the Chapter 14 of the text book

Exercises: 01-09, 13, 14, Pages 570-575

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 452 / 463
Lecture 23: Regression Analysis (Conti...)

Lecture 23: Regression Analysis (Conti...)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 453 / 463
Lecture 23: Regression Analysis (Conti...)

23 Lecture 23: Regression Analysis (Conti...)

23.1 Coecient of Determination (R )


2
23.2 Adjusted R2
23.3 Homework

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 454 / 463
Lecture 23: Regression Analysis (Conti...) Coecient of Determination (R 2 )

Coecient of Determination
the total sum of squares (proportional to the variance of the data):
SStot = ∑(yi − ȳ )2 ,
i
The sum of squares of residuals, also called the residual sum of
squares: SSres = ∑(yi − ŷi )2 = ∑ ε2i
i i
the coecient of determination is the proportion of the total variation
in the dependent variable Y that is explained, or accounted for, by
the variation in the independent variable X
2
it is denoted by R and dened by

R2 ≡ 1 −
SSres
SStot
Suppose R 2 = 0.49. This implies that 49% of the variability of the
dependent variable has been accounted for, and the remaining 51% of
the variability is still unaccounted for
Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 455 / 463
Lecture 23: Regression Analysis (Conti...) Coecient of Determination (R 2 )

Coecient of Determination (Conti...)

R2 is a statistic that will give some information about the goodness


of t of a model

in regression, the R2 coecient of determination is a statistical


measure of how well the regression predictions approximate the real
data points

an R2 of 1 indicates that the regression predictions perfectly t the


data

R2 increases as we increase the number of variables in the model (R


2
is monotone increasing with the number of variables included i.e., it
will never decrease)

an adjusted R2 is a modication of R2 that adjusts for the number of


explanatory terms in a model (p ) relative to the number of data
points (n)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 456 / 463
Lecture 23: Regression Analysis (Conti...) Adjusted R 2

Adjusted R 2

the adjusted R2 is dened as

n−1
R̄ 2 = 1 − (1 − R 2 )
n−p−1
where p is the total number of independent variables in the model
(not including the constant term), and n is the sample size

the explanation of this statistic is almost the same as R2 but it


penalizes the statistic as extra variables are included in the model

R2 and R̄ 2 are also used for measuring goodness-of-t of the tted


model.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 457 / 463
Lecture 23: Regression Analysis (Conti...) Adjusted R 2

Exercises (Conti...)
1 The following sample of observations were randomly selected.
x 4 5 3 6 10
y 4 6 5 7 7

(a). Fit a model y on x


(b). Find the coecient of determination. Interpret your ndings.
(c). Find the adjusted R 2 . Interpret your ndings.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 458 / 463
Lecture 23: Regression Analysis (Conti...) Adjusted R 2

Classroom Practice
1 The following sample of observations were randomly selected.

No. of TV Commercials (x ) 2 5 1 3 4 1 5 3 4 2
Total Sales(y ) 50 57 41 54 54 38 63 48 59 46
(a). Find the linear relationship between number of TV Commercials and total
sales.
(b). Fit a model y on x .
(c). Find the coecient of determination. Interpret your ndings.
(d). Find the adjusted R 2 . Interpret your ndings.
(e). Predict (or forecast) total sales when x = 5.

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 459 / 463
Lecture 23: Regression Analysis (Conti...) Homework

Homework

HW: Read the Chapter 14 of the text book

Exercises: 47-51, pp.122-124

Exercises: 4-14, 18-21, pp.570-582

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 460 / 463
Lecture 24: Review Class

Lecture 24: Review Class

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 461 / 463
Lecture 24: Review Class

24 Lecture 24: Review Class

24.1 Syllabus for the Final Exam and Marks Distribution

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 462 / 463
Lecture 24: Review Class Syllabus for the Final Exam and Marks Distribution

Syllabus for the Final Exam

Final Exam (35%)


▸ Date: 17 December, 2022 (Saturday)
▸ Time: 14:00 pm- 16:00 pm
1 Normal Distribution (Lectures 13-14)
2 Interval Estimation (Lectures 17-18)
3 Test of Hypothesis (Lectures 19-20)
4 Correlation Analysis (Lecture 21)
5 Regression Analysis (Lectures 22-23)
Two Class Tests (15%+15%=30%)

One mid-term test (25%)

Class attendance and participation (10%)

Dr. Md. Rezaul Karim (Assoticate Professor, Department of Statistics, JU) Statistics Spring - 2023 463 / 463

You might also like