[go: up one dir, main page]

0% found this document useful (0 votes)
29 views39 pages

TÓM TẮT XSTK

The document outlines the content of a course on statistical probability theory. It covers topics such as combinations, basic probability concepts, discrete and continuous random variables, probability distributions, random vectors, sampling theory, and parameter estimation.

Uploaded by

dtvyly7024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views39 pages

TÓM TẮT XSTK

The document outlines the content of a course on statistical probability theory. It covers topics such as combinations, basic probability concepts, discrete and continuous random variables, probability distributions, random vectors, sampling theory, and parameter estimation.

Uploaded by

dtvyly7024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

MINISTRY OF EDUCATION AND TRAINING

HO CHI MINH CITY UNIVERSITY OF LAW


DEPARTMENT OF MANAGEMENT

SUBJECT: STATISTICAL PROBABILITY THEORY


SUMMARY OF COURSE CONTENT
GV: Ths. Trần Thị Bảo Trâm
Class: 143 - QTKD47A1
Implementation group:
NO. NAME STUDENT MISSION EVALUATE
CODE
*Requirement 1:
Prepare + record a
1 Hồ Lan Anh 2253401010001 video of the 100%
statistics section
*Requirement 2:
Prepare chapters
5+6
*Requirement 1:
Prepare + record a
2 Lê Nguyễn Vân Anh 2253401010003 video of the 100%
statistics section
*Requirement 2:
Prepare chapters
1+2
*Requirement 1:
Prepare + record a
3 Nguyễn Thị Thu Hiền 2253401010035 video of the 100%
probability section
*Requirement 2:
Prepare chapters
3+4
*Requirement 1:
Prepare + record a
4 Đặng Thị Thúy Ngân 2253401010068 video of the 100%
probability section
*Requirement 2:
Prepare chapters
7+8
INDEX
CHAPTER 1. SUPPLEMENTARY ANALYSIS OF COMBINATION ............................................. 6
1.1. Mapping ....................................................................................................................................... 6
1.3. Multiplication rule ...................................................................................................................... 6
1.4. Permutation ................................................................................................................................. 6
1.5. Partial Permutation .................................................................................................................... 7
1.6. Combination ................................................................................................................................ 7
CHAPTER 2: BASIC CONCEPTS AND PROBABILITY FORMULAS ......................................... 8
2.1. RANDOMNESS TEST AND EVENT ............................................................................................ 8
2.1.1. Random phenomenon:.............................................................................................................. 8
2.1.2. Randomness test and event: ..................................................................................................... 8
2.1.3. Relationships between events:.................................................................................................. 8
2.1.4. Exhaustive events .................................................................................................................... 10
2.2. PROBABILITY OF EVENT......................................................................................................... 10
2.2.1. Classic form ............................................................................................................................. 10
2.2.2. Statistical format ..................................................................................................................... 10
2.2.3. Properties of probability ......................................................................................................... 10
2.3. PROBABILITY FORMULA ........................................................................................................ 11
2.3.1. Probability addition formula ................................................................................................. 11
2.3.2. Conditional probability .......................................................................................................... 12
2.3.3. Probability multiplication formula ........................................................................................ 12
2.3.4. Full and Bayes probability formulations. ............................................................................. 13
CHAPTER 3: RANDOM QUANTITIES AND THE LAW OF PROBABILITY DISTRIBUTION . 14
I. CONCEPT OF RANDOM VARIABLES ........................................................................................ 14
II. DISCRETE RANDOM VARIABLE .............................................................................................. 14
2.1. SOME CHARACTERISTICS ................................................................................................. 14
2.2 RANDOM QUANTITY.................................................................................................................. 14
2.2.1. The expectation (mean) of the random variable X is:.......................................................... 14
2.2.1. The variance of the random variable X is: ........................................................................... 15
2.2.3.The standard deviation of the random variable X is: ........................................................... 15
III. SOME COMMON PROBABILITY DISTRIBUTION LAWS ............................................. 15
3.1.Hypergeometric distribution: X  H ( N, NA, n ) ...................................................................... 15
3.2. Binomial distribution: X  B ( n,p ) .......................................................................................... 16
3.3.Poisson distribution: X  P(  ) ...................................................................................................... 16
IV. CONTINUOUS RANDOM VARIABLES .............................................................................. 17
4.1. SOME CHARACTERISTICS ................................................................................................. 17
4.2. RANDOM QUANTITY............................................................................................................ 17
4.2.1. The expectation (mean) of the random variable X is: ................................................... 17
4.2.2. The variance of the random variable X is: ..................................................................... 17
4.2.3. The standard deviation of the random variable X is: .................................................... 17
V. NORMAL DISTRIBUTION ........................................................................................................ 18
5.1. Standard normal distribution: T  N (0;1).......................................................................... 18
5.2. Normal distribution: X  N (𝜇, 𝜎2) ...................................................................................... 18
CHAPTER 3.2. RANDOM VECTOR ..................................................................................................... 19
I. PROBABILITY DISTRIBUTION OF DISCRETE RANDOM VECTOR ............................. 19
1.1. Component probability distribution (margin distribution) .............................................. 19
1.2. Conditional probability distribution ................................................................................... 21
II. PROBABILITY DISTRIBUTION OF CONTINUOUS RANDOM VECTOR ................... 22
2.1. Simultaneous density function of (X,Y) .............................................................................. 22
2.2. Component density function ................................................................................................ 23
2.3. Conditional density function ................................................................................................ 23
CHAPTER 5. SAMPLE THEORY ......................................................................................................... 24
1. Overall............................................................................................................................................ 24
2. Sample ............................................................................................................................................ 25
3.1.The probability distribution of the sample mean ..................................................................... 27
3.2. Probability distribution of the sample proportion F: ............................................................. 28
CHAPTER 6: PARAMETER ESTIMATION........................................................................................ 29
1. Estimate the range for the overall mean 𝝁.................................................................................. 29
2. Find reliability, do not consider case 4 ........................................................................................ 29
3. Find the sample size (consider only case 1 and case 2) .............................................................. 30
4. Estimate the range for the total ratio p ....................................................................................... 30
CHAPTER 7. TEST STATISTICAL HYPOTHESES. .......................................................................... 31
I. Concept of statistical hypothesis testing. ......................................................................................... 31
1. General concepts. .......................................................................................................................... 31
2. Parameter testing. ......................................................................................................................... 32
3. Types of errors in testing .............................................................................................................. 32
4. Significance level and rejection. ................................................................................................... 32
5. Steps of inspection. ........................................................................................................................ 32
6. Inspection methods ....................................................................................................................... 33
7. Test through P-value ..................................................................................................................... 33
II. Characteristic comparison test with a number ............................................................................. 33
1. Test comparing the average with a number................................................................................ 33
2. Test to compare a ratio with a number. ....................................................................................... 34
3. Test to compare variance with a number .................................................................................... 35
III. Test comparing two characteristics............................................................................................... 35
1. Compare two averages of two populations X and Y. ................................................................. 35
2. Compare the ratio of two populations X and Y. ......................................................................... 36
3. Compare two variances of two populations X and Y. ................................................................ 36
4. Compare two averages in vector form (X, Y) ............................................................................. 36
CHAPTER 8. CORRELATION AND REGRESSION PROBLEMS .................................................. 37
I. Pattern correlation coefficient. ......................................................................................................... 37
1. Define. ................................................................................................................................................ 37
2. Quality. ........................................................................................................................................... 37
II. Experimental mean linear regression line ..................................................................................... 37
1. Least squares method ................................................................................................................... 38
CHAPTER 1. SUPPLEMENTARY ANALYSIS OF COMBINATION
1.1. Mapping
1.2. Addition rule
Identifying signs: number of ways, cases, options that when removing one option,
method, case, a certain task can still be completed.

m1 + m2 +…mn
EX: There are 5 types of flowers in the carton: 2 red flowers, 2 yellow flowers, 1 blue
flower, 1 white flower and 1 rose. Randomly take 1 type of flower from the box to
arrange, how many ways are there to choose?
- Solution:
How to choose red flowers: 2
How to choose yellow flowers: 2
How to choose blue flower: 1
How to choose white flower: 1
How to choose pink flower: 1
 2+2+1+1+1 = 7 how to choose flowers for arrangement
1.3. Multiplication rule
Identifying signs: number of stages, all stages must be completed to get results

m1m2…mn
EX: A store sells shirts with 2 types of sizes including size 39, with 5 different colors and
size 40 with 4 different colors. How to choose 2 shirts of 2 sizes?
Solution:
Shirt size 39 has 5 options
Shirt size 40 has 4 options
 5.4 = 20 how to choose 2 shirts of 2 sizes
1.4. Permutation
Identification signs: Get all, sort all elements
Note: The permutations are all the same in composition, only different by the
arrangement order of the elements in the group.
Pm = m! = 1.2.3…m
EX: Arrange 6 teddy bears on a cabinet with 6 drawers. Ask how many ways there are to
arrange them?
Solution: P6 = 6! = 1.2.3.4.5.6
= 720 how to arrange bears on the cabinet

1.5. Partial Permutation


Identification signs: Is an ordered group of k different elements chosen from n elements
𝑛!
𝐴𝑘𝑛 =
(𝑛−𝑘)!

EX: Class 12A1 has 5 people running for the executive committee position. Please elect
3 out of 5 friends to be the class executive committee in the order of class president, class
vice president and secretary.
Solution:
5!
𝐴35 = = 60 how to choose
(5−3)!

1.6. Combination
Identification signs: Is an order-irrespective group consisting of k different elements
chosen from n elements.
𝑛!
𝐶𝑛𝑘 =
𝑘!(𝑛−𝑘)!

EX: A basketball match organized by the school requires each class to have 7 male
members participating. We know that class 12C5 has 25 boys. How many ways are there
to choose 7 boys in class 12C5 to compete in basketball?
Solution:
7 25!
𝐶25 = = 480700 how to choose
7!(25−7)!
CHAPTER 2: BASIC CONCEPTS AND PROBABILITY FORMULAS
2.1. RANDOMNESS TEST AND EVENT
2.1.1. Random phenomenon:
2.1.2. Randomness test and event:
Sign sample space: Ω.
EX: Roll 1 dice of the same suit Ω ={1;2;3;4;5;6}
Solution:
Let A be the event of “an even number appearing” A={2;4;6}
Let B be the event of “an odd number appearing” B={1;3;5}
EX: In a flower basket there are 2 types of flowers available for sale. Know that these
two types of flowers have type 1 products, type 2 products and waste products. What is
the simple event?
Solution: Let A be the simple event, A={2 type 1 flowers; 2 type 2 flowers; 2 waste
products }
EX: Need to get 5 cans of water in a box containing 5 cans of beer and 3 cans of soft
drinks. Then which event is the sure event and which event is the empty event?
Solution:
Let A be the sure event. A ={ get at least 2 cans of beer }
Let B be the empty event. B={get 5 cans of soft drinks}
2.1.3. Relationships between events:
a.) Equivalent relationship
- Pull along when A⊂B
- Equivalent to when A⊂B and B⊂A. Sign: A=B
EX: Toss a dice, let A be the event "the die comes up 5" and B is the event "the dice
comes up odd", then A⊂B
EX: Check 2 t-shirts. Let A be the event "there is at least 1 defective shirt" and B the
event "there is 1 defective shirt or 2 defective shirts", then A=B
b.) Sum and product of 2 events
Ω
Sign: S ∪ T or S+T
S
T
EX: Consider the test of incubating 2 chicken eggs
Let Ni: “The ith fruit blooms” (i=1;2)
Ki: “The ith fruit does not bloom” (i=1;2)
A: “There is 1 fruit blooming”
Then, the sample space of the test is:
Ω={K1K2;N1K2;K1N2;N1N2}
The following events are simple events:
𝜔1 = K1K2 ; 𝜔2 =N1K2 ; 𝜔3 =K1N2 ; 𝜔4 =N1N2
Event A is not elementary because A=N1K2 ∪ K1N2
c.) Opposing events:
𝐴̅ = Ω \ A
EX: From a batch containing 10 genuine products and 3 waste products, 11 products are
randomly selected
Let Ai be the event: “choose ith genuine product”, (i=8,9,10,11)
Ω = A8+A9+A10+A11 và 𝐴̅10 = Ω\A10 =A8+A9+A11
d.) Two mutually exclusive events:
A and B do not occur in the same trial
EX: Check 2 boxes of mackerel. Let A be the event "there is a barrel of mackerel that is a
waste product". B is the event "no barrel of fish is a waste product"
=> A and B are two mutually exclusive events
Chart VEN
A∪B A∩B A and B conflict A,𝐴̅ Opposition

A B 𝐴̅
A
2.1.4. Exhaustive events
Identification signs: Conflict with each other, and the total sum is equal Ω
EX: There are 4 coat racks. Choose one shirt from each shelf. Let Ai be the event “the
coat is taken from the ith shelf”, i = ̅̅̅̅
1,4
When {A1;A2;A3;A4} is the exhaustive events
2.2. PROBABILITY OF EVENT
Sign: p(A)
2.2.1. Classic form
𝑛(𝐴) 𝑘
p(A)= =
𝑛(Ω) 𝑛

EX: An art team needs to recruit 2 members. There are 4 girls and 2 boys applying (the
probability of being accepted is the same for all 6 people). Calculate the probability to:
1. Both admitted candidates are female
2. There is at least 1 female student admitted
Solution:
1.) Let A be the event that the two admitted candidates are both female
𝑛(𝐴) 𝐶42 2
p(A) = = =
𝑛(Ω) 𝐶62 5
2.) Let B be the event that at least one female student is admitted
𝑛(𝐵) 𝐶41 𝐶21 +𝐶42 14
p(B) = = =
𝑛(Ω) 𝐶62 15

2.2.2. Statistical format


𝑘
p(A) ≈
𝑛

EX: Pearson flipped a balanced coin 12,000 times and found tails appearing 6,019 times
(frequency 0.0516); Toss 24,000 times and see tails appear 12,012 times (frequency is
0.5005).
2.2.3. Properties of probability
1.) If A is an arbitrary event 0 ≤ p(A) ≤ 1
2.) p(∅) = 0; p(Ω) = 1
3.) If A ⊂ B then p(A) ≤ p(B)
2.3. PROBABILITY FORMULA
2.3.1. Probability addition formula
• If A and B are two arbitrary events

p(A+B) = p(A) + p(B) – p(A.B)

• If A and B are two mutually exclusive events

p(A+B) = p(A) + p(B)

• If {Ai} (i=1,…,n) is mutually incompatible then

p(A1 + A2 +…+ An) = p(A1) + p(A2) +…+ p(An)

EX: A group has 30 investors of all types, including 13 securities investors, 17 equipment
investors and 10 investors in both securities and equipment. Find the probability that that
person will meet the equipment investor.
Solution 1
Let A is “a partner who meets with a securities or equipment investor”
B is “a partner who meets stock investors"
C is “a partner who meets equipment investors"
13 17 10 2
P(A) = P(B) + P(C) – P(B∩C) = + − =
30 30 30 3

Solution 2 P(A) = 1 – P(A); P(A) = P(A.B) + P(A.𝐵̅)

20 2
3+10+7=20 => =
30 3
EX: A gift basket has 10 candies, of which 3 are red. Randomly take 3 candies from a gift
basket. Calculate the probability of getting at least 1 red candy.
Solution 1: Let A be the event "take at least 1 red candy"
Ai is the event “getting ith red candy”, (i=0,1,2,3)
{A1; A2; A3} pairwise conflict
𝐶31 .𝐶72 𝐶32 .𝐶71 𝐶33 .𝐶70 17
=>P(A) = P(A1) + P(A2) +P(A3) = 3 + 3 + 3 =
𝐶10 𝐶10 𝐶10 24

𝐶30 .𝐶73 17
Solution 2: P(A) = 1 – P(A0) = 1 - 3 =
𝐶10 24

Note: ̅̅̅̅̅̅̅
𝐴 ∩ 𝐵 = 𝐴̅ ∪ 𝐵̅ ; ̅̅̅̅̅̅̅
𝐴 ∪ 𝐵 = 𝐴̅ ∩ 𝐵̅
2.3.2. Conditional probability
𝑃(𝐴∩𝐵)
P(A|B) =
𝑃(𝐵)

EX: A group of 10 employees includes 3 men and 7 women, including 2 30-year-old men
and 3 30-year-old women. Randomly select 1 employee from that group. Let A be "the
selected employee who is female", B is "the selected employee who is 30 years old".
Calculate P(A|B), P(B|A)?
Solution:
𝐶71 7 𝑃(𝐴∩𝐵) 0.3
P(A) = 1 = = 0.7 P(A|B) = = = 0.6
𝐶10 10 𝑃(𝐵) 0.5

𝐶51 5 𝑃(𝐵∩𝐴) 0.3 3


P(B) = 1 = = 0.5 => P(B|A) = = =
𝐶10 10 𝑃(𝐴) 0.7 7

𝐶31 3
P(A∩ 𝐵) = 1 = = 0.3
𝐶10 10

Nature:
1.) 0≤ p(A|B) ≤ 1, ∀𝐴 ⊂ Ω
2.) If A⊂ C then p(A|B) ≤ p(C|B)
3.) P(A|B) = 1-p(𝐴̅|𝐵)
2.3.3. Probability multiplication formula
• If A and B are two tensely independent events
p(A∩ 𝐵) = p(B).p(A|B) = p(A).p(B|A)
• If A and B are two independent events
p(A∩ 𝐵) = p(A).p(B)
• If n events Ai, i = 1,...n are not independent then
p(A1A2…An) = p(A1).p(A2|A1)…p(An|A1…An-1)
EX: On Christmas, Mr. A sold 1 large pine tree and 1 small pine tree. The probability of
selling a large pine tree is 0.9. If a large pine tree is sold, the probability of selling a small
pine tree is 0.7. If a large pine tree cannot be sold, the probability of selling a small pine
tree is 0.2. Knowing that Mr. A sold at least 1 pine tree. What is the probability that Mr. A
can sell both trees?
Solution:
Let P(A) is the probability of “selling both trees” and P(C) is the probability of “selling at
least 1 tree”
𝑃(𝐴∩𝐶) 0,9.0,7
P(A|C) = = = 0,6848
𝑃(𝐶) 1−0,1.0,8

2.3.4. Full and Bayes probability formulations.


a.) Full probability formula
p(B) = ∑𝑛𝑖=1 𝑝(Ai).p(B|Ai) = p(A1).p(B|A1) +…+ p(An).p(B|An)

EX: Aquarium I has 3 goldfish and 4 brown fish, aquarium II has 5 goldfish and 3 brown
fish. Observed a fish jumping from lake I to lake II. Calculate the probability that the fish
jumping into lake II is a goldfish?
Solution:
3 6
𝑝(𝑓𝑖𝑠ℎ 1 𝑔𝑜𝑙𝑑 𝑎𝑛𝑑 𝑓𝑖𝑠ℎ 2 𝑔𝑜𝑙𝑑 ) = . 3 6 4 5 38
7 9
Chart :{ 4 5 => P = . + . =
𝑝(𝑓𝑖𝑠ℎ 1 𝑏𝑟𝑜𝑤𝑛 𝑣à 𝑐á 2 𝑔𝑜𝑙𝑑 ) = . 7 9 7 9 63
7 9

b.) Bayes formula


𝑝(Ai).p(B|Ai) 𝑝(Ai).p(B|Ai)
p(Ai|B) = =
∑𝑛
𝑖=1 𝑝(Ai).p(B|Ai) 𝑝(𝐵)

EX: The ratio of trucks, cars and motorbikes passing through road X with an oil pump
station is 5:2:13. The probability for trucks, cars and motorbikes to pass through this road
and enter the oil pump is 0.1; 0.2; 0.15. Knowing that there is a car passing through road
X to the oil pump, calculate the probability that it is a car?
Solution:
5
𝑝(𝑡𝑟𝑢𝑐𝑘 𝑝𝑎𝑠𝑠𝑒𝑠 𝑋 𝑡𝑜 𝑡ℎ𝑒 𝑜𝑖𝑙 𝑝𝑢𝑚𝑝) = . 0,1 = 0,025
20
2
𝑝(𝑐𝑎𝑟 𝑝𝑎𝑠𝑠𝑒𝑠 𝑋 𝑡𝑜 𝑡ℎ𝑒 𝑜𝑖𝑙 𝑝𝑢𝑚𝑝) = . 0,2 = 0,02
20
13
𝑝 (𝑚𝑜𝑡𝑜𝑟𝑏𝑖𝑘𝑒 𝑝𝑎𝑠𝑠𝑒𝑠 𝑋 𝑡𝑜 𝑡ℎ𝑒 𝑜𝑖𝑙 𝑝𝑢𝑚𝑝) = . 0,15 = 0,0975
{ 20
0,02 8
=>p = =
0,025+0,02+0,0975 57

CHAPTER 3: RANDOM QUANTITIES AND THE LAW OF PROBABILITY


DISTRIBUTION
I. CONCEPT OF RANDOM VARIABLES
The random variable (BNN) X of the experiment with sample space  is the
mapping:
X:  → R
 ↦ X () = 𝑥
The value 𝑥 is called the value of the random variable X.
Random variables include 2 types:
- Discrete random variable
- Continuous random variable

II. DISCRETE RANDOM VARIABLE


2.1. SOME CHARACTERISTICS
Let X be a discrete random variable with a probability distribution table:
X 𝑥1 𝑥2 ..... 𝑥n ......
p p1 p2 ..... pn ......
in there , 𝑃 𝑋 = 𝑥𝑖 = 𝑝𝑖 , 𝑖 = 1,2, …
( )
2.2 RANDOM QUANTITY
2.2.1. The expectation (mean) of the random variable X is:

𝐸𝑋 = ∑ 𝑥𝑖 𝑝𝑖
𝑖
The expectation of the random variable X2 is: 𝐸(𝑋)2 = ∑ 𝑥𝑖2 𝑝𝑖
𝑖

2.2.1. The variance of the random variable X is:


𝑉𝑎𝑟𝑋 = 𝐸(𝑋)2 − (𝐸𝑋)2

2.2.3.The standard deviation of the random variable X is:

𝜎 = √𝑉𝑎𝑟𝑋

EX: Given a random variable X with a probability distribution table:


X 1 2 3
P 0,4 0,3 0,3
Calculate the expectation, variance, and standard deviation of X.
We have: 𝐸𝑋 = 1.0,4 + 2.0,3 + 3.0,3 = 1,9
𝑉𝑎𝑟𝑋 = (12 . 0,4 + 22 . 0,3 + 32 . 0,3) − (1.0,4 + 2.0,3 + 3.0,3)2 = 0,69
𝜎𝑋 = √0,69 ≈ 0,83
III. SOME COMMON PROBABILITY DISTRIBUTION LAWS
3.1.Hypergeometric distribution: X  H ( N, NA, n )
𝐶𝑁𝑘𝐴 𝐶𝑁−𝑁𝐴
𝑝𝑘 = 𝑝(𝑋 = 𝑘 ) =
𝐶𝑁𝑛

𝑁−𝑛
𝐸𝑋 = 𝑛𝑝; 𝑉𝑎𝑟𝑋 = 𝑛𝑝𝑞
𝑁−1
𝑁𝐴
In there: 𝑝 = , 𝑞 = 1 − 𝑝
𝑁

EX: A shipment has N = 40 light bulbs, including 10 broken light bulbs, randomly
take 5 light bulbs to check. Let X be the random variable indicating the number of
broken light bulbs among the 5 light bulbs taken out.
a) Make a table of the probability distribution of X.
b) Calculate the average number of failed bulbs among the bulbs removed and find
the variance of X.
Solution:
a) We have: 𝑋 = {0; 1; 2; 3; 4; 5} và 𝑁 = 40, 𝑁𝐴 = 10, 𝑛 = 5Þ 𝑋 ∈ 𝐻(40,10,5)
So we have the probability distribution table of X:
X 0 1 2 3 4 5
1 4 2 3 3 2 4 1
P 0 5
𝐶10 𝐶30 𝐶10 𝐶30 𝐶10 𝐶30 𝐶10 𝐶30 𝐶10 𝐶30 5 0
𝐶10 𝐶30
5 5 5 5 5 5
𝐶40 𝐶40 𝐶40 𝐶40 𝐶40 𝐶40

10 5 𝑁−𝑛 1 1 40−5 175


b) 𝐸𝑋 = 𝑛𝑝 = 5. = ; 𝑉𝑎𝑟𝑋 = 𝑛𝑝𝑞 = 5. . (1 − ) . =
40 4 𝑁−1 4 4 40−1 208

3.2. Binomial distribution: X  B ( n,p )


3.3.Poisson distribution: X  P(  )

𝑒 − 𝑘
𝑝𝑘 = 𝑃(𝑋 = 𝑘 ) = (𝑘 = 0,1, … , 𝑛 … )
𝑘!

EX: At gas station H, on average every 10 minutes, there are 15 motorbikes coming to
fill up gas. Knowing that the number of motorbikes coming to refuel at this gas station in
a period of t minutes is a random variable with a Poisson distribution.
a) Find the probability that in a period of 7 minutes, at least 4 motorbikes will come
to fill up gas at this gas station.
b) Find the probability that in a period of 15 minutes, from 20 to 25 motorbikes will
come to refuel at gas station H.
Solution:
Let X and Y be the number of motorbikes that come to refuel at gas station H in 7
minutes and in 15 minutes, respectively.
7.15
Under the assumption: 𝑋~𝑃(); 𝑌~𝑃(1 );  = = 10,5; 1 = 22,5
10
a) The probability that in a 7-minute period at least 4 motorbikes will come to fill up
with gas is:
𝑃(𝑋 ≤ 4) = 1 − 𝑃(𝑋 < 4) = 1 − [𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3)
0 −
1 −
2 −
3
=1−( 𝑒 + 𝑒 + 𝑒 + 𝑒 − ) ≈ 0,99
0! 1! 2! 3!
b) The probability that in 15 minutes there will be from 20 to 25 motorbikes coming
to fill up with gas is:
25 25 25
1𝑘 (22,5)𝑘 −22,5
𝑃(20 ≤ 𝑌 ≤ 25) = ∑ 𝑃(𝑌 = 𝑘 ) = ∑ 𝑒 −1 = ∑ 𝑒 ≈ 0,473
𝑘! 𝑘!
𝑘=20 𝑘=20 𝑘=20

IV. CONTINUOUS RANDOM VARIABLES


4.1. SOME CHARACTERISTICS
The function f : R → R is called the density function of the continuous random variable
X if:
𝑏
p( 𝑎 ≤ 𝑋 ≤ 𝑏 ) = ∫𝑎 𝑓(𝑥)𝑑𝑥, a,b ∈ R

4.2. RANDOM QUANTITY


4.2.1. The expectation (mean) of the random variable X is:

+∞

𝐸𝑋 = ∫ 𝑥. 𝑓(𝑥)𝑑𝑥
−∞

+∞

The expectation of the random variable X2 is: 𝐸(𝑋)2 = ∫ 𝑥 2 . 𝑓(𝑥)𝑑𝑥


−∞

4.2.2. The variance of the random variable X is:


𝑉𝑎𝑟𝑋 = 𝐸(𝑋)2 − (𝐸𝑋)2

4.2.3. The standard deviation of the random variable X is:

𝜎 = √𝑉𝑎𝑟𝑋

EX: Find the expectation, variance, standard deviation of the random variable X with the
following density function:
2 2
(𝑥 + 2𝑥), 𝑥 ∈ [0; 1]
𝑓 (𝑥 ) = { 3
0, 𝑥 [0; 1]
We have:
1
2 11
𝐸𝑋 = ∫ (𝑥 2 + 2𝑥)𝑥𝑑𝑥 = ;
3 18
0

1 1 2
2 2 7 121 151
𝑉𝑎𝑟𝑋 = ∫ 𝑥 2 . (𝑥 2 + 2𝑥)𝑑𝑥 − [∫ 𝑥. (𝑥 2 + 2𝑥)𝑑𝑥 ] = − =
3 3 15 324 1620
0 0

151 √755
𝜎𝑋 = √ =
1620 90

V. NORMAL DISTRIBUTION
5.1. Standard normal distribution: T  N (0;1)
a. Define:
1 𝑡2
𝑓 (𝑡 ) = 𝑒 2 ,𝑡 ∈ 𝑅
√2𝜋

b. Characteristics: ModT = ET = 0; VarT = 1

c. Probability:
𝑏

𝑝(𝑎 ≤ 𝑇 ≤ 𝑏) = ∫ 𝑓(𝑡 )𝑑𝑡 =  (𝑏) −  (𝑎)


𝑎

5.2. Normal distribution: X  N (𝜇, 𝜎 2 )


a. Define:

1 (𝑥−𝜇)2

𝑓 (𝑥 ) = 𝑒 2𝜎2 , 𝑥 ∈𝑅
𝜎√2𝜋

b. Characteristics: ModX = EX =𝜇; VarX = 𝜎 2

c. Probability:
𝑏−𝜇 𝑎−𝜇
𝑝 (𝑎 ≤ 𝑋 ≤ 𝑏 ) =  ( )− ( )
𝜎 𝜎

EX: The lifespan of a type of equipment provided by company A has a


(approximately) normal distribution with a mean lifespan of 1500 hours and a
standard deviation of 150 hours. The product will be warranted by company A if its
lifespan is less than 1200 hours.
a) Calculate the percentage of equipment provided by company A that is warranted.
b) If company A wants the warranty rate to be 1%, how many hours must the
warranty period be?
Solution:
a) Let X be the lifespan of that type of equipment (hours).
We have: 𝑋~ 𝑁(1500; 1502 )
1200 − 1500 0 − 1500
𝑃(𝑋 < 1200) = 𝑃(0 ≤ 𝑋 < 1200 =  ( )− ( )
150 150
=  (−2) −  (−10) =  (10) −  (2) ≈ 0,5 − 0,47725 = 0,02275
So the rate of equipment under warranty is 2.275%.
b) Call the warranty period t (hour).
At that time, the percentage of equipment under warranty is:
𝑡 − 1500 0 − 1500
𝑃 (𝑋 = 𝑡 ) = 𝑃 (0 ≤ 𝑋 < 𝑡 ) =  ( )− ( )
150 150
According to the topic, we have:
1500 − 𝑡 1500 − 𝑡 1500 − 𝑡
0,5 −  ( ) = 0,01  ( ) = 0,49 = 2,33
150 150 150
 𝑡 = 1150,5

So for the percentage of equipment under warranty to be 1%, we stipulate the warranty
time to be 1150.5 hours.

CHAPTER 3.2. RANDOM VECTOR


I. PROBABILITY DISTRIBUTION OF DISCRETE RANDOM VECTOR
1.1. Component probability distribution (margin distribution)
• Probability distribution table of X

X 𝑥1 𝑥2 ........ 𝑥m
P p1* p2* ......... pm*
in there, 𝑝𝑖∗ = 𝑝𝑖1 + 𝑝𝑖2 + ⋯ + 𝑝𝑖𝑛
X's expectation is: 𝐸𝑋 = 𝑥1 𝑝1∗ + 𝑥2 𝑝2∗ + ⋯ + 𝑥𝑚 𝑝𝑚∗
• Probability distribution table of Y
Y y1 y2 ......... ym
P p*1 p*2 ......... p*m
in there, 𝑝∗𝑗 = 𝑝1𝑗 + 𝑝2𝑗 + ⋯ + 𝑝𝑚𝑗
Y's expectation is: 𝐸𝑌 = 𝑦1 𝑝∗1 + 𝑦2 𝑝∗2 + ⋯ + 𝑦𝑛 𝑝∗𝑛

EX: Simultaneous probability distribution of random vector (X,Y) given by the table:
Y 1 2 3
X
3 0,02 0,10 0,15
4 0,30 0.05 0,20
5 0.05 0,03 0,10

a) Calculate 𝑃(𝑋 = 5) and 𝑃(𝑋 ≥ 4, 𝑌 ≥ 2).


b) Prepare a table of component probability distributions and calculate
𝐸𝑋, 𝐸𝑌, 𝑉𝑎𝑟𝑋, 𝑉𝑎𝑟𝑌.
Solution:
a) 𝑃(𝑋 = 5) = 𝑃31 + 𝑃32 + 𝑃33 = 0,05 + 0,03 + 0,10 = 0,18
𝑃(𝑋 ≥ 4, 𝑌 ≥ 2) = 𝑃22 + 𝑃23 + 𝑃32 + 𝑃33 = 0,05 + 0,20 + 0,03 + 0,10 = 0,38
b) Probability distribution table of random vector X:
X 3 4 5
P 0,27 0,55 0,18
𝐸𝑋 = 0,27.3 + 0,55.4 + 0,18.5 = 3,91;
𝑉𝑎𝑟𝑋 = 𝐸(𝑋)2 − (𝐸𝑋 )2 = 15,73 − 15,2881 = 0,4419
Probability distribution table of random vector Y:
Y 1 2 3
P 0,37 0,18 0,45
𝐸𝑌 = 0,37 + 0,18.2 + 0,45.3 = 2,08;
𝑉𝑎𝑟𝑌 = 𝐸(𝑌)2 − (𝐸𝑌)2 = 5,14 − 4,3264 = 0,8136
1.2. Conditional probability distribution

𝑃(𝑋 = 𝑥𝑖 ; 𝑌 = 𝑦𝑗 ) 𝑝𝑖𝑗
𝑃(𝑋 = 𝑥𝑖 |𝑌 = 𝑦𝑗 ) = = , 𝑖 = ̅̅̅̅̅̅
1, 𝑚
𝑃(𝑌 = 𝑦𝑗 ) 𝑝∗𝑗

𝑃(𝑋 = 𝑥𝑖 ; 𝑌 = 𝑦𝑗 ) 𝑝𝑖𝑗
𝑃(𝑌 = 𝑦𝑗 |𝑋 = 𝑥𝑖 ) = = , 𝑗 = ̅̅̅̅̅
1, 𝑛
𝑃(𝑋 = 𝑋𝑖 ) 𝑝𝑖∗

▪ Probability distribution table of X with conditions Y = yj :


X 𝑥1 𝑥2 ........ 𝑥m
𝑃(𝑋 = 𝑥𝑖 |𝑌 = 𝑦𝑗 ) 𝑝1𝑗 𝑝2𝑗
.......
𝑝𝑚𝑗
𝑝∗𝑗 𝑝∗𝑗 𝑝∗𝑗

X's expectation with condition Y = yj :


1
𝐸𝑋 = (𝑥 𝑝 + 𝑥2 𝑝2𝑗 + ⋯ + 𝑥𝑚 𝑝𝑚𝑗
𝑝∗𝑗 1 1𝑗

▪ Probability distribution table of Y with conditions X = 𝑥𝑖 :


Y y1 y2 ......... ym
𝑝 𝑖1 𝑝𝑖2 𝑝𝑖𝑛
𝑃(𝑌 = 𝑦𝑗 |𝑋 = 𝑥𝑖 ) .........
𝑝𝑖∗ 𝑝𝑖∗ 𝑝𝑖∗

Y's expectation with condition 𝑋 = 𝑥𝑖 :


1
𝐸𝑌 = (𝑦 𝑝 + 𝑥2 𝑝𝑖2 + ⋯ + 𝑥𝑛 𝑝𝑖𝑛 )
𝑝1∗ 1 𝑖1

EX: Given the simultaneous probability distribution table of (X,Y):


Y 1 2 3
X
3 0,02 0,10 0,15
4 0,30 0,05 0,20
5 0,05 0,03 0,10
a) Prepare a probability distribution table of X with the condition Y = 3 and calculate
the expectation of X.
b) Prepare a probability distribution table of Y with the condition X = 4 and calculate
the expectation of Y.
Solution:
a) Make a table:
𝑃13 0,15 1
𝑃(𝑋 = 3|𝑌 = 3) = = =
𝑃13 + 𝑃23 + 𝑃33 0,15 + 0,20 + 0,10 3
𝑃23 0,20 4
𝑃(𝑋 = 4|𝑌 = 3) = = =
𝑃13 + 𝑃23 + 𝑃33 0,15 + 0,20 + 0,10 9
𝑃33 0,10 2
𝑃(𝑋 = 5|𝑌 = 3) = = =
𝑃13 + 𝑃23 + 𝑃33 0,15 + 0,20 + 0,10 9
Probability distribution table of X with condition Y = 3:
X 3 4 5
𝑃(𝑋 = 𝑥𝑖 |𝑌 = 3) 1 4 2
3 9 9
1 4 2 35
𝐸𝑋 = 3. + 4. + 5. =
3 9 9 9
b) Make a table:
𝑃21 0,30 6
𝑃(𝑌 = 1|𝑋 = 4) = = =
𝑃21 + 𝑃22 + 𝑃23 0,30 + 0,05 + 0,20 11
𝑃22 0,05 1
𝑃(𝑌 = 2|𝑋 = 4) = = =
𝑃21 + 𝑃22 + 𝑃23 0,30 + 0,05 + 0,20 11
𝑃23 0,20 4
𝑃(𝑌 = 3|𝑋 = 4) = = =
𝑃21 + 𝑃22 + 𝑃23 0,30 + 0,05 + 0,20 11
Probability distribution table of Y with the condition X = 4:
Y 1 2 3
𝑃(𝑌 = 𝑦𝑗 |𝑋 = 4) 6 1 4
11 11 11
6 1 4 20
𝐸𝑌 = 1. + 2. + 3. =
11 11 11 11

II. PROBABILITY DISTRIBUTION OF CONTINUOUS RANDOM VECTOR


2.1. Simultaneous density function of (X,Y)
• The two-variable function 𝑓(𝑥, 𝑦) ≥ 0 determined on R2 is called the
density function of random vector (X,Y) if:
+∞ +∞

∬ 𝑓 (𝑥, 𝑦)𝑑𝑥𝑑𝑦 = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1


𝑅2 −∞ −∞

• The probability of vector (X,Y) on the set 𝐷 𝑅2 is:

𝑃{(𝑋, 𝑌) ∈ 𝐷 } = ∬ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦


𝐷

2.2. Component density function


+∞

• The density function of X is: 𝑓𝑋 (𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦


−∞

+∞
• The density function of Y is: 𝑓𝑌 (𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑥
−∞

• The average of the components is:


+∞ +∞

𝐸 {𝑓𝑥 (𝑥)} = ∫ 𝑥. 𝑓𝑋 (𝑥)𝑑𝑥; 𝐸 {𝑓𝑥 (𝑥)} = ∫ 𝑦. 𝑓𝑌 (𝑦)𝑑𝑦


−∞ −∞

2.3. Conditional density function


• The conditional density function of X when Y = y is:
𝑓(𝑥, 𝑦)
𝑓𝑋 (𝑥 | 𝑦) =
𝑓𝑌 (𝑦)
• The conditional density function of Y when X = x is:
𝑓(𝑥, 𝑦)
𝑓𝑌 (𝑦 | 𝑥 ) =
𝑓𝑋 (𝑥)

EX: Given a random vector (X,Y) with density function:


𝐶 (𝑥 + 𝑥𝑦), (𝑥, 𝑦) ∈ [0; 1]
𝑓𝑋,𝑌 (𝑥, 𝑦) = {
0, (𝑥, 𝑦 ) [0; 1]
a) Find C.
b) Find the component density function of X,Y.
c) Find the conditional density function 𝑓𝑋 (𝑥 | 𝑦); 𝑓𝑌 (𝑦 | 𝑥 )
Solution:
a) According to the properties of the simultaneous density function:
+∞ +∞ 1 1 1
𝐶 3𝐶
∫ ∫ 𝑓𝑋,𝑌 (𝑥, 𝑦)𝑑𝑥𝑑𝑦 = ∫ ∫ 𝐶 (𝑥 + 𝑥𝑦)𝑑𝑥𝑑𝑦 = ∫(𝑦 + 1)𝑑𝑦 = =1
2 4
−∞ −∞ 0 0 0

4
𝐶 =
3
b) Component density function of X:
+∞ 1
4
𝑓𝑋 (𝑥) = ∫ 𝑓𝑋,𝑌 (𝑥, 𝑦)𝑑𝑦 = ∫ (𝑥 + 𝑥𝑦)𝑑𝑦 = 2𝑥, 𝑥 ∈ [0; 1].
3
−∞ 0
Component density function of Y:
+∞ 1
4 4
𝑓𝑌 (𝑥) = ∫ 𝑓𝑋,𝑌 (𝑥, 𝑦)𝑑𝑥 = ∫ (𝑥 + 𝑥𝑦)𝑑𝑥 = 𝑦, 𝑦 ∈ [0; 1]
3 3
−∞ 0
c) The conditional density function 𝑓𝑋 (𝑥 | 𝑦):
𝑓(𝑥, 𝑦) 4⁄3 (𝑥 + 𝑥𝑦) 𝑥
𝑓𝑋 (𝑥 | 𝑦) = = = + 𝑥, (𝑥, 𝑦) ∈ [0; 1]
𝑓𝑌 (𝑦) 4 𝑦
𝑦
3
The conditional density function 𝑓𝑌 (𝑦 | 𝑥 ):
𝑓(𝑥, 𝑦) 4⁄3 (𝑥 + 𝑥𝑦) 2
𝑓𝑌 (𝑦 | 𝑥 ) = = = (𝑦 + 1), (𝑥, 𝑦) ∈ [0; 1]
𝑓𝑋 (𝑥) 2𝑥 3

CHAPTER 5. SAMPLE THEORY


1. Overall
The overall is a collection of research objects. Has size N. Characteristic numbers of the
population include:
a) Overall average:
𝜇 = ∑𝑘𝑖=1 𝑥𝑖 . 𝑝𝑖

b) Variance of the overall:


2
𝜎 2 = ∑𝑘𝑖=1(𝑥𝑖 − 𝜇) . 𝑝𝑖

c) The standard deviation of the overall:


𝜎 = √𝜎 2

d) The ratio of the overall:


𝑀
P=𝑁

EX: The rubber industry has 500,000 workers. To study their living standards, people surveyed
the indicator X*: "Real income of rubber industry workers" and assumed the data given in the
following table:

-The average income of 3 rubber industry workers (overall average):


𝜇 = ∑𝑘𝑖=1 𝑥𝑖 . 𝑝𝑖 = 2,5. 0,1 + 3,5. 0,14 + 4,5. 0,3 + 5,5. 0,3 + 6,5. 0,11 + 7,5. 0,06 + 9. 0,05

= 5,025 (million dong)

-Variance of income (variance of the overall):


2 2 2 2
𝜎 2 = ∑𝑘𝑖=1(𝑥𝑖 − 𝜇) . 𝑝𝑖 = (2,5 − 5,025) .0,1 + (3,5 − 5,025) .0,14 + (4,5 − 5,025) .0,3
+(5,5 − 5,025)2 . 0,3 + (6,5 − 5,025)2 . 0,11 + (7,5 − 5,025)2 . 0,06 + (9 − 5,025)2 . 0,05
= 2,4969
-The standard deviation of income (standard deviation of the overall):
𝜎 = √𝜎 2 = √2,4969 = 1,5802 => 0,0001%

2. Sample
A sample was randomly and objectively chosen from the whole overall, size n.
Characteristics of the sample include:
a) The sample average:
1
If we consider a random sample: 𝑋̅ = 𝑋̅𝑛 = ∑𝑛𝑖=1 𝑋𝑖
𝑛

1
If there is a specific sample: 𝑥̅ = ∑𝑛𝑖=1 𝑥𝑖
𝑛

*Property: If the original random quantity X has the following expectation and variance
E(X) = 𝜇 and Var(X) = 𝜎 2 then
𝜎2
𝐸(𝑋̅) = 𝜇 𝑣à 𝑉𝑎𝑟(𝑋̅) =
𝑛

b) The sample variance:


1
𝑠̂ 2 = 𝑠̂𝑛2 = ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2
𝑛

Corrected sample variance:

1
𝑆 2 = 𝑆𝑛2 = ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2
𝑛−1

If there is a specific sample, 𝑆 2 will receive the value:

1 2
𝑠 2 = 𝑛−1 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥
̅)

*Properties of s: 𝑆 2 : if E(X) = u ; Var(X) = 𝜎 2 then


E(𝑆 2 ) = 𝜎 2
c) The standard deviation of the sample:
𝑠 = √𝑠 2
d) The ratio of the sample:

𝑋1 +𝑋2 +⋯+𝑋𝑛
If we consider a random sample: F= 𝐹𝑛 = 𝑛

𝑛𝐴
If considering a specific sample 𝑓=
𝑛
e) Relate the characteristics of the sample and the overall
F p
𝑋̅ 𝜇
𝑆2 𝜎2
f) Method for calculating sample characteristic numbers
*In case the sample data is given in the form of n observed values:
𝑛
∑ 𝑥 1
𝑋̅ = 𝑖=1 𝑖 𝑠2 = [∑𝑛𝑖=1 𝑥𝑖 2 − 𝑛(𝑥̅ )2 ]
𝑛 𝑛−1

*In case the sample data is given in the form of frequency 𝑛𝑖 :


∑𝑛 𝑥 1
𝑋̅ = 𝑖 𝑖 𝑠2 = [∑𝑛𝑖=1 𝑛𝑖 𝑥𝑖 2 − 𝑛(𝑥̅ )2 ]
𝑛 𝑛−1

3. The probability distributions of the sample characteristics


3.1.The probability distribution of the sample mean
a) The case where the overall has a normal distribution.

𝜎2
• Because E𝑋̅ = 𝜇 , Var 𝑋̅ = so :
𝑛

𝜎2 𝑋̅−𝜇
𝑋̅ ∈ 𝑁 (𝜇 ; ) => √𝑛 ∈ 𝑁(0,1)
𝑛 𝜎

• For a specific sample of size n large enough, then 𝑆 2 ≈ 𝜎2 and:

𝑆2 𝑋̅−𝜇
𝑋̅ ∈ 𝑁 (𝜇 ; ) => √𝑛 ∈ 𝑁(0,1)
𝑛 𝑆

• When n<30 and 𝜎 2 is unknown then (use student distribution with n-1 degrees of
freedom)
𝑋̅−𝜇
𝑆
√𝑛 ∈ 𝑆𝑡(𝑛 − 1)
b) In case X does not have a normal distribution
• From the central limit theorem, we deduce:

𝑋̅−𝜇 𝑋̅−𝜇
𝜎 T∈ 𝑁(0; 1), 𝑆 T∈ 𝑁(0; 1)
√𝑛 √𝑛

With n≥ 30, we have approximately normal distributions as follows:


-If 𝜎 2 is known then:
𝜎2 𝑋̅−𝜇
𝑋̅~𝑁 (𝜇 ; ) => √𝑛~𝑁(0,1)
𝑛 𝜎

-If 𝜎 2 doesn't know then:


𝜎2 𝑋̅ −𝜇
𝑋̅~𝑁 (𝜇 ; ) => √𝑛~𝑁(0,1)
𝑛 𝜎

3.2. Probability distribution of the sample proportion F:


Assuming 𝑋1 ∈ 𝐵(1; 𝑝)(𝑖 = 1, … , 𝑛) and n are quite large then:
𝑝𝑞 𝐹−𝑝
𝐹 ∈ 𝑁 (𝑝 ; ) => 𝑇 = √𝑛 ∈ 𝑁(0,1)
𝑛 √𝐹(1−𝐹)

EX: Investigating the productivity of 100 hectares of rice in area A, we have the
following data table:
Productivity 3-3,5 3,5-4 4-4,5 4,5-5 5-5,5 5,5-6 6-6,5 6,5-7
(ton/ha)
Area (ha) 7 12 18 27 20 8 5 3
Fields with a yield of less than 4.4 tons/hectare have low productivity. Calculate:
a)The proportion of rice areas with low productivity
𝑚 7+18+12
f= = = 0,37
𝑛 100
b) The average rice yield, sample variance, and the sample standard deviation
Click pocket calculator : Shift mode 6 => press number 1 => enter data in the table
The average rice yield: 𝑥̅ = 4,75
Sample variance: 𝑆 2 = 0,685
CHAPTER 6: PARAMETER ESTIMATION
1. Estimate the range for the overall mean 𝝁
a) Case 1: n ≥ 30 and 𝜎 2 are known
-Step1: from the sample can be calculated 𝑥̅ (the sample average)
1−𝛼
-Step2: from 1-𝛼 => = 𝜑(𝑡𝛼⁄2 )Look up table Laplace 𝑡𝛼⁄2
2
𝜎
-Step 3: The estimated range is : (𝑥̅ − 𝜀, 𝑥̅ + 𝜀), 𝜀 = 𝑡𝛼⁄2 .
√𝑛

b) Case 2: n≥ 30 và 𝜎 2 don’t know


-Step 1: calculate 𝑥̅ và s (sample mean and corrected standard deviation)
Look up table B
1−𝛼
-Step 2: from 1-𝛼 => = 𝜑(𝑡𝛼⁄2 ) 𝑡𝛼⁄2
2
𝑠
-Step 3: The estimated range is: (𝑥̅ − 𝜀, 𝑥̅ + 𝜀), 𝜀 = 𝑡𝛼⁄2 .
√𝑛

*Note: The relationship between the corrected s and uncorrected sample


standard deviation 𝑠̂ :
𝑛 𝑛
𝑠2 = 𝑠̂ 2 => 𝑠 = √ 𝑠̂ 2
𝑛−1 𝑛−1

c) Case 3: n < 30, 𝜎 2 is known and if X has a normal distribution, do the same
as case 1.
d) Case 4: n < 30, 𝜎 2 doesn’t know and X has a normal distribution
-Step 1: from the sample we calculate 𝑥̅ , s
Tra bảng pp Student
-Step 2: from 1-𝛼 => 𝛼 𝑡𝛼𝑛−1
⁄2
(Remember to reduce exponential level to n-1 before looking up the table)
𝑠
-Step 3: the estimated range is: (𝑥̅ − 𝜀, 𝑥̅ + 𝜀), 𝜀 = 𝑡𝛼𝑛−1
⁄2 .
√𝑛

2. Find reliability, do not consider case 4


𝜎 𝜀 √𝑛
Solve the equation 𝜀 = 𝑡𝛼 . => 𝑡𝛼 =
√𝑛 𝜎
𝑠 𝜀 √𝑛
Or 𝜀 = 𝑡𝛼 . => 𝑡𝛼 =
√𝑛 𝑠
1−𝛼
look up table B, we deduce: 𝜑(𝑡𝛼 ) = => 1- 𝛼 =2 𝜑(𝑡𝛼 )
2
EX:The amount of Vitamin contained in fruit A is a random variable X(mg) with a
standard deviation of 3.98mg. Analyzing 250 A fruits, the average amount of Vitamin
20mg was obtained. With a confidence level of 95%, estimate the average amount of
Vitamin in 1 fruit A.
Solution:
Let u be the average amount of vitamin A in fruit (mg)
. n = 250 >30 , 𝜎 2 = 3,982
. 𝑥̅ = 20 ; 1- 𝛼 = 95% => 𝛷(𝑡𝛼⁄2 ) = 0,475 => 𝑡𝛼⁄2 = 1,96
𝜎 3,98
. 𝜀 = 𝑡𝛼⁄2 . = 1,96. ≈ 0,4934
𝑛
√ √250

So 𝜇 ∈ (𝑥̅ − 𝜀 ; 𝑥̅ + 𝜀)
𝜇 ∈ (19,5066 ;20,4934)
3. Find the sample size (consider only case 1 and case 2)
We fix s( ) to find sample size N.
a) If    ' then we solve the inequality:
𝑠 𝑠 2
𝑡𝛼 . >  ' => N < (𝑡𝛼 . ) => 𝑁𝑚𝑎𝑥
√𝑁 ′

b) If   ' then we solve the inequality:


𝑠 𝑠 2
𝑡𝛼 . <  ' => N > (𝑡𝛼 . ) => 𝑁𝑚𝑖𝑛
√𝑁 ′

4. Estimate the range for the total ratio p


- The proportion p of elements with property A in the overall is unknown. With a
given 1− confidence level, the estimate interval p is (𝑝1 ; 𝑝2 ) satisfying:
P ( 𝑝1 < 𝑝 < 𝑝2 ) = 1 - 
𝑚
- If we know the sample proportion f = 𝑓𝑛 = where n is the sample size and m is
𝑛

the number of elements we are interested in, the estimated range for p is:
𝑓(1−𝑓)
(𝑓 − 𝜀 ; 𝑓 + 𝜀) , 𝜀 = 𝑡𝛼⁄2 . √
𝑛

1−𝛼
In there, 𝑡𝛼⁄2 is found from 𝜑(𝑡𝛼⁄2 ) = (look up table Laplace)
2
EX: Province X has 1,000,000 young people. People randomly surveyed 20,000
young people in province X about their educational level and found that 12,575
young people had graduated from high school. Estimate the proportion of young
people who have graduated from high school in province X with 95% confidence?
What is the number of young people who have graduated from high school in
province X?
Solution:
Let p be the proportion of young people who have graduated from high school in
province X.
𝑚 12575
f= = = 0,62875
𝑛 20000
1-𝛼 = 95% => 𝜑(𝑡𝛼⁄2 ) = 0,475 => 𝑡𝛼⁄2 = 1,96

𝑓(1−𝑓) 0,62875.(1−0.62875)
𝜀 = 𝑡𝛼⁄2 . √ 𝑛
= 1,96.√ ≈ 0,0067
20000

So p ∈ (𝑓 − 𝜀 ; 𝑓 + 𝜀)
p ∈ (0,62205 ; 0,63545)
The number of young people who have graduated from high school is:
1 000 000 . 0,62205 = 622050
1 000 000 . 0,63545 = 635450

CHAPTER 7. TEST STATISTICAL HYPOTHESES.


I. Concept of statistical hypothesis testing.
1. General concepts.
- Model: State two contradictory propositions, one proposition is a hypothesis and the other
is a paradox (antithesis).
- Solution: Observing the sample, we state a rule of action, we accept the hypothesis or
reject the hypothesis.
- Accepting the hypothesis means we believe the hypothesis is true and rejecting the
hypothesis means we believe the hypothesis is false. However, we cannot be certain about
the whole.
2. Parameter testing.
- Test comparing parameter 𝜃 (of the population) and the given real number 𝜃0, there are
three pairs of hypotheses:
H0 : 𝜃 = 𝜃 0 H0 : 𝜃 = 𝜃 0 H0 : 𝜃 = 𝜃 0
H1 : 𝜃 ≠ 𝜃 0 H1 : 𝜃 > 𝜃 0 H1 : 𝜃 < 𝜃 0

- Statistics calculated on a specific sample are Gqs.


- We only consider the following types of parametric tests:
+ Compare characteristics with a number.
+ Compare the characteristics of two populations.
3. Types of errors in testing
- Type I error: Refuting something that is true.
- Type II error: Accepting something that is wrong.
The essence
Decide
H0 is true H0 is false
True Type II error
AcceptH0
Probability = 1 − 𝛼 Probability = 𝛽
Type I error True
Reject H0
Probability = 𝛼 Probability = 1 − 𝛽

- 𝛼 and 𝛽 change in opposite directions.


4. Significance level and rejection.
- Determine a region 𝑊𝛼 called the reject area.
- 𝑊𝛼 is determined by critical values.
- 𝛼 is called the significant level.
- Commonly used α levels are 1%, 5%, 10%.
5. Steps of inspection.
- From the proposition→ pair of hypotheses H0, H1.
- Calculate Gqs statistics for specific samples.
- With the given significance level 𝛼 correctly determine the critical value and rejection
region 𝑊𝛼 .
- Rules:
+ Gqs  𝑊𝛼 → reject H0 (H0 false).
+ Gqs  𝑊𝛼 → has not been rejected H0 (H0 correct).
- Conclusion about the original proposition.
6. Inspection methods
- Tested using standard domain.
- Test by probability of significance.
- Test using confidence intervals.
7. Test through P-value
- P-value is “the lowest probability level to reject H0”
- P-value s often pre-calculated through specialized software.
- Testing rules according to P-value.
- With the given significance level 𝛼:
+ If P-value < 𝛼 then reject H0.
+ If P-value ≥ 𝛼 then has not been rejected H0.
II. Characteristic comparison test with a number
1. Test comparing the average with a number
- Overall normal distribution X ~ N (𝜇, 𝜎 2 ).
- The parameter 𝜇 is unknown, the test compares 𝜇 with the number 𝜇0 .
- Three pairs of hypotheses

H0 : 𝜇 = 𝜇 0 H0 : 𝜇 = 𝜇 0 H0 : 𝜇 = 𝜇 0
H1 : 𝜇 ≠ 𝜇 0 H1 : 𝜇 > 𝜇 0 H1 : 𝜇 < 𝜇 0

- Consider two cases:


+ The population variance 𝜎 2 is known (theoretical).
+ The population variance 𝜎 2 is unknown (actual).
* Test 𝝁 when 𝝈𝟐 is known
Statistical Pair of hypotheses Reject H0 P-value
𝑋̅ − 𝜇0 H0 : 𝜇 = 𝜇 0
Zqs = |Zqs| > 𝑧𝛼/2 2P (Z > |Zqs|)
𝜎/√𝑛 H1 : 𝜇 ≠ 𝜇 0
H0 : 𝜇 = 𝜇 0
Zqs > 𝑧𝛼 P (Z > Zqs)
H1 : 𝜇 > 𝜇 0
H0 : 𝜇 = 𝜇 0
Zqs < −𝑧𝛼 P (Z < Zqs)
H1 : 𝜇 < 𝜇 0

* Test 𝝁 when 𝝈𝟐 is unknown


Statistical Pair of hypotheses Reject H0
H0 : 𝜇 = 𝜇 0 (𝑛−1)
|Tqs| > 𝑡𝛼/2
H1 : 𝜇 ≠ 𝜇 0
𝑋̅ − 𝜇0 H0 : 𝜇 = 𝜇 0 (𝑛−1)
Tqs = Tqs > 𝑡𝛼
𝑠/√𝑛 H1 : 𝜇 > 𝜇 0
H0 : 𝜇 = 𝜇 0 (𝑛−1)
Tqs < −𝑡𝛼
H1 : 𝜇 < 𝜇 0

EX: Electricity Department A reported that: on average, a household has to pay 250,000
VND monthly for electricity, with a standard deviation of 20,000 VND. People randomly
surveyed 500 households and calculated that on average, each household pays 252,000
VND for electricity every month. In testing hypothesis H “The average monthly payment
per household is 250 thousand VND” with significance level 𝛼 = 1%. Please provide the t-
statistic value and conclusion.
Solution
Because n = 500, 𝜎 = 20
1− 𝛼
We have: 𝑥̅ = 252 and 𝛼 = 0,01 → = 0,495 → 𝑡𝛼 = 2,58
2
𝑥̅ − 𝜇0 252−250
Statistical value: t = = = 2,2361
𝜎/√𝑛 20/√500

Because t < 𝑡𝛼 , we accept hypothesis H


So t = 2,2361 and in fact, an average household has to pay 250,000 VND for electricity
every month.
2. Test to compare a ratio with a number.
Statistical Pair of hypotheses Reject H0
H0 : 𝜌 = 𝜌 0
̂ − 𝜌0
𝜌 |Zqs| > 𝑧𝛼/2
Zqs = H1 : 𝜌 ≠ 𝜌 0
√𝜌0 (1−𝜌0 )/𝑛
H0 : 𝜌 = 𝜌 0
Zqs > 𝑧𝛼
n≥ 100 H1 : 𝜌 > 𝜌 0
H0 : 𝜌 = 𝜌 0 Zqs < −𝑧𝛼
H1 : 𝜌 < 𝜌 0

EX: A report said that 25% of Vietnamese consumers are interested in Vietnamese
products. A random survey of 1,000 Vietnamese people found that 385 respondents were
interested in Vietnamese goods. With a significance level of 5%, retest the above statement.
Solution
𝑚 385
f= = = 0,385
𝑛 1000

1− 𝛼
𝛼 = 5% = 0,05 → = 0.475 = 𝜑(1,96) → 𝑡𝛼 = 1,96
2
Statistical value:
|𝑓− 𝑝0 | |0,385−0,25|
t= √𝑛 = √1000 = 9,8590
√𝑝0 𝑞0 √0,25 . 0,75

Because t > 𝑡𝛼 , we reject hypothesis H


Because f > p0 , so in fact, the percentage of consumers interested in Vietnamese products
is higher than 25%.
3. Test to compare variance with a number
Statistical Pair of hypotheses Reject H0
2(𝑛−1)
H0 : 𝜎 2 = 𝜎02 2𝑞𝑠 > 𝛼/2
H1 : 𝜎 2 ≠ 𝜎02 2(𝑛−1)
hoặc 2𝑞𝑠 < 1−𝛼/2
(𝑛 − 1)𝑠 2
2𝑞𝑠 = H0 : 𝜎 2 = 𝜎02 2𝑞𝑠 > 2(𝑛−1)
𝜎02 H1 : 𝜎 2 > 𝜎02 𝛼

H0 : 𝜎 2 = 𝜎02 2𝑞𝑠 < 1−𝛼


2(𝑛−1)
H1 : 𝜎 2 < 𝜎02

III. Test comparing two characteristics.


1. Compare two averages of two populations X and Y.
Statistical Pair of hypotheses Reject H0
𝑥̅ −𝑥̅ H0 : 𝜇1 = 𝜇2
Tqs = 1 2 |Tqs| > 𝑧𝛼/2
𝑠21 𝑠2
2 H1 : 𝜇1 ≠ 𝜇2
√ +
𝑛1 𝑛2
H0 : 𝜇1 = 𝜇2
n1, n2 > 30 Tqs > 𝑧𝛼
H1 : 𝜇1 > 𝜇2
H0 : 𝜇1 = 𝜇2
Tqs < −𝑧𝛼
H1 : 𝜇1 < 𝜇2

2. Compare the ratio of two populations X and Y.


Statistical Pair of hypotheses Reject H0
H0 : 𝜌1 = 𝜌2
|Zqs| > 𝑧𝛼/2
Z =
̂1 − 𝜌
𝜌 ̂2 H1 : 𝜌1 ≠ 𝜌2
qs 1 1
̅ (1−𝜌
√𝜌 ̅ )( + )
𝑛 𝑛 1 2 H0 : 𝜌1 = 𝜌2
Zqs > 𝑧𝛼
H1 : 𝜌1 > 𝜌2
𝑛1 𝜌̂1 + 𝑛2 𝜌̂2
𝜌̅ =
𝑛1 + 𝑛2 H0 : 𝜌1 = 𝜌2
Zqs < −𝑧𝛼
H1 : 𝜌1 < 𝜌2

3. Compare two variances of two populations X and Y.


Statistical Pair of hypotheses Reject H0
(𝑛 −1,𝑛2 −1)
H0 : 𝜎12 = 𝜎22 Fqs > 𝑓𝛼/21
H1 : 𝜎12 ≠ 𝜎22 (𝑛 −1,𝑛2 −1)
hoặc Fqs > 𝑓1−𝛼/2
1
𝑠12
Fqs =
𝑠22 H0 : 𝜎12 = 𝜎22 (𝑛1 −1,𝑛2 −1)
1 Fqs > 𝑓𝛼
(𝑚 ,𝑚 )
𝑓1−𝛼1 2 = H1 : 𝜎12 > 𝜎22
(𝑚2 ,𝑚1 )
𝑓𝛼
H0 : 𝜎12 = 𝜎22 (𝑛 −1,𝑛2 −1)
Fqs < 𝑓1−𝛼1
H1 : 𝜎12 < 𝜎22

- The hypothesis 𝜎12 < 𝜎22 is permuted to 𝜎22 > 𝜎12


- Only considering 𝑠12 > 𝑠22 will the table decide.
X1 ~ N (𝜇1 , 𝜎12 )
Pair of hypotheses Reject H0
X1 ~ N (𝜇1 , 𝜎22 )
H0 : 𝜎12 = 𝜎22 (𝑛1 −1,𝑛2 −1)
𝒔𝟐𝟏 > 𝒔𝟐𝟐 Fqs > 𝑓𝛼
H1 : 𝜎12 > 𝜎22
𝑠12
Fqs = H0 : 𝜎12 = 𝜎22 (𝑛 −1,𝑛2 −1)
𝑠22 Fqs > 𝑓𝛼/21
H1 : 𝜎12 ≠ 𝜎22

4. Compare two averages in vector form (X, Y)


- Set d = Y − X and hypothesis H : 𝜇𝑑 = 0
|𝑑̅|
- Calculate the statistic t = √𝑛 (n is the number of pairs in the sample)
𝑠𝑑

- Depending on n and whether the variance is known or unknown, we consider cases like
comparing the average with a number.

CHAPTER 8. CORRELATION AND REGRESSION PROBLEMS


I. Pattern correlation coefficient.
1. Define.
- The sample correlation coefficient r is a measure of the degree of linear dependence
between two random samples of the same size X and Y.
- Suppose we have a random sample of size n of random vector (X;Y) as (xi;yi); i = 1;2;...;n.
Then, the sample correlation coefficient r is calculated according to the formula:
̅̅̅̅−𝑥̅
𝑥𝑦 .𝑦̅ 1
r= ̅̅̅ = ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖
; 𝑥𝑦
𝑠̂𝑥 .𝑠̂𝑦 𝑛

2. Quality.
1) -1 r 1
2) If r = 0 then X, Y o not have a linear relationship;
If r = 1 then X, Y have an absolutely linear relationship
3) If r < 0 then the relationship between X, Y is decreasing.
4) If r > 0 then the relationship between X, Y is covariate.
II. Experimental mean linear regression line
- From the experimental sample of random vectors (X, Y), we represent pairs of points
(xi;yi) on the Oxy plane. Then, the curve connecting the points is the curve that depends on
Y on X that we need to find (see picture a),b))

- The straight line is the empirical regression line that best approximates the given sample
points, and is also an approximation of the curve to be found.
+ Figure a) we see good approximation (strong linear dependence)
+ Figure b) is not a good approximation.
1. Least squares method
- When there is a relatively strong linear dependence between two random variables X and
Y we need to find the expression a+bX that best approximates Y in the sense of minimizing
the mean squared error E(Y-a-bX)2, this method is called least squares.
- For each pair of points (xi;yi) the approximate error is:
𝜀𝑖 = 𝑦𝑖 − (𝑥 + 𝑏𝑥𝑖 ) (see picture c))

- The linear regression line of Y by X is: y = a + bx


- The linear regression line of X by Y is: x – a + by
EX: Tracking interest rates (Y) and inflation rates (X) in some countries, we have the
following data:
Y 17,5 15,6 9,8 5,3 7,9 10,0 19,2 13,1
X 14,2 11,7 6,4 2,1 4,8 8,1 15,4 9,8

Let's:
- Calculate the sample correlation coefficient.
- Build a sample regression equation.
- Estimate regression error.
- Forecast the value of the interest rate if the inflation rate is 22.5.
Solution
- With sample data we can calculate
𝑥̅ = 9,0625; 𝑦̅ = 12,3; ̅̅̅
𝑥𝑦 =130,9813;
𝑠 2𝑥 = 18,59; 𝑠𝑥 = 4,312
𝑠 2𝑦 = 20,76; 𝑠𝑦 = 4,56.
- So the sample correlation coefficient will be
r = (130,9813 - 9,0625 . 12,3) / (4,56 . 4,312) = 0,99
- We have:
𝑎̂ = 0,99 (4,56 / 4,312) = 1,045
𝑏̂ = 𝑦̅ - 𝑎̂𝑥̅ = 12,3 – 1,045 . 9,0625 = 2.83
- So we have a sample regression equation
𝑦̂ = 1,045.x + 2,83
- The estimate of the regression error is::
2
𝜀𝑦/𝑥 = 𝑠𝑦2 (1-r2) = 20,76 (1 – 0,992) = 0,413

- If inflation rate x2 = 22,5 the bank interest rate will be:


y0 = 1,045.22,5 + 2,83 = 26,343.

You might also like