H2 Mathematics
A Level (Statistics)
2017
écrit par Seow Ryan
Topics covered:
1. Permutations and Combinations
2. Probability
3. Discrete Random Variables
4. Binomial Distribution
5. Normal Distribution
6. Sampling Distribution
7. Hypothesis Testing
8. Linear Correlation and Regression
mis à jour août 2017
Uploaded by Ryan Seow
On owlcove.sg
Permutations and Combinations Probability
Addition : non-overlapping cases number of outcomes of A n( A)
case 1 – P(AAB) P( A)
total number of possible outcomes n( S )
case 2 – P(ABA)
case 3 – P(BAA) Venn Diagram
P(2As in 3 consecutive)
= P(AAB) + P(ABA) + P(BAA) • P( A ') P ( A) 1
• P ( A B ) P ( A) P ( B ) P ( A B )
Multiplication : successive cases
e.g the rain and don’t rain type qns • P( A) P( A B ) P ( A B ')
P(rain on second day) • P ( A B ') P ( A) P ( A B ) P ( A B ) P ( B )
= P(rain 1st day) × P(rain 2nd day) + • P ( A ' B ') 1 P( A B )
P(don’t rain 1st day) × P(rain 2nd day)
Mutually Exclusive events cannot occur
Combination is an unordered arrangement of a
together within a situation, e.g. in one coin toss,
number of objects in a given set.
either heads or tails, but not both
• n
Cr : choose r objects from n objects
• Separate circles in Venn diagram
Permutation is an ordered arrangement of • P( A B) 0
objects to find total number of ways to arrange
Conditional Probability
objects. (ORDER MATTERS !!)
• Hint words : IF, GIVEN
• n
Pr nCr r !
P( A B)
arrange r objects from n distinct • P( A B)
P( B)
• Arrange n distinct objects = n !
n!
• if p and q of them identical : Independent events are when the occurrence of
p !q ! one event does not affect another
CIRCLE Arrangement • P( A B ) P ( A) P ( B )
• indistinguishable seats ( n 1)! P( A B)
• P( A B) P( A)
• distinguishable seats n (n 1)! n ! P( B)
P( B A)
different because of labelling, colour, • P( B A) P( B)
shape, gifts etc P( A)
• refer to list below, divide by the right
column to account for rotational symm.
Shape ÷ Shape ÷
Triangle 1 Rectangle 2
Equilateral Δ 3 Square 4
Kite 1 Regular shapes
Trapezium 1 Pentagon 5
Parallelogram 2 Hexagon 6
Rhombus 2 Octagon 8
écrit par Seow Ryan Les Mathématiques #LeafySnails [1]
Uploaded by Ryan Seow
On owlcove.sg
Discrete Random Variables Normal Distribution
x 0 1 2 3
P(X=x) a b c D
• Total sum of probabilities = 1
i.e. P( X x) 1 so
all x
a+b+c+d=1
• Expectation, E ( X ) xP( X x)
all x
E ( x) 0(a ) b 2c 3d
o E (a ) a, where a is a constant
o E (aX bY ) aE ( X ) bE (Y )
For X N ( , ²) ,
• Variance, Var ( X ) ² E ( x ²) ( E ( x))²
o Var (a ) 0, where a is a constant where μ is the mean and σ is standard deviation
o Var (aX bY ) a ²Var ( X ) b ²Var (Y ) • Probability = area under curve
o standard deviation variation • Standard normal curve is Z N (0,1)
X
Binomial Distribution • Standardisation : Z
Properties • P( X a) P( X a)
• P( X a) 1 P( X a)
• Fixed number of independent trials
• P ( X Y ) P ( X Y 0)
• Only TWO types of outcomes,
success / failure
Note that two books (X1 + X2),
• Probability of success is the same
and twice the mass of a book (2X) are different
for each trials
things because they can occupy random masses
Application – change your normal distribution accordingly
(e.g. ( X 1 X 2 ) N (2 , 2 ²) , and
For X B ( n, p )
2X N (2 , 4 ²) respectively)
• E ( X ) np
• Var ( X ) np (1 p )
n
P( X r ) p 1 p
nr
•
r
r
• binompdf used to find individual
probability like P(X=2)
• binomcdf used to find a range like
P( X 3) 1 P ( X 3)
• Mode is very close to mean, but refers to
the value of x that has the highest
probability
écrit par Seow Ryan Les Mathématiques #LeafySnails [2]
Uploaded by Ryan Seow
On owlcove.sg
Sampling Distribution If given ( x a) in a population of n
x a a
Sample Mean, X
• µ
where X is a random variable, n
• x nµ
E( X )
X
•
1 ( x a)
2
²
n • S²
n 1
( x a)² n
• Var ( X )
n
Hypothesis Testing
Central Limit Theorem (CLT) states that for a
sufficiently large n, the closer the distribution of Null Hypothesis (H0) is a particular claim for the
population mean
the sample mean is to a normal distribution
Alternative Hypothesis the range of values that
² (Ha) excludes the value specified
X N , in H0
n
Test statistic random variable whose
value is calculated from
Corollary to CLT : X 1 ... X n N (n , n ²) sample data
Critical region range of values of the test
Sample Variance (biased!) statistic that we reject H0
Level of significance probability of rejecting H0
α% given that H0 is true
1
X X
2
x² p-value probability observing a
n value of the test statistic as
X
extreme or more extreme
1
2
X ² n
than the one obtained, given
n that H0 is true
Types of Tests
Unbiased Estimator of Population…
Left-tailed µ < µ0 overestimates
Mean = E X µ
overstates
• if x < µ0
• Variance Right-tailed µ > µ0 underestimates
understates
1
X X
2
S² if x > µ0
n 1 Two-tailed µ ≠ µ0 does not function
confirm if correct
1 X
2
n 1
X ² n
n
n 1
x2
(note that the X must be the same,
X and X, or (x-a) and (x-a) –
refer to next section)
• Rules of E(X) and Var(X) still apply!
écrit par Seow Ryan Les Mathématiques #LeafySnails [3]
Uploaded by Ryan Seow
On owlcove.sg
z-test
²
• Under H0, (since n = ___ sufficiently large, by CLT) X N µ0 , (approx.)
n
(a sufficiently large value of n is a value more than 30)
X µ0
• test stat, z N 0,1 approximately,
² n
(write only if not stated as normally distributed)
s² X µ0
Where variance is an unknown to be determined, X N µ0 , , z N 0,1
n s² n
1 x
2
1
where s ²
n 1
( x x )² n 1 x² n
TEST SIGNIFICANCE by z-test statistic
μ < μ0 μ ≠ μ0 μ > μ0
Graph
Reject
z≤-c z ≤ - c OR z ≥ c z≥c
H0
Don’t
reject z>c -c < z < c z≤-c
H0
TEST SIGNIFICANCE by p-value
• p-value ≤ α % reject H0 as the test statistic lies within the critical / rejection region
• p-value > α % do not reject H0 as the test statistic lies outside the rejection area
• CONTEXTUALISE your answers
écrit par Seow Ryan Les Mathématiques #LeafySnails [4]
Uploaded by Ryan Seow
On owlcove.sg
Regression and Correlation Regression Lines
Scatter plots / diagrams • Least squares regression line of y on x,
y y bx x
• Independent variable on horizontal axis S xy
• Dependent variable on vertical axis b
S xx
• Label max and min on both axes
• Where it minimizes the value of
+ve linear correlation -ve linear relation ei 2 (observed predicted)2
in the vertical direction
• Least squares regression line of x on y,
x x d y y
S xy
d
S yy
quadratic correlation unclear correlation
• bd = r2
• Minimise ei 2 in the horizontal direction
• b and d = estimated slope/gradient
• Both regression lines intersect at the
sample mean ( x , y )
Product Moment Correlation Coefficient
• x on y and y on x coincide when r 1
• Measure of the strength and direction of Estimation
a linear correlation of 2 variables
• Independent of the scale of measurement Indep. variable Estimate y Estimate x
x y on x
• r will not change if we add a constant or y x on y
multiply a positive constant to all values unclear y on x x on y
• Value of r = [-1,1]
• Estimating a value within given range is
• Correlation is not causation
reliable if strongly correlated
• If r = 0, there is no linear correlation, but
• Estimating a value outside given range
does not mean there is no relationship
(extrapolation) is unreliable
• 0.8<|r|<1, strong linear correlation
• 0.5<|r|<0.8, moderate linear correlation Linear Law ( y f ( x) Y A BX )
• 0<|r|<0.5, weak linear correlation Non-linear Y A B X
S xy y a bx
2
y a b x2
• r (in MF26)
S xx S yy y a bx y2 a b x
x y y ax
b
ln y ln a b ln x
• S xy xy
y ae
bx
n ln y a b x
x
2 1 1
y
S xx x
a b x
• 2
a bx y
n
b 1
y y a
2
y a b
• S yy y 2 x x
n
écrit par Seow Ryan Les Mathématiques #LeafySnails [5]
Uploaded by Ryan Seow
On owlcove.sg