Solution Manual For Statistical Inference 2nd Edition
Solution Manual For Statistical Inference 2nd Edition
Probability Theory
“If any little problem comes your way, I shall be happy, if I can, to give you a hint or two as
to its solution.”
Sherlock Holmes
The Adventure of the Three Students
1.1 a. Each sample point describes the result of the toss (H or T) for each of the four tosses. So,
for example THTT denotes T on 1st, H on 2nd, T on 3rd and T on 4th. There are 2 4 = 16
such sample points.
b. The number of damaged leaves is a nonnegative integer. So we might use S = {0, 1, 2, . . .}.
c. We might observe fractions of an hour. So we might use S = {t : t ≥ 0}, that is, the half
infinite interval [0, ∞).
d. Suppose we weigh the rats in ounces. The weight must be greater than zero so we might use
S = (0,∞). If we know no 10-day-old rat weighs more than 100 oz., we could use S = (0,100].
e. If n is the number of items in the shipment, then S = {0/n, 1/n, . . . , 1}.
1.2 For each of these equalities, you must show containment in both directions.
1.3 a. x ∈ A ∪ B ⇔ x ∈ A or x ∈ B ⇔ x ∈ B ∪ A
x ∈ A ∩ B ⇔ x ∈ A and x ∈ B ⇔ x ∈ B ∩ A.
b. x ∈ A ∪ (B ∪ C) ⇔ x ∈ A or x ∈ B ∪ C ⇔ x ∈ A ∪ B or x ∈ C ⇔ x ∈ (A ∪ B) ∪ C.
(It can similarly be shown that A ∪ (B ∪ C) = (A ∪ C) ∪ B.)
x ∈ A ∩ (B ∩ C) ⇔ x ∈ A and x ∈ B and x ∈ C ⇔ x ∈ (A ∩ B) ∩ C.
c. x ∈ (A ∪ B)c ⇔ x ∈ A or x ∈ B ⇔ x ∈ Ac and x ∈ Bc ⇔ x ∈ Ac ∩ Bc
x ∈ (A ∩ B)c ⇔ x ∈ A ∩ B ⇔ x ∈ A and x ∈ B ⇔ x ∈ Ac or x ∈ Bc ⇔ x ∈ Ac ∪ Bc.
1.4 a. “A or B or both” is A∪B. From Theorem 1.2.9b we have P (A∪B) = P (A)+P (B)−P (A∩B).
1|Page
1-2 Solutions Manual for Statistical Inference
p0 = p2 ⇒ u+w=1
p1 = p2 ⇒ uw = 1/3.
These two equations imply u(1 − u) = 1/3, which has no solution in the real numbers. Thus,
the probability assignment is not legitimate.
1.7 a.
{
1−πr2 if i = 0
P (scoring i points) = [A ]
πr 2 2
(6−i) −(5−i)
2
2
if i = 1, . . . , 5.
A 5
b.
(scoringipoints∩boardishit)
P (scoring i points|board is hit) = P
P (board is hit)
r
P (board is hit) = π2
A
[
P (scoring i points ∩ board is hit) r2 (6 − i)2 − (5 − i)2 ]
= π i = 1,...,5.
A 52
Therefore,
(6 − i)2 − (5 − i)2
P (scoring i points|board is hit) = i = 1,...,5
52
which is exactly the probability distribution of Example 1.2.7.
1.8 a. P (scoring exactly i points) = P (inside circle i) − P (inside circle i + 1). Circle i has radius
(6 − i)r/5, so
2|Page
Second Edition 1-3
⋃∞
Now define B∞k = i=k Ai.NotethatB ∞ k+1 ⊂B
n k andBk →φask→∞.
∞
(Otherwisethesum
of the probabilities would be infinite.) Thus
( ) ( ) [ ]
⋃ ⋃ ∑ ∑
P A
i = lim P Ai = lim P (Ai) + P (Bn+1) = P (Ai).
n→∞ n→∞
i=1 i=1 i=1 i=1
1.13 If A and B are disjoint, P (A ∪ B) = P (A) + P (B) =13 +4 =12, which is impossible. More
generally, if A and B are disjoint, then A ⊂ Bc and P (A) ≤ P (Bc). But here P (A) > P (B c),
so A and B cannot be disjoint.
1.14 If S = {s1, . . . , sn}, then any subset of S can be constructed by either including or excluding
si, for each i. Thus there are 2n possible choices.
1.15 Proof by induction. The proof for k = 2 is given after Theorem 1.2.14. Assume true for k, that
is, the entire job can be done in n 1 × n2 × · · · × nk ways. For k + 1, the k + 1th task can be
done in nk+1 ways, and for each one of these ways we can complete the job by performing
3|Page
1-4 Solutions Manual for Statistical Inference
the remaining k tasks. Thus for each of the n k+1 we have n1 × n2 × · · · × n k ways of com-
pleting the job by the induction hypothesis. Thus, the number of ways we can do the job is
(1 × (n1 × n2 × · · · × nk )) + · · · + (1 × (n1 × n2 × · · · × nk )) =n1 ×n2 ×···×nk ×nk+1.
| {z }
nk+1terms
b. Think of the n variables as n bins. Differentiating with respect to one of the variables is
equivalent to putting a ball in the bin. Thus there are r unlabeled balls to be placed in
n(n+r−1)
unlabeled bins, and there are r ways to do this.
1.20 A sample point specifies on which day (1 through 7) each of the 12 calls happens. Thus there
are 712 equally likely sample points. There are several different ways that the calls might be
assigned so that there is at least one call each day. There might be 6 calls one day and 1 call
each of the other days. Denote this by 6111111. The number of sample points with this pattern(12)
(12)
is 7 6 6!. There are 7 ways to specify the day with 6 calls. There are to specify which of
6
the 12 calls are on this day. And there are 6! ways of assigning the remaining 6 calls to the
remaining 6 days. We will now count another pattern. There might be 4 calls on one day, 2 calls
on each of two days, and 1 call on each of the remaining four days. Denote this by 4221111.(12)
(6)(8)(6)
The number of sample points with this pattern is 7 4!. (7 ways to pick day with 4
calls, (12) (6) 4 2 2 2 (8)
4 to pick the calls for that day, to pick two days with two calls, ways to pick
(6) 2 2
two calls for lowered numbered day, 2 ways to pick the two calls for higher numbered day,
4! ways to order remaining 4 calls.) Here is a list of all the possibilities and the counts of the
sample points for each one.
pattern number of sample points
(12)
6111111 7 6! = 4,656,960
(16 ) (7)
2
5211111 7 6 5! = 83,825,280
(1
5 )(6)(8)(6)
2
4221111 7 4! = 523,908,000
(14 ) (8)2 2
2
4311111 7 6 5! = 139,708,800
(7)(12
4 )( ) (6)
9 4! = 698,544,000
3321111 5
(12)(6)(9)(2)(5)
7 3! = 1,397,088,000
3222111 7
(7)(12 )(10 )(82)(62)(4)
3 3
2222211 5 2 2 2 2 2 2! = 314,344,800
3,162,075,840
4 | P a g The
e probability is the total number of sample points divided by 712, which is3,162,075,84012 ≈
7
.2285.
(n (2n)
1.21 The probability is 2r)22r . There are ways of choosing 2r shoes from a total of 2n shoes.
(2n) (2n2r ) 2r
Thus there are 2r equally likely sample points. The numerator is the number of sample points
(n)
for which there will be no matching pair. There are 2r ways of choosing 2r different shoes
2 3
5|Page
Second Edition 1-5
styles. There are two ways of choosing within a given shoe style (left shoe or right shoe), which(n)
gives 22r ways of arranging each one of the arrays. The product of this is the numerator
(n) 2r 2r
2r 2 .
(3115)(2915)(3115)(30 15)···(15)
1.22 a) b)366365 ···336
(366180) (36630 )
1.23
∑
n
P( same number of heads ) = P (1st tosses x, 2nd tosses x)
x=0
)x ( )n−x]2 )nn )2
n (
= ∑[(n)(1 1 1 ∑(n
x 2 2 =
4 x
x=0 x=0
1.24 a.
∞
∑
P (A wins) = P (A wins on ith toss)
i=1
)2 )4 ( ) ∞ )2i+1
( ( ∑(1
= 1 + 12 12+ 12 12 +··· = = 2/3.
2 2
i=0
∑∞ p
b. P (A wins) = p + (1 − p) p + (1 − p) p + · · · = i=0 p(1−p)2i = 1−(1−p)2 .
2 4
( )
d
p = p2 0. Thus the probability is increasing in p, and the minimum
c. 2
[1−(1−p)2 ]2 >
dp 1−(1−p) p 2
is at zero. Using L’Hôpital’s rule we find limp→0 1−(1−p) = 1/2.
1.25 Enumerating the sample space gives S ′ = {(B, B), (B, G), (G, B), (G, G)} ,with each outcome
equally likely. Thus P (at least one boy) = 3/4 and P (both are boys) = 1/4, therefore
An ambiguity may arise if order is not acknowledged, the space is S ′ = {(B, B), (B, G), (G, G)},
with each outcome equally likely.
1.27 a. For n odd the proof is straightforward. There are an even number of terms in the sum (n) )
(n
(0, 1, · · · , n), and k and , which are equal, have opposite signs. Thus, all pairs cancel
n−k
and the sum is zero. If n is even, use the following identity, which is the basis of Pascal’s(n)
(n−1) (n−1)
k = + . Then, for n even
triangle: Forn k > 0, k k−1
n−1
∑ ( ) ( ) ∑ ( ) ( )
(−1)k n n k
n n
k=0
k = + (−1)
n−1 +
0 k n
( )
k=1
n ( ) [( ) ( )]
n ∑ n−1 n−1
= 0 + + (−1)k
+
n k k−1
k=1
= ( ) ( ) ( ) ( )
n n n−1 n−1
0 + − − = 0.
n 0 n−1
n (n) n (n−1) n−1
b. Use the fact that for k > 0, k =n to write
k k−1
∑
( ) ∑ ( ) ∑ ( )
n n−1 n−1
k =n =n = n2n−1.
k k−1 j
k=1 k=1 j=0
6|Page
1-6 Solutions Manual for Statistical Inference
∑n ∑n j−1
c. −1)k+1 (n = −1)k+1 n−1 =n∑ −1)j =0fromparta).
k=1 ( of thektwo
1.28 The average k)integrals isk=1( ( k) =0 ( (n−1j)
Let d
n = log n! − [(n + 1/2) log n − n], and we want to show that lim n→∞ mdn = c, a constant.
This would complete the problem, since the desired limit is the exponential of this one. This
is accomplished in an indirect way, by working with differences, which avoids dealing with the
factorial. Note that ( ) ( )
1 1
dn − dn+1 = n+
2 log 1+ n − 1.
Differentiation will show that ((n
value (3/2)
+12))log((1+n))isincreasinginn,andhasminimum
log 2 = 1.04 at n = 1. Thus d
n − dn+1 > 0. Next recall the Taylor expansion of log(1 + x) = x −
x2/2 + x3/3 − x4/4 + · · ·. The first three terms provide an upper bound on log(1 + x), as the
remaining adjacent pairs are negative. Hence
( )
1)(1 1 1 1 1
0<dndn+1 < n+
2 n 2n2 + 3n3 −1= 12n2 + 6n3
∑∞
It therefore follows, by the comparison test, that the series 1 dn −dn+1 converges. Moreover,
the partial sums must approach a limit. Hence, since the sum telescopes,
N
∑
lim dn − dn+1 = lim d1 − dN+1 = c.
N →∞ N →∞
1
7|Page
Second Edition 1-7
n!
!
k1!k2!···km!nn < n n , since k1!k2! · · · km! > 1.
n
Therefore the outcome with averagex1 +x2 +···+xn
is the most likely.
√ n
b. Stirling’s approximation is that, as n → ∞, n! ≈ 2πnn+(1/2)e−n, and thus
(
( )∕ √ ) √
n! 2nπ e 2πnn+(1/2)e−nen
n n = n√ n = √ = 1.
n e n 2nπ nn 2nπ
c. Since we are drawing with replacement from the set {x1, . . . , xn}, the probability of choosing
any xi is1n.Therefore the probability of obtaining an ordered sample of size n without x i
Toprovethatlim →∞ 1−n n
is (1 −1n)n. n ( ) = e−1, calculate the limit of the log. That is
( ) ( )
1 log 1− 1 n
lim n log 1− = lim
n→∞ n n→∞ 1/n
L’Hôpital’s rule shows that the limit is −1, establishing the result. See also Lemma 2.3.14.
1.32 This is most easily seen by doing each possibility. Let P (i) = probability that the candidate
hired on the ith trial is best. Then
1
1 1 ,P(N) = 1.
P (1) = , P (2) = ,... ,P(i) = N − i + 1 , . . .
N N−1
1.33 Using Bayes rule
P (CB|M )P (M ) .05 ×1
2
P (M |CB) = P (CB|M )P (M ) + P (CB|F )P (F ) = .0025×2 .9524.
.05 ×12+ =
1.34 a.
P (Brown Hair)
= P(Brown Hair|Litter 1)P(Litter 1) + P(Brown Hair|Litter 2)P(Litter 2)
( ) ( )
= 2)(1 3)(1 19
3 2 + = .
5 2 30
b. Use Bayes Theorem
(
2 )(1)
P (Litter 1|Brown Hair) = P (BH|L1)P (L1) 3 2 10
= 19 = .
P (BH|L1)P (L1) + P (BH|L2)P (L2 30 19
1.35 Clearly P (·|B) ≥ 0, and P (S|B) = 1. If A
∞ 1 , A2, . . . are disjoint, then
( ) ⋃∞
⋃ ⋃∞
( (
∞
B) A B))
P = P i=1 Ai ∩ = P i=1 ( i ∩
i=1 Ai B P (B) P (B)
∑ ∑
∞ (A B)
= i=1 P
P (B)i ∩ = P (A
|B).
i
i=1
8|Page
1-8 Solutions Manual for Statistical Inference
P (W)
= P(W|A)P(A) + P(W|B)P(B) + P(W|C)P(C)
) ) )
( ( (
= γ 1 1 1 +1
3 +0 3 +1 3 = γ
3
Thus, P (A|W) =P(A∩W) = γ
/3
P (W)
(γ+1)/3 here,
γ+1 w
γ 1
= 3 if γ =12
γ+1
γ 1
< 3 if γ <1
γ+1
γ 1 2
γ+ > 3 if γ >12.
1
P (A ∩ B) (A)
P (A|B) = =P = P(A)
P (B) 1
b. A ⊂ B implies A ∩ B = A. Thus,
P (A ∩ B) (A)
P (B|A) = =P = 1.
P (A) P (A)
And also,
P (A|B) = P (A ∩ B) (A)
=P .
P (B) P (B)
c. If A and B are mutually exclusive, then P (A ∪ B) = P (A) + P (B) and A ∩ (A ∪ B) = A.
Thus,
P (A|A ∪ B) = P (A ∩ (A ∪ B))
= P(A)
P (A ∪ B) P (A) + P (B).
9|Page
Second Edition 1-9
1.41 a.
b. By a similar calculation as the one in (a) P (dot sent|dot rec) = 27/434. Then we have
P( dash sent|dot rec) =16 Given that dot-dot was received, the distribution of the four
43 .
possibilities of what was sent are
Event Probability
dash-dash (16/43)2
dash-dot (16/43)(27/43)
dot-dash (27/43)(16/43)
dot-dot (27/43)2
1.43 a. For Boole’s Inequality,
n n
∑ P (A ∑
P (∪ i=1)≤
n
) − P2 + P3 + · · · ± P n ≤
i P (Ai)
i=1 i=1
since Pi ≥ Pj if i ≤ j and therefore the terms −P2k + P2k+1 ≤ 0 for k = 1, . . . ,n−1 when
2
n is odd. When n is even the last term to consider is −P n ≤ 0. For Bonferroni’s Inequality
apply the inclusion-exclusion identity to the A ci, and use the argument leading to (1.2.10).
b. We illustrate the proof that the Pi are increasing by showing that P2 ≥ P3. The other
arguments are similar. Write
n−1 n
∑ ∑ ∑
P2 = P (Ai ∩ Aj ) =
1≤i<j≤n i=1 j=i+1 P (Ai ∩ Aj )
n−1 n n
[ ]
∑ ∑ ∑
= P (Ai ∩ Aj ∩ Ak ) + P (Ai ∩ Aj ∩ (∪k Ak )c)
i=1 j=i+1 k=1
∑ ∑ ∑ ∑
≥ P (Ai ∩ Aj ∩ Ak ) = P (Ai ∩ Aj ∩ Ak ) = P3.
i=1 j=i+1 k=j+1 1≤i<j<k≤n
The sequence of bounds is improving because the bounds P 1, P1 −P2 +P3, P1 −P2 +P3 −P4 +
P5,..., are getting smaller since Pi ≥ Pj if i ≤ j and therefore the terms −P 2k + P2k+1 ≤ 0. The
lower bounds P1 − P2, P1 − P2 + P3 − P4, P1 − P2 + P3 − P4 + P5 − P6, . . ., are getting bigger
since Pi ≥ Pj if i ≤ j and therefore the terms P2k+1 − P2k ≥ 0.
10 | P a g e
1-10 Solutions Manual for Statistical Inference
c. If all of the Ai are equal, all of the probabilities in the inclusion-exclusion identity are the
same. Thus ) )
( (
P1 = nP(A), P2 = n n
2 P (A), . . . ,Pj = j P (A),
and the sequence of upper bounds on P (∪iAi) = P (A) becomes
[ ) )]
( (
P1 = nP(A), P1 − P2 + P3 = n− n n
2 + 3 P (A), . . .
which eventually sum to one, so the last bound is exact. For the lower bounds we get
[ ( )] [ ( ) ( ) ( )]
P1 − P2 = n− n n n n
P (A), P1 − P2 + P3 − P4 = n− + − P (A), . . .
2 2 3 4
which start out negative, then become positive, with the last one equaling P (A) (see Schwa-
ger 1984 for details).
∑ (20)(1)k (3)n−k
1.44 P (at least 10 correct|guessing) = 20
= .01386.
k=10 k 4 4
1.45 X is finite. Therefore B is the set of all subsets of X . We must verify each of the three properties
in Definition 1.2.4. (1) If A ∈ B then PX (A) = P (∪xi∈A{sj ∈ S : X(sj) = xi}) ≥ 0 since P
is a probability function. (2) PX (X ) = P (∪mi=1{sj ∈ S : X(sj) = xi}) = P(S) = 1. (3) If
A1,A2,... ∈ B and pairwise disjoint then
∞
⋃
∞
PX(∪ k=1Ak ) = P( {∪xi∈Ak {sj ∈S:X(sj )=xi}})
∞ k=1 ∞
∑ = ∑ P
= P (∪xi∈Ak {sj ∈S:X(sj )=xi}) X (Ak),
k=1 k=1
where the second inequality follows from the fact the P is a probability function.
1.46 This is similar to Exercise 1.20. There are 77 equally likely sample points. The possible values of
X3 are 0, 1 and 2. Only the pattern 331 (3 balls in one cell, 3 balls in another cell and 1 ball in
a(7)(7)(4)
third cell) yields X3 = 2. The number of sample points with this pattern is 2 3 3
5 = 14,700.
So P (X3 = 2) = 14,700/77 ≈ .0178. There are 4 patterns that yield X 3 = 1. The number of
sample points that give each of these patterns is given below.
pattern number of sample points
(7) 2 2
34 7 6 = 1,470
(3)(6)(4)(2)
7
322 7 = 22,050
(3) (4)(5)2
7
3211 7 6 2! = 176,400
(3)(6) 2
31111 7 73 4 4! = 88,200
288,120
So P (X3 = 1) = 288,120/77 ≈ .3498.−π The number of sample points that yield X3 = 0 is
77 − 288,120 − 14,700 = 520,723, and P(X3 = 0) = 520,723/77 ≈ .6322.
1.47 All of the functions are continuous, hence right-continuous. Thus we only need to check the
limit, and that they are nondecreasing
a. limx→−∞1(1 2 ) an−1 x)=2 =1,and
d = 0, limx→∞ 1 2 +π t ( +π (2)
π an−1 x))
dx2+ t ( n−11+x2=> 0, so(= F (x) is increasing.
b. See Example 1.5.5.
c. limx→−∞ e−e−x = 0, limx→∞ e−e−x = 1,ddx e−e−x = e−xe−e−x > 0.
d. limx→−∞(1 − e−x) = 0, limx→∞(1 − e−x) = 1,ddx (1−e−x)=e−x > 0.
11 | P a g e
1-11
Second Edition
1−ϵ 1−ϵ 1−ϵ 1−ϵ
e. limy→−∞ 1+e−y = 0, limy→∞ ϵ + 1+e−y = 1, d dx (1+e−y ) =
(
1−ϵ)e−y
(1+e−y )2 > 0 andddx (ϵ+ 1+e−y ) >
) = F(0). Thus is FY (y) right
1−ϵ
0, FY (y) is continuous except on y = 0 where lim y↓0(ϵ + 1+e −y
continuous.
1.48 If F (·) is a cdf, F (x) = P (X ≤ x). Hence limx→∞ P (X ≤ x) = 0 and limx→−∞ P (X ≤ x) = 1.
F (x) is nondecreasing since the set {x : X ≤ x} is nondecreasing in x. Lastly, as x ↓ x0,
P (X ≤ x) → P (X ≤ x0), so F (·) is right-continuous. (This is merely a consequence of defining
F (x) with “ ≤ ”.)
1.49 For every t, FX (t) ≤ FY (t). Thus we have
P (X > t) = 1 − P (X ≤ t) = 1 − F X (t) ≥ 1 − FY (t) = 1 − P (Y ≤ t) = P (Y > t).
And for some t∗, FX (t∗) < FY (t∗). Then we have that
12 | P a g e
1-12 Solutions Manual for Statistical Inference
∫π/2
1.54 a. 0
sin xdx = 1. Thus, c = 1/1 = 1.
∫∞ ∫0 x+
b. −∞ e−|x|dx = −∞ exd 0 e−xdx = 1 + 1 = 2. Thus, c = 1/2.
1.55
∫3
P (V ≤ 5) = P (T < 3) = 1
e t=1−e−2
0 1.5 −t/1.5 d .
For v ≥ 6,
( ) ∫ v
2
P (V ≤ v) = P (2T ≤ v) = P T ≤ v 1
2 = e−t/1.5 t=1−e−v/3
0 1.5 d .
Therefore, {
0 −∞ < v < 0,
P (V ≤ v) = 1−e−2 0≤v<6,
1−e−v/3 6≤v
13 | P a g e