[go: up one dir, main page]

0% found this document useful (0 votes)
66 views9 pages

Anne Bio-Stat Assignment Edited

Uploaded by

Tsion Moges
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views9 pages

Anne Bio-Stat Assignment Edited

Uploaded by

Tsion Moges
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

1 Classify each of the following variable first as qualitative or quantitative.

If you say

quantitative classifies discrete or continuous. Next, classify them as nominal, ordinal,

interval or ratio level of data measurement.

i. Classification of students by region of birth..........qualitative, Nominal

ii. Your ranking preference for four soft drinks..........Qualitative, Ordinal

iii. Temperature measured in 0c......................quantitatives, Continous

iv. Number of children in a family....................quantitative, Discrete

v. Socioeconomic status of a family when classified as low,

middle and upper.............qualitative,ordinal

2. 60 students from a certain school took part in a survey to discover how many hours

aweek they spent on studying.The data are recorded below:

18 16 22 17 28 31 9 19 8 5 34 26 22 29 37 20 45 7 19 15 38 15 20 10

26 29 24 41 36 20 28 15 9 6 42 32 27 6 28 16 18 28 12 8 33 31 15 38

21 27 31 15 28 21 18 22 39 11 28 18

Class Frequency (fi) Comulitative Mi Mi.fi


frequency
5-10 9 9 7.5 67.5
11-16 19 18 13.5 121.5
17-22 15 33 19.5 292.5
23-28 11 44 25.5 280.5
29-34 8 52 31.5 252
35-40 5 57 37.5 187.5
41-46 3 60 43.5 130.5
60 1332

A. Calculate the mean and median of the data

Mean= 1357/60= 22.61

Median= 21+22/2= 21.5

17, 18, 18, 18, 18, 19, 19, 20, 20, 20, 21, 21, 22, 22, 23

B. Find the group frequency distribution of the data using the intervals

Class Frequency (fi) Comulitative Mi Mi.fi


frequency

5-10 9 9 7.5 67.5

11-16 19 18 13.5 121.5

17-22 15 33 19.5 292.5

23-28 11 44 25.5 280.5

29-34 8 52 31.5 252

35-40 5 57 37.5 187.5

41-46 3 60 43.5 130.5

60 1332

C. Estimate the mean and median based on the grouped data.

Mean= sum( Mi.fi)/sum(fi)

1332/60=22.2

First by n/2 we find the median class 60/3=30 so the 3rd class is the median class

Median=Lm+[n/2-fc/fm]*W
Median=16.5+[30-18/15]*6

Median=16.5+72/15

median= 21.3
D. Compare the answers of parts (a) and (c). How close are the estimates of the mean and
median based on the grouped data to the true mean and median?
They are differenet only in their 10th place

3. If a variable takes the values: x+4, x-7, x-5, x-3, x-2, x+1, x-1, and x+6, where x is any

real number, then find the median.

When arranged in assending

Median= When arranged in assending orders

X-7, X-5, X-3, X-2, X-1, X+1, X+4, X+6

The Median.........X-2 + X-1/2 = 2x-3/2 = X-3/2


4. An examiner decided to use a midterm test, a final exam and a laboratory practical. The

practical accounts two times the midterm test and the final exam accounts three times the

midterm test. A student obtained marks 72 on the midterm test, 66 on practical and 58 on

the final exam. Calculate the average mark of the student

Mid term test= M M+F+l=100

Final Exam= F M+2M+3M=100

Laboratory practical=L M=100/6=16.7

L=2m and F=3m F=50 and L=33.34

M=72 M=12

L=66 L=22

F=58 F=29 average mark=M+L+F=63

5. Two groups of students were trained to perform a certain task and tested to find out which group is
faster to learn the task. For the two groups the following information was given:
( 12× 25 ) +(8 × 15) 420
A. Combined mean ¿ = =21.
12+8 20
SD 4
B. Group I : Coefficient of variation CV = ×100= ×100=16.
X 25
SD 3
Group II : Coefficient of variation CV = ×100= ×100=20.
X 15
Students in Group I are faster in performing the task than students in Group II.
X− X 28−25
C. Student”A” in Group I : Z-score Z¿ = =0.75 .
SD 4
X−X 17−15
Student “B” in Group II : Z-score Z= = =0.67 .
SD 3
Student “B” is faster in performing the task than student “A”.

6. The frequency distribution of the score (out of 15) in Basic Biostatistics test of a nursing students is
presented in the following table. The average score of the class is 8.

Score (class MP(mi) Frequency(fi) mi fi Class


interval) boundaries
1-3 2 1 2 0.5-3.5
4-6 5 3 15 3.5-6.5
7-9 8 5 40 6.5-9.5
10-12 11 A 11A 9.5-12.5
13-15 14 2 28 12.5-15.5
sum 11+A 85+11A




mi f i
85+ 11 A
Mean ¿ ❑
= =8. From this we have A = 1. The missing frequency is 1.
11+ A


fi

7. The following table shows the results of a screening test evaluation in which a random
sample of 650 subjects with the disease and an independent sample of 2550 subjects
without the disease participated.

Test results Disease


posetive Negative
Posetive 538 1125
Negative 112 1425

a) What is the probability that the test result is positive given that a randomly chosen person
has the disease?

Let A= positive present C= positive absent

B= negative present D=negative absent


Probability of positive present = P(A)= A/A+B = 538/650 = 0.83

b) What is the probability that the test result is negative given that a randomly chosen
person does not have the disease?

P(O )= B/C+B= 1425/2550= 0.56


c) What is the probability that a randomly chosen person has the disease given that the test
result is positive?

P(posetive) = 538/538+1125

=0.32

8. The mean and variance of the number of students who failed a certain exam are 80 and 40,
respectively. Find the probability of passing the exam and number of trials.

9. The average length of stay in a hospital after an open-heart surgery is believed to be about 10 days. A
random sample of 10 patients who has an open-heart surgery at a certain hospital showed the number of
days spent by each patient in the hospital as:

10, 8, 7, 12, 16, 9, 8, 10, 12, 13.


A. The mean for the sample is
10+8+7+ 12+ 16+9+8+ 10+12+13 105
x= = =10.5 .
10 10
The variance, 2, is

( 10−10.5 )2+ ( 8−10.5 )2 + ( 7−10.5 )2 +…+ ( 13−10.5 )2


σ 2 ( x )=
10−1

0.25+6.25+12.25+ …+6.25 68.5


¿ = =7.61
9 9

Therefore, a 95% C.I. for the population mean is

μ=10.5 ± 1.96
( √√7.61
10 )
=10.5 ± 1.709=(8.791 , 12.209)

We are 95% confident that the average number of days spent after an open-heart surgery, for
the population from which these sample was drawn , lies between 8.791 and 12.209 days.
B. Hypotheses: H 0 : μ=10 , H A : μ ≠10
x−μ 10.5−10 0.5
Z cal= = = =1.79
σ 2.76 0.28
√n √ 10
This score falls inside the “fail to reject region” from –1.96 to +1.96. If the calculated Z value
is positive, the rule says:
Reject H 0 if Z calculated ( Z cal) > Z tabulated ( Z tab) or Accept H 0 if Z cal < Z tab.\
On the other hand, if the calculated Z value is negative:
Reject H 0 if Z calculated ( Z cal) < Z tabulated ( Z tab) .
Here, both Z calculated and Z tabulated are positive. Hence, the null hypothesis is accepted.

10. A survey was conducted on a random sample of 1,000 rural residents. Residents were asked whether
they have health insurance. 650 individuals surveyed said they do have health insurance, and 350 said they
do not have health insurance. Compute the 95% CI for the population proportion of rural residents with
health insurance.

Sample size n=1,000,


Individuals having health insurance is 650. This shows ^p=0.65 of the sample has a health
insurance.
Individuals do not have health insurance is 350. This shows q^ =0.35 of the sample has no health
insurance.
A 95% CI for the population proportion of rural residents with health insurance is given by

^p ±1.96
√ p^ q^
n √
=0.65 ± 1.96
(0.65)( 0.35)
1000
=0.65± 0.03=(0.62, 0.68)

11. If the total cholesterol values for a certain population are approximately normally distributed with a
mean of 200 mg/100 ml and a standard deviation of 20 mg/100 ml, find the probability that an individual
picked at random from this population will have a cholesterol level

A. Z -score corresponding to 180mg/100ml is


X−μ 180 mg/100 ml−200 mg/100 ml
Z= = =−1 ,
σ 20 mg/100 ml

Proportion below this cholesterol level is P(Z < -1) = P(Z > 1) = 0.1587.

Z -score corresponding to 200mg/100ml is


X−μ 200 mg/100 ml−200 mg/100 ml
Z= = =0 ,
σ 20 mg /100 ml

Proportion above this cholesterol level is P(Z > 0) = 0.5000.

 Proportion of cholesterol level between 180∧200 mg/100 ml

P(180 mg/100 ml< Z <200 mg/100 ml)

= 1 – proportion below 180 mg/100 ml – proportion above 200 mg/100 ml

= 1−0.1587−¿ 0.5000 = 0.3413


( 225
B. P X >
mg
100 ml )=P ( Z >
225−200
20 )=P ( Z >1.25) =0.1056 .
P( X <
100 ml ) ( 20 )=P ( Z ←2.5)=P ( Z> 2.5)=0.0062 .
150 mg 150−200
C. =P Z<

12. Suppose it is known that 10% of a certain population is HIV positive. If a random sample of 4 is
drawn from this population, what is the probability of at least one of them being HIV positive? How many
individuals are you expected to be infected?

Solution: If the probability that any individual in the population is HIV positive to be P =0.10,
then the probability that at least one r = 1 being HIV positive out of n = 4 subjects selected is:

P ( X=1 )= ( 41) ( 0.10) (1−0.10)


1 4−1
=4 ×0.10 × 0.729=0.29 .

The probability of obtaining at least one being HIV positive in the sample is 0.29.
The expected number of individuals to be infected is
μ=n× p=4 ×0.10=0.40 .

13. A group of researchers wants to assess prevalence of a certain disease in a given community of size
9000. And the researchers decide to take a representative sample from the community. Based on previous
literatures the prevalence in the area is 50% and the researchers want to be precise with 5% margin of
error. Assuming that the samples will be selected using simple random sampling, calculate the final
sample size?

Solution: Population size N=9000, Margin of error d=5 %=0.05,


Proportion p=50 %=0.5, for a 95 % CI Z=1.96. The required (minimum) sample size for a
very large population is given by:
2 2
Z p(1− p) ( 1.96 ) ( 0.5)(0.5)
n 0= 2
= 2
=384.16 ≈ 384 .
d (0.05)
For a population N=9000, the required sample size would be
n0 384.16 384.16
n= = = =368.322 ≈368.
n0 384.16 1.043
1+ 1+
N 9000

14. A researcher wishes to determine whether there is a relationship b/n the gender of an individual and
the amount of alcohol consumed. A sample of 68 people is selected, and the following data are obtained.
Identify whether there is an association b/n gender of an individual and the amount of alcohol consumed?

Gender Alcohol Consumption Total


Low Moderate High
Male 10 9 8 27
Female 13 16 12 41
Total 23 25 20 68

Expected frequencies
27 ×23 27 ×25
e11 (row 1 column 1) = =9.13 ; e12 (row 1 column 2) = =9.9 3 ;
68 68
27 ×20 41 ×23
e13 (row 1 column 3) = =7.94 ; e21 (row 2 column 1) = =13.87 ;
68 68
41 ×25 41 ×20
e22 (row 2 column 2) = =15.07 ; e23 (row 2 column 3) = =12 . 06.
68 68

Gender Alcohol Consumption Total


Low Moderate High
Male 9.13 9.93 7.94 27
Female 13.87 15.07 12.06 41
Total 23 25 20 68

Hypothesis: H 0: There is an association between gender and alcohol consumption,


H A : The variables are independent (not related)
The degrees of freedom (df) in a contingency table with R rows and C columns is:
df =( R−1 ) ( C−1 )=( 2−1 )( 3−1 )=2
2
Hence, ❑ tab with f =2 , at 0.01 level of significance = 9.21.
2 2 2
(10−9.13) (9−9.9 3)
2 (12−12.06)
❑ cal= + + …+
9.13 9.9 3 12.06
¿ 0.083+ 0.087+0.0005+0.055+ 0.057+0.0003=0.283.
2 2
This implies that ❑ cal <❑ tab and hence H 0accepted, that is, there is an association between
gender of an individual and the amount of alcohol consumed.

15. Correlation coefficient r is given by





xy−n x y
.
r=
( n−1 ) SD ( x ) SD ( y )




xy =x 1 y 1 + x 2 y 2 +…+ x 11 y 11
¿ 1 ( 8.1 ) +1.1 ( 7.8 )+ …+2 ( 10.5 )=152.59
x 1+ x 2 +…+ x 11 1+1.1+…+2 16.5
x= = = =1.5 ,
11 11 11
y 1 + y 2 +…+ y 11 8.1+7.8+ …+10.5 100.4
y= = = =9.13
11 11 11



xy −n x y =152.59−11 ( 1.5 ) ( 9.13 ) =152.59−150.645=1.945(¿)

Standard deviation of x , SD(x) , is the square root of variance, S2 (x ):


2 2 2 2
2 ( x 1−x ) + ( x 2−x ) + ( x 3−x ) +…+ ( x 11−x )
S ( x )=
n−1

0.25+0.16+ …+0.25 0.55


¿ = =0.11
10 5

Therefore, SD ( x )= √ 0.11=0.332.

Standard deviation of y , SD( y ), is the square root of variance, S2 ( y ):

2 2 2 2
2 ( y 1− y ) + ( y 2 − y ) + ( y 3− y ) +…+ ( y 11− y )
S ( y )=
n−1

1.016+1.769+0.397+ …+1.877 7.203


¿ = =0.7203
10 10

Therefore, SD ( y )=√ 0.7203=0.849 .

( n−1 ) SD ( x ) SD ( y )=( 11−1 )( 0.332 )( 0.849 )=2.819 ¿

From (*) and (**), we have





xy−n x y
1.945
r= = =0.699 .
( n−1 ) SD ( x ) SD( y ) 2.819

The correlation coefficient of 0.699 indicates a strong positive correlation between the amount of
converted sugar and the temperature needed.

You might also like