SM_CMT 05101 Epidemiology and Biostatistics
SM_CMT 05101 Epidemiology and Biostatistics
SM_CMT 05101 Epidemiology and Biostatistics
CMT 05101
Epidemiology and
Biostatistics
NTA Level 5 Semester 1
Student Manual
August 2010
Copyright © Ministry of Health and Social Welfare – Tanzania 2010
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
ii
Table of Contents
Module Sessions
Session 1: Introduction to Biostatistics and Methods for Qualitative Data...............1
Session 2: Descriptive Methods for Quantitative Data............................................13
Session 3: The Normal Distribution ........................................................................25
Session 4: Sampling Techniques .............................................................................33
Session 5: Estimation of Mean and Proportion .......................................................45
Session 6: Significance Tests of One Sample ..........................................................53
Session 7: Chi-Square (χ 2) Test .............................................................................61
Session 8: Source and Uses of Morbidity and Mortality Statistics .........................71
Session 9: Introduction to Epidemiology.................................................................79
Session 10: Ecology and Epidemiological Approach to Causation ........................89
Session 11: Natural History and Levels of Prevention of Diseases ........................99
Session 12: Introduction to Epidemiological Methods/Studies .............................109
Session 13: Case- Control Studies .........................................................................123
Session 14: Cohort Studies ....................................................................................131
Session 15: Testing and Screening of a Disease ....................................................141
Session 16: Control of Epidemics ..........................................................................151
Session 17: Integrated Disease Surveillance and Response ..................................159
Session 18: Planning for Disease Prevention and Control ....................................169
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
iii
Background and Acknowledgement
In April 2009, a planning meeting was held at Kibaha which was followed up by a Task
Force Committee meeting in June 2009 at Dodoma and developed a proposal which guided
the process of the development of standardised Clinical Assistant (CA) and Clinical Officer
(CO) training materials which were based on CA/CO curricula. The purpose of this process
was to standardize the entire curriculum with up-to-date content which would then be
provided to all Clinical Assistant and Clinical Officer Training Centres (CATCs/COTCs).
The perceived benefit was that, by standardizing the quality of content and integrating
interactive teaching methodologies, students would be able to learn more effectively and that
the assessment of students’ learning would have more uniformity and validity across all
schools.
The new training package for CA/CO cadres includes a Facilitator Guide, Student Manual
and Practicum. There are 40 modules with approximately 600 content sessions. This product
is a result of a lengthy collaborative process, with significant input from key stakeholders and
experts of different organizations and institutions, from within and outside the country.
The MOHSW would like to thank all those involved during the process for their valuable
contribution to the development of these materials for CA /CO cadres. We would first like to
thank the U.S. Centers for Disease Control and Prevention’s Global AIDS Program
(CDC/GAP) Tanzania, and the International Training and Education Center for Health (I-
TECH) for their financial and technical support throughout the process. At CDC/GAP, we
would like to thank Ms. Suzzane McQueen and Ms. Angela Makota for their support and
guidance. At I-TECH, we would especially like to acknowledge Ms. Alyson Shumays,
Country Program Manager, Dr. Flavian Magari, Country Director, Mr. Tumaini Charles,
Deputy Country Director, and Ms. Susan Clark, Health Systems Director. The MOHSW
would also like to thank the World Health Organization (WHO) for technical and financial
support in the development process.
Particular thanks are due to those who led this important process: Dr. Bumi L.A.
Mwamasage, the Assistant Director for Allied Health Sciences Training, Dr. Mabula Ndimila
and Mr. Dennis Busuguli, Coordinators of Allied Health Sciences Training, Ministry of
Health and Social Welfare, Dr. Stella Kasindi Mwita, Programme Officer Integrated
Management of Adults and Adolescent Illnesses (IMAI), WHO Tanzania and Stella M.
Mpanda, Pre-service Programme Manager, I-TECH.
Sincere gratitude is expressed to small group facilitators: Dr. Otilia Gowele, Principal, Kilosa
COTC, Dr. Violet Kiango, Tutor, Kibaha COTC, Ms. Stephanie Smith, Ms. Stephanie
Askins, Julie Stein, Ms. Maureen Sarewitz, Mr. Golden Masika, Ms. Kanisia Ignas, Ms.
Yovitha Mrina and Mr. Nicholous Dampu, all of I-TECH, for their tireless efforts in guiding
participants and content experts through the process. A special note of thanks also goes to
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
iv
Dr. Julius Charles and Dr. Moses Bateganya, I-TECH’s Clinical Advisors, and other Clinical
Advisors who provided input. We also thank individual content experts from different
departments of the MOHSW and other governmental and non-governmental organizations,
including EngenderHealth, Jhpiego and AIHA, for their technical guidance.
Special thanks goes to a team of I-TECH staff namely Ms. Lauren Dunnington, Ms.
Stephanie Askins, Ms. Stephanie Smith, Ms Aisling Underwood, Golden Masika, Yovitha
Mrina, Kanisia Ignas, Nicholous Dampu, Michael Stockman and Stella M. Mpanda for
finalising the editing, formatting and compilation of the modules.
Finally, we very much appreciate the contributions of the tutors and content experts
representing the CATCs/COTCs, various hospitals, universities, and other health training
institutions. Their participation in meetings and workshops, and their input in the
development of content for each of the modules have been invaluable. It is the commitment
of these busy clinicians and teachers that has made this product possible.
Tutors
Ms. Magdalena M. Bulegeya – Tutor, Kilosa COTC
Mr. Pius J.Mashimba – Tutor, Kibaha Clinical Officers Training Centre (COTC)
Dr. Naushad Rattansi – Tutor, Kibaha COTC
Dr. Salla Salustian – Principal, Songea CATC
Dr. Kelly Msafiri – Principal, Sumbawanga CATC
Dr. Joseph Mapunda - Tutor, Songea CATC
Dr. Beda B. Hamis – Tutor, Mafinga COTC
Col Dr. Josiah Mekere – Principal, Lugalo Military Medical School
Mr. Charles Kahurananga – Tutor, Kigoma CATC
Dr. Ernest S. Kalimenze – Tutor, Sengerema COTC
Dr. Lucheri Efraim – Tutor, Kilosa COTC
Dr. Kevin Nyakimori – Tutor, Sumbawanga CATC
Mr. John Mpiluka – Tutor, Mvumi COTC
Mr. Gerald N. Mngóngó –Tutor, Kilosa COTC
Dr. Tito M. Shengena –Tutor, Mtwara COTC
Dr. Fadhili Lyimo – Tutor, Kilosa COTC
Dr. James William Nasson– Tutor, Kilosa COTC
Dr. Titus Mlingwa – Tutor, Kigoma CATC
Dr. Rex F. Mwakipiti – Principal, Musoma CATC
Dr. Wilson Kitinya - Principal, Masasi ( Clinical Assistants Training Centre (CATC)
Ms. Johari A. Said – Tutor, Masasi CATC
Dr. Godwin H. Katisa – Tutor, Tanga Assistant Medical Officers Training Centre (AMOTC)
Dr. Lautfred Bond Mtani – Principal, Sengerema COTC
Ms Pamela Henry Meena – Tutor, Kibaha COTC
Dr. Fidelis Amon Ruanda – Tutor, Mbeya AMOTC
Dr. Cosmas C. Chacha – Tutor, Mbeya AMOTC
Dr. Ignatus Mosten – Ag. Principal, Tanga AMOTC
Dr. Muhidini Mbata – Tutor, Mafinga COTC
Dr. Simon Haule – Ag. Principal, Kibaha COTC
Ms. Juliana Lufulenge - Tutor, Kilosa COTC
Dr. Peter Kiula – Tutor, Songea CATC
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
v
Mr. Hassan Msemo – Tutor, Kibaha COTC
Dr. Sangare Antony –Tutor, Mbeya AMOTC
Content Experts
Ms. Emily Nyakiha – Principal, Bugando Nursing School, Mwanza
Mr. Gustav Moyo - Registrar, Tanganyika Nursesand Midwives Council, Ministry of Health
and Social Welfare (MOHSW).
Dr. Kohelet H. Winani - Reproductive and Child Health Services, MOHSW
Mr. Hussein M. Lugendo – Principal, Vector Control Training Centre (VCTC), Muheza
Dr. Elias Massau Kwesi - Public Health Specialist, Head of Unit Health Systems Research
and Survey, MOHSW
Dr. William John Muller - Pathologist, Muhimbili National Hospital (MNH)
Mr. Desire Gaspered - Computer Analyst, Institute of Finance Management (IFM), Dar es
Salaam
Mrs. Husna Rajabu - Health Education Officer, MOHSW
Mr. Zakayo Simon - Registered Nurse and Tutor, Public Health Nursing School (PHNS)
Morogoro
Dr. Ewaldo Vitus Komba - Lecturer, Department of Internal Medicine, Muhimbili University
of Health and Allied Sciences School (MUHAS)
Mrs. Asteria L.M. Ndomba - Assistant Lecturer, School of Nursing, MUHAS
Mrs. Zebina Msumi - Training Officer, Extended programme on Immunization (EPI),
MOHSW
Mr. Lister E. Matonya - Health Officer, School of Environmental Health Sciences (SEHS),
Ngudu, Mwanza.
Dr. Joyceline Kaganda - Nutritionist, Tanzania Food and Nutrition Centre (TFNC),
MOHSW.
Dr. Suleiman C. Mtani - Obstetrician and Gynecologist, Director, Mwananyamala Hospital,
Dar es salaam
Mr. Brown D. Karanja - Pharmacist, Lugalo Military Hospital
Mr. Muhsin Idd Nyanyam - Tutor, Primary Health Care Institute (PHCI), Iringa
Dr. Judith Mwende - Ophthalmologist, MNH
Dr. Paul Marealle - Orthopaedic and Traumatic Surgeon, Muhimbili Orthopedic Institute
(MOI),
Dr. Erasmus Mndeme - Psychiatrist, Mirembe Refferal Hospital
Mrs. Bridget Shirima - Nurse Tutor (Midwifery), Kilimanjoro Chrician Medical Centre
(KCMC)
Dr. Angelo Nyamtema - Tutor Tanzania Training Centre for International Health (TTCIH),
Ifakara.
Ms. Vumilia B. E. Mmari - Nurse Tutor (Reproductive Health) MNH-School of Nursing
Dr. David Kihwele - Obs/Gynae Specialist, and Consultant
Dr. Amos Mwakigonja – Pathologist and Lecturer, Department of Morbid Anatomy and
Histopathology, MUHAS
Mr. Claud J. Kumalija - Statistician and Head, Health Management Information System
(HMIS), MOHSW
Ms. Eva Muro, Lecturer and Pharmacist, Head Pharmacy Department, KCMC
Dr. Ibrahim Maduhu - Paediatrician, EPI/MOHSW
Dr. Merida Makia - Lecturer Head, Department of Surgery, MNH
Dr. Gabriel S. Mhidze - ENT Surgeon, Lugalo Military Hospital
Dr. Sira Owibingire - Lecturer, Dental School, MUHAS
Mr. Issai Seng’enge - Lecturer (Health Promotion), University of Dar es Salaam (UDSM)
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
vi
Prof. Charles Kihamia - Professor, Parasitology and Entomology, MUHAS
Mr. Benard Konga - Economist, MOSHW
Dr. Martha Kisanga - Field Officer Manager, Engender Health, Dar es Salaam
Dr. Omary Salehe - Consultant Physician, Mbeya Referral Hospital
Ms Yasinta Kisisiwe - Principal Nursing Officer, Health Education Unit (HEU), MOHSW
Dr. Levina Msuya - Paediatrician and Principal, Assistant Medical Officers Training Centre
(AMOTC), Kilimanjaro Christian Medical Centre (KCMC)
Dr. Mohamed Ali - Epidemiologist, MOHSW
Mr. Fikiri Mazige - Tutor, PHCI-Iringa
Mr. Salum Ramadhani - Lecturer, Institute of Finance Management
Ms. Grace Chuwa - Regional RCH Coordinator, Coastal Region
Mr. Shija Ganai - Health Education Officer, Regional Hospital, Kigoma
Dr. Emmanuel Suluba - Assistant Lecturer, Anatomy and Histology Department, MUHAS
Mr. Mdoe Ibrahim - Tutor, KCMC Health Records Technician Training Centre
Mr. Sunny Kiluvia - Health Communication Consultant, Dar es Salaam
Dr. Nkundwe Gallen Mwakyusa - Ophthalmologist, MOHSW
Dr. Nicodemus Ezekiel Mgalula -Dentist, Principal Dental Training School, Tanga
Mrs. Violet Peter Msolwa - Registered Nurse Midwife, Programme Officer, National AIDS
Control Programme (NACP), MOHSW
Dr. Wilbert Bunini Manyilizu - Lecturer, Mzumbe University, Morogoro
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
vii
IT support
Mr. Isaac Urio - IT Consultant, I-TECH
Mr. Michael Fumbuka - Computer Systems Administrator – Institute of Finance and
Management (IFM), Dar es Salaam
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
viii
Introduction
Module Overview
This module content has been prepared to enhance learning of students of Clinical Assistant
(CA) and Clinical Officer (CO) schools.. The session contents are based on the sub-enabling
outcomes of the curricula of CA and CO. The module sub-enabling outcomes are as follows:
6.1.1 Differentiate determinants of health and diseases of public health importance
atistics
6.1.2 Apply epidemiological methods in assessing distribution of health and diseases
6.1.3 Describe disease causation, prevention and control
6.1.4 Utilise concept of epidemic control measures and disaster preparedness
6.1.5 Utilise tools for gathering epidemiological data
6.2.1 Describe different types and sources of health information and biost
6.2.2 Utilise different methods of data collection
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
ix
guidance and to respond to all difficulty encountered by students. One module will be
assigned to 5 students and it is the responsibility of the tutor to do this assignment for easy
use and accessibility of the student manuals to students.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
x
Abbreviations
AIDS Acquired Immunodeficiency Syndrome
ALu Artemether Lumefantrine
AMREF African Medical Research Foundation
AR Attributable Risk
ASFR Age-Specific Fertility Rate
CBOs Community Based Organizations
CBR Crude Birth Rate
CDC Centers for Disease Control and Prevention
CFR Case Fatality Rate
CHD Coronary Heart Disease
CHMT Council Health Management Team
COTC Clinical Officers Training Centre
DALYs Disability Adjusted Life Years
df Degrees of Freedom
DL Distance Learning
EPI Expanded Program on Immunization
FELTP Field Epidemiology and Laboratory Training Programmes
GFR General Fertility Rate
GRR Gross Reproductive Rate
H1N1 Haemophilus Influenza type N 1
HIV Human Immunodeficiency Virus
IDS Integrated Diseases Surveillance
IDSR Integrated Diseases Surveillance and Response
IMCI Integrated Management of Childhood Illness
KAP Knowledge, Attitudes and Practices
MUCHS Muhimbili University College of Health Sciences
NNT Neonatal Tetanus
OC Oral Contraceptives
OR Odds Ratio
PTB Pulmonary Tuberculosis
PYLL Person Years of Life Lost
RD Risk Difference
RR Relative Risk
SBP Systolic Blood Pressure
SND Standard Normal Deviation
STIs Sexually Transmitted Infection
TDHS Tanzania Demographic and health survey
TFR Total Fertility Rate
TPHA Treponema Pallidum haemaglutination Assay
USA United States of America
VDRL Venereal Disease Research Laboratory
WHO World Health Organization
WHO/AFRO World Health Organization/African Region
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
xi
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
xii
Session 1: Introduction to Biostatistics and
Methods for Qualitative Data
Learning Objectives
By the end of this session, students are expected to be able to:
• Define terms used in biostatics
• Explain the need for studying biostatistics in medical science
• List applications of biostatics
• Explain descriptive statistics
• Describe descriptive methods for qualitative data
Introduction to Biostatistics
• Biostatics can be defined as the application of statistics to biological problems.
• To many biomedical scientists, the term is considered to mean the application of statistics
specifically to medical problems.
• For this group of people, therefore, biostatics and medical statistics are synonymous.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 1: Introduction to Biostatistics and Methods for Qualitative Data 1
o How much of a drug (e.g. Artemether Lumefantrine [ALu]) is distributed to health
units, hospitals, health centers, dispensaries, etc.
• Secondly, statistics as a discipline is a field of study concerned in broad terms with:
o Collecting, organizing and summarizing data in a systematic way
o Drawing of inferences about a population on the basis of only a part of the population
targeted
Note: When referring to the discipline of statistics, the singular form of the word is not used
and has no meaning. For example using the words ‘mathematic’ or ‘physic’ have no meaning
as singulars when referring to the disciplines of mathematics or physics.
• This course is mainly concerned with the second sense of the meaning of statistics, which
is statistics as a discipline.
• The introductory portion of the study of statistics is usually referred to as descriptive
statistics, and the second part is referred to as inferential statistics, which provides
objectives and means for drawing conclusions.
• The kind of biostatistics referred to in this course will be that of medical statistics.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 1: Introduction to Biostatistics and Methods for Qualitative Data 2
Figure 1: Results of Comparison Between Two Treatments
Treatment Outcome
Improved Did not Total % improved
improve
Standard 80 120 200 40
New 100 100 200 50
Total 180 220 400 45
• With these results one may be tempted to conclude that the new treatment is better than
the old (standard)
• An analysis that looks at the results for male patients separately from the female patients
revealed the following:
• From this table, we note that for female patients, the standard treatment shows more
improvement.
• This is exactly the opposite of what we saw in the overall assessment, and one might
expect the new treatment to fair better among the male patients.
• If this holds the conclusion is to be:
o Female patients show more of an improved outcome with the standard treatment,
while male patients show more improvement with the new treatment.
o In practical terms, the decision following this controversial conclusion would be
understandable.
• When we look at the results relating to the male patients we see the following:
• Figure 3 above shows that just as in female patients, the standard treatment is also more
effective in male patients.
• Calculations should be checked and verified for the overall rate of improvement of the
standard treatment, for example is (32 + 48) / (40+160) = 40% as shown above.
• With a proper statistical method of analysis, it becomes clear that the difference in
improvement between the two treatments when gender has been taken into account is
20% in favour of the standard treatment.
• Such features are common in medical surveys and are typical aspect of observational
studies.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 1: Introduction to Biostatistics and Methods for Qualitative Data 3
• In an experimental study, the situation would have been controlled.
• These arguments emphasize the need for biostatistical methods for both data analysis and
study designs.
Definition of a Variable
• Variable: A term for a characteristic that is different in different members of a population
or sample, such as height.
o This measurement is not constant, so therefore it is variable.
o Variables can be qualitative or quantitative, continuous or discrete.
o Random variables cannot be predicted and are the most useful for statistical purposes.
Types of Variables
• There two types of variables:
o Qualitative (categorical) variables
o Quantitative (numerical) variables
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 1: Introduction to Biostatistics and Methods for Qualitative Data 4
Quantitative (Numerical) Variables
• Quantitative variables take numerical values, for example:
o Age (years): 10, 19, 45, 60
o Height (cms):140, 50.6, 200
o Parity: 0, 1, 2, 3, 4, 5, 6, 10
o Hemoglobin (g/dl): 16.3, 8.9, 12.7
• Quantitative variables are of two types: continuous and discrete
• Continuous variables take any value within meaningful extremes, for example:
o Height (cm): 159, 25, 160.35
o Weight (kg): 71.12, 80.56
o Exact age like 21 yrs 6 months and 4 days
• Discrete variables take only fixed values, in most cases whole numbers, for example:
o Parity: 0, 1, 2, 3, 4, 5, 6, 10
o Age last birth day: 5, 19, 45, 90
o Counts: 1, 2, 3, 4, 5, 9
o Number of AIDS cases: 100, 10000, 34278
Levels of Measurement
• Variables are measured on different levels/scales
• The term ‘measurement’ is used here in a broad sense
• These are nominal, ordinal, and ratio measurements
Refer Handout 1.1: Differences Between Nominal, Ordinal, Interval and Ratio
Measurements
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 1: Introduction to Biostatistics and Methods for Qualitative Data 5
Use of Tallies in Making Frequency Distribution
• A frequency distribution is normally formed (manually) by a process known as
tallying. This involves the following steps:
o Scan the data and determine the categories
o List the categories
o Work through the data and allocate each observation to the category where it
belongs using the tally marks to keep a count of the number in each category
o Add the tally marks to give the frequency
• The following data show a qualitative variable ‘Result of sputum examination’. If:
o 1 = Smear negative (– ve), culture negative (–ve)
o 2 = Smear negative (– ve), not done
o 3 = Smear positive (+ve) , culture positive (+ve)
1211311332131123113123113113131321131121123
1112122311213111112131131112111323331112111
Figure 5: Frequency, Relative Frequency and Cumulative Relative Frequency for Sputum
Examination
Value Frequency Relative Cumulative Relative Frequency
Use of Diagrams
• Frequency distributions can be illustrated visually by means of statistical diagrams.
• These diagrams serve two main purposes:
o Presentation of information/data (e.g. report) in articles for ease of appreciation
o To serve as a private aid for further statistical analysis
• Two types of diagrams are commonly used to illustrate qualitative data. These are:
o Pie charts
o Bar charts
Pie Charts
• These are used to express the distribution of individual observations into different
categories.
• Note that the frequencies should be converted into percentages totaling 100 for a pie chart
to be used.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 1: Introduction to Biostatistics and Methods for Qualitative Data 6
• Example of a pie chart illustrating the distribution of student enrolled for academic year
2009 at Kilosa Clinical Officers Training Centre (Kilosa COTC): first year (A) = 57,
second year (B) 44, third year (C) = 38, and Distance Learning (DL) = 12
Figure 6: The Numbers of Students at Kilosa COTC for Academic Year 2008/9
Bar Chart
• The bar chart is the simplest and most effective means of illustrating qualitative data
• The various categories of a variable are represented on the horizontal axis and the
frequency or relative frequency is represented on the vertical axis
• The length of each bar represents the number of observations (frequency) in each
category or the relative frequency in percentage
• For example, consider the following birth control method mix in a certain population:
Note: To use a pie chart for this variable would not be suitable because the diagram will be
too congested. Hence a bar chart is more appropriate (see Figure 8: below).
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 1: Introduction to Biostatistics and Methods for Qualitative Data 7
Figure 8: Bar Chart for Percentage of Birth Control Method Use
2 Norplant
Hysterectomy
Birth control method
3
Vasectomy
26 Condoms
1 7 spermicides
17 Loop
Depo-provera
9
Oral contraceptives
32 Abstinence
3
0 5 10 15 20 25 30 35
Percentage of utilization
Two-Way Tables
• Statistical information on two variables can be presented simultaneously in a form of a
two-way table.
• This table makes the information easier to assimilate by showing many of the properties
of the data at a glance.
• In a two-by-two table, data are presented in rows and columns.
• The format for a table depends upon the data and the aspects of the data which are
important to portray.
• A two-way table should include the following:
o A clear title
o A caption for the rows and columns with units of measurement of the variable
o Labels for each individual row or column, i.e. the values taken by the variable
concerned
o Marginal and grand totals\
Activity: Demonstration
Instructions
Using the scenario below tutor will show how to create a two-way table as shown in Figure 9.
Scenario
In a study to investigate whether or not HIV infection is a risk factor to pulmonary
tuberculosis (PTB), a total of 2165 individuals were examined. Blood samples were also
collected from these individuals for laboratory diagnosis of HIV infection. Of the 2165
individuals examined, 651 were found to be negative for HIV infection. Of those who were
negative, 57 were found to have PTB. 1514 of the HIV positive, 875 were found to have
PTB. This information can be summarized in a two by two table as shown in Figure 9 below.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 1: Introduction to Biostatistics and Methods for Qualitative Data 8
Figure 9: Pulmonary Tuberculosis Infection by HIV Status
HIV status PTB status
Positive Negative Total
875 639 1514
Positive
(57.8%) (42.2%) (100.0%)
57 594 651
Negative
(8.9%) (91.1%) (100.0%)
932 1233 2165
Total
(43.0%) (57.0%) (100.0%)
Note: Numbers in brackets show the row percentages.
• The cells of a two way table may contain percentages instead of the real counts.
• Calculation of percentages may be row-wise or column-wise depending on the purpose of
the table.
• In the above table, the interest is to investigate whether HIV infection is a risk factor to
PTB.
o The aim is to see whether PTB is higher in HIV positives than in HIV negatives.
o The row percentages are more appropriate in this case.
Key Points
• The term biostatics means the application of statistics to biological health problems.
• There is a need for studying biostatistics in medical science for some standardized
techniques to cope with the inevitable biological variability.
• Biostatics is applied in statistical methods which have a role to play in official health
statistics, epidemiology, clinical studies, human biology, laboratory studies, health service
administration, and there may be need to prioritize target groups for necessary
interventions.
• The descriptive statistics, also known as methods of descriptive statistics, vary with
different types of data that are generated from different types of variables.
• Frequency distribution is a descriptive data method for qualitative data.
• This means that the number of times (or the frequency) that each value (or group of
values) occurs in the study population is tallied and summarized by using a variety of
methods (pie graphs, bar charts, etc.) depending on the type data and purpose.
Evaluation
• What does the term biostatics mean?
• Why do we need to study biostatistics in medical science?
• What are the applications of biostatics?
• What is descriptive statistics?
• What are the descriptive methods for qualitative data?
References
• Bonita R. et al. (2006). Basic Epidemiology (2nd ed.). Geneva, Switzerland: WHO.
• Jones D. et al. (2008). Biostatistics. Work Book-Field Epidemiology and Laboratory
Training Programmes (FELTP).
• McCusker J. (2001). Epidemiology in Community Health, Rural Health Series No. 9
(Revised Edition). Nairobi, Kenya: AMREF.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 1: Introduction to Biostatistics and Methods for Qualitative Data 9
• Rosner B. (2006). Fundamentals of Biostatistics (6th ed.). Australia, Canada, Singapore,
Spain, United Kingdom, United States: Thomson Brookes/Cole.
• Varkevisser et. al. (1995). Designing and Conducting Health Systems Research Projects,
Volume 2 Part 2 Module 24. Health Systems Research Training Series.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 1: Introduction to Biostatistics and Methods for Qualitative Data 10
Handout 1.1: Differences Between Nominal, Ordinal, Interval
and Ratio Measurements
Variables are measured on different levels/scales. The term ‘measurement’ is used here in a
broad sense.
Nominal Measurement
• The nominal scale classifies persons or things based on a qualitative assessment of the
characteristic being assessed. It neither includes information on quantity or amount nor
does it indicate ‘more than’ or ‘less than’.
o Example 1: Gender (male or female) is a common nominal variable used in
epidemiologic studies.
o Example 2: Country telephone codes are an example of numeric variables that do not
indicate more or less (country code 82 is not more than country code 37).
o Other examples: These used for identifying various categories that make up a given
variable e.g. Religion: 1 = Muslim, 2 = Christian, 3 = Other. Note that the numbering
codes does not signify ranking and that the categories comprising a nominal variable
cannot occur together and are not related.
Ordinal Measurement
• The ordinal scale also classifies persons or things based on the characteristic being
assessed but does indicate ‘more than’ or ‘less than’. In this sense, it provides more
information than the nominal scale. However, the ordinal scale does not indicate how
much more than or less than.
o Example: Rating students’ performance as being poor, average, good, or excellent
indicates how well students perform and provides a basis for comparison. However,
it does not indicate how much better an excellent performance is compared to a good
one.
Interval Measurement
• The interval scale has the same characteristics of the ordinal scale – classifying persons or
things based on the characteristic assessed and indicating more than or less than – but the
interval scale indicates how much more than or less than.
• The interval scale does not indicate a true zero point, meaning that there cannot be an
absence of a characteristic being measured. Additionally, ratios made with two numbers
in the interval scale do not have meaning.
o Example: Temperature is an interval in that different values can tell you how much
more or less. However, there is no true zero point. The value of zero in temperature
does not indicate absence of temperature. Also, when comparing two temperatures,
their ratio is not meaningful. We would not say that a 90 degree temperature is twice
as hot as a 45 degree temperature.
Ratio Measurement
• The ratio scale includes all the characteristics of the interval scale but does indicate a true
zero point.
o Example: Height and weight measurements indicate how much more or less, but also
have a true zero point. A weight of zero indicates an absence of weight.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 1: Introduction to Biostatistics and Methods for Qualitative Data 11
Differences Between Nominal, Ordinal, Interval and Ratio Measurements
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 1: Introduction to Biostatistics and Methods for Qualitative Data 12
Session 2: Descriptive Methods for Quantitative
Data
Learning Objectives
By the end of this session, students are expected to be able to:
• Describe the descriptive methods of quantitative data
• Describe the different methods of presenting frequency distribution data for grouped and
ungrouped data
• Describe the difference between the mean, median and mode
• Calculate the mean, median, variance and standard deviation
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 2: Descriptive Methods for Quantitative Data 13
Figure 2: Frequency Distribution of No. of Lesions caused by Smallpox Virus in an Egg
Membrane
No. of Frequency of No.
Class Mid- Point(x) (fx)
Lesions of Membranes (f)
0- 1 5 5
10 - 6 15 90
20 - 14 25 350
30 - 14 35 490
40 - 17 45 765
50 - 8 55 440
60 - 9 65 585
70 - 3 75 225
80 - 6 85 510
90 - 1 95 95
100 - 0 105 0
110 - 1 115 115
Total 80 3670
Note that the dash symbol (-) means ‘up to but not including’ the next tabulated value. (That is,
according to the table in Figure 2, 10- means 10 is the lower limit while 19 is the upper limit. The
value 15 is therefore the midpoint for the class interval 10.)
The Rules That are Used to Make a Frequency Distribution for Grouped Data
• Determine the Range, R, of values. (R=largest value-smallest value)
• Decide on the number, I, of classes.
o This number depends on the form of data and the requirements of the frequency
distribution, but usually they should be between 5 and 20 for convenience.
• Determine the width of the class interval, W, such that W=R/I.
o A constant width for all classes is preferable.
• Choose the upper and lower limits of the class interval carefully to avoid ambiguities.
• List the intervals in order.
• Use tallies to allocate each observation into the class in which it falls.
• Add the tally marks to obtain class frequencies.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 2: Descriptive Methods for Quantitative Data 14
Figure 3: Frequency Distribution of Age at Loss of Last Tooth
Age Frequency Interval Width Average Number of Year of
Age of Loss Last Tooth
11 – 15 1 5 0.20
16 – 19 7 4 1.75
20 – 24 21 5 4.20
25 – 29 35 5 7.00
30 – 34 40 5 8.00
35 – 44 58 10 5.80
45 – 54 28 10 2.80
55 – 74 10 20 0.50
Total 200
Line Diagrams
• Line diagrams are often used to express the change in some quantity over a period of time
or to illustrate the relationship between continuous quantities.
• Each point on the graph represents a pair of values, i.e. a value on the x-axis and a
corresponding value on the y-axis.
• Straight lines then connect the adjacent points.
Figure 5: Line Diagram for Cumulative Number of AIDS Cases in Tanzania 1983 to 1992
1983 1984 1985 1986 1986 1987 1988 1989 1990 1991 1992
Year
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 2: Descriptive Methods for Quantitative Data 15
Frequency Polygons
• Frequency polygons are a series of points (located at the mid-point of the interval)
connected by straight lines.
• The height of these points is equal to the frequency or relative associated with the values
of the variable (or the interval).
• The end points are joined to the horizontal axis at the mid points of the groups
immediately below and above the lowest and highest non-zero frequencies, respectively.
• Frequency polygons are not as popular as histograms, but are a visual presentation of a
frequency distribution.
• They can easily be superimposed and therefore superior to histograms for comparing sets
of data.
• The following Figure 6 shows the example of a frequency polygon.
Figure 6: Frequency Polygon for the Number of Trypanosomes in the Blood of a Rat’s Tail
Frequency
Counts
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 2: Descriptive Methods for Quantitative Data 16
Figure 7: Cumulative Frequency Curve for the Number of Trypanosomes in the Blood of a
Rat’s Tail
Frequency
Counts
• When making a statistical diagram, the axes should be clearly labeled and units of
measurement indicated.
• The choice of scales should be made with care.
x̄ =
∑x
n
• For example: consider the following heights of 10 men in centimeters (cm): 165, 167,
169, 169, 171, 173, 175, 176, 176, 169
• The mean height is calculated by adding the heights for the ten men and dividing the sum
by 10.
165 + 167 + 169 + 169 + 171 + 173 + 175 + 176 + 176 + 169
Arithmetic mean =
10
1710
x̄ = = 171 cm
10
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 2: Descriptive Methods for Quantitative Data 17
o Where,
∑ = sum all the values of the variable x from xi=1 to i=n
n= number of observations
• The arithmetic mean can also be calculated from frequency distributions
• Refer to data in Figure 7: multiply each value of the variable with its frequency
• Add them up and divide by the total frequency, for example:
∑ xi fi
x̄ =
∑ fi
• Where xi stands for the value of the variable and fi stands for frequency for value xi
• For example: mean count of trypanosomes in a tail blood of a rat is given by:
(0x4)+(1x27)+2x27)+…+(7x2)+(8x1)+(9x2)
x̄ =
128
402
x̄ = = 3.1
128
• With the grouped data the class midpoint should be used when calculating the mean.
Consider data in Figure 2: the mean number of lesions caused by small pox virus in egg
membranes is:
(5x1)+(15x6)+(25x14)+…+(95x1)+105x0)+115x1)
x̄ =
80
3670
x̄ = =45.8
80
• The arithmetic mean is a preferred measure since it uses more information from each
observation.
• However, it tends to be pulled by extreme values value.
• The following is duration of stay in hospital (in days) for some condition:
5, 5, 5, 7, 10, 20, 102.
• The mean duration of stay is calculated as follows:
154
x̄ = = 22 days
4
• This does not reflect the mean duration of stay
The Median
• The median is the 50th percentile of the values in a dataset and represents the literal
middle of the data.
• The median is found by arranging all values in the dataset in numerical order and then
choosing the middle value.
• If the number of values in a dataset is even, take the mean of the two middle numbers to
find the median.
• For example, below is a series of durations (in days) of absence from classes due to
sickness: 1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 6, 7, 8, 10, 10, 38, 80.
o The median duration is 5 days.
1 th
o Generally, when ‘n’ (number of observations) is odd the median is: /2 (n+1)
observations.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 2: Descriptive Methods for Quantitative Data 18
o But when ‘n’ is even, there is no middle observation, and the median is the mean of
1 th
the two middle observations, i.e. /2 (n+1) observation(s).
• In frequency distributions, the median can be obtained by accumulating the frequencies
and noting the value of the variable which divides the data into two equal halves (i.e. an
observation where 1/2n of the observation lies).
o The median is less efficient than the mean because it takes no account of the
magnitude of most of the observations.
o If two groups of observations are pooled, the median of the combined group cannot be
expressed in terms of the medians of the two component groups.
o The median is much less amenable than the mean to mathematical treatments so it is
less used in more elaborate statistical techniques.
• However, if the data are distributed asymmetrically the median is more stable than the
mean.
o For example: Drawing from the data on the duration of stay in the hospital, the
median is 7 which is a more realistic estimate than the calculated mean of 22 days.
The Mode
• The mode represents the value that is found most frequently in a set of numbers, though it
is not often used.
• Note that it is possible to have more than one mode.
o For example: in the following set of numbers (8, 7, 8, 8, 9, 6, 5, 6, 4, 6, 7) the mode is
both 8 and 6, since each is included in the dataset three times.
• This dataset is referred to as bimodal because it has two modes.
• It is also possible not to have a mode in a set of numbers.
o For example: in the following set of numbers (5, 4, 9, 7, 6, 3, 8) there is no number
which occurs more frequently than any other, therefore, there is no mode.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 2: Descriptive Methods for Quantitative Data 19
Figure 8: Population of the United States by State
40,000,000
35,000,000
30,000,000
25,000,000
Population
20,000,000
Mean Median
15,000,000
10,000,000
5,000,000
0
.Pennsylvania
.New Jersey
.Maryland
.Arizona
.Kentucky
.Hawaii
.Iowa
.Delaware
.Nevada
.New Mexico
.Missouri
.Mississippi
.New York
.Connecticut
.North Carolina
.Florida
.Ohio
.Michigan
.Georgia
.Virginia
.Rhode Island
.Montana
.Illinois
.Massachusetts
.Indiana
.Minnesota
.Colorado
.South Carolina
.Louisiana
.Oregon
.Utah
.Kansas
.Idaho
.Maine
.California
.Tennessee
.Nebraska
.New Hampshire
.South Dakota
.Alaska
.North Dakota
.District of
.Texas
.Alabama
.Arkansas
.Vermont
.Oklahoma
.Wyoming
.Wisconsin
.West Virginia
.Washington
State
Source: Bonita R. et al, 2006
• The states with on the left side of the bar chart/graph have a significantly larger
population than other states
• Because of this, we expect the mean to be higher in value than the median.
• The calculated mean in this sample is 5,811,968.706, which is just marked on the graph
above in Figure 8.
• The median is 4,173,405, also marked on the graph: the mean in this example is greater
than the median.
• A general rule to follow is that if the data is skewed either to the left or to the right, the
median represents the data better than the mean.
• If a sample is normally distributed, the mean and median will be nearly the same.
• With symmetrical data, the mode will be similar as well.
• The mode is rarely used as it can easily be misinterpreted and is not used in statistical
tests.
• When the sample size is small, however, the mode may represent the data most
accurately.
• It is possible that in bimodal data, the modes will be a more accurate description as well.
• The mode is also frequently used to describe qualitative data.
• For example, you might find a modal diagnosis, or use the mode to describe medical
diagnoses by stating the diagnosis that was seen most frequently over a given period of
time.
Measures of Variability
• Measures of variability express the degree of variation or scatter of a series of
observations.
• Common measures of variation are range, variance and standard deviation.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 2: Descriptive Methods for Quantitative Data 20
The Range
• Is defined as the difference between the maximum value and the minimum value.
o For example: if the lowest and highest of a series of diastolic blood pressure are 65
mm Hg and 95 mm Hg, then the range = 95-65 = 30 mm Hg.
• The range is seldom used in statistical analysis because:
o It wastes information since it uses information from only two extreme values.
o The two extreme values are more likely to be faulty.
o The range increases with increasing number of observations.
• Since these differences are squared, the variance is measured in the square of the units in
which the variable X is measured.
o For example, if X is height in cm, the variance will be in cm2.
• A measure of variation that is measured in the original units of the variable is the standard
deviation that is the square root of the variance:
∑ (x i − x)
n
o The standard deviation shows the average deviation of observations from the mean
and the interval x + 2SD covers roughly 95% of all the observations.
o The population variance is in most cases unknown because data are normally not
available for the whole population.
o When this is the case, the population variance, s2 is estimated by the sample variance
s2 :
∑ (xi – x̄)2
s2 =
n–1
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 2: Descriptive Methods for Quantitative Data 21
• Note a change in the denominator from n to n–1.
• When n–1 is used in the denominator, it gives a better estimate of the population variance
than when n is used.
• To calculate the variance and standard deviation for the seven observation in Figure 9
below, (x –x) and (x – x) has to be calculated and then: ∑ (x - x̄)2
Note: Variance and standard deviation can be calculated using the shortcut formula for
∑ (xi – x)2.
This is:
∑ (xi – x)2 2 (∑ xi 2 )
= ∑ xi –
n n -1
Instructions
You will work in small groups to calculate the variance and standard deviation by the
shortcut formula. Use the data from the table below. One group will report their
experience calculating and the answers they came up with and others will share in
discusssion.
sn 1 2 3 4 5 6 7 8
Value 24 34 38 46 47 53 53 61
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 2: Descriptive Methods for Quantitative Data 22
∑ (xi - x)2 = ∑ xi 2 - (∑ xi 2 )
s2 = n
n -1 n-1
Key Points
• Histograms, line diagrams, frequency polygons and cumulative frequency curves are the
most common methods used to present data for both grouped and ungrouped data.
• Mean, median, mode are the most common measures used to determine central tendency
while standard deviation, range and variance are common measures for disperse.
Evaluation
• What are descriptive methods of quantitative data?
• What are the different methods of presenting frequency distribution data for grouped and
ungrouped data?
• How can you calculate the mean, median, variance and standard deviation?
References
• Bonita R. et al. (2006). Basic Epidemiology (3rd ed.). Geneva, Switzerland: WHO.
• Jones D. et al. (2008). Biostatistics. Work Book-Field Epidemiology and Laboratory
Training Programmes (FELTP).
• McCusker J. (2001). Epidemiology in Community Health, Rural Health Series No 9
(Revised Edition). Nairobi, Kenya: AMREF.
• Rosner B. (2006). Fundamentals of Biostatistics (6th ed.). Australia, Canada, Singapore,
Spain, United Kingdom, United States: Thomson Brookes/Cole.
• Varkevisser et. al. (1995). Designing and Conducting Health Systems Research Projects,
Volume 2 Part 2 Module 24. Health Systems Research Training Series.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 2: Descriptive Methods for Quantitative Data 23
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 2: Descriptive Methods for Quantitative Data 24
Session 3: The Normal Distribution
Learning Objectives
By the end of this session, students are expected to be able to:
• Describe the normal distribution curve and its characteristics
• Explain probability distribution and continuous probability distribution
• Demonstrate skills to calculate standard normal distribution (SND)
• If we select three workers then the probability distribution becomes more complicated.
Possible Outcomes Probability
All male 0.216 = (0.60 x 0.60 x 0.60)
2 male, 1 female 0.432 = (0.60 x 0.60 x 0.40)
2 female, 1 male 0.288 = (0.40 x 0.40 x 0.60)
All female 0.064 = (0.40 x 0.40 x 0.40)
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 3: The Normal Distribution 25
Characteristics of a Normal Distribution Curve
• It is specified by two parameters: the population mean and the standard deviation.
• It is symmetrical around the mean, bell-shaped, and unimodal. This is why the normal
curve is frequently referred to as the ‘bell curve’.
• The mean, median, and mode are all in the middle of the curve.
• The total area under the curve above the x-axis is one square unit with 50% of the area to
the right of the mean and 50% to the left of the mean.
• The area bounded by one standard deviation to the right and one standard deviation to the
left of the mean will represents approximately 68% of the values.
• The area bounded by two standard deviations to the right and two to the left will
represents approximately 95% of the values.
• The area bounded by three standard deviations to the right and three to the left will
represents approximately 99.7% of the values. (i.e. 99.7% of the values will be within
three standard deviations of the mean).
Figure 1: Areas Under the Normal Curve that Lie Between 1, 2 and 3 Standard Deviations on
Each Side of the Mean
• Knowing the mean and standard deviation of a normal distribution allows one to
determine the following values:
o The proportion of individuals who fall into any range of values
o The percentile at which a given value falls
o The value which corresponds to a given percentile
Activity: Exercise 1
Instruction
The tutor will provide example of calculating standard normal distribution. Follow along
with the calculations.
A study of blood pressure of African American school boys gave a distribution of systolic
blood pressure (SBP) close to the normal with µ = 105.8mm Hg and σ = 13.4mm Hg.
• What percentage of boys would be expected to have SBP greater than 120 mm Hg?
• Calculate SND = 120 – 105.8 = 1.06
13.4 Activity continue on the next page
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 3: The Normal Distribution 26
• From the table of Standard Normal Distribution, the area to the right of SND = 1.06 is
0.14457, so about 14.5% of the boys would be expected to have SBP greater than 120
mm Hg.
Refer to Handout 3.1: Table of Standard Normal Distribution and Figure 2 below.
Figure 2: Distribution Curve Showing the Probability That SBP is Greater Than 120 mm Hg
Activity: Exercise 2
Instruction
The tutor will provide example of calculating SND1 and SND2 based on results from Exercise
1.
Example
What percentage of boys would be expected to have systolic blood pressure less than 120 mm
Hg?
• If 14.5% have SBP greater than 120 mm Hg. Then 100 – 14.5 = 85.5% will have SBP
less than 120mm Hg.
• What proportion of boys would be expected to have SBP between 85 and 120 mmHg?
• Calculate SND1 85 – 105.8 = 1.55
13.4
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 3: The Normal Distribution 27
Figure 3: Distribution Curve Showing the Probability that SBP is Between 85 and 120 mm
Hg.
-1.06 0 1.06
Source: Jones et al, 2008
Activity: Exercise 3
Instruction
The tutor will provide example of calculating the area between SND1 and SND2 based on
results from Exercise 2.
• The area to the right of SND2 1.06 is 0.14457 and the area to the left of SND1 1.55 is
0.060571, so the proportion with SBP between 85mm Hg and 120mm Hg is 100 – 14.5 –
6.1 = 79.4.
• What will be the range of blood pressures for school boys at 95% confidence limit? If or
within what limits would the central 95% of SBPs be expected?
o If µ = 105.8 and σ = 13.4 then, µ ± 1.96 σ includes 95% of SBP
o 105.8 – 1.96 x 13.4 to 105.8 + 1.96 x 13.4 i.e. 79.5 to 132.1 mm Hg
o i.e. 95% of the school boys have SBPs between 79.5 mm Hg and 132.1 mm Hg.
Instruction
You will work in small groups to calculate SND. You will need blank sheets of paper and
calculators for this activity. One group will present their responses and let other groups share
the discussion.
Key Points
This session emphasized the importance of:
• Normal distribution curve and its characteristics
• Probability distribution
• Continuous probability distribution
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 3: The Normal Distribution 28
Evaluation
• What are the characteristics of a normal distribution curve?
• What are probability distribution and normal probability distribution?
• Give the formula for calculating standard deviation?
References
• Jones, D. et al. (2008). Biostatistics Work Book-Field Epidemiology and Laboratory
Training Programmes (FELTP).
• Makwaya, et al. (1997). Lecture Notes in Biostatistics. Department of Epidemiology and
Biostatistics, MUCHS: Tanzania.
• Varkevisser, et. al. (1995). Designing and Conducting Health Systems Research Projects,
Volume 2 Part 2 Module 24. Health Systems Research Training Series.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 3: The Normal Distribution 29
Handout 3.1: Table of Standard Normal Distribution
-z z
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 3: The Normal Distribution 30
Worksheet 3.1: Calculating the SND
Instructions
Use Handout 3.1, plain papers, and calculators for this activity. Select a reporter to record and
report the work you do in small groups. Present your responses in plenary.
Question
Suppose the average length of stay in a chronic disease hospital of a certain type of patient is
60 days with a standard deviation of 15. If it is reasonable to assume an approximately
normal distribution of lengths of stay, find the probability that a randomly selected patient
from this group will have a length of stay:
a) Greater than 50 days
b) Less than 30 days
c) Between 30 and 50 days
d) Greater than 90 days
Answers:
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 3: The Normal Distribution 31
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 3: The Normal Distribution 32
Session 4: Sampling Techniques
Learning Objectives
By the end of this session, students are expected to be able to:
• Describe the concept of sampling
• State the types of sampling methods
• Describe probability (random sampling) and non-probability sampling
• Calculate sample size for estimation of a mean and proportion
Concept of Sampling
Definition of Terms
• Sampling: The process of selecting any portion of a population as representative of that
population.
o In research we are often dealing with groups which are effectively infinite, such as the
number of children under-five in a district.
o In sampling, part of a group (population) is chosen to provide information which can
be generalized to the whole group, although in theory it would be possible to
investigate the whole group.
o Sampling is adopted to reduce labor and costs. If the whole population is studied, the
process is referred to as taking a census.
• Sampling Unit: An element or set of elements considered for selection in some stage of
sampling.
• Sampling Frame: A list of all the sampling units in the population.
• Sampling Scheme: A method of selecting sampling units from a sampling frame.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 4: Sampling Techniques 33
• In this discussion, we shall confine ourselves to surveys designed to provide estimates of
certain characteristics of populations, particularly the mean and the proportion, as
opposed to other study types.
Non-Probability Sampling
• There are two common non-probability sampling methods: convenience sampling and
quota sampling.
• Convenience Sampling: The sample is obtained on convenience basis.
o Investigators select the study units that happen to be available at the time of data
collection. (Many hospital-based studies use convenience samples).
o A major limitation of this approach is that the sample drawn may be quite
unrepresentative of the study population.
• Quota Sampling: A fixed predetermined number of sample units from different
categories of the study population is obtained.
o A sample obtained in this manner ensures that a certain number of sample units from
different categories with specific characteristics (such as age, sex and religion) are
represented in the sample.
o It is useful when one desires to provide a balance of study units according to some
characteristics of interest.
o Convenience sampling would not achieve this sort of balance.
Probability Sampling
• Probability sampling is also frequently called random sampling.
• In probability sampling the selection procedure has some element of probability/chance.
In particular, a study unit has a known probability of being selected into the sample.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 4: Sampling Techniques 34
• The steps involved in simple random sampling include:
o Obtain a numbered list of all units in the study population (i.e. availability of
complete sampling frame).
o Decide on the size of the sample.
o Select the required number of units using either the ‘lottery’ system or tables of
random numbers.
Refer to Handout 4.2: Table of Random Sampling Numbers
Systematic Sampling
• In systematic sampling, elements in the sample are obtained in a systematic way.
• The steps involved in systematic sampling include:
o Obtain the sampling frame and the size of the study population N.
o Decide on the sample size, n.
o Calculate the sampling interval, k = N/n.
o Select the first element at random from the first k units.
o Include every kth unit from the frame into the sample.
N 720
K= = =9
n 80
• To determine the first unit in the sample, (4th step in systematic sampling listed
above); select one individual randomly from the 9 individuals on the list.
• If, using simple random sampling, the initial selection was 7, the selected individuals
would be those occupying positions 7, 7+9=16, 16+9= 25, 25+9=34, ..., etc.,
according to the 5th step listed above (i.e., include every kth unit) . This continues until
80 individuals have been obtained.
Stratified Sampling
• In stratified sampling the population is divided into subgroups (or strata) whereby each
stratum is sampled randomly with a known sample size.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 4: Sampling Techniques 35
• Strata may be defined according to some characteristics of importance in the survey.
o Examples of strata include occupation, religion, age groups or even locality
(whereby regions of the country may be taken as strata in a national health survey).
• The steps involved in stratified sampling are as follows:
o Divide the population into subgroups (strata).
o Draw a sample (of predetermined size) randomly from each of the stratum.
• An important stratification principle is that the between-strata variability should be as
high as possible, or equivalently that each stratum should be as homogeneous as possible.
That is, units within a stratum should be as much alike as possible and units in different
stratum should be as much different as possible.
Cluster Sampling
• There are situations in which obtaining a complete list of individuals in the study
population is not feasible or practical, or a complete sampling frame is not available
before the investigation starts.
• In such cases it would be easy and convenient to consider a sampling frame in which the
sampling units are a collection (cluster) of study units.
o Examples of clusters include schools, hospital wards, villages, etc.
• Because the sampling unit is a cluster (e.g. a school) the sampling method is known as
cluster sampling.
• The selection steps will be exactly the same as those for any of the above random
sampling methods but the sampling unit being the cluster.
o Divide the population into clusters.
o Draw a sample (of predetermined size) randomly from each of the clusters.
• Unlike in stratified sampling, an important principle in cluster sampling is that units
within a cluster should as heterogeneous as possible while the between-cluster variability
should be as low as possible.
Multistage Sampling
• Multi-stage sampling is carried out in many (more than 1) stages, and different sampling
techniques can be employed at every stage.
• In this method the sampling frame is divided into a population of first-stage sampling
units, of which a first-stage sample is taken.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 4: Sampling Techniques 36
• Each first-stage unit selected is subdivided into second-stage sampling units, which are
then sampled.
• The process continues until it is convenient to stop.
• To illustrate multistage sampling consider a health survey of primary school children in
Tanzania mainland.
o An immediate problem to taking a sample of these children is that it is almost
impossible to construct a complete sampling frame.
o A multistage sample might be:
Take a sample of regions.
Within each selected region take a sample of districts.
Within each selected district, take a sample of schools.
Within each selected school, take a sample of school children, and carry out the
investigation.
o The sample would thus be accomplished in four stages. Notice that the construction of
a complete sampling frame for each stage is relatively easy.
o In addition to the advantage of easily identifying complete sampling frames, a
multistage sampling procedure is likely to result in an appreciable cost savings by
concentrating resources at selected schools instead of a sample made up of children
scattered in all parts of the country.
• Sometimes, in the final stage of sampling, complete enumeration of the available units is
undertaken.
• In the above example, once a survey team has reached the level of a school it may cost
little extra to examine all the children in the school; however, it may be worthwhile to
expend that cost and effort for several reasons (such as feelings of exclusion from children
not included within the study in the same school.)
• Advantages of multistage sampling:
o No complete listing of population required
o Construction of a complete sampling frame for each stage relatively easy
o Most feasible approach for large populations (cost saving)
• Disadvantages of multistage sampling:
o Several sampling lists
o Sampling error difficult to measure
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 4: Sampling Techniques 37
o Therefore, the required sample size, n, is given by n= 4σ 2 /ε2
o This formula implies appropriate knowledge of the population standard deviation σ,
and in almost all surveys it is unknown.
o Thus, it is necessary to replace σ with an estimate. This estimate may be obtained
from results of previous studies on the variable or alternatively be obtained as direct
result from a pilot study.
Instructions
Bias in Sampling
• Bias in sampling refers to the systematic error in sampling procedures that may lead to
distortion in results. Sources of bias in sampling include the following:
o Non-response: This is encountered mainly when subjects refuse to participate. They
may refuse an interview, or forget to fill out a questionnaire. The non-respondents
(particularly those due to refusal) may differ systematically from the respondents.
o Studying volunteers only: The fact that some people volunteer to participate in a
study may mean that they differ from the general population on the factors being
studied.
o Sampling registered patients only: Patients going to a hospital are likely to differ
from those being treated elsewhere.
o Missing cases of short duration: In prevalence studies, cases of short duration (e.g.
fatal cases, cases with short episodes, and mild cases) are more likely to be missed.
o Seasonal bias: If the condition under study exhibits different characteristics in
different seasons of the year, this may lead to a distortion in the results, depending on
the period of data collection.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 4: Sampling Techniques 38
o Tarmac bias: Selecting a study area on the basis of accessibility will generally
constitute a selection bias.
Ethical Considerations
• If recommendations from a study are intended for the entire study population (e.g all
relevant individuals in a region) then one is bound ethically to ensure the sample studied
is representative of that population.
• Remember that random selection of a sample does not guarantee representativeness.
Key Points
• It is cheaper and more realistic to study a portion of population than to study the whole
population.
• Sampling methods can be grouped into two types: random sampling and non-random
sampling. With random sampling, all the subjects have an equal chance of being selected.
Therefore, there is less likelihood of bias.
• The ideal number of subjects/study units to be included in the study can best be obtained
through sample size calculation using a mean or proportion.
• If conclusions that are valid for the whole population are to be drawn on the basis of a
sample, then the sample should be representative.
Evaluation
• What is sampling?
• What are the types of sampling methods?
• What is random sampling and non-random sampling?
• How can you calculate sample size for estimation of a mean and estimation of a
proportion?
References
• Bonita, R. et al. (2006). Basic Epidemiology, 2nd Edition. Geneva, Switzerland: WHO.
• Jones, D. et al. (2008). Biostatistics Work Book. Field Epidemiology and Laboratory
Training Programmes. (FELTP).
• Makwaya, et al. (1997) Lecture Notes in Biostatistics. Department of Epidemiology and
Biostatistics, MUCHS. Tanzania.
• McCusker, J. (2001). Epidemiology in Community Health. (Revised Edition) Rural
Health Series No 9. Nairobi, Kenya: AMREF.
• Rosner, B. (2006). Fundamentals of Biostatistics,6th edition). Belmont, California:
Thomson Brookes/Cole.
• Varkevisser, C., et al. (1995). Designing and Conducting Health Systems Research
Projects,Vol. 2, Part 2, Module 24. Health Systems Research Training Series.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 4: Sampling Techniques 39
Handout 4.1: Sampling Techniques
The diagram above depicts drawing a sample size n using a particular sampling method from
a study population with N units (subjects.) Inferential statistics techniques are then used to
make inferences about the study population on the basis of results from the sample.
Steps:
1. Identify the study population. (Note that it is possible to have several study populations
in one study.)
2. Draw a sample from the study population.
3. Describe the sample by calculating relevant statistics.
4. Make inferences about the parameters.
5. Draw conclusions about the study population.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 4: Sampling Techniques 40
Handout 4.2: Table of Random Sampling Numbers
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 4: Sampling Techniques 41
Handout 4.3: Systematic Random Sampling
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 4: Sampling Techniques 42
Worksheet 4.1: Calculating Sample Size for Estimation of a
Proportion
Instructions:
• Please work in small groups.
• Read the following scenario and answer the questions that follows.
• Be prepared to briefly share your response with the class.
Scenario:
You have been assigned to conduct a study in order to estimate the prevalence (i.e.
proportion) of people affected with Bancroftian filarial infection in the Dar es Salaam region.
A review of literature on the subject reveals that studies done along the East African coastal
strip some years back showed the prevalence to be in the order of 30%.
Question:
• What sample size do you require in order to come up with a reasonable estimate in your
study?
• Give a complete answer. Describe any assumptions or prior decisions that you undertake.
Answer:
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 4: Sampling Techniques 43
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 4: Sampling Techniques 44
Session 5: Estimation of Mean and Proportion
Learning Objectives
By the end of this session, students are expected to be able to:
• Define sampling errors and non-sampling errors
• Describe the standard error of the mean
• Describe the standard error of the proportion
• Estimate the standard error of the mean
• Estimate the standard error of the proportion
• In general, the sample mean or sample proportion is unlikely to be exactly equal to the
mean or proportion in the population, although the former is intended to estimate the
latter. If the two are exactly equal to one another, it is just by coincidence.
• Our conclusion about a population on the basis of the sample we have taken will almost
always have some error.
• We distinguish between two sorts of error:
o Sampling errors
o Non-sampling errors
Sampling Errors
• Sampling errors are those which arise due to the fact that we have observed only part of
the whole population, and they get less important as the sample size increases.
o For example, an estimate of the mean number of children per household in a certain
district based on two households only (in the district) will certainly be poorer than an
estimate based on a sample of 100 households.
o We say there is less sampling error in the latter situation than in the former. If we
investigated the whole population (i.e. all households in the district) the sampling
error would be zero because we would know the population mean exactly.
Non-Sampling Errors
• Non-sampling errors are due mainly to fault in the sampling process which is likely to
create room for the potential sources of bias. (These are sometimes also referred to as
systematic errors.)
o These errors are potentially serious since the bias they cause may lead to invalid
conclusions.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 5: Estimation of Mean and Proportion 45
o Increasing the size of a sample will not necessarily reduce the non-sampling errors.
o Subjects who refuse to participate in an interview or may forget to fill in a
questionnaire may differ systematically from those who diligently respond.
o Non-sampling errors also occur through equipment faults, observer errors and during
data processing through coding, data entry, etc.
o However, in this section we will direct our attention to sampling (also known as
random errors).
• Let us revisit the issue of the sampling error in the situation of a sample taken once.
• Two properties about the sampling error are apparent:
o The larger the sample size the better the precision in estimating (i.e. large samples are
more likely to produce closer estimates than small samples).
o If the variability of the observations in the parent (study) population is small we
would expect the error to be small, and vice-versa.
o Thus, the sampling error depends on the variability of observations in the population.
• Take a moment to recall the idea of repeatedly taking a random sample of size ‘n,’ and
for each sample calculate the sample mean ‘x’ each time. This would lead to a series of
values of ‘x’ and the natural questions relating to this (new) variable ‘x’ will be about its
distribution as well as the mean and variance of the variable.
How to Calculate the Standard Error and 95% Confidence Interval of a Mean
• When dealing with numerical data you may wish to estimate to what degree the sample
mean varies from the population mean.
• The standard error for the mean is calculated by dividing the standard deviation by the
square root of the sample size:
Standard deviation
S D
Sample Size or
n
• It can be assumed, for a normally distributed variable, that approximately 95% of all
possible sample means lie within two standard errors of the population mean. In other
words, we can be 95% sure that the population mean, of which we want to have the best
possible estimate, lies within two standard errors of our sample mean.
• When describing variables statistically you usually present the calculated sample mean
plus or minus two standard errors.
• This is called the 95% Confidence Interval. It means that you are about 95% certain that
the true population mean is within this interval.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 5: Estimation of Mean and Proportion 46
Activity: Exercise 1
Instructions
Tutor will show example of calculating standard error and confidence interval, as follows:
The weights of a random sample of 11 three-year-old children were taken in a village. The
sample mean was 16 kg and the standard deviation of the sample was 2 kg.
Standard Error:
2
SE = = 0.6 kg
11
This means that we are approximately 95% certain that the mean weight of all three-year-old
children in this population lies between 14.8 and 17.2 kg.
Activity: Exercise 2
Instructions
Tutor will show another example of calculating standard error and confidence interval.
Follow along with the calculations.
Now, imagine that the size of the random sample in Exercise 1 is increased. The weights of a
random sample of 20 three-year-old children were taken in a village. The sample mean was
16 kg and the standard deviation of the sample was 2 kg.
Standard Error:
2
SE = = 0.45 kg
20
This means that we are approximately 95% certain that the mean weight of all three-year-old
children in this population lies between 15.1 and 16.9 kg.
Note that the increase in sample size in Example 2 clearly improved the reliability of the
calculation, because the confidence interval was narrower.
How to Calculate the Standard Error And 95% Confidence Interval of A Percentage
• In the previous section we calculated the standard error and the 95% confidence interval
of a sample mean, starting with numerical data. We will now do the same for a
percentage that was calculated from categorical data.
• The formula for calculating the standard error of a percentage is:
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 5: Estimation of Mean and Proportion 47
p(100 − p)
SE =
n
In this equation, p represents one of the percentages and (100 - p) represents the other
percentage. The standard error of the percentage is obtained by multiplying the
percentages, dividing the result by the number in the sample, and taking the square root.
• Note that instead of percentages we can use proportions. A proportion can take on any
value between 0 and 1.
• The formula for calculating the standard error of a proportion would be:
p(1− p)
SE =
n
Where p equals a proportion of the population, and (1-p) represents the other proportion of
the population.
Activity: Exercise 3
Instructions
Tutor will provide an example of calculating standard error and confidence interval using
categorical data as follows:
Among a sample of 120 TB patients, which was drawn from the total population of TB
patients in the country, it was found that 28 (or 23.3%) did not comply with their out-patient
treatment. The other 92 (or 76.7%) exhibited a satisfactory degree of compliance.
We now want to calculate the standard error of the percentage of non-compliance (23.3%).
This is done as follows:
If p represents one of the percentages (23.3%) and (100 - p) represents the other (76.7%),
then the standard error of the percentage is obtained by multiplying them, dividing the result
by the number in the sample and taking the square root.
Standard Error:
23.3x 76.7
SE = = 3.9
120
Now, we calculate the confidence interval for the percentage of non-compliance in the country.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 5: Estimation of Mean and Proportion 48
If we are to calculate this example as a proportion, we would do it as follows:
Standard Error:
p(1− p)
SE =
n
0.233x0.767
SE = = 0.039
120
Note that with this example, we can extrapolate from our proportions to come to the same
conclusion: We are 95% confident that in the population of all TB patients in the country
from which the sample of 120 was drawn, 15.5% to 31.1% do not comply with their out-
patient treatment.
Instructions
REFER to Worksheet 5.1: Calculate Standard Error of a Mean. Read the
worksheet instructions.
Calculate the Standard Error of the mean and 95% confidence interval. Record your
answers.
Question:
The haemoglobin levels of a random sample of 40 five-year-old children were taken in a
village. The sample mean was 13gm/dl and the standard deviation of the sample was 3 gm/dl.
Please work together to calculate the Standard Error of the mean, and to calculate the 95%
confidence interval.
Key Points
• The larger the sample size, the smaller the standard error and the narrower the confidence
interval will be.
• The advantage of having a sufficiently large sample is that the sample mean will be a better
estimate of the population mean.
• At a certain point, increases in sample size demand vast investments in time and money,
whereas the confidence interval only marginally decreases.
Evaluation
• What is standard error?
• Why is it important to know standard error?
References
• Makwaya et al. (1997). Lecture notes in biostatistics. Department of Epidemiology and
Biostatistics. MUHAS.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 5: Estimation of Mean and Proportion 49
• Varkevisser et. al. (1995). Designing and conducting health systems research projects
(Volume 2, Part 2, Module 24). Health Systems Research Training Series.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 5: Estimation of Mean and Proportion 50
Worksheet 5.1: Calculating Standard Error of a Mean
Instructions
Work in small groups to answer the following question.
Question:
The haemoglobin levels of a random sample of 40 children of five years old were taken in a
village. The sample mean was 13gm/dl and the standard deviation of the sample was 3 gm/dl.
Please calculate the Standard Error of the mean and the 95% confidence interval.
Each group should record their answer on a flipchart (or on a section of chalkboard) in the
classroom.
Answer:
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 5: Estimation of Mean and Proportion 51
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 5: Estimation of Mean and Proportion 52
Session 6: Significance Tests of One Sample
Learning Objectives
By the end of this session, students are expected to be able to:
• Define the terms statistical hypothesis, null hypothesis (H0,), alternative
hypothesis (H1), test statistic, significance level, and critical value
• Describe p-value and its interpretation
• Differentiate statistical significance from practical significance
• Describe the student’s t-test
Definition of Terms
• Statistical hypothesis: A statement about the parameter(s) or distribution of the
population(s) being sampled.
• Null hypothesis (H0): A term describing the particular hypothesis being tested.
o In many instances it is formulated for the sole purpose of being rejected or nullified. It
is often a hypothesis of ‘no difference’.
• Alternative hypothesis (H1): A statistical hypothesis that disagrees with the null
hypothesis.
o The null hypothesis H0 and the alternative hypothesis H1 concern populations, but
our conclusions are based on samples taken from these populations.
o Generalizing from a sample to the population can be dangerous due to sampling
errors. Therefore, we are unable to say that H0 or H1 is definitely true.
o If sampling errors are taken into account, we can investigate the likelihood that the
null or alternative hypotheses are true. We have to measure the relevant information
in the sampled data and weigh this information in relation to the sampling errors
involved.
• Test statistic: A statistic which represents the relevant sample information for the
question under investigation.
o The test statistic provides a basis for testing a statistical hypothesis and has a known
sampling distribution with tabulated percentage points (e.g. standard, normal, etc).
The value of a test statistic differs from one sample to another.
• Significance level: An arbitrary cut-off point which gives some small probability for
deciding when to declare the null hypothesis untenable or false.
• Critical value: A cut-off value corresponding to a given significance level, as determined
from the sampling distribution of the test statistic (by using statistical tables).
o The critical value is the boundary value such that if the value of the test statistic is
more extreme (i.e. more unlikely) than the critical value, then H0 is rejected and the
probability of rejecting H0 when it is true is less than the significance level.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 6: Significance Tests of One Sample 53
The p-Value
• The p-value is the probability that the differences as large (or larger) as we have
observed could have occurred simply by chance.
Figure 2: The Relationship between the p-Value and the Sample Size
Sample size
P -Value Small Large
Small Evidence against H0 Evidence against H0
Results point away from H0 Results support H1
Large Difficult to interpret No evidence against H1
Can’t distinguish between H0 and H1 Results point at H0
• The following results relate to malnutrition among under-fives in Dodoma and Mwanza
using different sample sizes confirm the above explanation. We can observe that as
sample size increases, the difference between populations in Dodoma and Mwanza
becomes significant.
Figure 3: Malnutrition in under-fives in Dodoma and Mwanza: p-Values and Sample Size
SN Dodoma Mwanza p-Value Conclusion
50 40% 30% 0.29 No significant difference
500 40% 30% 0.0009 Highly significant
50,000 31% 30% 0.0006 Highly significant
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 6: Significance Tests of One Sample 54
• Similarly, it is not reasonable to take a non-significant result as indicating no effect, just
because we cannot rule out the null hypothesis.
Activity: Exercise 1
Instructions
Tutor will use the example below to illustrate the relationship between confidence intervals
and significance tests.
Example
Consider a study in which there are two groups of patients, randomized to a group receiving
a new treatment for a medical condition (i.e., treatment group), and into a control group. We
know that the mean survival time of the patients after being treated by the new technique is
46.9 months. From this, we have determined that the standard error is 4.33 months. (That is,
x=46.9 moths, and SE(x) = 4.33 months.)
A 95% confidence interval for the true mean survival time due to this new technique can be
calculated as shown below:
The value proposed in the null hypothesis is 38.3 months. Note that it is not included in the
confidence interval.
Therefore, it can be concluded that 38.3 is an unlikely value for the mean survival time of
patients after treatment with the new techniques. This means that the null hypothesis is
rejected at the 5% level (i.e. p<0.05).
Student’s t-Test
• As already shown above, the standard normal deviate (SND) test involves the calculation
for variance
• The SND is then compared with the critical values 1.96 or 2.58.
• This was applied since the population standard deviation (σ) was known.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 6: Significance Tests of One Sample 55
• If σ is unknown, the SND cannot be calculated.
• However, the value of σ can be estimated from the sample by the standard deviation s.
• Replacing σ in the below formula with s, we obtain a new quantity t, given that it follows
the t-distribution with (n-1) Degrees of Freedom (df).
∑ (x- x )2
2
σ = ______
n
∑ (x- x )2
s2 = ______
n
• As the sample size increases, s should be nearly equal to σ and t will be very close to the
standard normal deviate.
Activity: Exercise 2
Instructions
Tutor will provide example of how to calculate standard error and confidence interval.
Example
The following data are uterine weights (in mg) of each of 20 rats drawn at random from a
large stock. Is it likely that the mean weight for the whole stock could be 24 mg, a value
observed in some previous work?
9 18 21 26 14 18 22 27 16 30
15 19 22 29 15 19 24 30 24 32
In this problem:
n=20
µ = 24
∑ x = 430
x = 430 / 20 = 21.5
∑ x2 = 9984
s2 = 38.895
s = 6.237
The null hypothesis is that the mean weight for the whole stock is 24 mg. (µ0=24)
21.4 − 24
Therefore, t = = 1.891
1.3219
The df are (20 – 1) = 19.
We then consult the Student’s t-Table. We find that the value of t at p=0.05 and df=19 is
2.093.
Since 1.891 < 2.093, then p >0.05.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 6: Significance Tests of One Sample 56
There is not sufficient evidence to suggest that the mean uterine weight of the stock is
different from 24 mg. The 95% confidence interval for the true mean is:
x ± µ (0.05, 19) × SE( x )
21.5 ± 2.093 × 1.395 = 18.2 to 23.8
The inclusion of the value 24 corresponds to the non-significant result of testing this value at
the 5% level.
Instructions
You will work in small group to calculate significance level and confidence limits. Record
your answers neatly on a paper. One group will present their answers, and other groups will
contribute to the discussion drawing from their experiences.
Key Points
• For a proper interpretation of the p-value the sample size should be considered.
• If the sample size is too small the sampling error will be large. This will prohibit us to
find evidence against H0 and result in high p-values, even if H0 is not true.
• Sample size is important in the interpretation of the p-value.
Evaluation
• What do p<0.05 and p>0.05 mean?
• What is the distinction between statistical significance and practical significance?
• What is the use of the student t-test?
References
• Bonita, R., Beaglehole, R., Kjellstrom, T. (2006). Basic Epidemiology. (2nd ed). Geneva,
Switzerland: WHO
• Jones, et al. (2008). Biostatistics Workbook. Atlanta, USA: CDC Field epidemiology and
lab training program (FELTP).
• Makwaya, et al. (1997). Lecture notes in Biostatistics. Department of Epidemiology and
Biostatistics. MUHAS.
• McCusker, J. (2001). Epidemiology in Community Health (Rural Health Series, No. 9).
Nairobi, Kenya: AMREF.
• Rosner, B. (2006). Fundamentals of Biostatistics. (6th ed.). Belmont, CA: Thomson
Brookes/Cole.
• Varkevisser, et al. (1995). Designing and Conducting Health Systems Research Projects.
(Vol. 2, Part 2, Module 24). Health Systems Research Training Series.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 6: Significance Tests of One Sample 57
Handout 6.1: Student’s t-Test Table
Degrees of Table of t
freedom p = 0.50 0.20 0.10 0.05 0.02 0.01
1 1.00 3.08 6.31 12.71 31.82 63.66
2 0.82 1.89 2.92 4.30 6.96 9.92
3 0.76 1.64 2.35 3.18 4.54 5.84
4 0.74 1.53 2.13 2.78 3.75 4.60
5 0.73 1.48 2.02 2.57 3.36 4.03
6 0.72 1.44 1.94 2.45 3.14 3.71
7 0.71 1.41 1.89 2.36 3.00 3.50
8 0.71 1.40 1.86 2.31 2.90 3.36
9 0.70 1.38 1.83 2.26 2.82 3.25
10 0.70 1.37 1.81 2.23 2.76 3.17
11 0.70 1.36 1.80 2.20 2.72 3.11
12 0.70 1.36 1.78 2.18 2.68 3.05
13 0.69 1.35 1.77 2.16 2.65 3.01
14 0.69 1.35 1.76 2.14 2.62 2.98
15 0.69 1.34 1.75 2.13 2.60 2.95
16 0.69 1.34 1.75 2.12 2.58 2.92
17 0.69 1.33 1.74 2.11 2.57 2.90
18 0.69 1.33 1.73 2.10 2.55 2.88
19 0.69 1.33 1.73 2.09 2.54 2.86
20 0.69 1.33 1.72 2.09 2.53 2.85
21 0.69 1.32 1.72 2.08 2.52 2.83
22 0.69 1.32 1.72 2.07 2.51 2.82
23 0.69 1.32 1.71 2.07 2.50 2.81
24 0.68 1.32 1.71 2.06 2.49 2.80
25 0.68 1.32 1.71 2.06 2.49 2.79
For more than 25 degrees of freedom, calculate the value of t that is required for each level of
significance from the expression: a+ (b/degree of freedom), where a and b have the values set
out below. As examples of the calculation the values of t needed for significance are given
for 30 and 40 degrees of freedom. Thus for P=0.05 the value of t needed is 1.96 +
(2.50/30)=2.04 for 30 degrees of freedom and 1.96 + (2.50/40)=2.02 for 40 degrees of
freedom.
Degrees of
P = 0.50 0.20 0.10 0.05 0.02 0.01
freedom
a 0.67 1.28 1.65 1.96 2.33 2.58
b 0.26 0.86 1.58 2.50 3.98 5.30
30 0.68 1.31 1.70 2.04 2.46 2.76
40 0.68 1.30 1.69 2.02 2.43 2.71
Source: Makwaya et. al. (1997): Lecture notes in Biostatistics. MUCHS. Tanzania.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 6: Significance Tests of One Sample 58
Worksheet 6.1: Calculating Significance Level and Confidence
Limits
Instructions:
Work together in your small group to answer the following questions on a sheet of paper.
Select a recorder in your group, and record your answer and calculations.
Question:
The mean level of prothrombin in the normal population is known to be 20.0 mg/100 ml of
plasma and the standard deviation is 4 mg/100 ml. A sample of 40 patients showing vitamin
K deficiency has a mean prothrombin level of 18.5 mg/100 ml.
a) How reasonable is it to conclude that the true mean for patients with vitamin K deficiency
is the same as that for a normal population?
b) Within what limits would the mean prothrombin level be expected to lie for all patients
with vitamin K deficiency? Give the 95% confidence limits.
Answer:
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 6: Significance Tests of One Sample 59
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 6: Significance Tests of One Sample 60
Session 7: Chi-Square (χ 2) Test
Learning Objectives
By the end of this session, students are expected to be able to:
• Define chi-square test
• Describe chi-square test of 2-by-2 table
• Demonstrate calculation of chi-square from a 2-by-2 table
• Describe chi-square test for a larger contingency table
Definition
• Chi-square: A test used to find out whether observed differences between proportions of
events in groups may be considered statistically significant.
• This table is called a ‘2-by- 2 contingency table’ because there are 2 rows and 2 columns.
o In general, we can have any ‘r × c’ contingency table, with ‘r’ rows and ‘c’ columns.
• From the above table, the observed frequencies are 41, 216, 64 and 180. We need to
obtain the expected frequencies under the null hypothesis that: ‘There is no difference in
outcome for patients receiving Treatment A and Treatment B.’
o In contingency table problems, the expected frequencies are the frequencies that you
would predict (‘expect’) in each cell of the table, if you were only provided with the
row and column totals (assuming that the variables under comparison were
independent).
o The expected frequencies are calculated in the following way, where E = expected
frequency:
Row Total × Column Total
E=
Grand Total
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 7: The Chi-Square (χ 2) Test 61
• For example, in the top left cell, where we observe 41 deaths, the expected frequency
under the null hypothesis is:
105 × 257
= 53.86
501
• These expected frequencies are shown in the table below.
• They add up to the same grand total as the observed frequencies.
• We can then compare the observed and the expected frequencies by looking at their
differences.
• We need also to consider the importance of the magnitude of the differences relative to
the expected values.
o For example, a difference of 5 between 995 and 1000 is not as important as the
discrepancy of 5 between 2 and 7.
• The percentage points of the chi-square distribution are provided in Handout 7.1.
• From our example, we can determine that:
df = (2-l) (2-l) = l. Therefore from the above table, χ2 = 7.97 with l degree of freedom
(df).
• The Chi-Square table in Handout 7.1 shows that the observed value of 7.97 is beyond the
0.01 point of the Chi-square distribution.
• Therefore p is < 0.0l. We can conclude that the difference between the two treatments is
significant.
• A short cut formula for computing χ2 for a 2-by-2 table is given as follows:
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 7: The Chi-Square (χ 2) Test 62
Figure 3: Obtaining Chi-square Value Using the Degree of Freedom
Row
Variable y Variable x
marginal total
X1 X2
Y1 a b r1=a+b
Y2 c d r2= c+d
Column
s1 = (a+c) s2 = (b+d) n=(a+b+c+d)
marginal Total
Instructions
. Refer to:
• Handout 7.1: Table of Chi-Square (χ2) Test Values
• Handout 7.2: Steps in Calculating Chi-Square (χ2) Test
• Worksheet 7.1: Calculate the Chi-square Test
You will work in small groups to calculate Chi-square values. Read the question in
Worksheet 7.1, and refer to Handout 7.1 and Handout 7.2 for guidance as needed. One group
will present their solution in plenary, and others share in the discussion.
Activity: Exercise 1
Instructions
Tutor will provide an example of the Chi-square Test for larger contingency tables.
The following data show a sample of 10-year-old children classified according to the state of
oral hygiene and type of school attended.
Oral Hygiene among 10-year-olds, by type of school
Oral Hygiene
Total
Type of School Good Fair+ Fair- Poor
Below average 62 103 57 11 233
Average 50 36 26 7 119
Above average 80 69 18 2 169
Total 192 208 101 20 521
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 7: The Chi-Square (χ 2) Test 63
The expected numbe of children with good oral hygiene attending below average schools in a
sample of 192 children is:
233 × 192
= 85.9
521
Similarly, the expected numbers of children attending below average schools out of 208
children with fair+ oral hygiene is:
233 × 208
= 93.0
521
Now, let’s look at the expected frequencies for oral hygiene by type of school.
χ =∑
2 (O − E)
2
E
Now, apply that to our data:
2 (62 – 85.9)2 (103 – 93.0)2 (2 – 6.5) 2
χ = + + … +
85.9 93.0 6.5
Therefore, p <0.001.
Thus, we reject H0 and conclude that it is probable that there is an association between oral
hygiene and type of school attended.
Note that a large proportion of children with good oral hygiene attended above average
schools compared to those who attended below average schools.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 7: The Chi-Square (χ 2) Test 64
Key Points
• The Chi-square test is only valid for comparing observed and expected frequencies. It is
not valid for other variable such as percentages, means and rates.
• The Chi-square test is not valid for cells with expected frequencies less than 5.
Evaluation
• Define ‘chi-square test’.
• What is the formula for calculating the chi-square test for a 2-by-2 table?
• What is the formula for calculating the Chi-square test for a larger contingency table?
References
• Makwaya, et al. (1997). Lecture notes in biostatistics. Department of Epidemiology and
Biostatistics. MUCHS.
• Varkevisser, et al. (1995). Designing and Conducting Health Systems Research Projects.
(Vol. 2, Part 2, Module 24). Health Systems Research Training Series.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 7: The Chi-Square (χ 2) Test 65
Handout 7.1: Table of Chi-Square (χ2) Test Values
If the number of degrees of freedom is greater than 30, calculate the value of the expression:
χ 2 – df
The level P associated with this value is (to a close approximation) as follows:
P = 0.50 0.20 0.10 0.05 0.02 0.01
Value 0.00 0.60 0.91 1.16 1.45 1.64
Example:
• Assume that χ2 = 49 and df = 36.
• The expression χ 2 – df = 7 – 6 = 1.0
• This is not quite significant at P=0.05 (where a value of 1.16 is required).
o Note: If χ2 = 55, the expression would give 7.42 – 6.00 = 1.42 which is highly
significant at almost p = 0.02 (where a value of 1.45 is required).
Source: Makwaya et. al. (1997): Lecture notes in Biostatistics. MUCHS. Tanzania
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 7: The Chi-Square (χ 2) Test 66
Handout 7.2: Steps in Calculating Chi-Square (χ2) Test
Example
Suppose that in a cross-sectional study of the factors affecting the utilization of antenatal clinics
you found that 64% of the women who lived within 10 kilometres of the clinic came for
antenatal care, compared to only 47% of those who lived more than 10 kilometres away. This
suggests that antenatal care (ANC) is used more often by women who live close to the clinics.
The complete results are presented in the table below:
Utilisation of Antenatal Clinics by Women Living Far From and Near the Clinic
Distance from ANC Used ANC Did not use ANC TOTAL
Less than 10 km 51 (64%) 29 (36%) 80 (100%)
10 km or more 35 (47%) 40 (53%) 75 (100%)
Total 86 69 155
From the table we conclude that there seems to be a difference in the use of antenatal care
between those who live close to and those who live far from the clinic (64% versus 47%). We
now want to know if this observed difference is statistically significant or not.
The chi-square test can be used to give us the answer. This test is based on measuring the
difference between the observed frequencies and the expected frequencies if the null
hypothesis (i.e. the hypothesis of no difference) were true.
b) For each cell, subtract the expected frequency from the observed frequency (O - E).
c) For each cell, square the result of (O - E) and divide by the expected frequency E.
d) Add the squared results (calculated in step c) for all the cells.
e) The formula for calculating a chi-square value (steps b to d) is:
2 (O – E)2
χ =
E
Continued on next page
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 7: The Chi-Square (χ 2) Test 67
o O is the observed frequency (indicated in the table)
o E is the expected frequency (to be calculated)
o ∑ means ‘sum of’ and directs you to add together the values of [(O - E)2 /E] for all the
cells of the table.
o For a 2-by-2 table (which contains 4 cells) the formula is:
2 (O1 – E1)2 (O2 – E2)2 (O3 – E3)2 (O4 – E4)2
χ = + + +
E1 E2 E3 E4
As for the t-test, the calculated χ2 value has to be compared with a theoretical χ2 value in order
to determine whether the null hypothesis is rejected or not.
Note: Handout 7.2 Table of Chi-Square (χ2) Test values contains a table of theoretical χ2
values.
a) First, decide what significance level you want to use (alpha or α-value). We usually take
0.05.
b) Then, calculate the degrees of freedom. With the χ2 test the number of degrees of freedom
is related to the number of cells (i.e., the number of groups you are comparing).
o The number of degrees of freedom is found by multiplying the number of rows (r)
minus 1 by the number of columns (c) minus 1:
df = (r–1)(c–1)
o For a simple two-by-two table the number of degrees of freedom is 1:
df = (2–1)(2–1) = 1
c) The χ2 value belonging to the α-value and the number of df are located in the table. If the
calculated χ2 value is equal to or larger than the χ2 value from the table, then the p-value
is smaller than the chosen level of significance (α-value).
d) In this case, we reject the null hypothesis and conclude that there is a statistically
significant difference between the groups. If the calculated χ2 value is smaller than the χ2
value from the table, then the p-value found is larger than the chosen significance level of
0.05. In this case, we accept the null hypothesis and conclude that the observed difference is
not statistically significant.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 7: The Chi-Square (χ 2) Test 68
Worksheet 7.1: Calculate the Chi-square Test
Instructions
• Work in small groups to complete the following worksheet.
• Refer to Handout 7.1: Table of Chi-square (χ2 ) Test and Handout 7.2: Steps in
Calculating Chi-Square (χ2) Test as needed.
Question
In the study of the factors affecting the utilization of antenatal clinics found that 64% of the
women lived within 10 km of the clinic came for antenatal care, compared to only 47% of
those who lived more than 10 km away. This suggests that antenatal care is used more often
by women who live close to the clinics. The complete results are presented below:
Utilization of Antenatal Clinic by Women Living Far From and Near the Clinics
Distance from Used ANC Did not use Total
ANC ANC
Less than 10 km 51 (64%) 29 (36%) 80 (100%)
10 km or more 35 (47%) 40 (53%) 75 (100%)
Total 86 69 155
From the table we determine that there seems to be a difference in utilization of antenatal care
between those who live close to and those who live far from the clinic. We want to know
whether this observable difference is statistically significant.
Please calculate the χ2 value. Use Handout 7.1: Table of Chi-square (χ2 ) Test to interpret
the results.
Answer:
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 7: The Chi-Square (χ 2) Test 69
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 7: The Chi-Square (χ 2) Test 70
Session 8: Source and Uses of Morbidity and
Mortality Statistics
Learning Objectives
By the end of this session, students are expected to be able to:
• Define vital statistics and demography
• Describe key sources of demographic data
• State the definition of rate, ratio and proportion
• Describe the measures of fertility, morbidity and mortality rates
Sources of Data
• Quality of data depends on many factors, one of which is the source of data.
• Sources of data have a direct implication on information quality in terms of coverage,
completeness and cost.
• In this session we will concentrate on the following sources of demographic data:
o Census
o Vital registration systems
o Sample surveys
Census
• The main characteristic of census is that it covers the whole population.
• Although commonly limited to population, a census can be used to quantify any number
of items in a category.
o For example, recorded censuses have been found of agriculture, business, livestock,
housing, etc., sometimes done concurrently with population census.
• No sampling is involved and each person is enumerated separately.
• A census must have a legal basis to make it complete and compulsory.
• A census reflects a single point in time (such as 1-January-2010), although the whole
process of data collection/enumeration can take a longer time.
• Basic questions which should appear on a census questionnaire include:
o Name, age, sex, relationship to the head of household, marital status, race, religion,
ethnicity, education level, occupation, employment status, migration status, and
household amenities.
• Additional questions would depend on the availability and quality of vital registration.
• A population census can be carried out using either de facto or de jure method.
o De Facto Census
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 8: Source and Uses of Morbidity and Mortality Statistics 71
De facto method enumeration designates persons to the area or location they are
found during enumeration (i.e., it enumerates the population ‘in fact’ there at the
time of the census, regardless of the location of their legal or permanent/normal
residence.) The question of originality does not count here.
For example, in Tanzania’s 1988 Population Census, Zanzibar had a population of
641,000. This implies that 641,000 people spent a night in Zanzibar before a
census night.
Tanzania follows de facto census enumeration.
o De Jure Census
The de jure method of enumeration allocates persons to their normal/usual
residence. That is, the census counts people who belong to an area or have the
right to live there through citizenship, legal residence, etc. For example, a
businessman working in Dar es Salaam but living in Arusha would be assigned to
Arusha.
• In Tanzania, a census is normally conducted every ten years (decennial).
o This creates some setbacks and implications for planning, because population can
change rapidly as a result of births, deaths, and migration/movement.
• To overcome this problem, inter-censal surveys or mini-surveys are conducted.
Examples of such surveys are the Tanzania Demographic and Health Survey (TDHS),
conducted approximately every 5 years.
• Further surveys on morbidity and for specific diseases (e.g., maternal mortality,
HIV/AIDS, childhood malnutrition, etc.) can be conducted whenever a need arises.
Vital Registration
• Vital registration systems are common in developed countries where information on
births, marriages, deaths and migrations are collected. In developing countries, vital
registration systems are often incomplete, unreliable, or non-existent.
• Questions in the vital registration system are always very simple and few.
o Consider hospital or health service data here in Tanzania. Examples of such
registrations are information on deaths found in hospitals (death certificates), birth
and marriage data found in churches, mosques and District Commissioners’ offices
and migration data found at airports and borders.
• The shortfall of vital registration systems is that they are often incomplete, selective
samples, and are practically unreliable. This does not mean that the system should be
discarded; instead it should be improved to remove these errors.
Sample Surveys
• Sample surveys give the same information in a more detailed form when a reliable vital
registration system does not exist.
• Only a sample of the population is involved; thus, sample surveys are less costly than a
complete census. In addition, information can usually be collected more quickly in a
sample survey than in a census.
• Sample surveys allow more detailed, nuanced data collection than census systems.
• One key disadvantage to sample surveys is the error introduced through sampling.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 8: Source and Uses of Morbidity and Mortality Statistics 72
Ratio
• Any number (numerator) divided by any other number (denominator) gives a ratio.
• For example:
X
Y
is a ratio, where X is the numerator and Y is the denominator. X and Y do not need to
have the same units.
• Sex Ratio at Birth is a commonly used ratio in epidemiology and vital statistics.
No. of male births
= Sex Ratio at Birth
No. of female births
Proportion
• A proportion is a special form of a ratio only that in a proportion the numerator is part of
a denominator.
• For example:
o Proportion of females among first-year MUCHS students
No. of females in 1st year
Total no. of 1st-year students
Rate
• A rate is a proportion with the added dimension of time.
• A population must be studied throughout a specified time period (e.g., 1 year), during
which the frequency of an event of interest (e.g., disease, death, etc.) is counted.
• A rate indicates the frequency of events occurring in a population per unit of time.
• For example:
o The death rate per year is given by the number of deaths during the year, divided by
the number of person-years of exposure to the risk of death.
No. of deaths in one year
Crude Death Rate = × 1,000
Total population
• Rates may be expressed per 1,000; per 100,000; or per 1,000,000 population depending
on convention and convenience.
Measures of Fertility
• Common measures of fertility include:
o Crude Birth Rate (CBR)
o General Fertility Rate (GFR)
o Total Fertility Rate (TFR)
o Gross Reproductive Rate (GRR)
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 8: Source and Uses of Morbidity and Mortality Statistics 73
Crude Birth Rate (CBR)
• CBR is called a ‘rate,’ but in practice it is a ratio.
• The rate is ‘crude’ because it does not take into account the risk of giving birth according
to age and sex differences.
Activity: Exercise 1
Instructions
Tutor will provide an example of the steps required to calculate Total Fertility Rate (TFR).
Example
Figure 1: Number of Live Births And Maternal Age, Tanzania, 1988
Age No. of Women No. of live births Age-Specific Fertility Rate
(No. live births/No. of women)
15-19 665,000 21,000 0.0316
20-24 516,000 114,000 0.2209
25-29 459,000 118,000 0.2571
30-34 344,000 123,000 0.3576
35-39 310,000 37,000 0.1194
40-44 229,000 6,000 0.0262
45-49 218,000 5,000 0.0229
Total 2,741,000 424,000 1.0357
• Age-specific fertility rates are calculated by dividing the no. of live births by no. of
women in each age cohort.
Activity continued on the next page
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 8: Source and Uses of Morbidity and Mortality Statistics 74
• For women ages 15-19, ASFR = 21,000 / 665,000 = 0.316
The TFR is the sum of all age specific fertility rates.
• TFR = 1.0357 × 5 = 5.1785
• The sum of all ASFRs is multiplied times 5 because of the 5 year age group interval.
• If ages are in single years, then there is no need to multiply this sum.
• The figure 5.1785 means that on average, each woman will have 5 children during her
reproductive period (assuming that these age-specific fertility rates will still apply until
she finished her reproductive life).
Activity: Exercise 2
Instructions
Tutor will use example below to show the steps required to calculate General Reproductive
Rate (GRR).
Example
Figure 2: Number of Female Live Births And Maternal Age, Tanzania, 1988
Age No. of Women No. of live births No. female births Female ASFR
15-19 665,000 21,000 11,000 0.0165
20-24 516,000 114,000 58,000 0.1124
25-29 459,000 118,000 60,000 0.1307
30-34 344,000 123,000 63,000 0.1831
35-39 310,000 37,000 19,000 0.0613
40-44 229,000 6,000 3,000 0.0131
45-49 218,000 5,000 3,000 0.0138
Total 2,741,000 424,000 217,000 0.5309
• Female ASFRs are calculated by dividing the no. of live female births by no. of women in
each age cohort.
• For women ages 15-19, Female ASFR = 11,000 / 665,000 = 0.0165
• We calculate GRR as follows:
GRR = 0.5309 × 5 = 2.6545.
• Note: If the true sex ratio at birth is known, the GRR can be calculated using the TFR.
• Remember that in Exercise 1, we found that the TFR was 5.1785.
• Calculate GRR using the TFR and the sex ratio, as follows:
217,000
GRR = 5.1785 × = 2.65
424,000
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 8: Source and Uses of Morbidity and Mortality Statistics 75
Measures of Morbidity
Incidence Rates
• Incidence is a measure of the risk of developing a disease/condition within a specified
time period.
• The incidence rate is the number of new cases per population in a given time period (i.e.,
the rate of contracting a disease among those still at risk).
• Incidence rate is expressed as follows, where k = 2, 3, 4, 5 or 6 depending on the
convenience or convention:
Prevalence Rates
• The prevalence of a disease is the total number of existing cases among the entire
population.
• It can be measured at an instant time (point prevalence) or looking for cases over a stretch
period of time (period prevalence).
• Point prevalence is expressed as follows, where k = 2, 3, 4, 5 or 6 depending on the
convenience or convention:
• This index is prone to bias because cases with long duration have a higher probability of
being in the sample than those with short duration.
Specific Rates
• Specific rates apply to:
o Defined geographic areas
o Defined age groups
o Different sexes (male, female)
o Defined socio-economic characteristics (e.g., education level, marital status, etc.)
• They are called rates to that specification.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 8: Source and Uses of Morbidity and Mortality Statistics 76
• Crude Death Rate (CDR)
No. of deaths in one year
Crude Death Rate = × 1,000
Total population
• Note: When the denominator is approximated by the ‘total population’, then the index
obtained is not the actual crude death rate, but rather a ‘crude mortality ratio.’
o Infant mortality rate is often broken down into several indices depending on the age
categories of an infant.
o Generally, these rates are expressed as the number of deaths per 1,000 live births.
o Maternal death is defined as the death of a woman while pregnant or within the 42
days after termination of that pregnancy, regardless of the length and site of the
pregnancy, due to any cause related to or aggravated by the pregnancy itself or its care
but not due to accidental or incidental causes.
Key Points
• The main sources of statistical data are census, vital registration and sample surveys.
• It is important to be able to distinguish between rates, ratio and proportion.
• Measures of fertility, morbidity and mortality are expressed in rates, ratio or proportion.
Evaluation
• Describe the gross reproductive rate.
• Define prevalence rate.
• Define incidence rate.
• What is the neonatal mortality rate?
References
• Bonita, R., Beaglehole, R., Kjellstrom, T. (2006). Basic Epidemiology. (2nd Edition).
Geneva, Switzerland: WHO
• Jones, et al. (2008). Biostatistics Workbook. Atlanta, USA: CDC Field epidemiology and
lab training program (FELTP).
• Makwaya, et al. (1997). Lecture notes in biostatistics, Department of Epidemiology and
Biostatistics, MUCHS.
• McCusker, J. (2001). Epidemiology in Community Health (Rural Health Series, No. 9).
Nairobi, Kenya: AMREF.
• Rosner, B. (2006). Fundamentals of Biostatistics. (6th edition). Belmont, CA: Thomson
Brookes/Cole.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 8: Source and Uses of Morbidity and Mortality Statistics 77
Handout 8.1 Infant Mortality Rate (IMR) Measures
o Infant mortality rate is often broken down into several indices depending on the age
categories of an infant.
o Generally, these rates are expressed as the number of deaths per 1,000 live births.
• Stillbirth Rate
No. of stillbirths in time period
Stillbirth Rate = × 1,000
No. of live births + no. of stillbirths in time period
o This index is important because it documents fetal and neonatal death during or very
soon after delivery. It includes neonates that are born dead or alive.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 8: Source and Uses of Morbidity and Mortality Statistics 78
Session 9: Introduction to Epidemiology
Learning Objectives
By the end of this session, students are expected to be able to:
• Define the concepts of epidemiology, health and disease
• Identify key applications and achievements of epidemiology
• Describe two key epidemiological theories of disease causation
• Explain the determinants of health and disease
Definition of Concepts
Definition
• Epidemiology: The study of the distribution and determinants of health-related states or
events in specified populations, and the application of this study to the control of health
problems.
• Key elements of this definition can be further understood as follows:
o Study = basic science
o Distribution = time, place, person
o Determinants = cause, risk factors
o Event = health status
o Population = public health
o Application = information for action
• Three closely-related components (distribution, determinants and frequency) encompass
all epidemiological principles and methods.
• Epidemiology is a multidisciplinary subject which has borrowed from the fields of
demography, statistics, sociology and other sciences to become a distinct discipline with
its own philosophy.
Health-related Events
• Epidemic communicable diseases
• Endemic communicable diseases
• Non-communicable Diseases
• Chronic Diseases
• Injuries
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 9: Introduction to Epidemiology 79
• Maternal and Child Health
• Occupational, and Environmental Health
• Health Behaviors
Definition of Health
• Health: A state of complete physical, mental, and social well-being and not merely the
absence of disease or infirmity. (World Health Organization)
• Health is more than just the absence of pain or discomfort. Good health is a dynamic
relationship between the individual, friends, family and the environment within which we
live and work.
Definition of Disease
• Disease: A disorder of structure or function in a human, especially one that produces
specific symptoms or that affects a specific part.
Definition of Reservoir
• Reservoir: The habitat in which disease-causing organisms normally live and multiply.
• Reservoirs can be human, animal, or environmental.
o Diseases with human reservoirs:
Smallpox (symptomatic)
HIV (asymptomatic)
o Diseases with animal reservoirs (also known as zoonoses):
Brucellosis (can be found in goats, sheep, cattle, pigs)
Plague (can be found in rats and other wild rodents)
Anthrax (can be found in cattle, sheep, goats, and other herbivores)
o Environmental
Histoplasmosis (caused by a fungus that is often found in areas with lots of
bird/bat droppings such as caves)
Legionnaires’ bacillus (caused by aquatic bacteria that grow in warm water)
• Note: a reservoir is different from a vector or disease carrier, which are agents of disease
transmission.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 9: Introduction to Epidemiology 80
Achievements in Epidemiology
• The field of epidemiology has contributed to many advancements in disease prevention
and eradication. Two examples are described below.
Smallpox Eradication
• During the late 1970s, epidemiology played a central role in smallpox eradication.
• The World Health Organisation (WHO) coordinated intensive smallpox eradication
campaigns, informed by epidemiological data about the distribution of cases, and the
model, mechanisms, and levels of disease transmission.
• Data was accumulated by mapping outbreaks of the disease, and by evaluating control
measures.
• Ten years after the campaign to eradicate smallpox ended, reports confirmed that only
two countries had a reported smallpox case. One naturally occurring case of small pox
was also reported in the year 1977.
• Smallpox was declared eradicated in the year 1979.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 9: Introduction to Epidemiology 81
The Miasma Theory
• This theory dating back to the early 1700’s offered an alternative explanation for the
origin of epidemics.
• The idea was based on the notion that when air is of bad odor or quality, persons
breathing that air would become ill (e.g, malaria, cholera, etc.).
• The miasma theory of disease did inspire many sanitation reforms in England in the
1800s; however, the theory was not supported by any scientific explanation was
abandoned.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 9: Introduction to Epidemiology 82
II. Chemical Agents (presence of poisons or allergens)
Factor Disease
Pesticides Poisoning
Drugs, alcohol Intoxication/dependency
Allergens Eczema
Pollen Hay-fever
III. Physical Agents
Factor Disease
Energy, speed Accidents
Solar radiation Sun burns
Radioactivity Neoplasm
IV. Infectious Agents
Organism Factors Disease
Bacteria e.g. Mycobacterium tuberculosis Tuberculosis
Vibrio cholera Cholera
Viruses e.g. Measles Measles
Polio virus Poliomyelitis
Small pox virus Small pox
Ricketsia e.g. Rickettsia prowazeki Typhus
Rickettsia conorii Tick bite fever
Protozoa e.g. Plasmodium malariae Malaria
Entamoeba histolytica Amoebiasis
Trypanosoma Trypanosomiasis
Helminthes e.g. Schistosoma haemotobium Urinary schistosomiasis
Ascaris lumbricoides Ascaris
Trychophyton e.g. Trychophyton spp. Tinea corporis
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 9: Introduction to Epidemiology 83
Figure 3. Factors Related to the Environment
Factors Example
Physical Climate, geology, radiation, heat, light, air pollution
associated with chronic respiratory disease.
Biological Human Population density
Flora Food source, influence on disease agents/vectors
Fauna Influences presence of host/vectors and agents
Socio-Economic Occupation Occupational hazards
Urbanization Crowding, stress
Development Education, poverty, availability of health services
Disruption War and conflict (Rwanda 1994/5), natural disasters
(Haiti earthquake, 2010)
Key Points
• Three closely interrelated components – distribution, determinants and frequency-
encompass all epidemiological principles and methods
• Epidemiology is a multidisciplinary subject which has borrowed from demography,
statistics, sociology and other sciences to become a distinct discipline with own
philosophy.
• Miasma and contagium theory are the main historical epidemiological theories of disease
causation.
• The determinants/factors and the associated disease are an interplay of host, agent and
environment.
Evaluation
• Define the following epidemiological concepts:
o Health
o Disease
• Name 2 major achievements of epidemiology.
• What is a determinant of a disease?
• Provide an example of determinants/factors influencing disease that is related to:
o Agent
o Host
o Environment
References
• Bonita, R., Beaglehole, R., Kjellstrom, T. (2006). Basic Epidemiology. (2nd ed). Geneva,
Switzerland: WHO
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 9: Introduction to Epidemiology 84
• Jones, et al. (2008). Biostatistics Workbook. Atlanta, USA: CDC Field epidemiology and
lab training program (FELTP).
• Kapiga, S. et al. (1998). Lecture notes in epidemiology and research methodology.
Department of Epidemiology and Biostatistics. MUCHS.
• McCusker, J. (2001). Epidemiology in Community Health (Rural Health Series, No. 9).
Nairobi, Kenya: AMREF.
• Rosner, B. (2006). Fundamentals of Biostatistics. (6th ed). Belmont, CA: Thomson
Brookes/Cole.
• WHO. (2003) WHO Definition of health. Retrieved from
http://www.who.int/about/definition/en/print.html/ (date unknown)
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 9: Introduction to Epidemiology 85
Handout 9.1: Examples of Determinants/Factors Influencing
Disease Occurrence
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 9: Introduction to Epidemiology 86
Figure 2. Factors related to the Host
Factor Example(s)
Age Paralytic polio. The ratio of the paralytic cases to infections increases
with age (1:1000 among young children and 1:75 among adults).
Sex Males usually have higher poliomyelitis attack rate than females.
Genetic Persons with sickle cell trait are associated with decreased risk of
malaria due to plasmodium malaria.
Blood group A has increased risk of gastric cancer while group O have
increased risk for duodenal ulcers.
Ethnicity Certain ethnic groups have increased risk for keloid and gastro-
intestinal tract cancer.
Physiology Pregnancy – candidiasis
Puberty – goiter, stress, nutritional state and fatigue
Immunology Hypersensitivity, allergy
Active Prior infection, immunization
Passive Maternal antibodies
Existing Pathology Pre-existing disease may initiate another by interfering with immunity
(e.g. malaria and herpes simplex, or other concurrent disease)
Behavior Personal hygiene, religion, customs, habits, utilization of health
resources and related diseases.
Factor(s) Example(s)
Physical Climate, geology, radiation, heat, light, air pollution
associated with chronic respiratory disease.
Biological Human Population density
Flora Food source, influence on disease agents/vectors
Fauna Influences presence of host/vectors and agents
Socio-Economic Occupation Occupational hazards
Urbanization Crowding, stress
Development Education, poverty, availability of health services
Disruption War and conflict (Rwanda 1994/5), natural disasters
(Haiti earthquake, 2010)
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 9: Introduction to Epidemiology 87
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 9: Introduction to Epidemiology 88
Session 10: Ecology and Epidemiological
Approach to Causation
Learning Objectives
By the end of this session, students are expected to be able to:
• Define basic concepts of ecology
• Describe the epidemiological models of disease causation
• Explain the concept of causation in epidemiology
• Identify the guidelines for causation of disease
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 10: Ecology and Epidemiological Approach to Causation 89
For example, increased numbers of domestic animals can lead to overgrazing,
erosion, and desertification.
When humans enter new environments, they may encounter new habitats for
disease vectors, parasites, etc. (For example, onchoceriasis transmitted by
blackflies, trypanosomiasis transmitted by tsetse flies, yellow fever transmitted by
mosquitoes, etc.)
o High human population densities (urbanization) together with human patterns of
socialization provide good opportunities to diseases of close contact (droplet
transmission, sexual transmission, etc.).
o It has been found that the probability to contract certain diseases (e.g. measles) does
not depend on the proportion of susceptible individuals in a community, but rather on
the absolute number of susceptible individuals.
• Socialization
o Socialization refers to the way in which humans come in contact with each other.
Within each community or culture a great number of patterns exists in circumstances
such as work, school, religion and recreation.
• Adaptation
o A change or the process of change by which an organism or species becomes better
suited to its environment and better able to survive and reproduce.
o Two organisms can coexist in a host-parasite relationship; however, they often
demonstrate adaptation as the parasite becomes very virulent and the host develops
resistance.
o Due to selection pressure, more sensitive hosts will be weeded out or relatively
resistant mutants will develop.
o This is in the interest of the parasite as well, because the worst that might happen to
the parasite is extinction of its host.
• Herd Immunity
o Herd immunity is the ability of a community to resist disease. It can occur naturally
by exposure to infection or artificially by vaccination.
o High herd immunity indicates the decreased probability of a group or community to
develop a disease upon introduction of an infectious agent (although there may be a
certain number of persons who are individually susceptible to the disease agent).
o This decreased probability (resistance) is a product of the number of susceptible, and
the probability that those who are susceptible will come into contact with an infected
person/disease-causing agent.
o The percentage of vaccination/natural immunity required to produce herd immunity in
a population depends on the disease concerned, and on socialization patterns.
For example, measles transmission still occurs in Tanzania and other tropical
developing countries.
In these places the measles virus may be present in the community.
The number of cases depends upon the number of susceptible persons children
(children) exposed to the virus.
If there is cyclic pattern with peaks occurring at intervals of every 2-3 years, this
suggests that every 2-3 years there is a sufficient number of newly susceptible,
hence low herd immunity.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 10: Ecology and Epidemiological Approach to Causation 90
o Contributory Factors
• Essential/necessary factors are ‘required ingredients’ for disease to occur.
o These are agents of disease such as bacteria and viruses in infectious diseases, and
fire, nutrition, radiation or various poisons in non-infectious diseases.
• Contributory factors are host environment factors that are associated with increased
likelihood of disease occurrence.
o For example:
Host Factors: immunity, sex, age, etc.
Socio-economic factors: poverty, development, etc.
Physical/Environmental Factors: rainfall, temperature, etc.
Biological Factors: presence of vectors, animal reservoirs, etc.
• There are different models that are used to represent more complex systems in the
interplay of these factors and human ecology for disease causation. These models are:
o Epidemiologic Triangle
o Wheel Model of Disease Causation
o Web of Causation Model
Triangular Model
• In this model, also called the epidemiological triad, three main components are important
in the chain of disease transmission:
o Agent
o Host
o Environment
• Under stable ecological conditions, the epidemiological triad is in a balanced state, and
the disease can be said to be absent or endemic.
• If this balance is disturbed and becomes unfavorable to the agent, the incidence of the
disease will decrease.
o If the situation remains unfavorable to the agent, the disease may become sporadic or
even disappear.
• If the balance alters to favour the agent, then an epidemic may occur.
• Although there may be a great deal of information available about a particular disease
agent, the roles of host and environment are not completely understood.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 10: Ecology and Epidemiological Approach to Causation 91
o For example we do not know why only some of the people exposed to large doses of
x-rays develop leukemia or why all heavy smokers develop lung cancer.
o Some diseases like schizophrenia, coronary heart disease, rheumatoid arthritis, and
essential hypertension have not been linked to any causative agent.
• Given these limitations, new models have been developed which de-emphasize the role of
the agent and stress the multiplicity of interactions between host and environment.
• This is described in the wheel model and web of causation model.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 10: Ecology and Epidemiological Approach to Causation 92
interact to bring about atherosclerosis, hypertension and coagulation/clot lysis
which in turn lead to coronary heart disease, cerebro-vascular disease and
hypertensive disease.
o Jaundice (serum hepatitis) development after Syphilis Treatment:
Before the advent of antibiotics the treatment of syphilis was by intravenous
injection of arsenical compounds. Many of the syphilis patients who received this
treatment developed jaundice. In 1967 viral antigens (necessary factor) became
associated with hepatitis for the first time.
Instructions
Think about the models of disease causation that you have just discussed. Brainstorming and
discuss different factors that result in or contribute to disease or disability.
Consider the following four categories: Host Factors, Social/Environmental Factors,
Physical/Environmental Factors, and Biological Environment Factors. Answer the question
Questions
• What factors contribute to road accidents? What category do these factors belong in?
• What factors contribute to coronary heart disease and essential hypertension? What
category do these factors belong in? .
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 10: Ecology and Epidemiological Approach to Causation 93
• The term ‘risk factor’ is commonly used to describe factors that are positively associated
with the development of a disease but that are not sufficient to cause the disease.
o Some risk factors (e.g., tobacco smoking) are associated with several diseases and
some diseases (e.g., coronary heart disease) are associated with several risk factors.
o Epidemiological studies can measure the relative contribution of each factor to
disease occurrence, and also the corresponding potential reduction in disease from the
elimination of each risk factor.
Refer to Handout 10.1: Assessing the Relationship between a Possible Cause and
an Outcome
• The following are steps in assessing the nature of the relationship between a possible
cause and an outcome:
• Temporal Relationship
o Exposure to the factor must necessarily precede development of the disease in order
to consider a causal association.
o This is usually self-evident, although difficulties may arise in case control and cross-
sectional studies when measurements of the possible cause and effect are made at the
same time and the effect may in fact alter the exposure.
o In cases where the cause is an exposure that can be encountered at different levels, it
is essential that a high enough level be reached before the disease occurs for the
correct temporal relationship to exist.
o Repeated measurement of the exposure at more than one point in time and in different
locations may strengthen the evidence.
• Consistency
o Consistency is demonstrated by several studies giving the same result.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 10: Ecology and Epidemiological Approach to Causation 94
o If the same association has been repeatedly observed by different researchers in
different places, under different circumstances and at different times, it is very likely
that the association is causal.
• Plausibility
o Biological plausibility is expressed when an association is plausible.
o If an association is consistent with other knowledge, it is more likely to be causal.
For instance, laboratory experiments may have shown how exposure to the
particular factor could lead to changes associated with the effect measured.
• Strength
o A strong association between possible cause and effect, as measured by the size of the
risk ratio (relative risk), is more likely to be causal than a weak association.
o This could easily be influenced by confounding or bias.
o The stronger the associations (high relative risk) the more readily we can accept direct
causation as likely explanation of the observed association.
• Dose-Response Relationship
o If a dose-response relationship can be demonstrated, then the likelihood that the
exposure is causal increases.
o The dose relationship occurs when changes in the level of the possible cause/agent are
associated with changes in the prevalence or incidence of the effect. (This is also
termed as ‘biological gradient’).
• Reversibility
o When the removal of a possible cause results in a reduced disease risk, the likelihood
of the association being causal is strengthened.
For example, the cessation of cigarette smoking is associated with reduction in
the risk of lung cancer relative to the risk in people who continue to smoke.
This finding strengthens the likelihood that cigarette smoking causes lung cancer.
o If the cause leads to rapidly irreversible changes in the subsequent disease regardless
of continued exposure (as with HIV infection), then reversibility cannot be a
condition for causality.
• Study Design
o The ability of study design to prove causation is the most important consideration.
o The best results come from the well designed, completely conducted, randomized
controlled trials; however, it is not always practical to use this study method for all
investigations, and data of this quality may not always be available.
• Judging the Evidence
o A particular exposure should produce one specific disease; otherwise there is a weak
argument in favour of causation.
For example, if an association is limited to specific workers, particular sites and
one type of disease, and there is no association is drawn between the type of
occupation and another disease, it is a strong argument in favour of causation.
Key Points
• The basic concepts of ecology are: ecology, climax state, food chain, habitat, population
density, socialization and adaptation.
• The concept of causation in epidemiology is an important element in determining disease
occurrence and trends.
• The common important epidemiological models of disease causation are the
epidemiological triangle, the wheel model, and the web of causation model.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 10: Ecology and Epidemiological Approach to Causation 95
• The guidelines for establishing the cause of a disease are: biological plausibility, temporal
relationship, consistency, strength, dose-response relationship, reversibility, study design
and judging the evidence.
Evaluation
• Define the following concepts of ecology: climax state, food chain, habitat, population
density, socialization and adaptation.
• Explain the epidemiological models of disease causation.
• What are the guidelines in establishing the cause of a disease?
References
• Bonita, R., Beaglehole, R., Kjellstrom, T. (2006). Basic Epidemiology. (2nd Edition).
Geneva, Switzerland: WHO
• Kapiga S.H. et al. (1998). Lecture notes in epidemiology and research methodology.
Department of Epidemiology and Biostatistics. MUCHS.
• McCusker, J. (2001). Epidemiology in Community Health (Rural Health Series, No. 9).
Nairobi, Kenya: AMREF.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 10: Ecology and Epidemiological Approach to Causation 96
Handout 10.1: Assessing the Relationship Between a Possible
Cause and an Outcome
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 10: Ecology and Epidemiological Approach to Causation 97
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 10: Ecology and Epidemiological Approach to Causation 98
Session 11: Natural History and Levels of
Prevention of Diseases
Learning Objectives
By the end of this session, students are expected to be able to:
• Define the concept of natural history
• Describe the stages of pathogenesis in the natural history of disease
• Recognize factors responsible for pre-pathogenesis, pathogenesis and post-pathogenesis.
• Identify the components of disease transmission
• Describe the roles of agent, source and host in disease transmission
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 11: Natural History and Levels of Prevention of Diseases 99
o Agent factors are related to the survival in external environment. These include:
Infectivity
Pathogenicity
Virulence
Antigenicity
o Environmental factors can create favourable climates for the development of disease
agents or risk factors. Examples include:
Climate
Altitutude
Temperature
Presence and density of population vectors
Environmental sanitation
Culture
Economic conditions/poverty
Health services (quality, availability, etc.)
Social services
Pathogenesis
• This is the time between onset of disease stimuli or interaction between host and agent
and environmental factors up to the development of discernible lesions and or recovery,
progression of disease process to the formation of disability or death.
• This disease process period can be terminated or shortened by human interventions in
terms of treatment (or secondary prevention).
Post-Pathogenesis
• This is the stage where the agent (or necessary factor) has already been removed from the
affected populations (patients) but there is the effect of the disease persisting in the form
of disability (or sequel of the disease).
• A new disease may be a post-pathogenesis of another disease, e.g. rheumatic fever is a
post pathogenesis of streptococci sore throat.
Levels of Prevention
Prevention
• Prevention: Inhibiting the development of a disease before it occurs.
o This is a relatively narrow conceptualization of prevention.
o In epidemiology, ‘prevention’ includes measures that interrupt or slow the
progression of a disease.
• Four levels of prevention have been identified in epidemiology:
o Primordial Prevention
o Primary Prevention
o Secondary Prevention
o Tertiary Prevention
Primordial Prevention
• Primordial prevention: Preventing agents or risk factors; preventing the interaction
between host, agent and environmental factors, so that disease may not occur.
o Mainly deals with underlying conditions, and reflects more on non-communicable
diseases. (In contrast, primary prevention deals with a specific agent or causal factor.)
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 11: Natural History and Levels of Prevention of Diseases 100
• Efforts and interventions at the primordial prevention level involve anticipation of disease
occurrence and modification of the conditions responsible for the occurrence, before
disease happens.
o Whenever possible, these efforts should be evidence-based, drawing from solid
research and experience in other areas/countries.
o Examples of primordial prevention interventions include:
Policy and public health interventions that discourage, limit, and/or
prohibit cigarette smoking. Cigarette smoking can lead to high blood pressure,
strokes, or lung cancer.
Environmental interventions that reduce air pollution, the greenhouse
effect, acid rain, and ozone layer depletion can also result in a reduction of the
prevalence and severity of respiratory problems in the general population.
Primary Prevention
• Primary prevention: preventing healthy people from becoming ill.
• Typically considered the most cost-effective form of healthcare, because these efforts
help offset the suffering, cost and burden associated with disease.
• Primary prevention helps to lower disease incidence and control disease.
• Examples of primary prevention include:
o Immunization
o Wearing shoes to prevent hookworm
o Adequate intake of proteins and vitamins to prevent malnutrition
o Use of mosquito nets to prevent malaria
o Health education and promotion initiatives aimed at fostering positive health
behaviors
For example, promoting the use of latrines, promoting condom use, etc.
Secondary Prevention
• Secondary prevention: identifying/detecting individuals who are already infected with a
given disease as early as possible, in order to stop the disease from spreading/developing
further.
• Infected individuals should be diagnosed and treated as early as possible, to increase
recovery rates and reduce disability, morbidity, and mortality rates.
o Screening for early diagnosis and treatment can be done for sub-clinical diseases
using laboratory tests.
o Clinical examination can be done to discover early manifestation of disease, which is
easier to reverse.
o Treatment can be provided via drugs, lifestyle modification, or by natural remedies.
Tertiary Prevention
• Tertiary prevention: preventing further disability, or preventing/postponing death due to
disability or secondary disease(s).
o Because the disease is now established, primary prevention activities may have been
unsuccessful.
o Early detection through secondary prevention may have minimized the impact of the
disease.
o Sometimes, the agent has been removed while the disability or effect of the disease is
still visible or felt.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 11: Natural History and Levels of Prevention of Diseases 101
• Tertiary prevention may refer to patients who have been cured of a primary disease but
have a permanent disability (e.g. post-polio paralysis or post-trachoma blindness) and/or
need rehabilitation.
• Examples of tertiary prevention include:
o Palliative care for patients with AIDS or cancer
• Agent: the etiological factor which must be present for the disease, disability or
pathological state to occur in a susceptible host.
o An agent may be defined as the presence, absence, excess or deficiency of a certain
factor.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 11: Natural History and Levels of Prevention of Diseases 102
• One measure of virulence is the Case Fatality Rate (CFR) which can be expressed as
follows:
No. of persons dying of a disease during a
stated time period
Case Fatality Rate (CFR) = × 100
Total no. of persons with the disease during the
same time period
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 11: Natural History and Levels of Prevention of Diseases 103
Suitable Means of Transmission
• Transmission of infectious agents is any mechanism by which a susceptible host is
exposed to an infectious agent. It may be either direct or indirect.
Direct Transmission
• Direct transmission means that the agent is transmitted directly from the infected host
(man or animal) to the new host (e.g. influenza, gonorrhoea, etc.) Direct transmission may
be horizontal or vertical.
o Examples of horizontal transmission of diseases include:
Droplet infection
Faeco-oral route
Genital
Direct skin contact
o Examples of vertical transmission of diseases include:
Transplacental
Genital tract
Indirect Transmission
• Indirect transmission requires a vehicle or vector to carry the agent of disease from one
host to another.
Vehicles
• Substances such as water, air, food, blood used in transfusion, or formites (inanimate
objects used by an infectious host, such as clothing, handkerchiefs, doorknobs etc.)
• Multiplication may or may not take place in or on the vehicle (e.g. in influenza, hepatitis,
and streptococcal disease)
Vectors
• Vectors may be either mechanical or biological, but are always living.
• Mechanical vectors are such animals and insects, which can carry agents from place to
place on their feet, proboscis, or other body parts
o For example, flies are vectors of Shigellosis. Flies can breed in infected feces, and
then contaminate food, which humans may ingest.
• Biological vectors must have growth or multiplication of organisms occurring within the
body of the vector
o For example, fleas infected with Yersinia pestis bacteria transmit plague to humans
and other mammals through their bites.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 11: Natural History and Levels of Prevention of Diseases 104
Characteristics of the Host
• Infected
o An infected person harbours an infectious agent and has either manifest disease or an
asymptomatic infection.
o They may or may not be infectious to others.
• Infectious
o A person from whom the infectious agent can be naturally acquired.
o A person or their articles or clothing may also be merely contaminated with an
infectious agent, without being infected.
• Immune
o A person who possesses specific protective antibodies or cellular immunity as a result
of previous infection or immunisation.
o Immunity is relative: an ordinarily effective protection may be overwhelmed by an
excessive dose of the agent.
o It may also be impaired by immune-suppressive drug therapy or concurrent disease
(such as AIDS).
• Inherent resistance
o The ability to resist disease, independent of antibodies or specifically developed tissue
response.
o It commonly resides in anatomical or physiological characteristics of the host, and
may be genetic or acquired, permanent or temporary.
• Susceptible
o A person is considered susceptible if they do not possess sufficient resistance
(inherent and/or acquired) against a particular pathogenic agent to prevent contracting
a disease when exposed to the agent.
o Persons or animals must belong to a species that is biologically capable of being an
efficient host to the agent in question.
o Susceptibility of a host may be modified by characteristics, such as age, sex, race,
genetic make-up, physiological state, habits and customs, pathological state and
previous experience with the agent (immunity).
Instructions
Refer to Worksheet 11.1: Natural History of Disease and Levels of Prevention for
details on the steps for this assignment.
You will work in small group to accomplish the assignment as instructed by the tutor.
Complete Option 1: Written Assignment or Option 2: In Class Presentation based on class
schedule and amount of time available.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 11: Natural History and Levels of Prevention of Diseases 105
Key Points
• The natural history of disease is an important for in finding the effect of course of disease
in order to be able to determine the diagnosis, treatment and prevention of disease.
• There are four levels of prevention of disease which all have interplay in the natural
history of disease: primordial, primary, secondary, and tertiary.
• Transmission of infectious agents can be direct or indirect in which a susceptible host is
exposed to an infectious agent.
Evaluation
• Define natural history of disease.
• What are the three stages of pathogenesis in natural history of disease?
• What are the four levels of prevention?
• Describe the six key characteristics of infectious agents in disease transmission.
References
• Bonita, R., Beaglehole, R., Kjellstrom, T. (2006). Basic Epidemiology. (2nd Edition).
Geneva, Switzerland: WHO
• Kapiga S.H. et al. (1998). Lecture notes in epidemiology and research methodology.
Department of Epidemiology and Biostatistics. MUCHS.
• McCusker, J. (2001). Epidemiology in Community Health (Rural Health Series, No. 9).
Nairobi, Kenya: AMREF.
• WHO. 2007. World health statistics 2007. Geneva, Switzerland: World Health
Organization. Retrieved from: http://www.who.int/whosis/whostat2007/en/ (date
unknown)
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 11: Natural History and Levels of Prevention of Diseases 106
Worksheet 11.1: Natural History of Disease and Levels of
Prevention
Instructions
• The class will work in small groups.
• Each group will investigate a different topic. Groups should work together to:
o Outline the natural history of the disease/health issue that they are investigating
o Identify a comprehensive list of actions that can be taken to prevent morbidity and
mortality at each level (primordial, primary, secondary, tertiary).
o Include a list of resources/references in assignment/presentation.
o Allow 2-3 hours for group discussion and report/presentation preparation.
• The instructor will inform students to follow instructions for Option 1 or Option 2 below.
o Option 1: Written Assignment
Each group will prepare a brief written paper outlining the content above (natural
history of disease, and opportunities for prevention at all four levels).
Assignment will be handed in to the instructor for feedback and grading.
o Option 2: In Class Presentation
Each group will prepare a brief presentation detailing the content above (natural
history of disease, and opportunities for prevention at all four levels). All
resources for presentation (flipcharts, etc.) should be prepared in advance.
Instructor will inform students about the date of their group presentation, and
groups will present and discuss in class.
Group Assignments:
1. Road and Traffic Safety: High rate of road traffic accidents in Tanzania resulting in
injury and death to motorists and pedestrians
2. Infant Mortality: High infant mortality rate in Tanzania: 76 children per 1,000 live
births (2005) (Source: World Health Organization, 2007.)
3. Maternal Mortality: High maternal mortality ratio in Tanzania: 1500 women per
100,000 live births (2000) (Source: World Health Organization, 2007.)
4. Skilled Birth Attendance: In Tanzania, 47.1% of live births take place at a health
facility, and 52.7% take place at home. (Source: Tanzania Demographic and Health
Survey, 2004-5)
5. Child Health: Low coverage of primary health indicators, including number of children
under-five immunized, nutritional status of children under five, etc.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 11: Natural History and Levels of Prevention of Diseases 107
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 11: Natural History and Levels of Prevention of Diseases 108
Session 12: Introduction to Epidemiological
Methods/Studies
Learning Objectives
By the end of this session, students are expected to be able to:
• Describe key types of studies/methods in epidemiological research
• Describe types of analytical and descriptive surveys
• Specify the formulas for calculating relative risk and odds ratio
• Identify three methods of hypothesis formation
Descriptive Studies
• Descriptive studies are useful in studying the natural history of diseases.
• Descriptive studies describe tell us about the distribution of disease and disease
determinants in human populations.
• They are generally used to describe exposure variables and patterns of disease occurrence
with all types of studies.
o They explain the what, who, when, and where of health events.
o They provide information about person, place and time.
Person/Who: What are the characteristics of the persons affected by the disease?
(age, sex, race, socioeconomic status, genetic constitution, immunological status,
etc.)
Place/Where: What are the geographic characteristics of individuals and groups
who are affected by a particular disease? (geographical placement, altitude,
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 12: Introduction to Epidemiological Methods/Studies 109
latitude, climate, vegetation, proximity to another key location such as body of
water or factory, etc.)
Time/When: Does the disease have any time trend? Many infectious diseases
occur during certain periods of the year (seasonal distribution).
• Descriptive studies are sometimes called descriptive statistics.
o Data can be presented in the form of rates, frequency distributions, measures of
central location and dispersion, graphs, charts and maps.
• Descriptive studies can be time-bound, or can be ongoing.
o For example, disease registries in a particular area are not time-limited. They provide
an ongoing record of various characteristics of the affected individuals including age,
sex, occupation, duration of symptoms, etc.
o Cross-sectional studies are examples of time-limited descriptive studies.
• Key types of descriptive studies include:
o Ecological/Correlational Studies
o Case Reports/Case Series
o Cross-Sectional studies (which can be considered partially analytical)
Analytical Studies
• Analytical studies (or explanatory studies) try to explain a disease in context (i.e., provide
a situational analysis).
• These studies are designed specifically to explain the determinants (i.e., the how and the
why) of disease. They answer the following questions:
o Why does the disease occur in the persons experiencing it, and not in the persons not
experiencing it?
o Why do certain persons fail to make use of health services?
o Can the decreased incidence of the disease be attributed to the introduction of
preventive measures?
• To answer these questions, hypotheses are formulated and tested that may help to explain
the situation.
• Examples of analytical studies include:
o Ecological Studies
o Cross-sectional Studies
o Cohort Studies
o Case-Control Studies
• Note that there may be some overlap between the categories of Analytical Studies and
Descriptive Studies. Some types (ecological, cross-sectional) often fit into both
categories.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 12: Introduction to Epidemiological Methods/Studies 110
o ‘Ecological fallacy’ is when a person believes that what they observe at a group level
also applies on an individual level. It is easy to draw weak or false conclusions, such
as the following examples:
Proportions affiliated to certain religions in a country, and suicide rates in a
country. (While a country with a higher proportion of Protestants may also have
higher suicide rates than a country with different religious composition, there is no
evidence that individual Protestants are more likely to commit suicide than a
member of any other religious group.)
Proportion of the population in mining occupations and lung cancer rates in the
country.
• Individual level surveys also survey groups; however, they utilize information from
individuals. Such surveys are performed to test hypotheses that a specific factor is related
to a specific disease.
o Cross-sectional studies have an unselected population (i.e., prevalence studies)
o Case-control and cohort studies require data referring to more than one point in time,
and hence are longitudinal/time-span studies.
Ecological Studies
Ecological Studies
• These studies are often referred to as correlational studies.
• They frequently initiate the epidemiological process and allow for more detailed analysis
of observed correlations.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 12: Introduction to Epidemiological Methods/Studies 111
• Measures that represent characteristics of the entire population are used to describe
disease in relation to a factor of interest.
o Factors of interest may include per capita food consumption, per capita cigarette
consumption, infant mortality, mean annual rainfall, etc.
o For example, in one country a relationship was demonstrated between average sales
of an anti-asthma drug and the occurrence of an unusually high number of asthma
deaths.
• The two variables are correlated, and the measure of correlation is called the correlation
coefficient (r).
o For example, correlating per capita cigarette smoking and the occurrence of lung
cancer.
• The correlation coefficient (r) quantifies the extent to which there is linear relationship
between exposure and disease.
o It ranges from -1 to +1.
• The units of analysis are groups of people rather than individuals.
• Such relationships may be studies by comparing populations in different countries at the
same timed, or the same population in one country at different times. (The latter approach
may avoid some socio-economic confounding.)
• Although simple to conduct and thus attractive, ecological studies are often difficult to
interpret since they usually rely on data collected for other purposes, and essential
exposure data may not be available.
• Because the unit of analysis is a population or group, the individual link between
exposure and effect cannot be made.
• One attraction of ecological studies is that data can be used from populations with widely
differing characteristics.
o For example, correlation between esophageal cancer rates in communities with
different patterns of salt consumption.
• Ecological fallacy or bias may result from ecological studies if one erroneously infers that
relationships established between two or more variables, measured at an aggregate level,
will also hold at the individual level.
• Advantages of correlational studies:
o They can be undertaken rapidly
Often, these studies use routine data that is already available.
o They are inexpensive to conduct
• Limitations of correlational studies:
o Unable to link exposure with disease in particular individuals
For example, a correlational study cannot prove a link between increasing pap
smear rates among women and decreasing mortality from cancer of the cervix. In
this case it is not easy to prove that those women who underwent pap smears are
the same who experienced a reduction in cancer mortality.
o Lack of availability to control for the effects of potential confounding factors.
For example, it has been shown that there is strong correlation between per
capita number of colour television sets and Coronary Heart Disease (CHD) in
various countries. Owning a colour television is an indicator of economic well-
being, and related to other factors which are more likely to increase the risk of
CHD than owning a colour television.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 12: Introduction to Epidemiological Methods/Studies 112
Cross-Sectional Studies
Cross-Sectional Studies
• Cross-sectional studies or prevalence studies are carried out at a certain point in time and
in a given population or geographical area.
• They depend on a single examination of a cross-section of the population in which sick
and healthy, or exposed and unexposed, are not distinguished until results are examined.
• Information is collected through surveys/questionnaires, and/or laboratory or physical
examination of individual members of the study population.
• The prevalence of a risk factor or a disease is expressed as the proportion of the affected
individuals in the study population in a given geographical and area at a given point in
time.
• From a well-defined population, disease status and exposure are assessed simultaneously.
• The point in time could be:
o Calendar year (mid-year, mid-month)
o A fixed point in the course of events
This varies in real time from person to person. Examples include menarche,
adolescence, military recruitment, school age, etc.
• An important limitation of cross-sectional studies is their inability to sort out cause and
effect relationships since both are found in the study population at the same time.
• Cross-sectional studies can be used for the following purposes:
o To determine the magnitude of disease or disease determinants in a community in
terms of their prevalence.
o To study preliminary associations between disease and possible aetiological factors by
comparing the characteristics of the sick with those of the healthy.
o To screen undiagnosed disease in a community.
• Limitations of cross-sectional studies:
o Not possible to determine whether the exposure preceded or resulted from the disease.
o Reflects determinants of survival as well as aetiology.
Refer to:
• Handout 12.3: Summary of Case-Control and Cohort Studies
• Handout 12.4: Measures of Effect in Cohort and Case-Control Studies
Note: students will have additional practice calculating risk ratios and odds ratios in later
sessions.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 12: Introduction to Epidemiological Methods/Studies 113
Cohort Studies
• Cohort studies are synonymous with prospective studies, longitudinal studies, follow-up
studies or incidence studies.
• They can be carried out prospectively or retrospectively (i.e., historically).
• These studies are carried out on a sample of the population to determine the rate at which
groups of the population develop disease or die from it when differentially exposed. This
is one way of testing a hypothesis in disease causation.
• Basic data from cohort studies are represented in Figure 1:
• Association is said to exist between exposure and development of disease if the measure is
significantly different from unity.
• Standard statistical techniques are available to test this difference.
• Although cohort studies may take long to accomplish, they have the advantage that they
are more reliable in providing evidence for causation than other analytical studies.
Case-Control Studies
• Also called case-referent studies.
• These studies involve the comparison of cases of the disease under study with
comparable controls for levels of exposure.
• The effect of exposure in such studies is measured by the Odds Ratio (OR = ad/bc) as
shown in Figure 2, which is an approximation to the Relative Risk (RR).
• Case-control studies are by far the simplest in determining cause and effect relationships.
• They take a short time to complete and have the advantage that they use cases of the
disease under study and are especially useful for studying rare diseases.
• Important disadvantages of case-control studies are that they are more susceptible to
selection bias and that information on exposure is less accurately ascertained than in
cohort studies.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 12: Introduction to Epidemiological Methods/Studies 114
Figure 2: Exposure Status Among Cases and Controls
Exposure Disease status
status Cases (diseased) Controls (healthy) Total
Exposed a b a+b
Not exposed c d c+d
Total a +c b +d (a + b + c + d)
Experimental Studies
• An experimental study is an investigation in which the researcher wishes to study the
effects of exposure to or deprivation of a defined factor, and designs a situation in which
subjects (persons, animals, communities, etc.) will be exposed to or deprived of the factor.
• If the investigator compares subjects exposed to the factor with subjects not exposed to it,
the study is a controlled experiment.
• Common experimental studies include:
o Intervention studies
o Clinical trials
o Prophylactic trials
Refer to Handout 12.5: Additional Study Types and Terminology for a list of
common research conducted in the healthcare sector that do not investigate epidemiology.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 12: Introduction to Epidemiological Methods/Studies 115
Methods of Hypothesis Formulation
• There are three methods of hypothesis formulation.
o Method of Difference
If the disease frequency is significantly different between two sets of
circumstances, the disease might have a causal association with a particular factor
that differs between the two.
o Method of Agreement
If a single factor is common in a number of circumstances in which the disease
occurs, causal association can be suspected.
o Method of Concomitant Variation
If a factor varies in proportion to the frequency of disease, causal association
can also be suspected
Key Points
• The main categories of epidemiological studies are observational studies and
experimental studies.
• The aim of descriptive studies is to generate ideas or hypotheses for association(s)
between risk factors and diseases, while analytical studies use a comparison group to
establish an association.
• Case control studies are important in determining disease prevalence, while cohort studies
are useful in determining disease incidence.
• There are three methods of hypothesis formulation:
o Method of agreement
o Method of difference
o Method of concomitant variation
Evaluation
• What are the main categories of epidemiological studies?
• Name two types of analytical studies.
• What is the formula for calculating odds ratio in case control studies?
• What are the three methods of hypothesis formation?
References
• Bonita, R., Beaglehole, R., Kjellstrom, T. (2006). Basic Epidemiology. (2nd Edition).
Geneva, Switzerland: WHO
• Freedman, D. (1999). Ecological Inference and the Ecological Fallacy. Retrieved June
18, 2010 from http://www.stanford.edu/class/ed260/freedman549.pdf.
• Jones, et al. (2008). Biostatistics Workbook. Atlanta, USA: CDC Field epidemiology and
lab training program (FELTP).
• Kapiga, S. et al. (1998). Lecture notes in epidemiology and research methodology.
Department of Epidemiology and Biostatistics. MUCHS.
• MacMahon B., Trichopoulos, D. (1996). Epidemiology: principles and methods. Boston:
Little, Brown, & Co.
• McCusker, J. (2001). Epidemiology in Community Health (Rural Health Series, No. 9).
Nairobi, Kenya: AMREF.
• Rosner, B. (2006). Fundamentals of Biostatistics. (6th edition). Belmont, CA: Thomson
Brookes/Cole.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 12: Introduction to Epidemiological Methods/Studies 116
CMT 05101 Epidemiology and Biostatistics
Session 12: Introduction to Epidemiological Methods/Studies
NTA Level 5 Semester 1
Source: Adapted from Kapiga S., et al. (1998).
Handout 12.1 Types of Epidemiological Studies
Note: The above figure is a very simplified one and is meant to assist understanding the possible relationships
in epidemiological methods. In practice, there will be overlaps between certain methods.
Student Manual
117
Handout 12.2: Summary of Types of Descriptive Studies
Descriptive Studies
Populations Individuals
(Ecological)
Cross-Sectional
Case-Reports
Case-Series
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 12: Introduction to Epidemiological Methods/Studies 118
Handout 12.3: Summary of Case-Control and Cohort Studies
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 12: Introduction to Epidemiological Methods/Studies 119
Handout 12.4: Measures of Effect in Cohort and Case-Control
Studies
a
a+b
Relative Risk =
c
c+d
• Association is said to exist between exposure and development of disease if the measure is
significantly different from unity.
• Standard statistical techniques are available to test this difference.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 12: Introduction to Epidemiological Methods/Studies 120
Handout 12.5: Additional Study Types and Terminology
• Evaluative Studies
o Carried out to appraise the value or quality of healthcare or health programs
• Programme review
o A programme is any enterprise organized to eliminate or reduce one or more
problems.
o A programme review evaluates the care given to specific patients, communities, or
populations, or may evaluate a particular programme that operates in a defined setting
with specified aims and goals.
o Programme examples include immunization coverage in an area, iodization of salt for
goitre prevention, fluoridation of water supplies for the prevention of dental caries,
etc.
• In-built evaluation
o An evaluation that is planned in advance (for example, during the early programme
planning phase) and the required evaluation data is collected in a systematic way as an
integral part of the provision of service.
• Medical audit
o A type of evaluation in which the quality of service is evaluated by appraising the
quality of care given to individuals. If a medical audit discovers that some useful
procedure is not being done in the course of patient care, then it recommends that it
should be done (e.g., testing of blood for Hepatitis B virus before transfusion).
• Surveillance
o A systematic collection, analysis, and use of information for the control of a specific
disease. Generally it is used to observe the ongoing health status of a
community/population.
• Pilot study
o A dress rehearsal of an investigation performed in order to identify defects in the
study design.
• KAP studies
o Studies of Knowledge, Attitudes, and Practices (KAP) towards healthcare or health
beahviors. These are important in health education methods.
• Operational research
o Research concerned with organizational problems that seeks to determine how best a
service can be provided given all the possible constraints.
o For example, we know vaccination against measles is effective and could eradicate
measles, but there are many constraints, including:
Cold chain maintenance problems
Faulty immunization techniques
Lack of proper supervision
Inadequate facilities and resources
Lack of staff motivation
Lack of community motivation
o Operational research determines which one of these constraints is most important in
the control of measles.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 12: Introduction to Epidemiological Methods/Studies 121
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 12: Introduction to Epidemiological Methods/Studies 122
Session 13: Case- Control Studies
Learning Objectives
By the end of this session, students are expected to be able to:
• Describe case-control studies
• Explain advantages and disadvantages of case-control studies
• Describe retrospective and prospective case-control designs
• Discuss measures of effect from a case-control study
• Calculate an odds ratio from a 2-by-2 table
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 13: Case-Control Studies 123
Types of Case-Control Study Designs
• Retrospective Case-Control Design
• Prospective Case-Control Design
Retrospective Design
• All the case have been diagnosed by the time the investigator initiates the study
• They use common cases
Prospective Design
• The study is begun and all new cases diagnosed within a specified period of time are
included into the study.
• This design tries to avoid studying survivors and hence addresses aetiological factors
as close as possible to the commencement of the disease process.
Selecting Cases
• Cases should represent as far as possible a homogenous disease entity to ensure that the
cases selected represent a homogenous disease entity.
o Establish strict diagnostic criteria for the disease.
For example, example cervical cancer or cancer of the body of the uterus, and not
uterine cancer (which includes both types)
o Depending on the certainty of the diagnosis and the amount of information available it
is often useful to perform the analysis and present the results separately for cases
classified as definite, probable or possible.
• Selection of cases should be done from a well-defined population called source
population.
• Possible sources of cases are:
o Hospital or healthcare facility
Commonly involves identifying people with disease who have been treated at a
particular health facility during a specified period of time.
Although such cases may be identified easily, the underlying source population
may not be well defined.
o General Population
All diseased individuals or a random sample from a defined population.
Avoids bias from selection factors that led the affected individual to utilise a
certain health care facility.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 13: Case-Control Studies 124
Selecting Controls
• A control is as much like a case as possible except that they do not have the disease
(outcome) in question.
• They must have the same opportunity for exposure as a case and must be subject to the
same inclusion and exclusion criteria.
• No one control group is optimal for all situations.
• Scientific, economic and practical considerations should be sought before selecting
controls.
Sources of Controls
• Hospital/Healthcare Facility
o These may be patients admitted at the same hospitals as the cases for conditions other
than the disease being studied.
o Advantages Selecting Controls from Hospital/Health Care Facility
They are easily identified.
They are readily available and minimize assembling costs.
Minimizes potential for recall bias in both cases and controls.
Provides for similar hospital selection factors for cases and controls
Likely to co-operate than healthy individuals hence minimizing non- response.
o Disadvantages Selecting Controls from Hospital/Health Care Facility
They are ill by definition and differ from the health in a number of ways that may
be associated with the exposure.
The experience of these patients may not accurately represent the exposure
distribution in the population from which the cases are derived.
• General Population
o This is the source to be used when the cases have been selected to represent affected
individuals in a defined general population and also when hospital based controls are
not scientifically desirable or feasible.
o Recruitment may be done by random sampling methods using the available sampling
frame (administrative), selection of individuals from population registers, voting lists,
census records, salary lists, etc.
o Though controls from the general population may represent the non-ill individuals in
the community they have some limitations:
Usually costly and time consuming.
Sampling frame may not be available.
Difficult to get hold of busy people with a lot of scheduled activities.
May not recall exposures at the same level as the cases.
Individuals who have not experienced an adverse health effect, are less motivated
to participate, hence non-response may be higher than in hospital based controls
• Special Groups
o These include friends, relatives, spouses and neighbours of the cases.
o Advantages of Using Special Groups for Controls
They are healthy like other members of the population.
Likely to be co-operative due to the interest they have in the health of the case.
May offer some degree of control of confounders such as ethnic background,
socio-economic status, environmental factors, etc.
o Disadvantages of Using Special Groups for Controls
Due to the closeness of the controls to the cases, distribution of the factor under
study may be similar in the cases as in the controls, and hence an underestimate
of the true effect of the exposure may occur.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 13: Case-Control Studies 125
How Many Controls should be selected per Case?
• The precision of the study can be improved by increasing the number of study subjects.
• In a case-control study, the limiting factors are usually the number of available cases.
Controls may be easier to find. So we can increase the number of controls. This will
increase the power of the study, but not infinitely.
• Going from one to two controls per case vastly increases the power. Above a ratio of 4 or
5 to 1, the gain becomes very small. It is usually not worthwhile to have a ratio this large,
unless the information is already collected and easily accessible (for instance in a
computer database/file).
• There is a switch over between the need for higher study power, and the cost of finding
more cases and controls.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 13: Case-Control Studies 126
a
c a×d
Odds Ratio = =
b b×c
d
• In interpreting results from case control studies and indeed from any other
epidemiological study, the following should always be considered:
o Bias minimized by adopting a good design.
o Chance should be evaluated by statistical methods.
o Confounding may be minimized by adopting adequate design and adjusting it in the
analysis.
• When the above are taken care of, the odds ratio may be interpreted as follows:
o OR = 1
An odds ratio of one means lack of association between exposure and disease.
o OR ≠ 1
Odds ratios different from one indicate the possibility of an association between
exposure and disease.
o OR > 1
If the odds ratio is greater the one exposure to the factors will lead to an increased
risk of disease
o OR < 1
If the odds ratio is less than one it shows a protective effect of the exposure under
investigation.
Instructions
You will work in small groups to calculate and interpret Odds Ratio (OR). Record your
answers. Prepare to share your answers in plenary.
REFER s to:
• Handout 12.4: Measures of Effect in Cohort and Case-Control Studies
• Worksheet 13.1: Calculating Odds Ratio
Key Points
• A case control-study is an epidemiologic investigation which involves comparing the
characteristics of diseased persons (the cases) with those of non-diseased person (the
controls).
• The purpose of this comparison is to identify factors which occur more (or less)
frequently in the cases as compared to the controls
• A case-control study provides clues regarding the role of the factors in elevating or
reducing the risk of the disease under investigation.
Evaluation
• What are the advantages of case control studies?
• What are the disadvantages of case control studies?
• What are the advantages of selecting controls from hospital or health facility?
• What are the disadvantages of selecting controls from hospital or health facility?
• What are the advantages of using special groups for controls?
• How is the Odds Ratio interpreted?
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 13: Case-Control Studies 127
References
• Bonita, R., Beaglehole, R., Kjellstrom, T. (2006). Basic Epidemiology. (2nd Edition).
Geneva, Switzerland: WHO.
• Greenberg, R.S., et al. (1993). Medical epidemiology. East Norwalk, CT: Appleton
Lange.
• Mausner J.S., Kramer, S. (1984). Epidemiology: An introductory text. Philadelphia:
Saunders.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 13: Case-Control Studies 128
Worksheet 13.1: Calculating Odds Ratio
Instructions
• Work together in small groups to calculate and interpret the Odds Ratio (OR) for each
problem below.
• Note that all data in the following problem is hypothetical, and the information below is
intended only to help illustrate how to calculate Odds Ratios.
• Refer to Handout 12.4: Measures of Effect in Cohort and Case-Control Studies as
needed (from previous Session)
Problem 1
• Calculate Odds Ratio using the following data.
• Interpret the result (strength of association)
Problem 2
• Calculate Odds Ratio using the following data.
• Interpret the result (strength of association)
Problem 3
• Calculate Odds Ratio using the following data.
• Interpret the result (strength of association)
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 13: Case-Control Studies 129
Problem 4
• Calculate Odds Ratio using the following data.
• Interpret the result (strength of association)
Problem 5
• Calculate Odds Ratio using the following data.
• Interpret the result (strength of association)
Distribution of Cases and Controls According to Bottled Water Consumption and
occurrence of diarrhoea disease in a Case-Control Study
Case Control Total
Exposed 15 20 35
Not exposed 20 15 35
Total 35 35 60
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 13: Case-Control Studies 130
Session 14: Cohort Studies
Learning Objectives
By the end of this session, students are expected to be able to:
• Describe cohort studies
• Explain advantages and disadvantages of cohort studies
• Discuss measures of effect from a cohort study
• Compare case-control studies and cohort studies
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 14: Cohort Studies 131
Key Characteristics of Cohort Studies
• Start with ‘healthy’ subjects
o Before they develop disease of interest, and/or who are free of the disease of interest.
• Classify them on the basis of exposure or risk factor.
• Follow subjects and assess the occurrence of outcome of interest among exposure groups.
• Compare incidence of outcome in both groups.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 14: Cohort Studies 132
o For Example: In evaluating potential adverse health outcomes associated with the use
of Oral Contraceptives (OC), choice of an appropriate comparison group is of critical
importance.
Some studies have used a comparison group of women not using OC, while others
have selected a group using some form of contraceptive other than OC.
Women not using contraception may differ substantially from users of any type of
contraception in terms of ability or desire to become pregnant or the nature of
their sexual practices.
On the other hand, women using various forms of contraceptives are likely to
differ from OC users with respect to religion, socioeconomic status, and other
lifestyle factors.
o Thus, it may be that no one comparison group is clearly superior and information of
the role of OC and disease can best be explained by comparing the results from
various comparison groups.
• Consistent results from multiple comparison groups reinforces the believe that the
exposure under observation is related to outcome or disease under investigation.
Selection of Subjects
These can be selected based on the following options:
• Random samples of populations
o Not useful when exposure is rare
• Special exposure groups (e.g. atomic bomb survivors)
o Groups with high exposure prevalence
o Useful for occupational or environmental exposures
• Within special information groups (e.g. doctors, students)
o Groups with high anticipated quality of information
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 14: Cohort Studies 133
• Ie = Cumulative Incidence of disease among exposed
Ie = a (a + b ) = a
Ne
• Io = Cumulative Incidence of disease among unexposed
Io = c (c + d ) = c
NO
• Relative risk is the measure of strength of the association between exposure and disease.
• It measures the risk of a disease in exposed groups when compared to the unexposed
group.
• Interpreting Relative Risk:
o RR = 1
a relative risk of one (1) shows no increased risk in the exposed group (i.e., no
association between exposure and disease)
o RR > 1
a relative risk greater than one (1) indicates increased risk among exposed group
o RR < 1
A relative risk less than one (1) indicates a decreased risk in the exposed group.
Activity: Exercise 1
Instructions
Tutor will provide example of calculating cumulative incidence of disease among exposed
and unexposed, and of calculating relative risk. Follow along with the calculations.
Example
Figure 2: Relationship Between Diabetes and Obesity
Diabetes Not Diabetic Total
Obese 773 27 800
Non Obese 227 33 260
Total 1000 60 1060
Relative Risk
0.96/0.87= 1.1
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 14: Cohort Studies 134
• Ie = Incidence of disease in exposed
Io = Incidence of disease in unexposed
• Risk Difference:
RD = Ie - Io
Instructions
You will work in small groups to answer Question 1, 2, or 3 in the worksheet.
Refer to:
• Handout 12.4: Measures of Effect in Cohort and Case-Control Studies
• Worksheet 14.1: Calculate Relative Risk (RR)
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 14: Cohort Studies 135
Comparison between Case Control and Cohort Studies
Figure 3: Comparison Between Case-Control and Cohort Studies
Characteristic Case Control Studies Cohort Studies
Sample size Small Large
Costs Less More
Study time Short Long
Rare disease Advantage Disadvantage
Rare exposure Disadvantage Advantage
Multiple Good in studying multiple exposure Not good to study when there are
exposures (Advantage) multiple exposures (Disadvantage)
Multiple Not good to study when there are Good in studying multiple
outcomes multiple outcomes (Disadvantage) outcomes (Advantage)
Take-Home Assignment
Activity: Take-Home Assignment: Practice Calculating Odds Ratio and Relative Risk
Instructions
Refer to:
• Handout 12.4: Measures of Effect in Cohort and Case-Control Studies
• Worksheet 14.2: Homework Assignment: Practice Calculating Odds Ratio and
Relative Risk
Key Points
• Cohort studies are good at studying an exposure that has multiple outcomes.
• It is easy to explain temporal relationship in the cohort studies
• The measure of risk for cohort studies is relative risk (RR)
• Risk Difference is a good measure when you want to find how the disease will be reduced
if the exposure is removed in a population.
• There a some limitations to conduct cohort studies
Evaluation
• What is ‘risk difference’ and how is it calculated?
• What are the six advantages of cohort studies?
• Mention the six limitations to cohort studies?
References
• Bonita, R., Beaglehole, R., Kjellstrom, T. (2006). Basic Epidemiology. (2nd Edition).
Geneva, Switzerland: WHO.
• Greenberg, R.S., et al. (1993). Medical epidemiology. East Norwalk, CT: Appleton Lange.
• Mausner J.S., Kramer, S. (1984). Epidemiology: An introductory text. Philadelphia:
Saunders.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 14: Cohort Studies 136
Worksheet 14.1: Calculate Relative Risk (RR)
Instructions
• Work in small groups to complete the problems in this worksheet during class.
• The tutor will assign you Question 1, 2, or 3 to begin with.
• After you have finished the question assigned to your group by the tutor, continue
working on the next question in the worksheet.
• You will have approximately 15 minutes to complete your work.
Question 1
In a study of exposure of dye from a certain industry and occurrence of cancer of urinary
bladder the following results were obtained.
Exposure to Dye and Occurrence of Cancer of Urinary Bladder
Exposure status Diseased Non Diseased Total
Exposed to dye 20 5 25
Not exposed to dye 5 20 25
Total 25 25 50
Work together to:
a) Calculate Relative Risk (RR)
b) Calculate the Risk Difference (RD)
c) Interpret the results
Question 2
A study was conducted among workers and surrounding communities in a mountainous
region of Tanzania to investigate exposure to iodized salt and occurrence of thyroid goitre.
The results are presented below:
Occurrence of Thyroid Goitre and Exposure to Iodized Salt
Exposure status Diseased Non Diseased Total
Exposed to iodized salt 100 120 220
Not exposed to iodized salt 100 100 200
Total 200 220 420
Question 3
A study was conducted among workers in a sheet metal factory to investigate exposure to
asbestos dust from the sheet metal industry and occurrence of lung cancer. The results are
presented below:
Occurrence of Lung Cancer and Exposure to Asbestos Dust
Exposure status Diseased Non Diseased Total
Exposed to Asbestos 100 100 200
Not exposed to Asbestos 100 120 220
Total 200 220 420
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 14: Cohort Studies 137
Work together to:
a) Calculate Relative Risk (RR)
b) Calculate the Risk Difference (RD)
c) Interpret the results
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 14: Cohort Studies 138
Worksheet 14.2: Calculating Odds Ratio And Relative Risk
Instructions
• Work in small groups to complete the worksheet as homework.
• Refer to class materials/notes, and also to Handout 12.4: Measures of Effect in Cohort
and Case-Control Studies (from Session 12) for reference as needed.
• Be sure that all group members participate in the problem-solving and discussion.
• Write down all of your work, and note any questions/challenges you have along the way.
• Submit your assignment to the instructor at the next class period. Be sure all group
members’ names are recorded on the work you submit.
• If there is class time available, the tutor may discuss this assignment in plenary, and your
group may be asked to share the answers.
Question 1
A case-control study was done to determine the association between the use of aspirin and a
suspected adverse effect. 200 cases and controls each were recruited. Among the cases, 190
had used aspirin before, while it was 130 for the controls.
Question 2
In a prospective cohort study to determine the risk of exposure to arsenic and squamous cell
carcinoma of skin, 600 non-diseased people were involved in the study. Among them 300
were exposed to arsenic metal and the other 300 were not. After a period of 10 years of
follow up, 150 people among those who were exposed had developed squamous cell
carcinoma while only 20 people developed squamous cell carcinoma among those who were
not exposed to the metal.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 14: Cohort Studies 139
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 14: Cohort Studies 140
Session 15: Testing and Screening of a Disease
Learning Objectives
By the end of this session, students are expected to be able to:
• Define the concepts of testing and screening
• Identify types of screening and their specific aims
• Describe the measurement properties of a screening test
• Explain criteria for initiating a screening programme
• Identify types of accepted screening methods for specific diseases and target populations
Definitions of Screening
• Screening:
o The examination of asymptomatic people in order to classify them as likely or
unlikely to have the disease of interest.
o The presumptive identification of unrecognized disease or defect by the application of
tests, examinations, or other procedures, which can be applied, rapidly to sort out
those who probably have a disease from those who probably do not.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 15: Testing and Screening of a Disease 141
Types of Screening, Specific Aims, and Measurement Properties
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 15: Testing and Screening of a Disease 142
• The results of a screening test and disease status can be examined conveniently by use of
the four-fold contingency table.
• From a population of N people, D+ has the disease, while D- does not have the disease.
• The prevalence of disease in this population can be represented as:
D+
N
• In this population:
o T+: persons are positive on the screening test, while;
o T-: Persons are negative on the screening test.
o True Positives (TP): Diseased individuals with a positive screening test
o False Positives (FP): Healthy individuals with a positive screening test
o False Negatives (FN): Diseased individuals with a negative screening test
o True Negatives (TN): Healthy individuals with a negative screening test
• An ideal screening test has as few false positives and false negatives as possible.
• Sensitivity: The ability of a test to give a positive finding when the tested person truly
has the disease under study.
o It is the probability that the test will be reactive in a diseased individual.
• A test with a high sensitivity will detect a high percentage of diseased individuals.
• Specificity: The ability of a test to give a negative finding when the tested person is truly
free of the disease under study.
o It is the probability that a test result will be non-reactive in an individual who is not
diseased.
True Negatives (TN)
Specificity = × 100%
Total Disease Negative (D-)
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 15: Testing and Screening of a Disease 143
• However, the pattern and prevalence of other diseases will influence specificity of the
test. The mix of sub-clinical and clinical cases in a population might affect the sensitivity
of a test.
• The predictive value is influenced by the sensitivity and specificity of the test, as well as
the prevalence of the disease in the population.
• The following table illustrates the influence of prevalence on predictive value, and the
independence of sensitivity and specificity on prevalence:
Figure 2: Influence of Prevalence on The Predictive Value and Accuracy of Screening Test
• In both cases sensitivity and specificity are 90% but the predictive value of a positive test
is different and depends on the prevalence of the disease in the study sample.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 15: Testing and Screening of a Disease 144
• Which cut-off point one wishes to choose, depends on several considerations:
o Is it harmful or serious to miss case?
If this is true, then choose a value for the positivity criterion that minimizes the
false negatives. Sensitivity should be high, usually at the expense of specificity.
(For example, neonatal phenylketonuria screening).
o Is treatment of risky? (Does it risk serious side-effects, operative mortality, etc.?)
If this is true, then the number of false positives should be low, and specificity
high (usually at the expense of sensitivity).
This also applies when a false positive diagnosis may have deleterious effects on a
patient’s lifestyle, self-image, or financial situation (For example: AIDS, mental
illness, or learning disorders).
o Is a highly specific and sensitive confirmatory test available?
If this is true, then aim for a very high sensitivity in the preliminary (screening)
test.
• Sensitivity is always inversely proportional to specificity. By increasing sensitivity one
has to accept a loss in specificity and vice-versa.
Serial/Consecutive Testing
• In serial testing, we first apply one test to a certain population and all people identified as
having a positive test are then submitted to a second test.
• This is called re-testing of reactive individuals with another test.
• For example, consider the Veneral Disease Research Laboratory (VDRL) test as a
screening test for syphilis, and the TPHA (Treponema Pallidum Haemaglutination Assay)
as a confirmatory test.
• We will assume a population with a prevalence of syphilis of 20%.
Figure 3: First Serial Test for Screening of Syphilis Disease (by VDRL)
• In serial testing, the 260 people who tested positive on the VDRL will then be subjected
to a second series of testing with TPHA, which is known to have the following test
characteristics: 95% Sensitivity and 99% Specificity.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 15: Testing and Screening of a Disease 145
• Positive Predictive Value = (171/171.8) × 100 = 99.53%
• Sensitivity = 95%
• Specificity = 99%
• Overall sensitivity = (180 – 9) /200 = 171 / 200 = 85.5 %
• Overal specificity = (720 + 79.2) /800 = (799.2/800) × 100 = 99.9
• Overall positive predictive value = (171/171.8) × 100 = 99.53%
• When using tests in series, the overall sensitivity does decrease, while overall specificity
increases.
• Most importantly, the predictive value always improves because the prevalence is
increased for the second test.
o In this example, the prevalence increased to 69.2%. [(180/260) × 100 = 69.2%]
• In general one can use the following general formulae to calculate overall sensitivity and
specificity:
o Overall Sensitivity = (Sensitivity A × Sensitivity B)
o Overall Specificity = (1 – Specificity A) × (1 – Specificity B)
• As far as sensitivity and specificity are concerned, it does not matter in which order the
tests are carried out, but efficiency and total cost may differ considerably.
• In general, one benefits most from serial testing if the most sensitive test is used for
screening, and the most specific test as confirmatory.
Summary
• Serial tests result in:
o Lower sensitivity (higher False Negative Rate)
o Increased specificity
o Increased PVP (lower False Positive Rate)
Diagnostic Test
• Sensitive and specific
• Simple and inexpensive
• Safe and acceptable to both the public and the profession
• Reliable
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 15: Testing and Screening of a Disease 146
• Costs of a screening programme must be balanced against the number of cases detected
and the consequences of not screening. The cost should also be economically balanced in
relation to the total expenditure on medical care.
Lead-time Bias
• Is defined as the interval between the diagnosis of a disease at screening and when it
would have been detected due to development of symptoms.
• This occurs because diseases with a long preclinical phase are more readily detected than
rapidly progressing cases with a short preclinical phase.
• A program may seen as successful when in fact observed differences in mortality were a
result merely of the detection of less rapidly fatal cases through screening, while those
more rapidly fatal are diagnosed after development of symptoms.
• If time to outcome (e.g. death) is measured from point of diagnosis, early diagnosis will
increase the time to outcome (e.g. the length of survival) by the interval between
diagnosis by screening, and when diagnosis would have occurred by ordinary means.
• This can make early diagnosis appear to increase survival time, even when it has had no
effect, or may even have a damaging effect.
Length-time Bias
• Length-time bias is the over representation among screen-detected case of those with a
long preclinical phase and hence a favorable prognosis.
• The proportion of slowly progressing disease picked up by screening will be greater than
the proportion picked up by standard clinical practice, since rapidly progressing disease
will tend to become symptomatic more quickly.
• Therefore, patients diagnosed by screening will progress more slowly than those
diagnosed by conventional means, even if early treatment has no impact.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 15: Testing and Screening of a Disease 147
Compliance Bias
• Volunteers for screening are generally more health conscious/concerned than the general
population. They tend to assume greater responsibility for their own care, and are often
more likely to comply with therapy.
• Groups detected by screening may do better than others, not because early treatment
matters, but because they comply with treatment more than others.
o This is not due to early detection by screening, but to the same reason that made them
volunteer for screening in the first place. It is a ‘volunteer’ bias.
Instructions
Refer to Worksheet 15.1: Calculate Sensitivity, Specificity, Positive and Negative
Predictive Value of a Test
Key Points
• Screening is defined as examination of asymptomatic people in order to classify them as
likely or unlikely to have the disease of interest.
• Types of screening are: mass screening, multiple or multiphase screening, targeted
screening, case-finding or opportunistic screening.
• The measurement properties of a screening test are accuracy, reproducibility, validity.
• Sensitivity is the ability of a test to give a positive finding when the tested person truly
has the disease under study.
• Specificity is the ability of a test to give a negative finding when the tested person is
truly free of the disease under study.
• The predictive value of a positive test (PVP) means the proportion of diseased amongst
all positive tests.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 15: Testing and Screening of a Disease 148
Evaluation
• What are the four types of screening?
• What are the measurement properties of a screening test?
• What are screening criteria for initiating a program for screening?
• Mention any five (5) accepted screening methods for disease and target population?
References
• Bonita, R., Beaglehole, R., Kjellstrom, T. (2006). Basic Epidemiology. (2nd Edition).
Geneva, Switzerland: WHO.
• Greenberg, R.S., et al. (1993). Medical epidemiology. East Norwalk, CT: Appleton
Lange.
• Mausner J.S., Kramer, S. (1984). Epidemiology: An introductory text. Philadelphia:
Saunders.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 15: Testing and Screening of a Disease 149
Worksheet 15.1: Calculate Specificity, Positive and Negative
Predictive Value of a Test
Instructions
• Individually complete the worksheet as homework.
• Refer to class materials/notes, and also to Handout 12.4: Measures of Effect in Cohort
and Case-Control Studies (from Session 12) for reference as needed.
• Write down all of your work, and note any questions/challenges you have along the way.
• Submit your assignment to the instructor at the next class period.
• If there is class time available, the tutor may discuss this assignment in plenary, and you
may be asked to share your answers.
Question 1
In a cervical cancer screening programme, pap smears were collected from 12,350 women
aged between 25-45 years of age. Among these women, 1250 were known to have pre-cancer
lesions. All pap smears were processed in reputable cytology laboratory and 1235 (650 from
women with pre-cancer lesions) were reported to be abnormal.
Work together to:
a) Put the above data in a table
b) Calculate the sensitivity and specificity of the pap smear
c) Calculate the predictive value positive (PVP) and predictive value negative (PVN) of the
pap Smear
d) Would you recommend that the pap smear be used for cervical screening in this
population?
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 15: Testing and Screening of a Disease 150
Session 16: Control of Epidemics
Learning Objectives
By the end of this session, students are expected to be able to:
• Define the terms epidemic, endemic and pandemic
• Identify different types of epidemics
• Identify steps taken during an investigation of an epidemic
• Explain principles of outbreak/epidemic investigations
Endemic Disease
• Diseases which are continuous and/or habitually transmitted in populations throughout
the year (such as malaria)
• Endemicity denotes the habitual presence of a disease in a community.
Epidemic Disease
• Epidemic: The occurrence of more cases of a specific disease in a population that is
clearly in excess of the expected incidence in a specified period of time.
o The number of cases that constitute an epidemic will vary with the type of disease.
In some epidemic-prone diseases such as cholera and poliomyelitis, one case is
considered an epidemic.
o In order to say that there is an epidemic, it is necessary to know the level of
endemicity of the disease.
In the USA a disease such as malaria one case is an epidemic since malaria is
already eradicated in USA.
Diseases like Ebola do not occur habitually in human populations. A single case
will constitute an epidemic in any part of the world (such as the outbreak in Zaire
in May 1995).
o In Tanzania, it is important to know the average acceptable numbers of cases for
endemic diseases (such as malaria) from a prior year’s records before deciding that an
epidemic is occurring.
Pandemic Disease
• This is expressed when an epidemic spreads to affect many countries globally.
o Modern epidemiology arose out of the study of so-called ‘classical epidemics’, such
as plague, smallpox, cholera, typhus, typhoid fever and dietary deficiencies.
o Some of these epidemics remain an important threat to many tropical countries.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 16: Control of Epidemics 151
• Rubella
• Hepatitis A
• Streptococcal infections
• Meningococcal meningitis
• Food poisoning
• For Tanzania and other tropical countries the most important cause of epidemics are
infectious diseases. For other countries (Iraq, Pakistan, Guatemala, etc.) it is also
important to consider road accidents, drug addiction, poisoning, etc. as epidemics that can
affect mortality and morbidity.
o Poisoning and neurological disability epidemics have been reported as a result of
ingestion of wheat products treated with methyl- and ethyl-mercuric compounds. The
wheat was intended only for use as seed and was so treated to prevent fungus growth.
o Other disease outbreaks involving the nervous system (Konzo) have been reported
from Mozambique and Tanzania and were later found to be associated with the
consumption of certain types of cassava with high content of cyanide.
Types of Epidemics
Common Source
• Occurs when a group of people are exposed to the same causative agent.
• If the period of exposure to the agent is brief and essentially simultaneous for all persons
contracting the disease, the epidemic is called a ‘point source epidemic’.
• All persons are affected by the same source and person-to-person transmission does not
occur.
• Common source epidemics are not necessarily caused by infectious agents; they may also
result from common exposure to noxious agents in the environment. Examples include:
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 16: Control of Epidemics 152
o The Bhopal disaster: a large industrial catastrophe occurring in 1984 at the Union
Carbide India Limited pesticide plant. Over 500,000 people were exposed to harmful
gas and toxins leaked. Chemicals continue to pollute groundwater in the area.
o The Chernobyl disaster: A nuclear accident that occurred in 1986 in the Ukraine. A
series of explosions occurred in one of the nuclear reactors, and radioactive materials
polluted the surrounding areas.
o Other examples might include children swimming in a chemically polluted river or
factory workers exposed to extreme heat or volatile chemicals.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 16: Control of Epidemics 153
• Sometimes it may be difficult to identify the nature of an epidemic from the shape of the
epidemic curve alone.
• The typical common source epidemic curves may be affected by the continued
development of cases through persistent contamination of the source, or exposure
occurring repeatedly or by a long and variable incubation period.
o The shape of the curve may also vary depending on the size of the population
exposed, the type of source distribution and the extent of use or the extent of contact
with the susceptible population.
o The typical shape of a point source epidemic may be modified by presence of more
than one disease agent, each with a different incubation period, or if secondary cases
(person to person transmission) follows exposure to the original point source.
o Conversely, a propagated epidemic can create a rapidly rising and rapidly falling
epidemic curve similar to that of a common source epidemic. This is especially so
when the disease has a short incubation period and is highly infectious (e.g. cholera).
Principles of Investigations
• In a clinical case, investigation and treatment must go side by side for the successful
management of the patient.
• Likewise an epidemic is always an emergency where action to counteract it must begin
even before complete investigation.
o For example, in 1854, Dr. John Snow showed that the outbreak of cholera around
Broad Street in London resulted from contamination of drinking water with
excrement from cholera sufferers. The epidemic was well managed and put under
control 30 years before the identification and description of the Vibrio cholerae, the
causative organism.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 16: Control of Epidemics 154
o Index case: the earliest documented case of a disease that is included in an
epidemiological study
Data Analysis
• To describe the outbreak
o by person/population (tables, bar charts, pie charts)
o place (spot maps)
o time (histograms, graphs)
• Person: who is the population at risk (age, sex, race, occupation, medical status, etc).
• Exposure: occupation, environment, cultural practices, socio-economic factors, etc.
• Determine size of the population at risk.
o Calculate Attack Rate, Case Fatality Rate (assess quality of case management).
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 16: Control of Epidemics 155
o Reservoir
• Deal with the reservoir (if any)
• Interrupt transmission.
• Reduce susceptibility of the host by vaccination, chemo-prophylaxis, improve nutrition,
etc.
• Treat cases
Report Writing
• Describe the situation using the answers and comments to the steps outlined above.
• Describe the need for outside assistance based on the gaps in resources.
• Make conclusions on the outbreak you are dealing with.
o Give recommendations on priority activities (short term, long term) based on findings
and conclusions.
Dissemination of Findings
• Convey the report to higher Ministry of Health (relevant division/program, senior/top
management)
• Disseminate report to the Council Health Management Team (CHMT).
• If epidemic has been confirmed, convey report to World Health Organisation (WHO)
through top management (i.e., MoH).
Intensify Surveillance
• Maintain contact with the district for daily updates (cases, deaths, number admitted,
number discharged, areas affected, etc) until end of the epidemic.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 16: Control of Epidemics 156
Role of the District Health Team
• District level coordination, dissemination of guidelines, implementation of control
measures:
o Coordination: District Task Force
o Case management (transport emergency supplies to affected area and serving health
units, set up treatment sites).
o Surveillance (retrieves data from health units and affected communities, report to
MoH and uses the information for control).
o Public information in the communities affected.
o Environmental sanitation and preventive measures: address risk factors including
mass immunization.
o Logistics management/monitoring.
o Investigation: District rapid response team
Key Points
• Epidemics are usually either point source or propagated
• The purpose of investigating an epidemic is to identify its cause and the best means to
control it.
• For proper control of epidemics all steps of investigations should be done.
• Every level of health care delivery has responsibilities to take in an investigation of
epidemic.
Evaluation
• What are the steps in investigation of an epidemic?
• What are the roles of the district health team in control of an epidemic?
• Differentiate point source epidemic from propagated epidemic?
• Define the following terms; Secondary attack rate, Secondary attack rate and Herd
immunity
References
• Bonita, R., Beaglehole, R., Kjellstrom, T. (2006). Basic Epidemiology. (2nd Ed). Geneva,
Switzerland: WHO.
• Chin, J. (2000). Control of communicable diseases manual. (17th Ed.) Washington, DC:
American Public Health Association.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 16: Control of Epidemics 157
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 16: Control of Epidemics 158
Session 17: Integrated Disease Surveillance and
Response
Learning Objectives
By the end of this session, students are expected to be able to:
• Define the terms surveillance and Integrated Disease Surveillance and Response (IDSR)
• Define the terms standard case definition and action threshold
• Identify the priority diseases that are in the IDSR for Tanzania
• State the reporting frequency of IDSR priority diseases
• Identify the non-outbreak-related surveillance responses
Objective of IDSR
• Broadly, the concept of IDSR is:
o To provide epidemiological evidence for use in making decisions and implementing
public health interventions for the control and prevention of communicable diseases.
• A technical definition of IDSR is:
o Surveillance includes the ongoing systematic collection, analysis and interpretation of
health data in the processes of describing and monitoring of a health event.
• The strategy aims to integrate surveillance functions at all levels and is expected to
enhance early detection, reporting, and timely response to epidemic-prone and other
priority endemic diseases.
o A priority activity of IDSR is data collection for action.
• Overall guiding principles for IDSR:
o Usefulness of data collected
o Simplicity of system
o Flexibility of system
o Integration of common activities
o Orientation to action
Aims of IDSR
• Strengthen the capacity to conduct effective surveillance activities.
• Integrate multiple systems so that forms, people and resources can be used more
efficiently and allow health staff to focus more on disease prevention, control and
reporting.
• Improve the use of information for decision making.
• Improve the flow of surveillance information between and within levels of the health
system.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 17: Integrated Disease Surveillance and Response 159
• Improve laboratory capacity and involvement in confirmation of pathogens and
monitoring of drug sensitivity
• Increase the involvement of clinicians in the surveillance system,
• Emphasize community participation in detection and response to public health problems,
• Improve communication:
o Between all levels of health care and public health system using data that can alter the
availability of resources and strengthen the ability of health staff to provide improved
services.
o With the public, target populations, donors, and organizations that provide similar
services. Coordination and collaboration can take place between groups, agencies and
organizations that share similar target populations for disease control objectives.
• Increase the access to and use of standard surveillance case definitions and laboratory
services for confirming suspected cases.
• Increase district-level decision making for defining, recognizing and responding to issues
and needs in the local area as well as to meeting national priorities and targets.
• Strengthen preparedness by integrating transportation, training and supervisory activities.
• Provide district level support in using epidemiological tools to detect, investigate and
respond to epidemics.
• Increase the alertness of clinicians to respond to a possible public health epidemic, even
if it is only 1 case, and help clinicians see the value of sharing information about these
cases with health staff responsible for surveillance.
• The MoH is committed to strengthen IRSR at multiple levels, including community,
health facility, district, region and national.
Action Threshold
• Denotes the critical point at which action/intervention must be taken to address a disease
outbreak or epidemic. It can be expressed in terms of numbers or proportions.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 17: Integrated Disease Surveillance and Response 160
Figure 1: Priority Diseases Under National Surveillance Strategy
• These diseases have been selected by based on severity, importance as a burden to the
community and the preferred frequency of reporting.
• The MOHSW requires immediate reporting of all epidemics and encourages case-based
reporting and line-listing of cases during epidemics. Zero reporting should be conducted
on a weekly and monthly basis as shown in the following table.
o Zero reporting: designated reporting sites at all levels should report at a specified
frequency (e.g. weekly or monthly) even if there are zero cases during that time span.
Figure 2: Disease Reporting Schedule for Diseases under National Surveillance Strategy
• Surveillance of diseases like tuberculosis, leprosy and HIV/AIDS are not included in the
IDSR reports but they are also diseases of public health importance. This is because they
have very strong case reporting systems in place.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 17: Integrated Disease Surveillance and Response 161
• Community involvement will also improve linkages between communities and the formal
health system.
• In each community, an inventory of community based organizations (CBOs), village
health workers, traditional healers and traditional birth attendants should be drawn.
• These groups need to be sensitized to be able to detect cases and report to the nearest
health facility.
• The flow of information from a community member can be in a verbal or written form.
• This information is passed on to the identified community leader in that particular
location who will in turn send the data to the person in charge (or any staff) at a health
facility.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 17: Integrated Disease Surveillance and Response 162
o Any child with diarrhoea and two or more of the following:
Restless or irritable
Sunken eyes
Drinks eagerly
Skin pinch goes back slowly
o Action threshold
In a defined locality where it is observed that the number of cases of diarrhoea for
the period of time clearly exceeds the number of cases of the previous
year/season.
• Diarrhea with severe dehydration
o Any child with diarrhoea and two or more of the following:
Lethargic or unconscious
Sunken eyes
Not able to drink or drinking poorly
Skin pinch goes back very slowly.
o Action threshold
When it is observed that the number of cases of diarrhoea for the period of time
clearly exceeds the number of cases of the previous year/season in a defined
locality where.
• Uncomplicated Malaria
o Standard case definition
Any person having fever with or without joint pains, sweats, nausea, chills and
vomiting.
o Action threshold
When it is observed that the number of cases for that period exceed the number of
expected by 50% in a defined locality/health facility.
• Measles
o Standard case definition
Any person with history of fever, skin rash and any of the following: cough,
runny nose, red eyes.
o Action threshold
One case at a health facility is a suspected outbreak.
• Meningococcal Meningitis
o Standard case definition
Any person with sudden onset of fever (more than 38.5°C per rectal or 38.0°C
axillarly) AND any one of the following: neck stiffness, altered consciousness,
bleeding under the skin.
o Action threshold
Single suspected case is a suspected outbreak
• Neonatal Tetanus (NNT)
o Standard case definition
Any newborn with normal ability to suck or cry during the first two days of life
and who between 2nd and 28th day of age cannot suck normally and becomes stiff
or has convulsions (or both).
o Action threshold
A single suspected case needs action.
• Plague
o Standard Case Definition
Any person with sudden onset of fever, headache and painful swelling of inguinal
and axillary lymph nodes or cough with blood-stained sputum.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 17: Integrated Disease Surveillance and Response 163
o Action Threshold
Single case is a suspected epidemic.
• Pneumonia (in children 2 months up to 5 years of age)
o Standard case definition:
Any child with cough or difficulty in breathing, and fast breathing:
2 months up to 12 months: 50 breathes per minute or more.
12 months up to 5 years: 40 breathes per minute or more.
o Action threshold
When it is observed that the number of cases for that period of time clearly
exceeds the number of cases of the previous year/season in a defined locality.
• Severe Pneumonia (in children 2 months up tp 5 years of age)
o Standard case definition:
Any child with cough or difficulty in breathing, and any danger signs or chest
indrawing or stridor in a calm child.
Danger signs: Not able to drink or breastfeed, vomiting everything, convulsions,
lethargy or unconscious.
o Action threshold
When it is observed that the number of cases for that period of time clearly
exceeds the number of cases of the previous year/season in a defined locality.
• Rabies
o Standard case definition
History of animal bite and the following: fever, mental confusion, fear of drinking
water, altered consciousness or death.
o Action threshold
Single suspected case is a suspected outbreak.
• Typhoid fever
o Standard case definition
Any person with prolonged history of fever excluding malaria with history of
abdominal pain with or without skin rash, constipation or diarrhoea.
o Action threshold
Two suspected cases in a week at a health facility.
• Yellow Fever
o Standard case definition
Any person with sudden onset of fever, followed by jaundice within two weeks of
first symptoms, and/or a history of travelling from an endemic area.
o Action threshold
Single suspected case is a suspected outbreak.
Introduction
• This section contains information that may be useful for the District Health Team to
use in interpreting surveillance data and providing guidelines for possible action
based on the interpretation.
• The district team will be looking at routine surveillance data for two main purposes:
o To examine district surveillance data over a time frame of months or years in
order to evaluate the public health interventions targeted at reducing the mortality
and morbidity of certain diseases/conditions that are under surveillance.
o To examine district surveillance data for "hidden" outbreaks
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 17: Integrated Disease Surveillance and Response 164
For example, shigella (diarrhoea with blood), meningitis, malaria, and/or measles
that were not detected by health facilities.
• The analysis of longer-term surveillance data is extremely important since more than
80% of the deaths covered by the diseases/conditions under surveillance occur as non-
outbreak related deaths.
Time Analysis
• Since cases and deaths are collected on most types of diseases/conditions under
surveillance, trends in cases, deaths, and case fatality ratios can be examined.
• Since most diseases/conditions have separate data collection for in-patients and out-
patients, separate trends by in-and out-patients can be examined.
• In-patient trends are often valuable because in-patients have more severe disease,
diagnosis is often more accurate and therefore the data is more specific than outpatient
data.
o We are most interested in preventing communicable diseases and deaths, examining
trends in in-patient cases and deaths separately from outpatient data should be a high
priority.
o Trends by age and other factors can only be examined for diseases with case based
information.
In general, trend lines are going up, remaining level, or going down. For each of
these different types of trend lines, some possible explanations are given below
for the District Health Team to consider:
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 17: Integrated Disease Surveillance and Response 165
• Could it be a change in reporting criteria or modified case definition?
o Is the case definition being followed? Have new staff joined the facility that may be
reporting cases differently than occurred previously?
o Are the cases confirmed or suspected? For example, are some facilities now reporting
suspect cases (with no laboratory confirmation) when previously they reported only
lab-confirmed cases?
• Could it be a seasonal variation?
o Review disease incidence data (i.e., ‘new’ case totals) from a similar time period in
the previous year(s). Is this increase ‘expected’ based on seasonality of the condition
under surveillance?
• Could the neighboring districts be experiencing similar changes?
o For example, if you suspect an outbreak of diarrhoea with blood (shigella), ask
neighboring districts if they are seeing a similar trend in diarrhoea with blood.
• Are there any common features among the reported cases?
o Any geographic clustering? Are most of the ‘excess’ cases coming from just one
or all health facilities? If either, develop hypothesis and contact appropriate health
facility start to determine if the increased incidence represents an outbreak.
• Has a health facility initiated a new screening and treatment program?
o If so, this may lead to the identification and reporting of more cases than in prior time
periods.
• Has any new outreach or health education activity been implemented?
o This may increase healthcare-seeking behaviour and lead to the identification and
reporting of more cases than in prior time periods.
• Has there been any recent immigration of at-risk persons into the community?
o This may lead to an increase in susceptible and may result in increased disease in
incidence and reporting.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 17: Integrated Disease Surveillance and Response 166
screening and treatment for selected disease)?
o If so, this may lead to the identification and reporting of fewer cases than in prior time
periods.
• Have any new community outreach or health education program ceased to operate in the
community?
o This may lead to a reduction in healthcare-seeking behavior and lead to the
identification and reporting of fewer cases than in prior time periods.
• Has there been any recent out-migration of at-risk persons into the community?
o This may lead to an decrease in susceptible individuals and may result in decreased
disease incidence and reporting
Key Points
• Surveillance is being watchful or vigilant for health problems and determinants with the
intention to take action for improvement of health of a population.
• Integrated Diseases Surveillance and Response is a strategy proposed and adopted by the
WHO/AFRO Regional Assembly in 1998 to strengthen disease surveillance in member
countries using an integrated approach.
• The overall objective of Integrated Disease Surveillance and Response (IDSR) is to
provide epidemiological evidence for use in making decisions and implementing public
health interventions for the control and prevention of communicable diseases.
• There are specific issues to consider when there is an observed increase, decrease or no
change in disease incidence.
Evaluation
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 17: Integrated Disease Surveillance and Response 167
o Polio
o Yellow fever
o Neonatal tetanus
o Plague
• What are the two diseases that are reported weekly and two diseases reported monthly?
References
• MOHSW. (2001). National Guidelines for Integrated Disease Surveillance and Response
(IDSR). Dar es Salaam, Tanzania: Ministry of Health and Social Welfare.
• WHO. (2001). Technical Guidelines Integrated Disease Surveillance and Response in the
African Region. Harare, Zimbabwe: World Health Organization/Regional Office for
Africa.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 17: Integrated Disease Surveillance and Response 168
Session 18: Planning for Disease Prevention and
Control
Learning Objectives
By the end of this session, students are expected to be able to:
• Define the concept prevention of disease and control of disease
• Explain the healthcare planning cycle
• Describe reassessment of burden of disease in healthcare planning
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 18: Planning for Disease Prevention and Control 169
Figure 1: Healthcare Intervention
1. Burden of
illness
6.
Monitoring 2.
Causation
HEALTH
CARE
INTERVENTIO
N
5. Implementation 3. Community
effectiveness
4.
Efficiency
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 18: Planning for Disease Prevention and Control 170
No. of pregnancy-related deaths in time period
Maternal Mortality Rate =
100,000 live births
• Median survival time: Refers to the time during which fifty percent of individuals with a
certain diagnosis or having had a certain intervention would have died or would have died
if there was no intervention.
• Disability
o A measure of presence of consequences of disease
o Several levels are known and defined as follows:
Impairment: any loss or abnormality of psychological, physiological, anatomical
structure or function.
Disability: any restriction or lack of ability to perform in the manner or within the
range considered normal for a human being.
Handicap: a disadvantage of a given individual resulting from an impairment or a
disability, that limits or prevents the fulfillment of a role that is normal for that
individual.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 18: Planning for Disease Prevention and Control 171
o In health interventions, cost-effectiveness of two or more interventions can be
compared using DALYs. An intervention whose cost averts more DALYs per unit is
said to be comparatively more cost-effective and hence preferred to others.
• Efficiency
o This is a measure of the relationship between the results achieved and the effort
expended in terms of money, resources and time.
o It provides the basis for the optimal use of resources and involves the complex inter-
rationship of costs and effectiveness of an intervention.
o This is the area where epidemiology and health economics are applied together
o There are two main approaches to the assessment of efficiency:
Cost-Effectiveness
Cost-Benefit Analysis
o These two measures are important in prioritizing which intervention is best especially
for developing countries.
• Cost-Effectiveness Analysis
o Compares the ratio of financial expenditure and effectiveness
Dollars per case prevented, dollars per life-year gained, dollars per quality-
adjusted life year gained, etc.
• Cost-Benefit Analysis
o In this measure, both the denominator and numerator are expressed in monetary
terms.
o The health benefits (e.g. lives saved) are measured and given a monetary value.
o If the cost-benefit analysis shows that economic benefits of the program are greater
than the costs, the program should be seriously considered.
o The measurement of efficiency requires many assumptions, and it should be used very
cautiously; it is not value-free and can serve only as a general guideline.
• Implementation
o The fifth stage in planning process begins by determining a specific intervention and
takes into account the problems likely to be faced in and by the community.
For example, if a planned intervention involves screening women for breast
cancer using mammography, it is important to ensure that the necessary
equipment and personnel are available.
o This stage involves setting specific quantified targets,
For example, ‘To reduce the frequency of smoking in young women from 30% to
20% over a five year period.’
This type of target-setting is essential for assessing the success of an intervention.
• Monitoring
o Monitoring is the continuous follow-up of activities to ensure that they are proceeding
according to plan.
o Monitoring must be directed to requirements of specific program, the success of
which may be measured in a variety of ways using short-, intermediate- and long-term
criteria.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 18: Planning for Disease Prevention and Control 172
o For example, in a community-level hypertension program, monitoring could include
the regular assessment of:
Personnel training
The availability and accuracy of sphygmomanometers (structural)
The appropriateness of case-finding and management procedures (process
evaluation)
The effect on blood-pressure levels in treated patients (outcome evaluation)
Key Points
• Prevention of disease is any activity which reduces the possibility of occurrence, burden
of morbidity, disability or mortality of a disease
• The process of healthcare planning includes:
o Measurement or assessment of the burden of illness
o Identification of the cause of illness
o Measurement of the effectiveness of different community interventions
o Assessment of their efficiency in terms of resources used
o Implementation of interventions
o Monitoring of activities
o Reassessment of the burden of illness to determine whether it has been altered.
• The measures of efficiency are cost effectiveness and cost benefit analysis
• Cost-effectiveness analysis looks at the ratio of financial expenditure and effectiveness
• In cost-benefit analysis measures both the denominator and numerator are expressed in
monetary terms.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 18: Planning for Disease Prevention and Control 173
Evaluation
• What is the meaning and significance of PYLLs and DALYs?
• Define efficiency, cost benefit analysis, cost effective analysis
• What are the stages in the healthcare planning cycle?
References
• Bonita, R., Beaglehole, R., Kjellstrom, T. (2006). Basic Epidemiology. (2nd ed). Geneva,
Switzerland: WHO
• Kapiga S.H. et al. (1998). Lecture notes in epidemiology and research methodology.
Department of Epidemiology and Biostatistics. MUCHS.
• McCusker, J. (2001). Epidemiology in Community Health (Rural Health Series, No. 9).
Nairobi, Kenya: AMREF.
CMT 05101 Epidemiology and Biostatistics NTA Level 5 Semester 1 Student Manual
Session 18: Planning for Disease Prevention and Control 174
The development of these training materials was supported through funding from the President’s Emergency Plan for AIDS Relief
(PEPFAR) through the U.S. Department of Health and Human Services, Health Resources and Services Administration (HRSA)
Cooperative Agreement No. 6 U91 HA 06801, in collaboration with the U.S. Centers for Disease Control and Prevention’s Global AIDS
Programme (CDC/GAP) Tanzania. Its contents are solely the responsibility of the authors and do not necessarily represent the official
views of HRSA or CDC.