Biostatistics in A Nutshell
Biostatistics in A Nutshell
■ Categories:
Types of Data: ■ Numerical: Counts (e.g., number of
accidents).
1. Data (Plural), Datum (Singular) ■ Categorical:
■ Nominal: No order, e.g.,
○ Defined as numbers used in statistics, derived from gender.
counting or measurements. ■ Ordinal: Ordered categories,
2. Variable: e.g., education levels.
○ Continuous Data:
○ Anything that can vary. ■ Interval: Measurable differences but no true
○ Examples: blood pressure, age, sex. zero, e.g., temperature.
■ Ratio: Includes zero, e.g., height, weight.
Types of Data:
General Guidelines for Presenting Data: ○ Answer: Data that is numerical and can be
measured, such as height, weight, or temperature.
1. Simplify messages but don’t distort data. 6. What is unstructured data?
2. Always label charts and include a title.
3. Indicate data source and date. ○ Answer: Data that is not organized and does not
4. Use the appropriate chart type for your data (Bar for have a predefined model, such as emails, videos,
categories, Line for trends). and audio files.
7. What is structured data?
○ Answer: Data is the plural form, referring to multiple ○ Answer: Data that can only take certain distinct
pieces of information. Datum refers to a single piece values, such as counts or integers.
of information. 9. What is continuous data?
2. What is a variable in statistics?
○ Answer: Data that can take any value within a given
○ Answer: A variable is any characteristic or attribute range, such as height or temperature.
that can change, such as age, weight, or 10.Give an example of nominal data.
temperature.
3. What are the two main types of variables? ○ Answer: Gender, type of vehicle.
11.Give an example of ordinal data.
○ Answer: Qualitative and Quantitative.
4. What is qualitative data? ○ Answer: Education level (e.g., high school,
bachelor’s, master’s).
○ Answer: Data that describes qualities or 12.What is interval data?
characteristics, such as colors, names, or types.
○ Answer: Data with meaningful intervals between 20.What is the best way to present trends over time?
values, but no true zero point, like temperature in
Celsius. ○ Answer: Line graphs.
13.What is ratio data? 21.What type of data is represented by pie charts?
○ Answer: Data that has both meaningful intervals and ○ Answer: Proportional or percentage data.
a true zero point, such as weight or income. 22.How should data be summarized?
14.Why is unstructured data important?
○ Answer: Simplify the message, ensure clarity, and
○ Answer: It represents the majority of data and is use either graphs or tables for presentation.
becoming increasingly useful with advancements in 23.What is the importance of labeling components in a
data processing. graphic?
15.What is a frequency distribution?
○ Answer: It ensures the viewer can clearly
○ Answer: A table showing how often each value or understand the data being presented.
group of values appears in a dataset. 24.What is the role of a table in data presentation?
16.What is relative frequency?
○ Answer: It allows for exact comparisons between
○ Answer: The proportion of occurrences of a certain data points.
value, often expressed as a percentage. 25.What is a stacked bar chart used for?
17.What is a bar chart used for?
○ Answer: Representing parts of a whole and
○ Answer: Comparing different categories or groups of comparing totals across categories.
data. 26.How do bar charts and line graphs differ?
18.What is a line graph used for?
○ Answer: Bar charts compare categories, while line
○ Answer: Displaying trends or changes over time. graphs show trends over time.
19.What is a pie chart used for? 27.What should always be included when presenting data?
○ Answer: Showing how parts make up a whole, ○ Answer: A title, labels, source of data, and reference
typically in percentages. to the number of observations.
28.What is the importance of data presentation in ○ d) Height
statistics? ○ Answer: c) Gender
3. What type of data is referred to as 'interval data'?
○ Answer: It helps communicate complex data in a
simple and understandable format. ○ a) Data with no order and no true zero
29.What is a common mistake when presenting data ○ b) Data with meaningful intervals and no true zero
visually? ○ c) Data with a true zero
○ d) Data in categories without numerical values
○ Answer: Overcomplicating the presentation or ○ Answer: b) Data with meaningful intervals and no
misrepresenting data to mislead the audience. true zero
30.Why is it important to use the correct type of graph? 4. Which type of data can only take specific distinct values
(e.g., number of accidents)?
○ Answer: The right graph ensures the data is
presented in the clearest and most effective way for ○ a) Continuous data
understanding. ○ b) Discrete data
○ c) Interval data
MCQs: ○ d) Ratio data
○ Answer: b) Discrete data
1. What does the term 'data' refer to? 5. What is the primary purpose of a bar chart?
○ a) Gender
○ b) Number of children
○ c) Temperature
○ d) Satisfaction rating
○ Answer: c) Temperature
10.What is the key difference between discrete and
continuous data?
● Central Tendency refers to the central or typical value in a ● Purpose: To describe the spread of the data or how far the
dataset. data points are from the central tendency.
● Purpose: To find a single value that represents the entire
dataset. Types of Measures:
1. Mean (Arithmetic Mean): ○ Measures how far each value is from the mean.
○ Formula (Sample): s2=∑(X−X‾)2n−1s^2 = \frac{\sum
○ Formula: Mean=∑Xn\text{Mean} = \frac{\sum X}{n} (X - \overline{X})^2}{n-1}
○ Sum of all values divided by the number of values. ○ Formula (Population): σ2=∑(X−μ)2N\sigma^2 =
○ Best for symmetric distributions without outliers. \frac{\sum (X - \mu)^2}{N}
2. Median: 2. Standard Deviation (SD):
○ The middle value when the data is sorted in ○ The square root of the variance.
ascending or descending order. ○ Formula (Sample): s=s2s = \sqrt{s^2}
○ Formula for odd number of values: ○ Formula (Population): σ=σ2\sigma = \sqrt{\sigma^2}
Middle=N+12\text{Middle} = \frac{N+1}{2}. ○ Describes how spread out the values are.
○ For even numbers, take the average of the middle 3. Coefficient of Variation (CV):
two numbers.
○ Resistant to extreme values (outliers). ○ The ratio of the standard deviation to the mean.
3. Mode: ○ Formula: CV=σμCV = \frac{\sigma}{\mu} for
population or CV=sX‾CV = \frac{s}{\overline{X}} for
○ The most frequent value in a dataset. sample.
○ Can be: ○ A measure of relative variability.
■ Unimodal: One mode. 4. Range:
■ Bimodal: Two modes.
■ Polymodal: More than two modes. ○ The difference between the highest and lowest
values in a dataset.
○ Simple but ignores other values. 5. Which of the following is a measure of dispersion?
5. Inter-Quartile Range (IQR):
○ a) Mean
○ The range between the 1st quartile (Q1) and the 3rd ○ b) Median
quartile (Q3). ○ c) Mode
○ Formula: IQR=Q3−Q1IQR = Q3 - Q1 ○ d) Standard Deviation
○ Preferred measure of spread in skewed distributions. ○ Answer: d) Standard Deviation
6. Which measure of central tendency is not influenced by
Quartiles: extreme values?
2. Types of Hypotheses: 7. Conclusion: If the p-value is smaller than the significance
level (α), we reject the null hypothesis. Otherwise, we fail to
○ Research Hypothesis (RH): A statement about a reject it.
population parameter that we want to test.
○ Statistical Hypothesis: Includes two types:
■ Null Hypothesis (Ho): A claim that there is
no effect or no difference.
■ Alternative Hypothesis (Ha): A claim that 30 Questions and Answers on Hypothesis Testing:
there is an effect or difference.
3. Errors in Hypothesis Testing: 1. What is a hypothesis?
○ Type I Error (α): Rejecting the null hypothesis when ○ A hypothesis is a statement or claim that is tested
it is true. through statistical methods.
○ Type II Error (β): Failing to reject the null hypothesis 2. What is the difference between a Research Question
when it is false. (RQ) and a Research Hypothesis (RH)?
4. Rejection Regions:
○ An RQ is phrased as a question and does not imply
○ Two-tailed test: Rejection regions in both tails of the a specific claim. An RH is a statement about a
distribution curve. population parameter.
○ Left-tailed test: Rejection region in the left tail. 3. What is the null hypothesis (Ho)?
○ Right-tailed test: Rejection region in the right tail.
5. Test Statistics: ○ It is the claim that there is no effect or difference in
the population parameter.
○ For hypothesis tests about the population mean (μ), 4. What is the alternative hypothesis (Ha)?
if the population standard deviation (σ) is unknown,
we use the t-distribution.
○ It is the claim that there is an effect or difference, 12.When should you use a t-distribution?
opposing the null hypothesis.
5. What does a Type I error refer to? ○ Use a t-distribution when the population standard
deviation is unknown and the sample size is small (n
○ It refers to rejecting the null hypothesis when it is < 30).
actually true. 13.When should you use a z-distribution?
6. What does a Type II error refer to?
○ Use a z-distribution when the population standard
○ It refers to failing to reject the null hypothesis when it deviation is known and the sample size is large (n >
is actually false. 30).
7. What is a one-tailed test? 14.What is the formula for the test statistic in a t-test?
1. What does the null hypothesis (Ho) represent? Answer: B) Reject the null hypothesis
A) Type I Error
5. What is the degrees of freedom (df) for a t-test
B) Type II Error when the sample size (n) is 20?
C) Alpha Error
D) Beta Error A) 20
B) 19
Answer: A) Type I Error C) 18
D) 21
Answer: B) 19
3. If the p-value is smaller than the significance level
(α), what is the correct decision?
6. Which type of test is used when you are testing if a C) Normal distribution
parameter is greater than a value? D) F-distribution
A) z-distribution
B) t-distribution
Simplified Notes on Normal Distribution: 5. Converting to Standard Normal Distribution:
1. Normal Distribution Basics: ○ When a random variable (X) follows a normal
distribution with a mean (μ) and standard deviation
○ A continuous probability distribution often referred to (σ), it can be converted to the standard normal
as the Gaussian distribution. distribution using the Z-score formula.
○ It is bell-shaped and symmetric around the mean 6. Example of Calculating Probability:
(μ).
○ Parameters: ○ If X is normally distributed with a mean of 24 and a
■ Mean (μ) – central location of the curve. standard deviation of 3, the probability for X>19X >
■ Standard deviation (σ) – determines the 19 can be calculated by converting the X value to Z.
spread of the distribution. 7. Z-Table:
2. Standard Normal Distribution:
○ A Z-table provides the area under the standard
○ When the mean (μ) is 0 and standard deviation (σ) is normal curve for various Z-values.
1. ○ You use it to find the cumulative probability up to a
○ In this distribution, Z-scores are used, which certain Z-score.
represent the number of standard deviations a value 8. Key Areas Under the Curve:
is from the mean.
3. Z-Scores: ○ Area between two Z-scores: To find the area
between two Z-scores, calculate the area under the
○ Z = (X - μ) / σ curve for each Z-score and subtract the smaller area
○ The Z-score helps in standardizing a normal from the larger area.
distribution to the standard normal distribution. ○ Right-tail and Left-tail areas: Z-tables can also be
4. Probability under the Normal Curve: used to find the probability for the right or left tails of
the distribution.
○ The area under the curve represents probability. For
the standard normal distribution, the total area is 1.
○ To calculate probabilities, you can use the Z-table,
which gives the cumulative probability for Z-scores.
30 Questions and Answers on Normal Distribution: 8. How do you use the Z-table to find probabilities?
1. What is the normal distribution? ○ Find the Z-score in the Z-table and use the
corresponding area under the curve to find the
○ A continuous probability distribution that is symmetric probability.
and bell-shaped, with mean (μ) and standard 9. What is the probability for a Z-score of 0 in the standard
deviation (σ). normal distribution?
2. What does a Z-score represent?
○ The probability for a Z-score of 0 is 0.5, as it is the
○ A Z-score represents how many standard deviations center of the curve.
a value (X) is from the mean (μ). 10.How do you find the probability between two Z-scores?
3. What is the formula for a Z-score?
○ Calculate the cumulative area for each Z-score from
○ Z=X−μσZ = \frac{X - \mu}{\sigma} the Z-table and subtract the smaller area from the
4. What is the significance of the area under the normal larger one.
curve? 11.What is the probability of a Z-score greater than 1.64?
○ The area under the curve represents probability, and ○ Use the Z-table to find the area for Z = 1.64 and
for the entire normal distribution, the area equals 1. subtract it from 1 to find the probability to the right.
5. How do you convert a raw score to a Z-score? 12.What does it mean if the Z-score is negative?
○ Subtract the mean (μ) from the raw score (X) and ○ A negative Z-score means the value is below the
divide by the standard deviation (σ). mean.
6. What is the standard normal distribution? 13.What does it mean if the Z-score is positive?
○ It is a normal distribution with a mean of 0 and a ○ A positive Z-score means the value is above the
standard deviation of 1. mean.
7. What does the Z-table show? 14.How do you calculate the probability for values less
than a given score in a normal distribution?
○ The Z-table shows the cumulative probability for
each Z-score in the standard normal distribution.
○ Use the Z-table to find the area corresponding to the 20.What does a Z-score of -2 represent?
Z-score for the given value.
15.What is the probability of a value being within 1 ○ A Z-score of -2 represents a value that is 2 standard
standard deviation of the mean in a normal distribution? deviations below the mean.
21.How do you calculate the probability that a random
○ Approximately 68% of the values lie within 1 variable is between two values in a normal distribution?
standard deviation of the mean.
16.What is the probability of a value being within 2 ○ Find the Z-scores for both values, and use the
standard deviations of the mean in a normal Z-table to find the cumulative probabilities and
distribution? subtract them.
22.What is the area to the left of Z = 1.96?
○ Approximately 95% of the values lie within 2
standard deviations of the mean. ○ The area to the left of Z = 1.96 is approximately
17.What is the probability of a value being within 3 0.9750.
standard deviations of the mean in a normal 23.How do you calculate the area under the normal curve
distribution? for values greater than 2?
○ Approximately 99.7% of the values lie within 3 ○ Find the Z-score for 2, use the Z-table to find the
standard deviations of the mean. area, and subtract it from 1.
18.How do you calculate the probability for a value greater 24.What is the probability for a Z-score between -1 and 1?
than a given value in a normal distribution?
○ The probability between Z = -1 and Z = 1 is
○ Use the Z-score formula to standardize the value approximately 0.6826.
and find the area to the right of the Z-score using the 25.How do you find the probability for a Z-score greater
Z-table. than -1.5?
19.How do you find the value corresponding to a given
probability in a normal distribution? ○ Find the cumulative probability for Z = -1.5 and
subtract it from 1.
○ Use the Z-table to find the Z-score that corresponds 26.What is the probability of a value falling between Z = -2
to the cumulative probability and then reverse the and Z = 2?
Z-score formula to find the raw score.
○ The probability is approximately 0.9544. C) The total area under the curve
27.What does the area under the normal curve represent? D) The mean of the distribution
○ It represents the probability of a certain outcome Answer: B) The number of standard deviations a value is
occurring within a given range. from the mean
28.How do you find the area under the normal curve for a
value less than 0? 2. In a normal distribution, what percentage of the data
lies within 1 standard deviation of the mean? A) 68%
○ Use the Z-table to find the cumulative probability for B) 95%
Z = 0, which is 0.5. C) 99.7%
29.How can the Z-score be used in real-world applications? D) 50%
○ The Z-score helps in comparing scores from different Answer: A) 68%
distributions and finding probabilities in normal
distributions. 3. What is the mean of the standard normal distribution?
30.What does it mean if the area under the normal curve is A) 1
0.95? B) 0
C) μ
○ This means that 95% of the values lie within this D) σ
range under the normal distribution.
Answer: B) 0
6. How do you calculate a Z-score? A) Z = X - μ 10.Which of the following is the correct area under the
B) Z = (X - μ) / σ curve for Z = -1.96? A) 0.025
C) Z = (σ - X) / μ B) 0.5
D) Z = μ / σ C) 0.975
D) 0.025
Answer: B) Z = (X - μ) / σ
Answer: A) 0.025
7. What does a negative Z-score indicate? A) The value is
above the mean
B) The value is below the mean
C) The value is at the mean
D) The value is impossible
Answer: B) The value is below the mean
○ A point estimate is a single value calculated from the ○ The formula is: μ=xˉ±t×sn\mu = \bar{x} \pm t \times
sample data that is used to estimate a population \frac{s}{\sqrt{n}}
parameter. 7. How do you calculate the confidence interval for a
2. What is the difference between point estimate and population proportion?
interval estimate?
○ The formula is: p=p^±z×p^(1−p^)np = \hat{p} \pm z
○ A point estimate provides a single value, while an \times \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}
interval estimate provides a range within which the 8. What is the t-distribution used for?
population parameter is likely to fall.
3. How do you calculate the margin of error for a ○ The t-distribution is used when the population
confidence interval for the mean? standard deviation is unknown and the sample size
is small (n < 30).
○ The margin of error is calculated as: ME=z×σnME = 9. How do you calculate the confidence interval for the
z \times \frac{\sigma}{\sqrt{n}} mean when sample size is large?
4. What does the confidence level represent in a
confidence interval? ○ Use the z-distribution if the sample size is large and
the population standard deviation is known.
○ The confidence level indicates the probability that the 10.How do you interpret a 95% confidence interval for the
confidence interval contains the true population population mean?
parameter.
5. What is the formula for a confidence interval for a ○ If you take many samples and construct 95%
population mean when the standard deviation is confidence intervals for each, about 95% of these
known? intervals will contain the true population mean.
○ A) Wider
○ B) Narrower
○ C) The same
○ D) Unpredictable
16.Answer: B
○ A) Sample mean
○ B) Sample standard deviation
○ C) Sample size
○ D) All of the above
18.Answer: D
1. Definition: A normal distribution is a probability distribution ○ Kolmogorov-Smirnov Test: For large sample sizes
that forms a bell-shaped curve when plotted. (>100).
○ Shapiro-Wilk Test: For smaller sample sizes (<100).
○ Key Characteristics: ○ p-value:
■ The total area under the curve is 1 (or 100%). ■ If p > 0.05, data is normally distributed.
■ The distribution is symmetric about the mean. ■ If p < 0.05, data is not normally distributed.
■ The two tails extend indefinitely. 2. Descriptive Methods:
○ Mean (μ) and Standard Deviation (σ) are key
parameters. ○ Skewness: Measures asymmetry of data.
○ Standard Normal Distribution is when the mean is ■ Positive skewness means the right tail is
0 and the standard deviation is 1. longer.
2. Data Presentation & Statistical Tests: ■ Negative skewness means the left tail is
longer.
○ For normally distributed data, use mean and ○ Kurtosis: Measures the "peakedness" of the
standard deviation. distribution.
○ For non-normal data, use median and ■ Positive kurtosis indicates a sharper peak,
minimum-maximum as measures of central while negative kurtosis suggests a flatter
tendency. distribution.
○ Hypothesis Testing: ○ Coefficient of Variation (CV): Standard deviation
■ If data is normally distributed, use divided by the mean.
parametric tests. ■ If CV < 30%, the data is considered normal.
■ If not, use non-parametric tests. 3. Visual Methods:
○ Kolmogorov-Smirnov Test.
5. How is skewness interpreted?
Methods Used to Determine Normality:
○ Positive skew indicates the right tail is longer;
● Box plot: Symmetry, with outliers being exceptions.
negative skew indicates the left tail is longer.
● Histogram: Symmetrical with no extreme peaks or flats.
6. What does a coefficient of variation less than 30%
● Skewness and Kurtosis: Should fall within acceptable
indicate?
ranges.
● Kolmogorov-Smirnov/Shapiro-Wilk Tests: p-value greater
○ The data is considered normal.
than 0.05 for normality.
7. What is the significance of a p-value greater than 0.05 in
normality testing?
○ a) Parametric tests
○ b) Non-parametric tests
○ c) Z-scores
○ d) Histogram
○ Answer: b) Non-parametric tests
10.Which plot can be used to check if data points follow a
straight line?
● a) Histogram
● b) Q-Q plot
● c) Stem-and-leaf plot
● d) Box plot
● Answer: b) Q-Q plot
t-tests: ○ The two groups should have normal distributions.
○ Variances of the two groups should be either equal
1. One Sample t-test: or unequal.
● Test Statistic:
● Purpose: Used to determine if the mean of a sample differs ○ If variances are assumed equal:
from a known population mean. t=(x1ˉ−x2ˉ)s12n1+s22n2t = \frac{(\bar{x_1} -
● Example: Testing if the average stress score of a group is \bar{x_2})}{\sqrt{\frac{s_1^2}{n_1} +
different from a known standard value (e.g., 84). \frac{s_2^2}{n_2}}}
● Steps: ○ If variances are assumed unequal:
1. State the hypothesis: t=(x1ˉ−x2ˉ)s12/n1+s22/n2t = \frac{(\bar{x_1} -
■ Null Hypothesis (H0): Mean = 84 \bar{x_2})}{\sqrt{s_1^2/n_1 + s_2^2/n_2}}
■ Alternative Hypothesis (Ha): Mean ≠ 84
2. Calculate the t-value: t=xˉ−μsnt = \frac{\bar{x} - 3. Paired t-test:
\mu}{\frac{s}{\sqrt{n}}}
■ Where xˉ\bar{x} = sample mean, μ\mu = ● Purpose: Compares the means of two related groups (e.g.,
population mean, ss = sample standard measuring the effect of a treatment before and after the
deviation, and nn = sample size. intervention).
3. Use the t-table to find the critical t-value and ● Hypothesis:
compare it with the calculated t-value to decide ○ Null Hypothesis (H0): The mean of the differences
whether to reject the null hypothesis. (d) is 0.
○ Alternative Hypothesis (Ha): The mean of the
2. Independent t-test: differences (d) is not 0.
● Test Statistic: t=dˉsd/nt = \frac{\bar{d}}{s_d/\sqrt{n}}
● Purpose: Compares the means of two independent groups ○ Where dˉ\bar{d} is the mean of the differences and
(e.g., comparing stress scores between males and females). sds_d is the standard deviation of the differences.
● Hypothesis:
○ Null Hypothesis (H0): μ1=μ2\mu_1 = \mu_2 (The
means are equal).
○ Alternative Hypothesis (Ha): μ1≠μ2\mu_1 \neq
\mu_2 (The means are different).
● Assumptions:
1. What is a one-sample t-test used for? ○ It is used to determine the rejection region for the
null hypothesis.
○ It is used to compare the mean of a sample with a 8. What does the t-table provide?
known population mean.
2. What is the null hypothesis in a one-sample t-test? ○ It provides the critical t-values for different degrees of
freedom and significance levels.
○ The null hypothesis states that the sample mean is 9. What happens if the calculated t-value exceeds the
equal to the population mean. critical t-value?
3. In an independent t-test, what does a p-value less than
0.05 indicate? ○ If the calculated t-value exceeds the critical t-value,
the null hypothesis is rejected.
○ It indicates strong evidence to reject the null 10.What does the paired t-test assume about the data?
hypothesis, meaning the means of the two groups
are significantly different. ○ It assumes that the data consists of paired
4. What is the assumption for an independent t-test observations (e.g., before and after measurements
regarding variances? on the same individuals).
○ It compares the means of two related groups (e.g., ○ A) To compare two sample means
before and after treatment). ○ B) To compare a sample mean to a known
6. How is the test statistic for an independent t-test population mean
calculated? ○ C) To test the variance between two samples
○ D) To test the differences within a sample
○ The test statistic is calculated as the difference in ○ Answer: B
means divided by the standard error of the difference 2. In an independent t-test, what do the null and alternative
in means. hypotheses represent?
7. What is the purpose of the critical t-value?
○ A) The sample mean equals the population mean
○ B) The two sample means are equal or different ○ B) The average value of all test statistics
○ C) The variance between the samples is equal ○ C) The maximum sample size for the test
○ D) The sample mean is less than the population ○ D) The value used to calculate the degrees of
mean freedom
○ Answer: B ○ Answer: A
3. Which of the following is a requirement for conducting 7. For a two-tailed test with α=0.05\alpha = 0.05, what are
an independent t-test? the rejection areas?
● The null hypothesis (Ho) states that all group means are ● Tukey, Bonferroni, and LSD (Least Significant Difference).
equal.
9. What is a Type I error?
3. What assumption is made regarding the population in
ANOVA? ● A Type I error occurs when the null hypothesis is rejected
when it is actually true.
● The population is assumed to be normally distributed.
10. What does it mean if the F-statistic is less than the
4. What is the F-statistic in ANOVA used for? F-critical value?
● It compares the variance between groups to the variance ● You fail to reject the null hypothesis, indicating no significant
within groups to determine if group means are significantly difference between the groups.
different.
○ Simple linear regression is a method to model the ○ Homoscedasticity refers to the assumption that the
relationship between two variables, where one is variance of the errors is constant across all levels of
independent and the other is dependent. the independent variable.
2. What does the slope (B) represent in a regression 7. How can we test if the slope is significantly different
model? from zero?
○ The slope (B) represents the change in the ○ We can use hypothesis testing:
dependent variable (y) for a one-unit change in the ■ Null Hypothesis (H₀): B = 0
independent variable (x). ■ Alternative Hypothesis (H₁): B ≠ 0
3. What is the purpose of the y-intercept (A) in the 8. Why is the least squares method used in linear
regression equation? regression?
○ The least squares method minimizes the sum of the 3. What is the function of the y-intercept (A) in the
squared differences between the observed values regression equation?
and the predicted values, resulting in the best fit line.
9. What is the meaning of a p-value less than 0.05 in ○ A) To predict future values
regression analysis? ○ B) To indicate the error term
○ C) The predicted value when x = 0
○ A p-value less than 0.05 indicates that the ○ D) To calculate the slope
relationship between the variables is statistically 4. Answer: C
significant, meaning the slope is likely not zero.
10.What is the significance of a high r² value? 5. Which of the following is NOT an assumption of simple
linear regression?
○ A high r² value indicates that the independent
variable explains a large portion of the variance in ○A) The relationship between x and y is linear
the dependent variable, meaning the model fits the ○B) The errors are normally distributed
data well. ○C) There is a homoscedastic relationship
○D) The independent and dependent variables are
categorical
6. Answer: D
1. What does simple linear regression predict?
7. What does the coefficient of determination (r²) indicate?
○ A) The relationship between multiple independent
variables and a dependent variable ○ A) The exact predicted value of y
○ B) The relationship between two variables (one ○ B) The percentage of the variance in y explained by
independent and one dependent) x
○ C) The correlation between two dependent variables ○ C) The slope of the regression line
○ D) The relationship between a dependent variable ○ D) The error of prediction
and a random variable 8. Answer: B
2. Answer: B
9. What does a p-value less than 0.05 indicate in
hypothesis testing?
○ A) The regression line is not useful ○ C) Test the significance of the slope
○ B) The null hypothesis is likely true ○ D) Write the regression equation
○ C) The regression model is statistically significant 16.Answer: B
○ D) There is no linear relationship between x and y
10.Answer: C 17.Which of the following indicates a strong positive linear
relationship?
11.Which of the following best describes the slope (B) in
the equation y = A + Bx? ○ A) r = -0.8
○ B) r = 0.0
○ A) The expected value of y when x = 0 ○ C) r = 1.0
○ B) The constant term of the regression line ○ D) r = 0.5
○ C) The amount by which y increases for a one-unit 18.Answer: C
increase in x
○ D) The error term of the model 19.What does homoscedasticity refer to in regression
12.Answer: C analysis?
13.In regression analysis, what does the error term (ε) ● A) The relationship between x and y is linear
account for? ● B) The variance of errors is consistent across all levels of x
● C) The data follows a normal distribution
○A) The predicted value of y ● D) There are no outliers in the data
○B) The random variations and unaccounted factors
○C) The slope of the regression line Answer: B
○D) The variance explained by the independent
variable
14.Answer: B
○ These tests do not rely on any assumptions about ○ Rank data when performing non-parametric tests like
the distribution of the population. They are useful Mann-Whitney and Kruskal-Wallis.
when the data is non-normal or when the data is ○ In the case of ties, ranks are averaged.
ordinal. 4. Assumptions of Non-Parametric Tests:
○ Common non-parametric tests include:
■ Mann-Whitney U Test (Wilcoxon ○ Independence of samples.
Rank-Sum Test): Compares two ○ Measurement scale at least ordinal (ranked data).
independent groups.
■ Kruskal-Wallis H Test: Non-parametric
equivalent of one-way ANOVA, comparing
three or more independent groups. 1. What is a non-parametric test?
■ Wilcoxon Signed Rank Test: Used for
comparing two related samples, equivalent to ○ Non-parametric tests are statistical tests that do not
the paired t-test. assume a specific distribution for the data. They are
■ Spearman’s Rho: Measures the correlation useful when the data are ordinal or not normally
between two ranked variables. distributed.
■ Friedman’s Test: Non-parametric equivalent 2. What is the Mann-Whitney U test used for?
of repeated measures ANOVA.
2. Advantages and Disadvantages: ○ The Mann-Whitney U test compares two
independent groups to determine if there is a
○ Advantages: significant difference between them.
■ Fewer assumptions and valid for non-normal 3. What is the non-parametric equivalent of a one-way
data. ANOVA?
■ Useful for ordinal data, ranked data, and
outliers. ○ The Kruskal-Wallis H test is the non-parametric
○ Disadvantages: equivalent of one-way ANOVA.
■ May waste information by using ranks rather 4. What are the assumptions of the Wilcoxon Signed Rank
than actual values. Test?
■ Less powerful compared to parametric tests.
○ The test assumes that the samples are paired ● A p-value of less than 0.05 typically indicates that there is a
(related), and the data should be at least ordinal. statistically significant difference between the groups being
5. How do you handle tied ranks in non-parametric tests? compared.
○ A) 0.01
○ B) 0.05
○ C) 0.10
○ D) 0.50
Notes on Chi-Square Test ○ EE = expected frequency
4. Formula for Chi-Square ● Compare the calculated chi-square statistic to the critical
value (from chi-square tables).
● χ2=∑(O−E)2E\chi^2 = \sum \frac{(O - E)^2}{E}, where: ● If χ2\chi^2 > critical value, reject the null hypothesis.
○ OO = observed frequency
9. Reporting Chi-Square Results answers whether certain categories are more
popular than expected.
● Example: χ² (df, N = sample size) = calculated value, p = 5. How do you report Chi-Square test results in APA
p-value. format?
● Report should include degrees of freedom, test statistic
value, and p-value. ○ Answer: Report results as
χ2(df,N=samplesize)=calculatedvalue,p=p−value\chi
1. Short Answer Questions ^2 (df, N = sample size) = calculated value, p =
p-value, for example, "χ²(4, N = 500) = 286.00, p <
1. What is the Chi-Square test used for?
.001".
6. Which of the following is NOT an assumption for the
○ Answer: It is used to determine if there is a
Chi-Square test?
significant association between two categorical
variables or if an observed distribution differs from an
○ A) Variables must be continuous
expected distribution.
○ B) Observations should be independent
2. What are the assumptions of the Chi-Square test?
○ C) Expected frequencies should be at least 5
○ D) Data should be randomly sampled
○ Answer: Variables should be categorical, data must
○ Answer: A) Variables must be continuous
be randomly sampled, observations should be
7. What is the main purpose of a Chi-Square test of
independent, and all expected cell frequencies must
independence?
be at least 5.
3. Explain the formula used in the Chi-Square test.
○ A) To test the goodness-of-fit
○ B) To determine if there is a relationship between two
○ Answer: The formula is χ2=∑(O−E)2E\chi^2 = \sum
categorical variables
\frac{(O - E)^2}{E}, where O is the observed
○ C) To compare the means of different groups
frequency and E is the expected frequency.
○ D) To check if observed frequencies differ from
4. What is the purpose of the goodness-of-fit test?
expected frequencies
○ Answer: B) To determine if there is a relationship
○ Answer: The goodness-of-fit test checks if observed
between two categorical variables
frequencies match an expected distribution. It
8. What does a high chi-square statistic indicate?
○ A) A high degree of independence
○ B) A significant difference between observed and
expected frequencies
○ C) No difference between observed and expected
frequencies
○ D) More samples are needed
○ Answer: B) A significant difference between
observed and expected frequencies
9. In which of the following cases would you use Fisher’s
Exact Test instead of the Chi-Square test?