[go: up one dir, main page]

0% found this document useful (0 votes)
30 views45 pages

Biostatistics in A Nutshell

The document provides an overview of biostatistics, detailing types of data (numerical and categorical), data presentation methods (tables, charts), and measures of central tendency and dispersion. It includes definitions and examples of various data types, guidelines for presenting data, and answers to common questions about statistical concepts. Additionally, it covers the importance of data presentation and the relationship between different statistical measures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views45 pages

Biostatistics in A Nutshell

The document provides an overview of biostatistics, detailing types of data (numerical and categorical), data presentation methods (tables, charts), and measures of central tendency and dispersion. It includes definitions and examples of various data types, guidelines for presenting data, and answers to common questions about statistical concepts. Additionally, it covers the importance of data presentation and the relationship between different statistical measures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Biostatistics in a nutshell ■​ Specific measurements (integer values).

■​ Categories:
Types of Data: ■​ Numerical: Counts (e.g., number of
accidents).
1.​ Data (Plural), Datum (Singular)​ ■​ Categorical:
■​ Nominal: No order, e.g.,
○​ Defined as numbers used in statistics, derived from gender.
counting or measurements. ■​ Ordinal: Ordered categories,
2.​ Variable:​ e.g., education levels.
○​ Continuous Data:
○​ Anything that can vary. ■​ Interval: Measurable differences but no true
○​ Examples: blood pressure, age, sex. zero, e.g., temperature.
■​ Ratio: Includes zero, e.g., height, weight.
Types of Data:

1.​ Qualitative (Non-numeric) Data Presentation Methods:


2.​ Quantitative (Numeric)
1.​ Tables:​
Categories of Data:
○​ Exact data and comparisons.
1.​ Unstructured Data (Non-numeric)​ ○​ Frequency and relative frequency distributions.
2.​ Bar Charts:​
○​ Characteristics:
■​ No organized structure (text or non-text). ○​ Compare categories (e.g., number of enrollments per
■​ Examples: Emails, videos, images, and site).
reports. ○​ Suitable for comparisons.
○​ Popular in Big Data Era. 3.​ Line Graphs:​
○​ Needs structuring before analysis.
2.​ Structured Data (Numeric)​ ○​ Show trends over time (e.g., changes in clinic staff
numbers).
○​ Discrete Data: 4.​ Pie Charts:​
○​ Display percentages of a whole (e.g., market share). 5.​ What is quantitative data?​

General Guidelines for Presenting Data: ○​ Answer: Data that is numerical and can be
measured, such as height, weight, or temperature.
1.​ Simplify messages but don’t distort data. 6.​ What is unstructured data?​
2.​ Always label charts and include a title.
3.​ Indicate data source and date. ○​ Answer: Data that is not organized and does not
4.​ Use the appropriate chart type for your data (Bar for have a predefined model, such as emails, videos,
categories, Line for trends). and audio files.
7.​ What is structured data?​

○​ Answer: Data that is organized and easily


30 Questions and Answers: searchable, such as databases with numeric values
or categories.
1.​ What is the difference between data and datum?​ 8.​ What is discrete data?​

○​ Answer: Data is the plural form, referring to multiple ○​ Answer: Data that can only take certain distinct
pieces of information. Datum refers to a single piece values, such as counts or integers.
of information. 9.​ What is continuous data?​
2.​ What is a variable in statistics?​
○​ Answer: Data that can take any value within a given
○​ Answer: A variable is any characteristic or attribute range, such as height or temperature.
that can change, such as age, weight, or 10.​Give an example of nominal data.​
temperature.
3.​ What are the two main types of variables?​ ○​ Answer: Gender, type of vehicle.
11.​Give an example of ordinal data.​
○​ Answer: Qualitative and Quantitative.
4.​ What is qualitative data?​ ○​ Answer: Education level (e.g., high school,
bachelor’s, master’s).
○​ Answer: Data that describes qualities or 12.​What is interval data?​
characteristics, such as colors, names, or types.
○​ Answer: Data with meaningful intervals between 20.​What is the best way to present trends over time?​
values, but no true zero point, like temperature in
Celsius. ○​ Answer: Line graphs.
13.​What is ratio data?​ 21.​What type of data is represented by pie charts?​

○​ Answer: Data that has both meaningful intervals and ○​ Answer: Proportional or percentage data.
a true zero point, such as weight or income. 22.​How should data be summarized?​
14.​Why is unstructured data important?​
○​ Answer: Simplify the message, ensure clarity, and
○​ Answer: It represents the majority of data and is use either graphs or tables for presentation.
becoming increasingly useful with advancements in 23.​What is the importance of labeling components in a
data processing. graphic?​
15.​What is a frequency distribution?​
○​ Answer: It ensures the viewer can clearly
○​ Answer: A table showing how often each value or understand the data being presented.
group of values appears in a dataset. 24.​What is the role of a table in data presentation?​
16.​What is relative frequency?​
○​ Answer: It allows for exact comparisons between
○​ Answer: The proportion of occurrences of a certain data points.
value, often expressed as a percentage. 25.​What is a stacked bar chart used for?​
17.​What is a bar chart used for?​
○​ Answer: Representing parts of a whole and
○​ Answer: Comparing different categories or groups of comparing totals across categories.
data. 26.​How do bar charts and line graphs differ?​
18.​What is a line graph used for?​
○​ Answer: Bar charts compare categories, while line
○​ Answer: Displaying trends or changes over time. graphs show trends over time.
19.​What is a pie chart used for?​ 27.​What should always be included when presenting data?​

○​ Answer: Showing how parts make up a whole, ○​ Answer: A title, labels, source of data, and reference
typically in percentages. to the number of observations.
28.​What is the importance of data presentation in ○​ d) Height
statistics?​ ○​ Answer: c) Gender
3.​ What type of data is referred to as 'interval data'?​
○​ Answer: It helps communicate complex data in a
simple and understandable format. ○​ a) Data with no order and no true zero
29.​What is a common mistake when presenting data ○​ b) Data with meaningful intervals and no true zero
visually?​ ○​ c) Data with a true zero
○​ d) Data in categories without numerical values
○​ Answer: Overcomplicating the presentation or ○​ Answer: b) Data with meaningful intervals and no
misrepresenting data to mislead the audience. true zero
30.​Why is it important to use the correct type of graph?​ 4.​ Which type of data can only take specific distinct values
(e.g., number of accidents)?​
○​ Answer: The right graph ensures the data is
presented in the clearest and most effective way for ○​ a) Continuous data
understanding. ○​ b) Discrete data
○​ c) Interval data
MCQs: ○​ d) Ratio data
○​ Answer: b) Discrete data
1.​ What does the term 'data' refer to?​ 5.​ What is the primary purpose of a bar chart?​

○​ a) A singular piece of information ○​ a) To show trends over time


○​ b) A collection of numbers or information ○​ b) To compare categories of data
○​ c) A type of graph ○​ c) To display percentages of a whole
○​ d) None of the above ○​ d) To represent continuous data
○​ Answer: b) A collection of numbers or information ○​ Answer: b) To compare categories of data
2.​ Which of the following is an example of qualitative 6.​ Which of the following is an example of ratio data?​
data?​
○​ a) Temperature in Celsius
○​ a) Age ○​ b) Height of a mountain
○​ b) Temperature ○​ c) Level of satisfaction
○​ c) Gender ○​ d) Education level
○​ Answer: b) Height of a mountain ○​ c) Discrete data is based on percentages, and
7.​ Which type of data is used to compare parts of a whole?​ continuous data is based on categories.
○​ d) There is no difference between the two.
○​ a) Line graph ○​ Answer: a) Discrete data can only take integer
○​ b) Pie chart values, while continuous data can take any value
○​ c) Bar chart within a range.
○​ d) Frequency table
○​ Answer: b) Pie chart
8.​ In data presentation, what is the importance of
including labels?​

○​ a) To make the graphic more attractive


○​ b) To ensure clarity and understanding
○​ c) To make the graph look more professional
○​ d) None of the above
○​ Answer: b) To ensure clarity and understanding
9.​ Which of the following best represents continuous
data?​

○​ a) Gender
○​ b) Number of children
○​ c) Temperature
○​ d) Satisfaction rating
○​ Answer: c) Temperature
10.​What is the key difference between discrete and
continuous data?​

○​ a) Discrete data can only take integer values, while


continuous data can take any value within a range.
○​ b) Continuous data is always text-based, while
discrete data is numerical.
Measures of Central Tendency: Measures of Dispersion (Spread/Variability):

●​ Central Tendency refers to the central or typical value in a ●​ Purpose: To describe the spread of the data or how far the
dataset. data points are from the central tendency.
●​ Purpose: To find a single value that represents the entire
dataset. Types of Measures:

Types of Measures: 1.​ Variance:​

1.​ Mean (Arithmetic Mean):​ ○​ Measures how far each value is from the mean.
○​ Formula (Sample): s2=∑(X−X‾)2n−1s^2 = \frac{\sum
○​ Formula: Mean=∑Xn\text{Mean} = \frac{\sum X}{n} (X - \overline{X})^2}{n-1}
○​ Sum of all values divided by the number of values. ○​ Formula (Population): σ2=∑(X−μ)2N\sigma^2 =
○​ Best for symmetric distributions without outliers. \frac{\sum (X - \mu)^2}{N}
2.​ Median:​ 2.​ Standard Deviation (SD):​

○​ The middle value when the data is sorted in ○​ The square root of the variance.
ascending or descending order. ○​ Formula (Sample): s=s2s = \sqrt{s^2}
○​ Formula for odd number of values: ○​ Formula (Population): σ=σ2\sigma = \sqrt{\sigma^2}
Middle=N+12\text{Middle} = \frac{N+1}{2}. ○​ Describes how spread out the values are.
○​ For even numbers, take the average of the middle 3.​ Coefficient of Variation (CV):​
two numbers.
○​ Resistant to extreme values (outliers). ○​ The ratio of the standard deviation to the mean.
3.​ Mode:​ ○​ Formula: CV=σμCV = \frac{\sigma}{\mu} for
population or CV=sX‾CV = \frac{s}{\overline{X}} for
○​ The most frequent value in a dataset. sample.
○​ Can be: ○​ A measure of relative variability.
■​ Unimodal: One mode. 4.​ Range:​
■​ Bimodal: Two modes.
■​ Polymodal: More than two modes. ○​ The difference between the highest and lowest
values in a dataset.
○​ Simple but ignores other values. 5.​ Which of the following is a measure of dispersion?​
5.​ Inter-Quartile Range (IQR):​
○​ a) Mean
○​ The range between the 1st quartile (Q1) and the 3rd ○​ b) Median
quartile (Q3). ○​ c) Mode
○​ Formula: IQR=Q3−Q1IQR = Q3 - Q1 ○​ d) Standard Deviation
○​ Preferred measure of spread in skewed distributions. ○​ Answer: d) Standard Deviation
6.​ Which measure of central tendency is not influenced by
Quartiles: extreme values?​

●​ Q1 (25th percentile): 1st quartile. ○​ Answer: The median.


●​ Q2 (50th percentile): Median. 7.​ What is the formula for calculating the mean?​
●​ Q3 (75th percentile): 3rd quartile.
○​ Answer: Mean=∑Xn\text{Mean} = \frac{\sum X}{n}
30 Questions and Answers: 8.​ What is the standard deviation a measure of?​

1.​ What is the purpose of measures of central tendency?​


○​ Answer: The spread or dispersion of a dataset.
9.​ Which of the following describes the range of a
○​ Answer: To find a single value that represents the
dataset?​
center of the dataset.
2.​ Which measure of central tendency is most affected by
○​ a) Difference between the highest and lowest values
extreme values (outliers)?​
○​ b) The square root of the variance
○​ c) The middle value
○​ Answer: The mean.
○​ d) The most frequent value
3.​ How do you calculate the median in an odd-numbered
○​ Answer: a) Difference between the highest and
dataset?​
lowest values
10.​What does the coefficient of variation measure?​
○​ Answer: Sort the data and select the middle value.
4.​ What is the mode of a dataset?​
○​ Answer: The relative variability of a dataset.
11.​What is the formula for calculating variance (sample)?​
○​ Answer: The most frequent value in the dataset.
○​ Answer: s2=∑(X−X‾)2n−1s^2 = \frac{\sum (X - 18.​In a normal distribution, what is the relationship
\overline{X})^2}{n-1} between mean, median, and mode?​
12.​If a dataset has a positive skew, which of the following
is true?​ ○​ Answer: Mean = Median = Mode.
19.​What is the primary use of the standard deviation?​
○​ a) Mean > Median > Mode
○​ b) Mean < Median < Mode ○​ Answer: To measure the spread of data around the
○​ c) Mean = Median = Mode mean.
○​ d) None of the above 20.​What is the formula for calculating the coefficient of
○​ Answer: a) Mean > Median > Mode variation (CV)?​
13.​How do you calculate the median in an even-numbered
dataset?​ ○​ Answer: CV=sX‾CV = \frac{s}{\overline{X}} for
sample.
○​ Answer: Take the average of the two middle values 21.​How does the median differ from the mean in terms of
after sorting the data. sensitivity to outliers?​
14.​What is the difference between sample variance and
population variance?​ ○​ Answer: The median is not affected by outliers,
while the mean is.
○​ Answer: Sample variance divides by n−1n-1, 22.​What is the definition of variance?​
whereas population variance divides by NN.
15.​What is the purpose of inter-quartile range (IQR)?​ ○​ Answer: The average of the squared deviations from
the mean.
○​ Answer: To measure the spread of data between the 23.​What does a higher coefficient of variation indicate?​
25th percentile (Q1) and the 75th percentile (Q3).
16.​What is a bimodal distribution?​ ○​ Answer: More relative variability in the dataset.
24.​What is the significance of the inter-quartile range (IQR)
○​ Answer: A distribution with two modes. in skewed distributions?
17.​How do you calculate the mode in a dataset? 25.​
○​ Answer: Identify the value that appears most ○​ Answer: IQR is preferred over standard deviation
frequently. because it is not influenced by extreme values.
26.​What is the mean of the dataset: 10, 8, 14, 15, 7, 3, 3, 8, MCQs:
12, 10, 9?
○​ Answer: 9. 1.​ What does the measure of central tendency help to
27.​What does a high standard deviation indicate? determine?
○​ Answer: The data is more spread out from the ○​ a) The spread of data
mean. ○​ b) The center of the data distribution
28.​What is the first quartile (Q1) of a dataset?​ ○​ c) The highest value in the dataset
○​ d) The difference between the highest and lowest
○​ Answer: The value at the 25th percentile, or the values
median of the lower half of the data. ○​ Answer: b) The center of the data distribution
29.​How do you calculate the variance of a sample?​ 2.​ Which measure of central tendency is most affected by
outliers?
○​ Answer: Subtract the sample mean from each data ○​ a) Median
point, square the result, sum them up, and divide by ○​ b) Mode
n−1n-1. ○​ c) Mean
30.​Which of the following is true for the range?​ ○​ d) Range
○​ Answer: c) Mean
○​ a) It considers all data points 3.​ What is the median of the dataset: 10, 15, 20, 25, 30?
○​ b) It only considers the highest and lowest values ○​ a) 10
○​ c) It is the average of the data ○​ b) 20
○​ d) It cannot be calculated for skewed data ○​ c) 25
○​ Answer: b) It only considers the highest and lowest ○​ d) 15
values ○​ Answer: b) 20
31.​What happens when the data is skewed negatively?​ 4.​ Which of the following is true about the mode?
○​ a) It is the most frequent value in the dataset
○​ Answer: Mean < Median < Mode ○​ b) It is always the middle value
○​ c) It is the sum of all values divided by the number of
values
○​ d) It cannot be calculated for categorical data
○​ Answer: a) It is the most frequent value in the
dataset
5.​ What is the relationship between the mean, median, and ○​ d) The total sum of data values
mode in a normal distribution?​ ○​ Answer: b) The average deviation from the mean
9.​ What does the coefficient of variation (CV) represent?​
○​ a) Mean = Median = Mode
○​ b) Mean > Median > Mode ○​ a) The variance relative to the mean
○​ c) Mean < Median < Mode ○​ b) The standard deviation as a percentage of the
○​ d) There is no relationship mean
○​ Answer: a) Mean = Median = Mode ○​ c) The range divided by the mean
6.​ Which of the following is a measure of dispersion?​ ○​ d) The variance divided by the sample size
○​ Answer: b) The standard deviation as a percentage
○​ a) Mean of the mean
○​ b) Median 10.​Which of the following is true for the interquartile range
○​ c) Mode (IQR)?​
○​ d) Standard Deviation
○​ Answer: d) Standard Deviation ○​ a) It measures the spread between the smallest and
7.​ What is the formula for calculating variance (sample)?​ largest values
○​ b) It measures the range between the 1st and 3rd
○​ a) s2=∑(X−X‾)2ns^2 = \frac{\sum (X - quartiles
\overline{X})^2}{n} ○​ c) It is calculated by subtracting the mean from the
○​ b) s2=∑(X−X‾)2n−1s^2 = \frac{\sum (X - median
\overline{X})^2}{n-1} ○​ d) It is used when data is normally distributed
○​ c) s2=∑X2ns^2 = \frac{\sum X^2}{n} ○​ Answer: b) It measures the range between the 1st
○​ d) s2=∑Xn−1s^2 = \frac{\sum X}{n-1} and 3rd quartiles
○​ Answer: b) s2=∑(X−X‾)2n−1s^2 = \frac{\sum (X -
\overline{X})^2}{n-1}
8.​ What does the standard deviation measure?​

○​ a) The central value of a dataset


○​ b) The average deviation from the mean
○​ c) The difference between the highest and lowest
values
Simplified Notes on Hypothesis Testing: ○​ For known σ and n > 30, we use the z-distribution.
6.​ P-Value: The probability of obtaining test results at least as
1.​ Hypothesis Definition: A hypothesis is a claim or extreme as the results actually observed, under the
statement that is tested through statistical analysis.​ assumption that the null hypothesis is correct.​

2.​ Types of Hypotheses:​ 7.​ Conclusion: If the p-value is smaller than the significance
level (α), we reject the null hypothesis. Otherwise, we fail to
○​ Research Hypothesis (RH): A statement about a reject it.​
population parameter that we want to test.
○​ Statistical Hypothesis: Includes two types:
■​ Null Hypothesis (Ho): A claim that there is
no effect or no difference.
■​ Alternative Hypothesis (Ha): A claim that 30 Questions and Answers on Hypothesis Testing:
there is an effect or difference.
3.​ Errors in Hypothesis Testing:​ 1.​ What is a hypothesis?​

○​ Type I Error (α): Rejecting the null hypothesis when ○​ A hypothesis is a statement or claim that is tested
it is true. through statistical methods.
○​ Type II Error (β): Failing to reject the null hypothesis 2.​ What is the difference between a Research Question
when it is false. (RQ) and a Research Hypothesis (RH)?​
4.​ Rejection Regions:​
○​ An RQ is phrased as a question and does not imply
○​ Two-tailed test: Rejection regions in both tails of the a specific claim. An RH is a statement about a
distribution curve. population parameter.
○​ Left-tailed test: Rejection region in the left tail. 3.​ What is the null hypothesis (Ho)?​
○​ Right-tailed test: Rejection region in the right tail.
5.​ Test Statistics:​ ○​ It is the claim that there is no effect or difference in
the population parameter.
○​ For hypothesis tests about the population mean (μ), 4.​ What is the alternative hypothesis (Ha)?​
if the population standard deviation (σ) is unknown,
we use the t-distribution.
○​ It is the claim that there is an effect or difference, 12.​When should you use a t-distribution?​
opposing the null hypothesis.
5.​ What does a Type I error refer to?​ ○​ Use a t-distribution when the population standard
deviation is unknown and the sample size is small (n
○​ It refers to rejecting the null hypothesis when it is < 30).
actually true. 13.​When should you use a z-distribution?​
6.​ What does a Type II error refer to?​
○​ Use a z-distribution when the population standard
○​ It refers to failing to reject the null hypothesis when it deviation is known and the sample size is large (n >
is actually false. 30).
7.​ What is a one-tailed test?​ 14.​What is the formula for the test statistic in a t-test?​

○​ It is a hypothesis test where the rejection region is on ○​ t=xˉ−μ0snt = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}}


one side of the distribution curve (left or right). 15.​What is the degrees of freedom (df) for a t-test?​
8.​ What is a two-tailed test?​
○​ df=n−1df = n - 1, where nn is the sample size.
○​ It is a hypothesis test where the rejection regions are 16.​What happens if the test statistic falls in the rejection
on both sides of the distribution curve. region?​
9.​ What is the significance level (α)?​
○​ If the test statistic falls in the rejection region, we
○​ It is the probability of making a Type I error. Common reject the null hypothesis.
values are 0.05 or 0.01. 17.​What is the critical value in hypothesis testing?​
10.​What is the p-value in hypothesis testing?​
○​ The critical value is the value that separates the
○​ The p-value is the probability of observing data at rejection region from the non-rejection region. It
least as extreme as the observed data, assuming the depends on the significance level and the degrees of
null hypothesis is true. freedom.
11.​What is the decision rule in hypothesis testing?​ 18.​What does it mean to reject the null hypothesis?​

○​ If the p-value is less than the significance level (α),


reject the null hypothesis. Otherwise, fail to reject it.
○​ Rejecting the null hypothesis means there is ○​ A non-directional hypothesis, or two-tailed test,
sufficient evidence to support the alternative states that a parameter is different from a specific
hypothesis. value but does not specify whether it is higher or
19.​What does it mean to fail to reject the null hypothesis?​ lower.
25.​What is the difference between a population and a
○​ Failing to reject the null hypothesis means there is sample in hypothesis testing?​
insufficient evidence to support the alternative
hypothesis. ○​ A population is the entire group being studied, while
20.​What is the relationship between the confidence level a sample is a subset of the population.
and significance level?​ 26.​What is the formula for calculating the test statistic in a
z-test?​
○​ The confidence level is 1−α1 - \alpha, where α\alpha
is the significance level. ○​ z=xˉ−μ0σnz = \frac{\bar{x} -
21.​What is the power of a test?​ \mu_0}{\frac{\sigma}{\sqrt{n}}}
27.​What is a critical region?​
○​ The power of a test is the probability of correctly
rejecting the null hypothesis when it is false. ○​ A critical region is the range of values for which the
22.​What does a p-value of 0.03 mean?​ null hypothesis is rejected.
28.​What happens if you increase the sample size?​
○​ A p-value of 0.03 means there is a 3% chance of
observing the data, or something more extreme, if ○​ Increasing the sample size generally increases the
the null hypothesis is true. power of the test and makes it easier to detect a true
23.​What is the difference between a left-tailed and a effect.
right-tailed test?​ 29.​How do you calculate the degrees of freedom for a
t-test?​
○​ A left-tailed test checks if the parameter is less than
a value, while a right-tailed test checks if the ○​ Degrees of freedom (df) for a t-test is n−1n - 1,
parameter is greater than a value. where nn is the sample size.
24.​What is a non-directional hypothesis?​ 30.​What is the purpose of hypothesis testing?​
○​ The purpose of hypothesis testing is to make A) Fail to reject the null hypothesis​
inferences about a population based on sample data B) Reject the null hypothesis​
and determine whether there is enough evidence to C) Accept the alternative hypothesis​
support a claim. D) Increase the sample size

1. What does the null hypothesis (Ho) represent? Answer: B) Reject the null hypothesis

A) The claim that there is a significant difference in the population​


B) The claim that there is no significant difference in the population​
C) The sample mean​ 4. What does a Type II error refer to?
D) The sample standard deviation
A) Rejecting the null hypothesis when it is true​
Answer: B) The claim that there is no significant difference in the B) Accepting the null hypothesis when it is false​
population C) Failing to reject the null hypothesis when it is false​
D) Making the correct decision

Answer: C) Failing to reject the null hypothesis when it is false


2. Which of the following errors occurs when the null
hypothesis is rejected when it is true?

A) Type I Error​
5. What is the degrees of freedom (df) for a t-test
B) Type II Error​ when the sample size (n) is 20?
C) Alpha Error​
D) Beta Error A) 20​
B) 19​
Answer: A) Type I Error C) 18​
D) 21

Answer: B) 19
3. If the p-value is smaller than the significance level
(α), what is the correct decision?
6. Which type of test is used when you are testing if a C) Normal distribution​
parameter is greater than a value? D) F-distribution

A) Two-tailed test​ Answer: B) t-distribution


B) Left-tailed test​
C) Right-tailed test​
D) Non-directional test
9. What does a two-tailed test check for?
Answer: C) Right-tailed test
A) A difference in only one direction (greater or less)​
B) A difference in both directions (greater or less)​
C) A difference from a mean of 0​
7. In a two-tailed test, if the test statistic falls in the D) Whether the data are normally distributed
rejection region, what should you do? Answer: B) A difference in both directions (greater or less)

A) Fail to reject the null hypothesis​


B) Reject the null hypothesis​
C) Increase the sample size​ 10. Which of the following is true about the
D) Change the significance level
confidence level and significance level (α)?
Answer: B) Reject the null hypothesis
A) Confidence level = α​
B) Confidence level = 1 − α​
C) Significance level is always 1​
8. Which of the following distributions is used for D) Confidence level increases as α increases
hypothesis testing when the population standard Answer: B) Confidence level = 1 − α
deviation is unknown?

A) z-distribution​
B) t-distribution​
Simplified Notes on Normal Distribution: 5.​ Converting to Standard Normal Distribution:​

1.​ Normal Distribution Basics:​ ○​ When a random variable (X) follows a normal
distribution with a mean (μ) and standard deviation
○​ A continuous probability distribution often referred to (σ), it can be converted to the standard normal
as the Gaussian distribution. distribution using the Z-score formula.
○​ It is bell-shaped and symmetric around the mean 6.​ Example of Calculating Probability:​
(μ).
○​ Parameters: ○​ If X is normally distributed with a mean of 24 and a
■​ Mean (μ) – central location of the curve. standard deviation of 3, the probability for X>19X >
■​ Standard deviation (σ) – determines the 19 can be calculated by converting the X value to Z.
spread of the distribution. 7.​ Z-Table:​
2.​ Standard Normal Distribution:​
○​ A Z-table provides the area under the standard
○​ When the mean (μ) is 0 and standard deviation (σ) is normal curve for various Z-values.
1. ○​ You use it to find the cumulative probability up to a
○​ In this distribution, Z-scores are used, which certain Z-score.
represent the number of standard deviations a value 8.​ Key Areas Under the Curve:​
is from the mean.
3.​ Z-Scores:​ ○​ Area between two Z-scores: To find the area
between two Z-scores, calculate the area under the
○​ Z = (X - μ) / σ curve for each Z-score and subtract the smaller area
○​ The Z-score helps in standardizing a normal from the larger area.
distribution to the standard normal distribution. ○​ Right-tail and Left-tail areas: Z-tables can also be
4.​ Probability under the Normal Curve:​ used to find the probability for the right or left tails of
the distribution.
○​ The area under the curve represents probability. For
the standard normal distribution, the total area is 1.
○​ To calculate probabilities, you can use the Z-table,
which gives the cumulative probability for Z-scores.
30 Questions and Answers on Normal Distribution: 8.​ How do you use the Z-table to find probabilities?​

1.​ What is the normal distribution?​ ○​ Find the Z-score in the Z-table and use the
corresponding area under the curve to find the
○​ A continuous probability distribution that is symmetric probability.
and bell-shaped, with mean (μ) and standard 9.​ What is the probability for a Z-score of 0 in the standard
deviation (σ). normal distribution?​
2.​ What does a Z-score represent?​
○​ The probability for a Z-score of 0 is 0.5, as it is the
○​ A Z-score represents how many standard deviations center of the curve.
a value (X) is from the mean (μ). 10.​How do you find the probability between two Z-scores?​
3.​ What is the formula for a Z-score?​
○​ Calculate the cumulative area for each Z-score from
○​ Z=X−μσZ = \frac{X - \mu}{\sigma} the Z-table and subtract the smaller area from the
4.​ What is the significance of the area under the normal larger one.
curve?​ 11.​What is the probability of a Z-score greater than 1.64?​

○​ The area under the curve represents probability, and ○​ Use the Z-table to find the area for Z = 1.64 and
for the entire normal distribution, the area equals 1. subtract it from 1 to find the probability to the right.
5.​ How do you convert a raw score to a Z-score?​ 12.​What does it mean if the Z-score is negative?​

○​ Subtract the mean (μ) from the raw score (X) and ○​ A negative Z-score means the value is below the
divide by the standard deviation (σ). mean.
6.​ What is the standard normal distribution?​ 13.​What does it mean if the Z-score is positive?​

○​ It is a normal distribution with a mean of 0 and a ○​ A positive Z-score means the value is above the
standard deviation of 1. mean.
7.​ What does the Z-table show?​ 14.​How do you calculate the probability for values less
than a given score in a normal distribution?​
○​ The Z-table shows the cumulative probability for
each Z-score in the standard normal distribution.
○​ Use the Z-table to find the area corresponding to the 20.​What does a Z-score of -2 represent?​
Z-score for the given value.
15.​What is the probability of a value being within 1 ○​ A Z-score of -2 represents a value that is 2 standard
standard deviation of the mean in a normal distribution?​ deviations below the mean.
21.​How do you calculate the probability that a random
○​ Approximately 68% of the values lie within 1 variable is between two values in a normal distribution?​
standard deviation of the mean.
16.​What is the probability of a value being within 2 ○​ Find the Z-scores for both values, and use the
standard deviations of the mean in a normal Z-table to find the cumulative probabilities and
distribution?​ subtract them.
22.​What is the area to the left of Z = 1.96?​
○​ Approximately 95% of the values lie within 2
standard deviations of the mean. ○​ The area to the left of Z = 1.96 is approximately
17.​What is the probability of a value being within 3 0.9750.
standard deviations of the mean in a normal 23.​How do you calculate the area under the normal curve
distribution?​ for values greater than 2?​

○​ Approximately 99.7% of the values lie within 3 ○​ Find the Z-score for 2, use the Z-table to find the
standard deviations of the mean. area, and subtract it from 1.
18.​How do you calculate the probability for a value greater 24.​What is the probability for a Z-score between -1 and 1?​
than a given value in a normal distribution?​
○​ The probability between Z = -1 and Z = 1 is
○​ Use the Z-score formula to standardize the value approximately 0.6826.
and find the area to the right of the Z-score using the 25.​How do you find the probability for a Z-score greater
Z-table. than -1.5?​
19.​How do you find the value corresponding to a given
probability in a normal distribution?​ ○​ Find the cumulative probability for Z = -1.5 and
subtract it from 1.
○​ Use the Z-table to find the Z-score that corresponds 26.​What is the probability of a value falling between Z = -2
to the cumulative probability and then reverse the and Z = 2?​
Z-score formula to find the raw score.
○​ The probability is approximately 0.9544. C) The total area under the curve​
27.​What does the area under the normal curve represent?​ D) The mean of the distribution​

○​ It represents the probability of a certain outcome Answer: B) The number of standard deviations a value is
occurring within a given range. from the mean​
28.​How do you find the area under the normal curve for a
value less than 0?​ 2.​ In a normal distribution, what percentage of the data
lies within 1 standard deviation of the mean? A) 68%​
○​ Use the Z-table to find the cumulative probability for B) 95%​
Z = 0, which is 0.5. C) 99.7%​
29.​How can the Z-score be used in real-world applications?​ D) 50%​

○​ The Z-score helps in comparing scores from different Answer: A) 68%​
distributions and finding probabilities in normal
distributions. 3.​ What is the mean of the standard normal distribution?
30.​What does it mean if the area under the normal curve is A) 1​
0.95?​ B) 0​
C) μ​
○​ This means that 95% of the values lie within this D) σ​
range under the normal distribution. ​
Answer: B) 0​

4.​ What is the standard deviation of the standard normal


10 Multiple-Choice Questions (MCQs) on Normal distribution? A) 1​
Distribution: B) 0​
C) μ​
1.​ What does the Z-score represent in the context of a D) σ​
normal distribution? A) The probability of an event ​
occurring​ Answer: A) 1​
B) The number of standard deviations a value is from the
mean​
5.​ Which of the following is true about the area under the 9.​ What is the probability of a value being greater than a
standard normal curve? A) It is always 0​ Z-score of 2? A) 0.025​
B) It is 0.5​ B) 0.975​
C) It is 1​ C) 0.5​
D) It can vary depending on the distribution​ D) 0.05​
​ ​
Answer: C) It is 1​ Answer: A) 0.025​

6.​ How do you calculate a Z-score? A) Z = X - μ​ 10.​Which of the following is the correct area under the
B) Z = (X - μ) / σ​ curve for Z = -1.96? A) 0.025​
C) Z = (σ - X) / μ​ B) 0.5​
D) Z = μ / σ​ C) 0.975​
​ D) 0.025​
Answer: B) Z = (X - μ) / σ​ ​
Answer: A) 0.025​
7.​ What does a negative Z-score indicate? A) The value is
above the mean​
B) The value is below the mean​
C) The value is at the mean​
D) The value is impossible​

Answer: B) The value is below the mean​

8.​ What is the cumulative probability for Z = 1.96 in the


standard normal distribution? A) 0.9750​
B) 0.9500​
C) 0.5​
D) 0.0250​

Answer: A) 0.9750​
Confidence Intervals ●​ Known Standard Deviation (σ):
○​ When the population standard deviation is known,
1. Estimation use the z-distribution to compute the CI for the
population mean (μ).
●​ Definition: Estimation involves assigning values to a ○​ Formula: μ=xˉ±z×σn\mu = \bar{x} \pm z \times
population parameter based on sample statistics. It can be \frac{\sigma}{\sqrt{n}}
done through: ●​ Unknown Standard Deviation (σ):
○​ Point Estimate: A single value from the sample ○​ When the population standard deviation is unknown,
used to estimate a population parameter. use the t-distribution to compute the CI.
○​ Interval Estimate: A range of values (confidence ○​ Formula: μ=xˉ±t×sn\mu = \bar{x} \pm t \times
interval) likely to contain the population parameter. \frac{s}{\sqrt{n}} Where tt is the value from the
t-distribution, and ss is the sample standard
2. Confidence Interval (CI) deviation.
●​ Confidence Interval (CI): A range around the point 4. Estimating Population Proportion (p)
estimate that gives us a confidence level about where the
population parameter lies. The confidence level is often ●​ Formula: p=p^±z×p^(1−p^)np = \hat{p} \pm z \times
95%, meaning we are 95% confident that the true population \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} Where p^\hat{p} is the
parameter is within the interval. sample proportion.
○​ Formula: Point Estimate±Margin of Error\text{Point
Estimate} \pm \text{Margin of Error} 5. Example Applications
○​ Margin of Error (ME): This is how much you add or
subtract from the point estimate, calculated using: ●​ In practice, you apply these formulas to estimate parameters
ME=z×(σn)ME = z \times \left( \frac{\sigma}{\sqrt{n}} such as the average price of textbooks, average insurance
\right) Where: premium, or proportions in surveys.
■​ zz is the z-value from the normal distribution
based on confidence level.
■​ σ\sigma is the population standard deviation.
■​ nn is the sample size.

3. Estimating Population Mean (μ)


10 Questions and Answers 6.​ What is the formula for a confidence interval for a
population mean when the standard deviation is
1.​ What is a point estimate?​ unknown?​

○​ A point estimate is a single value calculated from the ○​ The formula is: μ=xˉ±t×sn\mu = \bar{x} \pm t \times
sample data that is used to estimate a population \frac{s}{\sqrt{n}}
parameter. 7.​ How do you calculate the confidence interval for a
2.​ What is the difference between point estimate and population proportion?​
interval estimate?​
○​ The formula is: p=p^±z×p^(1−p^)np = \hat{p} \pm z
○​ A point estimate provides a single value, while an \times \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}
interval estimate provides a range within which the 8.​ What is the t-distribution used for?​
population parameter is likely to fall.
3.​ How do you calculate the margin of error for a ○​ The t-distribution is used when the population
confidence interval for the mean?​ standard deviation is unknown and the sample size
is small (n < 30).
○​ The margin of error is calculated as: ME=z×σnME = 9.​ How do you calculate the confidence interval for the
z \times \frac{\sigma}{\sqrt{n}} mean when sample size is large?​
4.​ What does the confidence level represent in a
confidence interval?​ ○​ Use the z-distribution if the sample size is large and
the population standard deviation is known.
○​ The confidence level indicates the probability that the 10.​How do you interpret a 95% confidence interval for the
confidence interval contains the true population population mean?​
parameter.
5.​ What is the formula for a confidence interval for a ○​ If you take many samples and construct 95%
population mean when the standard deviation is confidence intervals for each, about 95% of these
known?​ intervals will contain the true population mean.

○​ The formula is: μ=xˉ±z×σn\mu = \bar{x} \pm z \times


\frac{\sigma}{\sqrt{n}}
10 Multiple Choice Questions (MCQs) 7.​ What is the point estimate for the population mean
based on a sample of size 30 with a sample mean of 50?​
1.​ What does a 95% confidence interval mean?​
○​ A) 50
○​ A) 95% of the population is within the interval. ○​ B) 30
○​ B) 95% of the time, the interval will contain the ○​ C) 0
population parameter. ○​ D) Not enough information
○​ C) 95% of the sample data is within the interval. 8.​ Answer: A​
○​ D) None of the above.
2.​ Answer: B​ 9.​ What does a confidence interval with a 99% confidence
level mean?​
3.​ Which formula is used to calculate the margin of error
for the mean when the population standard deviation is ○​ A) We are 99% sure the sample mean is correct.
known?​ ○​ B) 99% of the sample data lies within the interval.
○​ C) We are 99% confident the population parameter
○​ A) ME=z×snME = z \times \frac{s}{\sqrt{n}} lies within the interval.
○​ B) ME=z×σnME = z \times \frac{\sigma}{\sqrt{n}} ○​ D) None of the above.
○​ C) ME=t×snME = t \times \frac{s}{\sqrt{n}} 10.​Answer: C​
○​ D) ME=t×σnME = t \times \frac{\sigma}{\sqrt{n}}
4.​ Answer: B​ 11.​Which of the following does not affect the width of a
confidence interval?​
5.​ Which distribution is used for constructing confidence
intervals when the population standard deviation is ○​ A) Sample size
unknown?​ ○​ B) Confidence level
○​ C) Population mean
○​ A) Normal Distribution ○​ D) Sample standard deviation
○​ B) t-Distribution 12.​Answer: C​
○​ C) Binomial Distribution
○​ D) Poisson Distribution
6.​ Answer: B​
13.​How do you calculate the confidence interval for a ○​ B) 95% of all possible samples will give the same
population proportion?​ confidence interval.
○​ C) 95% of the confidence intervals from repeated
○​ A) Use the normal distribution sampling will contain the population parameter.
○​ B) Use the t-distribution ○​ D) None of the above.
○​ C) Use the binomial distribution 20.​Answer: C​
○​ D) Use the chi-square distribution
14.​Answer: A​

15.​If the sample size increases, the confidence interval for


the population mean becomes:​

○​ A) Wider
○​ B) Narrower
○​ C) The same
○​ D) Unpredictable
16.​Answer: B​

17.​Which of the following is required to construct a


confidence interval for the population mean?​

○​ A) Sample mean
○​ B) Sample standard deviation
○​ C) Sample size
○​ D) All of the above
18.​Answer: D​

19.​What does a 95% confidence interval imply about


repeated sampling?​

○​ A) 95% of the sample data will fall within the interval.


Normal Distribution 1.​ Analytical Methods:​

1.​ Definition: A normal distribution is a probability distribution ○​ Kolmogorov-Smirnov Test: For large sample sizes
that forms a bell-shaped curve when plotted.​ (>100).
○​ Shapiro-Wilk Test: For smaller sample sizes (<100).
○​ Key Characteristics: ○​ p-value:
■​ The total area under the curve is 1 (or 100%). ■​ If p > 0.05, data is normally distributed.
■​ The distribution is symmetric about the mean. ■​ If p < 0.05, data is not normally distributed.
■​ The two tails extend indefinitely. 2.​ Descriptive Methods:​
○​ Mean (μ) and Standard Deviation (σ) are key
parameters. ○​ Skewness: Measures asymmetry of data.
○​ Standard Normal Distribution is when the mean is ■​ Positive skewness means the right tail is
0 and the standard deviation is 1. longer.
2.​ Data Presentation & Statistical Tests:​ ■​ Negative skewness means the left tail is
longer.
○​ For normally distributed data, use mean and ○​ Kurtosis: Measures the "peakedness" of the
standard deviation. distribution.
○​ For non-normal data, use median and ■​ Positive kurtosis indicates a sharper peak,
minimum-maximum as measures of central while negative kurtosis suggests a flatter
tendency. distribution.
○​ Hypothesis Testing: ○​ Coefficient of Variation (CV): Standard deviation
■​ If data is normally distributed, use divided by the mean.
parametric tests. ■​ If CV < 30%, the data is considered normal.
■​ If not, use non-parametric tests. 3.​ Visual Methods:​

○​ Histogram: Should resemble a bell curve for normal


data.
Testing Normality ○​ Box Plot: The median should be centered within the
box, indicating symmetry.
To test if data is normally distributed, several methods are used: ○​ Q-Q Plot: If the points fall along a straight line, the
data is normal.
Outliers: Outliers can be removed or dealt with 4.​ What test is used for large sample sizes to test
depending on their impact on the analysis. normality?​

○​ Kolmogorov-Smirnov Test.
5.​ How is skewness interpreted?​
Methods Used to Determine Normality:
○​ Positive skew indicates the right tail is longer;
●​ Box plot: Symmetry, with outliers being exceptions.
negative skew indicates the left tail is longer.
●​ Histogram: Symmetrical with no extreme peaks or flats.
6.​ What does a coefficient of variation less than 30%
●​ Skewness and Kurtosis: Should fall within acceptable
indicate?​
ranges.
●​ Kolmogorov-Smirnov/Shapiro-Wilk Tests: p-value greater
○​ The data is considered normal.
than 0.05 for normality.
7.​ What is the significance of a p-value greater than 0.05 in
normality testing?​

○​ The data is normally distributed.


Quiz Questions (Q&A)
8.​ What does a Q-Q plot indicate when the points fall in a
straight line?​
1.​ What does the normal distribution curve look like?​

○​ The data is normally distributed.


○​ A bell-shaped curve, symmetric about the mean.
9.​ What does the Shapiro-Wilk test assess?​
2.​ What is the purpose of hypothesis testing in relation to
normality?​
○​ It tests the normality of data, especially for smaller
sample sizes.
○​ To determine whether the data follows a normal
10.​Why are outliers important in normality testing?​
distribution.
3.​ What are parametric tests used for?​
○​ Outliers can distort the results of normality tests and
may need to be removed.
○​ Used when the data is normally distributed.
MCQs 5.​ Which of the following plots should resemble a bell shape
for normality?​
1.​ What is the shape of a normal distribution curve?​
○​ a) Box plot
○​ a) Square ○​ b) Stem-and-leaf plot
○​ b) Bell-shaped ○​ c) Histogram
○​ c) Triangular ○​ d) Q-Q plot
○​ d) Rectangular ○​ Answer: c) Histogram
○​ Answer: b) Bell-shaped 6.​ What does positive skewness indicate?​
2.​ Which test is used for small sample sizes to test normality?​
○​ a) The data is symmetric
○​ a) Kolmogorov-Smirnov ○​ b) The right tail is longer
○​ b) T-test ○​ c) The left tail is longer
○​ c) Shapiro-Wilk ○​ d) The data is flat
○​ d) ANOVA ○​ Answer: b) The right tail is longer
○​ Answer: c) Shapiro-Wilk 7.​ Which test is used for large sample sizes to test normality?​
3.​ What does a p-value of 0.07 indicate in normality testing?​
○​ a) Shapiro-Wilk
○​ a) Data is not normal ○​ b) Kolmogorov-Smirnov
○​ b) Data is normal ○​ c) Z-test
○​ c) Data has outliers ○​ d) ANOVA
○​ d) Data is skewed ○​ Answer: b) Kolmogorov-Smirnov
○​ Answer: b) Data is normal 8.​ What does a kurtosis greater than 0 indicate?​
4.​ What does the coefficient of variation (CV) measure?​
○​ a) Flatter distribution
○​ a) Skewness ○​ b) Symmetric distribution
○​ b) Standard deviation relative to the mean ○​ c) Peaked distribution
○​ c) Kurtosis ○​ d) Skewed distribution
○​ d) Data spread ○​ Answer: c) Peaked distribution
○​ Answer: b) Standard deviation relative to the mean
9.​ What would you use if your data does not follow a normal
distribution?​

○​ a) Parametric tests
○​ b) Non-parametric tests
○​ c) Z-scores
○​ d) Histogram
○​ Answer: b) Non-parametric tests
10.​Which plot can be used to check if data points follow a
straight line?​

●​ a) Histogram
●​ b) Q-Q plot
●​ c) Stem-and-leaf plot
●​ d) Box plot
●​ Answer: b) Q-Q plot
t-tests: ○​ The two groups should have normal distributions.
○​ Variances of the two groups should be either equal
1. One Sample t-test: or unequal.
●​ Test Statistic:
●​ Purpose: Used to determine if the mean of a sample differs ○​ If variances are assumed equal:
from a known population mean. t=(x1ˉ−x2ˉ)s12n1+s22n2t = \frac{(\bar{x_1} -
●​ Example: Testing if the average stress score of a group is \bar{x_2})}{\sqrt{\frac{s_1^2}{n_1} +
different from a known standard value (e.g., 84). \frac{s_2^2}{n_2}}}
●​ Steps: ○​ If variances are assumed unequal:
1.​ State the hypothesis: t=(x1ˉ−x2ˉ)s12/n1+s22/n2t = \frac{(\bar{x_1} -
■​ Null Hypothesis (H0): Mean = 84 \bar{x_2})}{\sqrt{s_1^2/n_1 + s_2^2/n_2}}
■​ Alternative Hypothesis (Ha): Mean ≠ 84
2.​ Calculate the t-value: t=xˉ−μsnt = \frac{\bar{x} - 3. Paired t-test:
\mu}{\frac{s}{\sqrt{n}}}
■​ Where xˉ\bar{x} = sample mean, μ\mu = ●​ Purpose: Compares the means of two related groups (e.g.,
population mean, ss = sample standard measuring the effect of a treatment before and after the
deviation, and nn = sample size. intervention).
3.​ Use the t-table to find the critical t-value and ●​ Hypothesis:
compare it with the calculated t-value to decide ○​ Null Hypothesis (H0): The mean of the differences
whether to reject the null hypothesis. (d) is 0.
○​ Alternative Hypothesis (Ha): The mean of the
2. Independent t-test: differences (d) is not 0.
●​ Test Statistic: t=dˉsd/nt = \frac{\bar{d}}{s_d/\sqrt{n}}
●​ Purpose: Compares the means of two independent groups ○​ Where dˉ\bar{d} is the mean of the differences and
(e.g., comparing stress scores between males and females). sds_d is the standard deviation of the differences.
●​ Hypothesis:
○​ Null Hypothesis (H0): μ1=μ2\mu_1 = \mu_2 (The
means are equal).
○​ Alternative Hypothesis (Ha): μ1≠μ2\mu_1 \neq
\mu_2 (The means are different).
●​ Assumptions:
1.​ What is a one-sample t-test used for?​ ○​ It is used to determine the rejection region for the
null hypothesis.
○​ It is used to compare the mean of a sample with a 8.​ What does the t-table provide?​
known population mean.
2.​ What is the null hypothesis in a one-sample t-test?​ ○​ It provides the critical t-values for different degrees of
freedom and significance levels.
○​ The null hypothesis states that the sample mean is 9.​ What happens if the calculated t-value exceeds the
equal to the population mean. critical t-value?​
3.​ In an independent t-test, what does a p-value less than
0.05 indicate?​ ○​ If the calculated t-value exceeds the critical t-value,
the null hypothesis is rejected.
○​ It indicates strong evidence to reject the null 10.​What does the paired t-test assume about the data?​
hypothesis, meaning the means of the two groups
are significantly different. ○​ It assumes that the data consists of paired
4.​ What is the assumption for an independent t-test observations (e.g., before and after measurements
regarding variances?​ on the same individuals).

○​ The assumption is that the variances of the two


groups can be either equal or unequal.
5.​ What does the paired t-test compare?​ 1.​ What is the main purpose of a one-sample t-test?​

○​ It compares the means of two related groups (e.g., ○​ A) To compare two sample means
before and after treatment). ○​ B) To compare a sample mean to a known
6.​ How is the test statistic for an independent t-test population mean
calculated?​ ○​ C) To test the variance between two samples
○​ D) To test the differences within a sample
○​ The test statistic is calculated as the difference in ○​ Answer: B
means divided by the standard error of the difference 2.​ In an independent t-test, what do the null and alternative
in means. hypotheses represent?​
7.​ What is the purpose of the critical t-value?​
○​ A) The sample mean equals the population mean
○​ B) The two sample means are equal or different ○​ B) The average value of all test statistics
○​ C) The variance between the samples is equal ○​ C) The maximum sample size for the test
○​ D) The sample mean is less than the population ○​ D) The value used to calculate the degrees of
mean freedom
○​ Answer: B ○​ Answer: A
3.​ Which of the following is a requirement for conducting 7.​ For a two-tailed test with α=0.05\alpha = 0.05, what are
an independent t-test?​ the rejection areas?​

○​ A) The samples must be dependent ○​ A) -1.96 and +1.96


○​ B) The samples must be normally distributed ○​ B) 0 and 1.96
○​ C) The variances must be equal ○​ C) 0 and -1.96
○​ D) The sample size must be greater than 50 ○​ D) -2.00 and +2.00
○​ Answer: B ○​ Answer: A
4.​ What is assumed about the variances in a paired t-test?​ 8.​ Which test is used to compare the means of two related
groups?​
○​ A) The variances must be equal
○​ B) The variances must be unequal ○​ A) One-sample t-test
○​ C) The variances do not need to be considered ○​ B) Independent t-test
○​ D) The variances must be normal ○​ C) Paired t-test
○​ Answer: C ○​ D) Z-test
5.​ What does a p-value greater than 0.05 typically ○​ Answer: C
indicate?​ 9.​ What does the t-table help determine?​

○​ A) Reject the null hypothesis ○​ A) The mean of the sample


○​ B) Accept the null hypothesis ○​ B) The critical t-value for a given degree of freedom
○​ C) Strong evidence to reject the null hypothesis ○​ C) The standard deviation of the sample
○​ D) The sample is too large for testing ○​ D) The p-value for the hypothesis test
○​ Answer: B ○​ Answer: B
6.​ What is the critical value in hypothesis testing?​ 10.​In an independent t-test, if the calculated t-value is
greater than the critical t-value, what is the decision?​
○​ A) The value at which the null hypothesis is rejected
●​ A) Accept the null hypothesis
●​ B) Reject the null hypothesis
●​ C) Do not conduct the test
●​ D) Increase the sample size
●​ Answer: B
ANOVA: 1.​ State Hypothesis (Ho and Ha).
2.​ Select Significance Level (e.g., α = 0.05).
●​ ANOVA (Analysis of Variance) is a statistical method used 3.​ Calculate the F-statistic.
to compare the means of three or more groups. 4.​ Find F-critical from the F-distribution table.
●​ It is an extension of the two-sample t-test. 5.​ Decision:
●​ One-way ANOVA is used when you have one factor with ○​ If the calculated F-statistic (Fcalc) > F-critical
multiple levels (groups) and want to test if there’s a (Fcritical), reject Ho (there is a significant
significant difference between the means of those groups. difference).
○​ If Fcalc < Fcritical, fail to reject Ho (there is no
2. Hypothesis Testing: significant difference).
●​ Null Hypothesis (Ho): All group means are equal (e.g., μ1 6. Post-hoc Tests:
= μ2 = μ3 = ...).
●​ Alternative Hypothesis (Ha): At least one group mean is ●​ After rejecting Ho, post-hoc tests compare which specific
different. groups differ. Examples include:
○​ LSD, Tukey, Bonferroni, etc.
3. Assumptions: ●​ Post-hoc tests help control the Type I error (false positive).
●​ The population is normally distributed. 7. Example:
●​ Samples are randomly selected and independent.
●​ Variance is similar across groups. A study on blood pressure readings compares three methods
(Machine, Standard Curve, and Difference between Machine and
4. ANOVA Formula: Standard Curve). The hypothesis is tested to see if the mean blood
pressure readings differ across methods.
●​ F-statistic: It compares the variance between groups to the
variance within groups.
○​ F=Variance Between GroupsVariance Within
GroupsF = \frac{\text{Variance Between 1. What is the main purpose of ANOVA?
Groups}}{\text{Variance Within Groups}}
●​ To determine if there is a significant difference between the
5. ANOVA Steps: means of three or more groups.
2. What is the null hypothesis in ANOVA? 8. What are some common post-hoc tests?

●​ The null hypothesis (Ho) states that all group means are ●​ Tukey, Bonferroni, and LSD (Least Significant Difference).
equal.
9. What is a Type I error?
3. What assumption is made regarding the population in
ANOVA? ●​ A Type I error occurs when the null hypothesis is rejected
when it is actually true.
●​ The population is assumed to be normally distributed.
10. What does it mean if the F-statistic is less than the
4. What is the F-statistic in ANOVA used for? F-critical value?

●​ It compares the variance between groups to the variance ●​ You fail to reject the null hypothesis, indicating no significant
within groups to determine if group means are significantly difference between the groups.
different.

5. What is a post-hoc test?


1.​ What does ANOVA test for?​
●​ A post-hoc test is used after ANOVA to identify which
specific groups differ when the null hypothesis is rejected. ○​ A) Difference in individual data points
○​ B) Whether at least one group mean is different
6. What happens if the calculated F-statistic is greater than the ○​ C) The standard deviation of groups
critical F-value? ○​ D) Variance within a single group​
Answer: B
●​ You reject the null hypothesis and conclude that at least one 2.​ What assumption is required for ANOVA?​
group mean is different.
○​ A) Populations are skewed
7. In a one-way ANOVA, what is considered the "factor"
○​ B) Samples are dependent
variable?
○​ C) The population is normally distributed
●​ The factor variable is the categorical variable used to define ○​ D) All groups have different variances​
the groups being compared. Answer: C
3.​ In ANOVA, the F-statistic is calculated by:​ 7.​ Which of the following is a requirement for ANOVA to
work correctly?​
○​ A) Dividing the variance between groups by the
variance within groups ○​A) Equal sample sizes across all groups
○​ B) Dividing the sum of squares by the number of ○​B) Normally distributed population
groups ○​C) Independent samples
○​ C) Adding the variance between and within groups ○​D) Both B and C​
○​ D) Subtracting the means of groups​ Answer: D
Answer: A 8.​ Which post-hoc test is used to control for Type I error?​
4.​ What is the alternative hypothesis in ANOVA?​
○​ A) Tukey's Test
○​ A) All group means are equal ○​ B) LSD
○​ B) At least one group mean is different ○​ C) Bonferroni Test
○​ C) The sample sizes are equal ○​ D) Gabriel Test​
○​ D) There is no variance within groups​ Answer: C
Answer: B 9.​ In one-way ANOVA, what is compared between the
5.​ What is a common post-hoc test used after ANOVA?​ groups?​

○​ A) T-test ○​A) Median values


○​ B) Regression Analysis ○​B) Variance
○​ C) Tukey's Test ○​C) Means of the groups
○​ D) Chi-square Test​ ○​D) Range of the data​
Answer: C Answer: C
6.​ What happens if the calculated F-statistic is smaller 10.​Which of the following is NOT a type of post-hoc test?​
than the F-critical value?​
●​ A) Tukey's HSD
○​ A) Reject the null hypothesis ●​ B) Bonferroni Correction
○​ B) Fail to reject the null hypothesis ●​ C) Linear Regression
○​ C) Accept the alternative hypothesis ●​ D) LSD​
○​ D) None of the above​ Answer: C
Answer: B
Simple Linear Regression ●​ Y-Intercept (A): The predicted value of y when x = 0.
●​ Error Term (ε): Accounts for random variations that the
1. Simple Linear Regression: model doesn’t explain.
●​ Coefficient of Determination (r²): Measures how well the
●​ A statistical method used to model the relationship between independent variable explains the variance in the dependent
two variables: one independent variable (x) and one variable. A higher r² means the model is a better fit.
dependent variable (y).
●​ The goal is to predict or estimate the value of the dependent 4. Scatter Plot and Regression Line:
variable based on the independent variable.
●​ Example: Estimating the amount of food expenditure based ●​ A scatter plot helps visualize the relationship between
on income levels. variables.
●​ The regression line is the line of best fit that minimizes
2. Key Concepts: errors (using the least squares method) to predict the
dependent variable.
●​ Simple Regression: Involves only two variables—one
independent (x) and one dependent (y). 5. Interpretation of Results:
●​ Linear Regression: A type of regression where the
relationship between variables is assumed to be linear (a ●​ Interpretation of A: If x = 0, then y = A. For example, in a
straight line). food expenditure and income study, if income is zero, the
●​ Equation: food expenditure might still be a fixed value (A).
○​ The regression line is expressed as y = A + Bx + ε ●​ Interpretation of B: A unit change in x results in a change
■​ A: Y-intercept (value of y when x = 0) of B units in y. For instance, an increase of 1 gram of sugar
■​ B: Slope (rate of change in y for a unit in a cereal increases its calorie count by B units.
change in x)
■​ ε: Error term (random error or noise in the 6. Assumptions in Linear Regression:
model)
●​ Linearity: The relationship between x and y is linear.
3. Important Terms: ●​ Homoscedasticity: The variance of errors is constant
across all values of x.
●​ Slope (B): Shows the rate of change in the dependent ●​ Independence of observations: The data points are
variable (y) due to a one-unit change in the independent independent.
variable (x). ●​ Normality: The errors should follow a normal distribution.
7. Hypothesis Testing: ○​ The y-intercept (A) is the value of the dependent
variable (y) when the independent variable (x) is
●​ Testing the significance of the slope (B) to determine if it is zero.
significantly different from zero: 4.​ What is the coefficient of determination (r²)?​
○​ Null Hypothesis (H₀): B = 0 (No linear relationship).
○​ Alternative Hypothesis (H₁): B ≠ 0 (There is a linear ○​ The coefficient of determination (r²) measures how
relationship). well the independent variable explains the variation
in the dependent variable. It ranges from 0 to 1.
8. Example in SPSS: 5.​ What does a scatter plot show in simple linear
regression?​
●​ SPSS can be used to run simple linear regression, and the
output gives the coefficients (A and B), R² value, and
○​ A scatter plot shows the relationship between the
p-value to determine if the model is significant.
independent variable (x) and the dependent variable
(y) by plotting their data points.
6.​ What is homoscedasticity in the context of regression
1.​ What is simple linear regression?​ analysis?​

○​ Simple linear regression is a method to model the ○​ Homoscedasticity refers to the assumption that the
relationship between two variables, where one is variance of the errors is constant across all levels of
independent and the other is dependent. the independent variable.
2.​ What does the slope (B) represent in a regression 7.​ How can we test if the slope is significantly different
model?​ from zero?​

○​ The slope (B) represents the change in the ○​ We can use hypothesis testing:
dependent variable (y) for a one-unit change in the ■​ Null Hypothesis (H₀): B = 0
independent variable (x). ■​ Alternative Hypothesis (H₁): B ≠ 0
3.​ What is the purpose of the y-intercept (A) in the 8.​ Why is the least squares method used in linear
regression equation?​ regression?​
○​ The least squares method minimizes the sum of the 3.​ What is the function of the y-intercept (A) in the
squared differences between the observed values regression equation?​
and the predicted values, resulting in the best fit line.
9.​ What is the meaning of a p-value less than 0.05 in ○​ A) To predict future values
regression analysis?​ ○​ B) To indicate the error term
○​ C) The predicted value when x = 0
○​ A p-value less than 0.05 indicates that the ○​ D) To calculate the slope
relationship between the variables is statistically 4.​ Answer: C​
significant, meaning the slope is likely not zero.
10.​What is the significance of a high r² value?​ 5.​ Which of the following is NOT an assumption of simple
linear regression?​
○​ A high r² value indicates that the independent
variable explains a large portion of the variance in ○​A) The relationship between x and y is linear
the dependent variable, meaning the model fits the ○​B) The errors are normally distributed
data well. ○​C) There is a homoscedastic relationship
○​D) The independent and dependent variables are
categorical
6.​ Answer: D​
1.​ What does simple linear regression predict?​
7.​ What does the coefficient of determination (r²) indicate?​
○​ A) The relationship between multiple independent
variables and a dependent variable ○​ A) The exact predicted value of y
○​ B) The relationship between two variables (one ○​ B) The percentage of the variance in y explained by
independent and one dependent) x
○​ C) The correlation between two dependent variables ○​ C) The slope of the regression line
○​ D) The relationship between a dependent variable ○​ D) The error of prediction
and a random variable 8.​ Answer: B​
2.​ Answer: B​
9.​ What does a p-value less than 0.05 indicate in
hypothesis testing?​
○​ A) The regression line is not useful ○​ C) Test the significance of the slope
○​ B) The null hypothesis is likely true ○​ D) Write the regression equation
○​ C) The regression model is statistically significant 16.​Answer: B​
○​ D) There is no linear relationship between x and y
10.​Answer: C​ 17.​Which of the following indicates a strong positive linear
relationship?​
11.​Which of the following best describes the slope (B) in
the equation y = A + Bx?​ ○​ A) r = -0.8
○​ B) r = 0.0
○​ A) The expected value of y when x = 0 ○​ C) r = 1.0
○​ B) The constant term of the regression line ○​ D) r = 0.5
○​ C) The amount by which y increases for a one-unit 18.​Answer: C​
increase in x
○​ D) The error term of the model 19.​What does homoscedasticity refer to in regression
12.​Answer: C​ analysis?​

13.​In regression analysis, what does the error term (ε) ●​ A) The relationship between x and y is linear
account for?​ ●​ B) The variance of errors is consistent across all levels of x
●​ C) The data follows a normal distribution
○​A) The predicted value of y ●​ D) There are no outliers in the data
○​B) The random variations and unaccounted factors
○​C) The slope of the regression line Answer: B
○​D) The variance explained by the independent
variable
14.​Answer: B​

15.​What is the first step in performing simple linear


regression?​

○​ A) Calculate the error term


○​ B) Draw a scatter plot of the data
Non-Parametric Tests: 3.​ Example of Data Handling:​

○​ These tests do not rely on any assumptions about ○​ Rank data when performing non-parametric tests like
the distribution of the population. They are useful Mann-Whitney and Kruskal-Wallis.
when the data is non-normal or when the data is ○​ In the case of ties, ranks are averaged.
ordinal. 4.​ Assumptions of Non-Parametric Tests:​
○​ Common non-parametric tests include:
■​ Mann-Whitney U Test (Wilcoxon ○​ Independence of samples.
Rank-Sum Test): Compares two ○​ Measurement scale at least ordinal (ranked data).
independent groups.
■​ Kruskal-Wallis H Test: Non-parametric
equivalent of one-way ANOVA, comparing
three or more independent groups. 1.​ What is a non-parametric test?​
■​ Wilcoxon Signed Rank Test: Used for
comparing two related samples, equivalent to ○​ Non-parametric tests are statistical tests that do not
the paired t-test. assume a specific distribution for the data. They are
■​ Spearman’s Rho: Measures the correlation useful when the data are ordinal or not normally
between two ranked variables. distributed.
■​ Friedman’s Test: Non-parametric equivalent 2.​ What is the Mann-Whitney U test used for?​
of repeated measures ANOVA.
2.​ Advantages and Disadvantages:​ ○​ The Mann-Whitney U test compares two
independent groups to determine if there is a
○​ Advantages: significant difference between them.
■​ Fewer assumptions and valid for non-normal 3.​ What is the non-parametric equivalent of a one-way
data. ANOVA?​
■​ Useful for ordinal data, ranked data, and
outliers. ○​ The Kruskal-Wallis H test is the non-parametric
○​ Disadvantages: equivalent of one-way ANOVA.
■​ May waste information by using ranks rather 4.​ What are the assumptions of the Wilcoxon Signed Rank
than actual values. Test?​
■​ Less powerful compared to parametric tests.
○​ The test assumes that the samples are paired ●​ A p-value of less than 0.05 typically indicates that there is a
(related), and the data should be at least ordinal. statistically significant difference between the groups being
5.​ How do you handle tied ranks in non-parametric tests?​ compared.

○​ When there are tied ranks, average the ranks of the


tied values.
6.​ What is the Friedman test used for?​ 1.​ Which of the following is a non-parametric test?​

○​ The Friedman test is used to compare three or more ○​ A) t-test


related groups and is the non-parametric equivalent ○​ B) Mann-Whitney U test
of repeated measures ANOVA. ○​ C) Paired t-test
7.​ What is Spearman's Rho used to measure?​ ○​ D) Pearson Correlation
○​ Answer: B) Mann-Whitney U test
○​ Spearman’s Rho measures the strength and 2.​ What is the non-parametric equivalent of one-sample
direction of the association between two ranked t-test?​
variables.
8.​ What is the advantage of non-parametric tests?​ ○​ A) Mann-Whitney U test
○​ B) Wilcoxon Signed Rank Test
○​ They are less reliant on assumptions about the data ○​ C) Kruskal-Wallis Test
distribution and can handle ordinal data or data with ○​ D) Friedman’s Test
outliers. ○​ Answer: B) Wilcoxon Signed Rank Test
9.​ What is the Kruskal-Wallis H test used for?​ 3.​ Which test is used to compare three or more
independent groups?​
○​ The Kruskal-Wallis H test is used to compare three
or more independent groups, especially when the ○​ A) Paired t-test
data are non-normally distributed. ○​ B) Kruskal-Wallis H test
10.​What does a p-value of less than 0.05 indicate in ○​ C) Spearman's Rho
non-parametric tests?​ ○​ D) Mann-Whitney U test
○​ Answer: B) Kruskal-Wallis H test
4.​ Which of the following tests is used for repeated ○​ Answer: B) 0.05
measures?​ 8.​ What type of data is required for the Mann-Whitney U
test?​
○​ A) Mann-Whitney U test
○​ B) Kruskal-Wallis H test ○​A) Ordinal or continuous with non-normal distribution
○​ C) Friedman’s Test ○​B) Nominal
○​ D) Wilcoxon Signed Rank Test ○​C) Only nominal
○​ Answer: C) Friedman’s Test ○​D) Only interval data
5.​ Which assumption is common for non-parametric ○​Answer: A) Ordinal or continuous with non-normal
tests?​ distribution
9.​ Which non-parametric test is used for paired samples?​
○​ A) Normal distribution of the data
○​ B) Data are at least ordinal ○​ A) Kruskal-Wallis H test
○​ C) Homogeneity of variance ○​ B) Wilcoxon Signed Rank Test
○​ D) Parametric assumptions of normality ○​ C) Friedman’s Test
○​ Answer: B) Data are at least ordinal ○​ D) Spearman’s Rho
6.​ What does Spearman’s Rho measure?​ ○​ Answer: B) Wilcoxon Signed Rank Test
10.​What is the advantage of using non-parametric tests
○​ A) The mean difference between groups over parametric tests?​
○​ B) The linear relationship between two variables
○​ C) The correlation between two ranked variables ●​ A) They are more powerful
○​ D) The variance within a group ●​ B) They don’t require assumptions about data distribution
○​ Answer: C) The correlation between two ranked ●​ C) They are easier to calculate
variables ●​ D) They require larger sample sizes
7.​ What is the p-value threshold for significance in most ●​ Answer: B) They don’t require assumptions about data
statistical tests?​ distribution

○​ A) 0.01
○​ B) 0.05
○​ C) 0.10
○​ D) 0.50
Notes on Chi-Square Test ○​ EE = expected frequency

1. Definition of Chi-Square Test 5. Hypothesis Testing Steps

●​ A Chi-Square test is a non-parametric statistical method 1.​ State hypotheses:


used to examine how well observed frequencies match ○​ Null Hypothesis (H₀): No association or difference.
expected frequencies. ○​ Alternative Hypothesis (H₁): There is an
●​ It is used for: association or difference.
○​ Goodness-of-Fit Test: Tests if an observed 2.​ Calculate the test statistic.
distribution fits an expected distribution. 3.​ Compare the test statistic to the critical value.
○​ Test of Independence/Association: Checks if two 4.​ Make a decision: Reject or fail to reject the null hypothesis.
categorical variables are independent or associated. 5.​ Report results (e.g., in APA format).

2. Assumptions 6. Goodness-of-Fit Test

●​ Variables should be categorical. ●​ Compares observed frequencies to a hypothesized


●​ Random sampling of data. distribution.
●​ Independent observations (one participant should not ●​ Example: Do the choices of post-graduation plans differ
affect another's participation). from expected?
●​ All cells must have at least 5 expected observations.
7. Test of Independence
3. Contingency Tables
●​ Determines if two categorical variables are independent or
●​ These tables display frequencies for different categories of related.
variables (rows for independent variables, columns for ●​ Example: Are gender and owning a cell phone related?
dependent variables).
●​ Used to test the relationship between the variables. 8. Decision Rule:

4. Formula for Chi-Square ●​ Compare the calculated chi-square statistic to the critical
value (from chi-square tables).
●​ χ2=∑(O−E)2E\chi^2 = \sum \frac{(O - E)^2}{E}, where: ●​ If χ2\chi^2 > critical value, reject the null hypothesis.
○​ OO = observed frequency
9. Reporting Chi-Square Results answers whether certain categories are more
popular than expected.
●​ Example: χ² (df, N = sample size) = calculated value, p = 5.​ How do you report Chi-Square test results in APA
p-value. format?​
●​ Report should include degrees of freedom, test statistic
value, and p-value. ○​ Answer: Report results as
χ2(df,N=samplesize)=calculatedvalue,p=p−value\chi
1. Short Answer Questions ^2 (df, N = sample size) = calculated value, p =
p-value, for example, "χ²(4, N = 500) = 286.00, p <
1.​ What is the Chi-Square test used for?​
.001".
6.​ Which of the following is NOT an assumption for the
○​ Answer: It is used to determine if there is a
Chi-Square test?​
significant association between two categorical
variables or if an observed distribution differs from an
○​ A) Variables must be continuous
expected distribution.
○​ B) Observations should be independent
2.​ What are the assumptions of the Chi-Square test?​
○​ C) Expected frequencies should be at least 5
○​ D) Data should be randomly sampled
○​ Answer: Variables should be categorical, data must
○​ Answer: A) Variables must be continuous
be randomly sampled, observations should be
7.​ What is the main purpose of a Chi-Square test of
independent, and all expected cell frequencies must
independence?​
be at least 5.
3.​ Explain the formula used in the Chi-Square test.​
○​ A) To test the goodness-of-fit
○​ B) To determine if there is a relationship between two
○​ Answer: The formula is χ2=∑(O−E)2E\chi^2 = \sum
categorical variables
\frac{(O - E)^2}{E}, where O is the observed
○​ C) To compare the means of different groups
frequency and E is the expected frequency.
○​ D) To check if observed frequencies differ from
4.​ What is the purpose of the goodness-of-fit test?​
expected frequencies
○​ Answer: B) To determine if there is a relationship
○​ Answer: The goodness-of-fit test checks if observed
between two categorical variables
frequencies match an expected distribution. It
8.​ What does a high chi-square statistic indicate?​
○​ A) A high degree of independence
○​ B) A significant difference between observed and
expected frequencies
○​ C) No difference between observed and expected
frequencies
○​ D) More samples are needed
○​ Answer: B) A significant difference between
observed and expected frequencies
9.​ In which of the following cases would you use Fisher’s
Exact Test instead of the Chi-Square test?​

○​ A) Large sample sizes with sufficient expected


frequencies
○​ B) Small sample sizes or expected frequencies less
than 5 in any cell
○​ C) Continuous data
○​ D) Normal distribution data
○​ Answer: B) Small sample sizes or expected
frequencies less than 5 in any cell
10.​If the calculated chi-square statistic is greater than the
critical value, you should:​

●​ A) Accept the null hypothesis


●​ B) Fail to reject the null hypothesis
●​ C) Reject the null hypothesis
●​ D) Calculate the p-value
●​ Answer: C) Reject the null hypothesis

You might also like