[go: up one dir, main page]

0% found this document useful (0 votes)
43 views63 pages

Unit 5

The document provides information on business research methods and data analysis. It discusses topics like data collection methods, editing and coding data, classification and tabulation. Some key points include: - Data analysis involves describing, condensing, and evaluating data to draw conclusions. It requires collecting a substantial amount of data through various methods like surveys, interviews, experiments etc. - The stages of data analysis include editing, coding, classification and tabulation. Editing involves reviewing data to detect and correct errors. Coding assigns symbols or numbers to survey responses. - Classification organizes data into groups or intervals. Tabulation summarizes raw data into tables for easy analysis and presentation. Frequency distributions arrange data by value and corresponding frequency

Uploaded by

vedantshukla082
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views63 pages

Unit 5

The document provides information on business research methods and data analysis. It discusses topics like data collection methods, editing and coding data, classification and tabulation. Some key points include: - Data analysis involves describing, condensing, and evaluating data to draw conclusions. It requires collecting a substantial amount of data through various methods like surveys, interviews, experiments etc. - The stages of data analysis include editing, coding, classification and tabulation. Editing involves reviewing data to detect and correct errors. Coding assigns symbols or numbers to survey responses. - Classification organizes data into groups or intervals. Tabulation summarizes raw data into tables for easy analysis and presentation. Frequency distributions arrange data by value and corresponding frequency

Uploaded by

vedantshukla082
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

MBA (Year 1 Semester 2)

By: Nilakshi Goel


Business Research Methods
▪ Data Analysis is the process of systematically applying statistical
and/or logical techniques to describe and illustrate, condense and
recap, and evaluate data
▪ A credible amount of data must be collected to allow for a substantial
analysis
▪ Information collected must now be described, analysed and
interpreted
▪ The data may be collected by survey, interviews, literature review,
participant observation, experiments, simulation studies etc.
▪ The collected data is processed and analysed to come to some
conclusion or to verify the hypothesis made.
▪ Having the necessary skills to analyse
▪ Concurrently selecting data collection methods and appropriate analysis
▪ Drawing unbiased inference
▪ Inappropriate subgroup analysis
▪ Determining statistical significance
▪ Lack of clearly defined and objective outcome measurements
▪ Providing honest and accurate analysis
▪ Manner of presenting data
▪ Data recording method
▪ Partitioning ‘text’ when analysing qualitative data
▪ Training of staff conducting analyses
▪ Reliability and validity
The analysis of data requires a number of closely related operations such as
establishment of categories. This stage mainly include:
▪ Editing
▪ Coding
▪ Tabulation
Editing Coding Classification Tabulation

• Process of • Refers to the • Data into • Process of


examining the process of homogeneous summarizing raw
collected raw data assigning groups. Two data and
to detect errors numerals or Ways: displaying same in
and omissions and other symbols to • Classificati compact form is
to correct these answers so that on on known as
when possible responses can be attributes tabulation
• Basically scrutiny put into limited • Classificati
of the completed number of on on class
questionnaires categories or intervals
classes
▪ Data editing is defined as the process involving the review and adjustment of
collected survey data. The purpose is to control the quality of the collected data
▪ The basic purpose served by data editing is that it improves the quality, accuracy
and adequacy of the collected data thereby making it more suitable for the
purpose for which the data was collected
▪ Detection of errors in the data that otherwise affect the validity of outputs
▪ Validation of data for the purposes it was collected
▪ Provision of information that would help access the overall level of accuracy of
the data
▪ Detection and identification of any inconsistencies in the data and outliers and to
make adjustments for them
▪ Validity and completeness of data: refers to correctness and completeness of
obtained responses. This helps ensure that there are no missing values or empty
fields in the data bases
▪ Range: Verifies that data within a field fall between the boundaries specified for
the particular field
▪ Duplicate data entry: this helps ensure that there is no repetition or duplication of
data and each unit on the data base or register was filled once only
▪ Logical Consistency: through this type of editing connections between data fields
or variables are taken into account
▪ Outliers: this type of editing helps detect values that are too extreme or unusual so
that they can be verified and checked
Stage – I: Rules are set for editing. This stage is further subdivided into two steps
▪ In step one, instructions are provided to desk editors who then check the data for
coherence and consistency
▪ In step two, rules are set by establishing logical relations between the variables according
to various criteria. This set of rules is called automated validation rules and this type of
editing seeks to detect errors during data entry and to screen them

Stage – II: The manual desk editing stage is a traditional method that is put into effect by a
specialized editing team. The data, (if) on paper is checked after the data has been
collected and before it is fed into the data bases. If however, electronic means have been
used to collect the data, the forms entered into the database are revised individually.
The automated data editing method makes use of computer programs and systems for
checking the data all at once after it has been entered electronically. These programs and
systems contain the auditing rules which validate the data, detect errors and determine
unacceptable responses
▪ What is Code?
▪ A code in research methodology is a short word or phrase describing the
meaning and context of the whole sentence, phrase or paragraph. The code
makes the process of data analysis easier
▪ Numerical quantities can be assigned to codes and this these quantities can be
interpreted. Codes help quantify qualitative data and give meaning to raw data.

▪ What is Coding?
▪ Data coding is the process of deriving codes from the observed data. In
qualitative research the data is either obtained from observations, interviews or
from questionnaires.
▪ Preliminary Codes
When data coder assigns codes to the observed data, he cannot manage to assign
well-refined codes in the first instance
▪ Final Codes
The final codes will help you observe a better pattern in the data. This pattern is
necessary to reach the final evaluation or analysis stage of the data.
▪ Textual

In such form of presentation, data is simply mentioned as mere text, that is generally ain a paragraph. This is commonly used when the
data is not very large

e.g. the 2002 earthquake proved to be a mass murderer of humans. As many as 10,000 citizens have been reported dead

▪ Tabular

The data is organised in rows and columns. This is one of the most widely used forms of presentation of data since data tables are easy
to construct and read
Components:
▪ Table Number

▪ Title

▪ Headnotes

▪ Stubs

▪ Caption

▪ Body of Field

▪ Footnotes

▪ Source
▪ Ease of representation
▪ Ease of analysis
▪ Helps in comparison
▪ Economical
▪ Quantitative Classification: In quantitative classification, data is classified on the
basis of quantitative attributes.

Marks No. of Students

0-50 29

51-100 54
▪ Spatial Classification: When data is classified according to location, it becomes a
spatial classification

Country No. of teachers

India 1,24,000

China 56,000
▪ The frequency of a particular data value is the number of times the data value
occurs. E.g. if four students have a score of 80 in mathematics., and then the score
of 80 is said to have a frequency of 4. the frequency of a data value is often
represented by f.
▪ A frequency table is constructed by arranging collected data values in ascending
order of magnitude with their corresponding frequencies
▪ Example: the marks awarded for an assignment set for a year 8 class of 20 students
were as follows:

Present this information in a frequency table


▪ When the set of data values are spread out, it is difficult to set up a frequency table
for every data value as there will be too many rows in the table. So we group data
into class intervals (or groups) to help us organise, interpret and analyse the data
▪ Ideally, we should have between five and ten rows in a frequency table. Bear this in
mind when deciding the size of class interval (or group)
▪ The frequency of a group (or class interval) is the number of data values, that fall in
the range specified by that group (or class interval)
A frequency distribution is a tabular arrangement of data into classes according to
size or magnitude along with corresponding class frequencies (the number of values
which fall in each class)
1) Find the range of the data: The range is the difference between the largest and the
smallest values.
2) Decide the approximate number of classes in which the data are to be grouped. There
are no hard formula for determining the approximation number of classes.
K = 1 + 3.322logN
Where K = Number of classes
And logN = Logarithm of the total number of observations
Example: if the total number of observations is 50, the number of classes would be
K = 1 + 3.322logN
K = 1+3.22log50
K = 1 + 3.322 (1.69897)
K = 1 + 5.644
K = 6.644
7 classes, approximately
3) Determine the approximate class interval size: the size of class interval is
obtained by dividing the range of data by the number of classes and is denoted by h
class interval size
(h) = Range Number of Classes
In the case of fractional results, the next higher whole number is taken as the size of
the class interval

4) Decide the starting point: The lower-class limit or class boundary should cover
the smallest value in the raw data. It is multiple of class intervals.
Example: 0, 5, 10, 15, 20, etc. are commonly used.
5) Determine the remaining class limits (boundary): When the lowest class boundary
has been decided, by adding the class interval size to the lower-class boundary you can
compute the upper-class boundary. The remaining lower and upper class limits may be
determined by adding the class interval size repeatedly till the largest value of the data is
observed in the data.

6) Distribute the data into respective classes: All the observations are divided into
respective classes by using the tally bar method which is suitable for tabulating the
observations into respective classes. The number of tally bars is counted to get the
frequency against each class. The frequency of all the classes is noted to get the grouped
data of frequency distribution of the data. The total of the frequency columns must be
equal to the number of observations.
There are four important characteristics of frequency distribution. They are as
follows:
▪ Measures of central tendency and location (mean, median, mode)
▪ Measures of dispersion (range, variance, standard deviation)
▪ The extent of symmetry/asymmetry (skewness)
▪ The flatness or peakedness (kurtosis)
Generally, the central tendency of a dataset can be described using the following
measures:
▪ Mean (Average): Represents the sum of all values in a dataset divided by the total
number of the values
▪ Median: the middle value in a dataset that is arranged in ascending order (from
the smallest value to the largest value). If a dataset contains an even number of
values, the median of the dataset is the mean of the two middle values.
▪ Mode: Defines the most frequently occurring value in a dataset. In some cases, a
dataset may contain multiple modes, while some datasets may not have any mode
at all
In statistics, the measures of dispersion help to interpret the variability of data i.e. to know
how much homogeneous or heterogeneous the data is. In simple terms, it shows how
squeezed or scattered the variable is.
▪ Range: is is simply the difference between the maximum value and the minimum value
given in a data set. Example: 1,3,5,6,7 => Range = 7-1 =6
▪ Variance: Deduct the mean from each data in the set then squaring each of them and
adding each square and finally dividing them by the total no. of values in the data set is
the variance.
▪ Standard Deviation: The square root of the variance is known as the standard deviation
i.e
Skewness refers to a distortion or asymmetry that derivates from the symmetrical
bell curve, or normal distribution, in a set of data. If the curve is shifted to the left or
to the right, it is said to be skewed. Skewness can be quantified as a representation
of the extent to which a given distribution varies from a normal distribution. A
normal distribution has a skew of zero, while a lognormal distribution, for example,
would exhibit some degree of right-skew.
Like skewness, kurtosis is a statistical measure that is used to describe distribution.
Whereas skewness differentiates extreme values in one versus the other tail, kurtosis
measures extreme values in either tail. Distributions with large kurtosis exhibit tail
data exceeding the tails of the normal distribution (e.g. five or more standard
deviations from the mean). Distributions with low kurtosis exhibit tail data that are
generally less extreme than the tails of the normal distribution.
Apart from diagrams, Graphic presentation is another way of the presentation of data
and information. Usually, graphs are used to present time series and frequency
distributions.
▪ Suitable Title
▪ Unit of Measurement
▪ Suitable Scale
▪ Index
▪ Data Sources
▪ Keep it Simple
▪ Neat
▪ Bar Graph - contains a vertical axis and horizontal axis and displays data as
rectangular bars with lengths proportional to the values that they represent, a
useful visual aid for marketing purposes
▪ Histogram – frequency distribution and graphical representation uses adjacent
vertical bars erected over discrete intervals to represent the data frequency within
a given interval; a useful visual aid for meteorology and environmental purposes
▪ Pie Chart – shows percentage values as a slice of pie; a useful visual aid for
marketing purposes
▪ A bar chart is used when you want to show a distribution of data points or perform
a comparison of metric values across different subgroups of your data. From a bar
chart, we can see which groups are highest or most common, and how other groups
compare against the other
▪ A pie chart can only be used if the sum of the individual parts add up to a
meaningful whole, and is built for visualizing how each part contributes to that
whole. Meanwhile, a bar chart can be used for a broader range of data types, not
just for breaking down a whole into components.
▪ A histogram is used to summarize discrete or continuous data. In other words, it
provides a visual interpretation, of numerical data by showing the number of data
points that fall within a specified range of values (called ‘bins’). It is similar to
vertical bar graphs.
ADVANTAGES DISADVANTAGES
▪ Understanding Content: Visuals are more effective ▪ Cost of human efforts and resources
than text in human understanding
▪ Process of selecting the most appropriate
▪ Flexibility of use: graphical representation can be graphical and tabular representation of data
leveraged in nearly every field involving data
▪ Greater design complexity of visualizing data
▪ Increases structured thinking: users can make quick,
data-driven decisions at a glance with visual aids ▪ Potential for human bias

▪ Supports creative, personalized reports for more


engaging and stimulating visual presentation
▪ Improves communication: analyzing graphs that
highlight relevant themes is significantly faster than
reading through a descriptive report line by line
▪ Shows the whole picture: an instantaneous, full view of
all variables time frames data behavior and
relationships
▪ A pie-chart shows the relationship of the parts of the whole by visually
comparing the sizes of the sections (slices). Pie charts can be
constructed by using a hundreds disk or by using a circle
▪ The hundreds disk is built on the concept that the whole of anything is
100%, while the circle is built on the concept that 36° is the whole of
anything. Both methods of creating a pie chart are acceptable, and both
will produce the same results
▪ The sections have different colours to enable an observer to clearly see
the differences in the sizes of the sections
▪ A pie chart is best used when trying to work out the composition of
something. If you have categorical data then using a pie chart would
work really well as each slice can represent a different category
▪ Another good use for pie chart would be to compare areas of growth
within a business such as turnover, profit and exposure
▪ The bar graph helps to compare the different sets of data among
different groups easily
▪ It shows the relationship using two axes, in which the categories
on one axis and the discrete values on the other axis
▪ The graph shows the major changes in data over time

▪ Bar graphs are used to match things between different groups


or to trace changes over time. Yet, when trying to estimate
change over time, bar graphs are most suitable when the
changes are bigger
▪ Bar charts possess a discrete domain of divisions and are
normally scaled so that all the data can fit on the graph. When
there is no regular order of the divisions being matched, bars on
the chart may be organized in any order.
▪ It can be used in many different situations to offer an
insightful look at frequency distribution. For example, it can
be used in sales and marketing to develop the most effective
pricing plans and marketing campaigns.
▪ Over time, histograms can show what the normal distribution
is for a process that is running smoothly. However, by
routinely producing histograms, any variation is quickly
detected. This is a major advantage for organisations
because it supports finding and dealing with process
variation quickly.
▪ In bar graphs, each bar represent one value or category. On the other
hand in a histogram, each bar will represent a continuous data
▪ In a bar graph, the x-axis need not always be a numerical value. It
can also be a category. However, in a histogram, the X-axis is always
quantitative data and is continuous data
▪ Due to the above factor, a histogram can be observed for its
skewness i.e. a pattern or tendency of data to fall in more on the low
end or high end etc. Some cannot be done for a bar chart
▪ A hypothesis may be defined as a proposition, or a set of proposition set forth as an
explanation for the occurrence of some specified group of phenomena either
asserted merely as a provisional conjecture to guide some investigation or
accepted as highly probable in the light of established facts. Quite often a
research hypothesis is a predictive statement, capable of being tested by scientific
methods, that relates an independent variable to some dependent variable
▪ E.g. consider statements like: “Students who receive counselling will show a
greater increase in creativity than students not receiving counselling” or “the
automobile A is preforming as well as automobile B”
▪ These are hypotheses capable of being objectively verified and tested. Thus, we
may conclude that a hypothesis states what we are looking for and it is proposition
which can be put to a test to determine its validity.
Conceptual Clarity - Specificity – It Testability – It
It should be clear and should be specific should be capable of
precise and limited in scope being tested

Expectancy – It Objectivity – It
Simplicity – It
should state the should not include
should be stated as
expected value judgements,
far as possible in
relationships relative terms or any
simple terms
between variables moral preaching

Theoretical
Availability of
Relevance – It
Techniques –
should be consistent
Statistical methods
with a substantial
should be available
body of established
for testing the
or known facts or
proposed hypothesis
existing theory
Null Hypothesis & Alternative
Hypothesis

Level of Significance
BASIC
CONCEPTS IN Decision
HYPOTHESIS
TESTING
Type I and Type II Error

Two-tailed and One-tailed


Test
▪ Null Hypothesis and Alternative Hypothesis:
In the context of statistical analysis, we often talk about null hypothesis and alternative hypothesis
▪ If we compare method A with method B about its superiority and if we proceed on the
assumption that both methods are equally good, then this assumption is termed as the null
hypothesis
▪ As against this, we may think that the method A is superior, or the method B is inferior we are
stating what is termed as alternative hypothesis
▪ The Null Hypothesis is stated as H0
▪ Alternative hypothesis is stated as HA
▪ If our sample results do not support this null hypothesis, we should conclude that something else
is true.
▪ What we conclude rejecting the null hypothesis is known as alternative hypothesis.
▪ In other words, the set of alternatives to the null hypothesis is referred to as the alternative
hypothesis.
▪ If we accept H0, then we are rejecting HA and if we reject HA, then we are accepting H0.
▪ The null hypothesis and the alternative hypothesis are chosen before the sample is drawn.
▪ The researcher must avoid the error of deriving hypotheses from the data that he collects
and then testing the hypotheses from the same data. In the choice of null hypothesis, the
following considerations are usually kept in view:
▪ Alternative hypothesis is usually the one which one wishes to prove, and the null
hypothesis is the one which one wishes to disprove. This, a null hypothesis represents
the hypothesis we are trying to reject, and alternative hypothesis represents all other
possibilities.
▪ If the rejection of a certain hypothesis when it is actually true involves great risk, it is
taken as null hypothesis because then the probability of rejecting it when it is true (the
level of significance) which is chosen very small
▪ Null hypothesis should always be approximately a certain value specific hypothesis i.e.,
it should not state about or approximately a certain value.
▪ This is a very important concept in the context of hypothesis testing.
▪ It is always some percentage (usually 5%) which should be chosen with great care,
thought and reason.
▪ In case we take the significance level at 5%, then this implies that H0 will be
rejected when the sampling result (i.e., observed evidence) has a less than 0.05
probability of occurring if HA is true
▪ In other words, the 5% level of significance means that researcher is willing to take
as much as a 5% risk of rejecting the null hypothesis when it (H0) happens to be
true.
▪ Thus the significance level is the maximum value of the probability of rejecting H0,
when it is true and is usually determined in advance before testing the hypothesis
▪ Given a Hypothesis H0 and an alternative hypothesis HA we make a rule which is
known as decision rule according to which we accept H0(i.e. reject HA) or reject H0
(i.e. accept HA)
▪ For instance, if (H0, is that a certain lot is good (there are very few defective items
in it) against HA ) that the lot is not good (there are too many defective items in it),
then we must decide the number of items to be tested and the criterion for
accepting or rejecting the hypothesis.
▪ We might test 10 items in the lot and plan our decision saying that if there are none
or only 1 defective item among the 10, we will accept H0, otherwise we will reject
H0 (or accept HA).
▪ This sort of basis is known as decision rule.
▪ Degrees of freedom of an estimate is the number of independent
pieces of information that went into calculating the estimate. It’s not
quite the same as the number of items in the sample.
▪ In order to get the df for the estimate, you have to subtract 1 from the
number of items, Let’s say you were finding the mean weight loss for
a low-card diet. You could use 4 people, giving 3 degrees of freedom
(4-1 = 3), or you could use one hundred people with df = 99
▪ Type I Error : We reject H0, when H0 is true
▪ Rejection of hypothesis which should have been accepted
▪ Denoted by α (alpha), known as α error
▪ Also called level of significance of test

▪ Type II Error : We accept H0, when H0 is not true


▪ Accepting the hypothesis which should have been rejected
▪ Denoted by β (beta), known as β error
▪ The probability of Type I error is usually determined in advance and is understood
as the level of significance of testing the hypothesis
▪ If type I error is fixed at 5%, it means that there are about 5 chances in 100 that we
will reject H0, when H0 is true.
▪ We can control Type I error just by fixing it at a lower level
▪ For instance, if we fix it at 1%, we will say that the maximum probability of
committing Type I error woul; only be 0.01
▪ When testing a hypothesis, the researcher can make a correct decision but is also
at the risk of committing either Type I or a Type II error
▪ Between the two, the researcher always seeks to minimize the risk of committing a
Type I error
▪ The significance level represents the risk of committing a Type I error
▪ In practice, significance levels used, vary between fields of study
▪ However, the most common significance levels used are 1%, 5% and 10%
▪ A 1%, 5%, 10% significance level means you accept that there is a risk of 1%, 5%,
10% that you are committing Type 1 error
▪ In the context of hypothesis testing, these two terms are quite important
and must be clearly understood
▪ A two-tailed test rejects the null hypothesis if, say, the sample mean is
significantly higher or lower than the hypothesized value of the mean of
the population. Such a test is appropriate when the null hypothesis is
some specified values, and the alternative hypothesis is a value not equal
to the specified value of the null hypothesis
▪ Symbolically, the two-tailed test is appropriate when we have H0: µ = µH0
and HA : µ ≠ µH0 , which may mean µ > µH0 and µ < µH0
▪ A one-tailed test would be used when we are the test, say whether the
populations mean is either lower than or higher than some hypothesized
value
Step – I : State the Null Hypothesis (H0) and the
Alternative Hypothesis (HA)
Under two sample test H0 and H1 are
formulated differently

Step – II : Select a Level of Significance

Under two sample tests, different


formulae are used
Step – III : Identify the Test Statistics

Step – IV : Formulate the Decision Rule

Step – V : Make a Decision

STEPS IN HYPOTHESIS Do not Reject the Reject the Null


TESTING Null Hypothesis Hypothesis
▪ Known as tests of significance
▪ Two types:
▪ Parametric Tests/Standard Tests of Hypothesis –
▪ Usually assume certain properties of the parent population from which we draw samples
▪ If the information about the population is completely known by means of its parameters,
then statistical test is called parametric test
▪ E.g.: t-test, f-test, ANOVA, Pearson Correlation
▪ Non-Parametric tests/Distribution-free tests of Hypothesis –
▪ If there is no knowledge about the population or parameters, but still it is required to test
the hypothesis of the population. Then it is called non-parametric test.
▪ E.g.: Chi-Square Test, Mood’s Median Test, Kruskal-Wallis Test, Spearman Rank Correlation

▪Like parametric test


for nominal and ordinal scale
Parametric Non-Parametric
Information about population is completely No information about the population is
known available
Specific assumptions are made regarding the No assumptions are made regarding the
population population
Null hypothesis is made on parameters of the
The null hypothesis is free from parameters
population distribution
The statistics is based on the distribution Test statistic is arbitrary
Parametric tests are applicable only for
It is applied to both variable and attributes
variable
Parametric test is powerful, if it exists It is not so powerful like parametric tests
No parametric test exists for Nominal Scale Non-parametric test do exist for nominal and
data ordinal scale data
▪ The t-test tells you how significant the differences between groups are: In other
words, it lets you know if those differences (measured in means) could have
happened by chance
▪ A t-test is a type of inferential statistics used to determine if there is a significant
difference between the means of two groups, which may be related in certain
features
▪ The t-test is one of many tests used for the purpose of hypothesis testing in
statistics
▪ Calculating a t-test requires three key data values. They include the difference
between the mean values from each data set (called the mean difference), the
standard deviation of each group, and the number of data values of each group
▪ There are several different types of t-test that can be preformed depending on the
data and type of analysis required
The T Score:

▪ The t-score is a ratio between the difference between two groups and
the difference within the groups
▪ The larger the t score , the more difference there is between groups
▪ The smaller the t score, the more similarity there is between groups.
▪ A t score of 3 means that the groups are three times as different from
each other as they are within each other.
▪ T-Values and P-Values: How big is “big enough”? Every t-value has a p-value to go
with it. A p-value is the probability that the results from your sample data occurred
by chance. P-values are from 0% to 100%. They are usually written as a decimal
e.g., a p value of 5% is 0.05. Low p-values are good: They indicate your data did not
occur by chance. E.g., a p-value of 0.01 means there is only 1% probability that the
results from an experiment happened by chance. In most cases, a p-value of 0.05
(5%) is accepted to mean the data is valid.
▪ There are three main types of t-tests:
▪ An Independent Samples t-test compares the means for two groups
▪ A paired sample t-test compares means from the same group at different times (say, one
year apart)
▪ A one sample t-test tests the mean of a single group against a known mean
▪ A z-test is a statistical test used to determine whether two population means are
different when the variances are known and the sample size is large
▪ The test statistic is assumed to have a normal distribution and nuisance parameters
such as standard deviation should be known in order for an accurate z-test to be
performed
▪ Z-test is a statistical test to determine whether two population means are different
when the variances are known and the sample size is large
▪ Z-test is a hypothesis test in which the z-statistic follows a normal distribution
▪ A z-statistic or z-score is a number representing the result from z-test
▪ Z-tests are closely related to t-tests, but t-tests are best performed when an
experiment has a small sample size
▪ Z-tests assume the standard deviation is known, while t-tests assume it isunknown
Several different types of tests are used in statistics. You would use a Z test if:
▪ Your sample size is greater than 30. Otherwise, use a t-test.
▪ Data points should be independent from each other. On other words, one data
point isn’t related or doesn’t affect another data point
▪ Your data should be normally distributed. However, for large sample sizes (over 30)
this doesn’t always matter.
▪ Your data should be randomly selected from a population, where each item has an
equal chance of being selected.
▪ Sample sizes should be equal if at all possible.
▪ Let’s say we need to determine if girls on average score higher than 600 in the
exam. We have the information that the standard deviation for girls' score is 100. So,
we collect the data of 20 girls by using random samples and record their marks.
Finally, we also set our α value (significance level) to be 0.05

▪ In this example:
▪ Mean Score for Girls is 640
▪ The size of the sample is 20
▪ The population mean is 600
▪ Standard Deviation for Population is 100
▪ https://youtu.be/3Jhg4tbtIuM
▪ https://youtu.be/qTk4pWRH7Ic
▪ https://youtu.be/tcWpAP0JKSE
▪ https://youtu.be/sCfpA_vycQI
▪ https://youtu.be/yrPsgj6gThY

You might also like