Research Method
Quantitative Data Analysis
Topics
• Getting Data Ready
• Getting a Feel for the Data
• Hypotheses Testing
Getting the Data Ready for Analysis
• Data coding: assigning a number to the participants’ responses so they can be
entered into a database.
• Data Entry: after responses have been coded, they can be entered into a
database. Raw data can be entered through any software program (e.g., Stata,
Eviews, SPSS)
• Editing suspicious data:
• An illogical response or an outlier response. An outlier is an observation that
is substantially different from the other observations.
• Inconsistent responses are responses that are not in harmony with other
information.
• Illegal codes are values that are not specified in the coding instructions.
Getting a Feel for the Data
Economic Growth
Rapid expansion of service sector, while manufacturing sector lagging behind.
Innovation Capability in 1900 Explains Development Today
Faster Quality Growth is Riskier Quality Growth
Hypotheses Testing
• Null Hypothesis (denoted Ho):
the statement being tested in a test of hypothesis.
• Alternative Hypothesis (Ha):
what is believe to be true if the null hypothesis is false.
• Steps:
• Determine the null and alternative hypotheses.
• What are the claim
• What are the opposite of the claim
• Determine the claim is the null or alternative hypotheses
• Specify the test statistic and its distribution if the null hypothesis is true.
• Select a and determine the rejection region.
• Calculate the sample value of the test statistic and, if desired, the p-value.
• State the conclusion.
Null Hypothesis
• We begin with the assumption that Ho is true and any difference
between the sample statistic and true population parameter is due to
chance and not a real (systematic) difference
• Similar to the notion of “innocent until proven guilty”
• That is, “innocence” is a null hypothesis
• Refers to the status quo
• Always contains “=” , “≤” or “≥” sign
• May or may not be rejected
Alternative Hypothesis
• Is the opposite of the null hypothesis
• Challenges the status quo
• Never contains the “=” , “≤” or “≥” sign
• May or may not be proven
• Is generally the hypothesis that the researcher is trying to prove.
Evidence is always examined with respect to Ha, never with respect
to Ho
• We never “accept” Ho, we either “reject” or “not reject” it
Rejection Region or Critical Value Approach
The given level of significance = α
Non-rejection region
H0: μ = 12 a /2 a /2
H1: μ ≠ 12 Represents
Two-tail test 12 critical value
H0: μ ≤ 12 H1: a Rejection
μ > 12 region is
Upper-tail test 12 shaded
H0: μ ≥ 12
a
H1: μ < 12
Lower-tail test 12
Conclusions in Hypothesis Testing
We always test the null
hypothesis. The initial
conclusion will always be
one of the following:
• Reject the null
hypothesis.
• Fail to reject the null
hypothesis.
Caution
• Never conclude a hypothesis test with a statement of “reject the null hypothesis”
or “fail to reject the null hypothesis.” Always make sense of the conclusion with a
statement that uses simple nontechnical wording that addresses the original
claim.
• Accept Versus Fail to Reject:
• Some texts use “accept the null hypothesis”, but we are not proving the null
hypothesis.
• Fail to reject is more correct.
• The available evidence is not strong enough to warrant rejection of the null
hypothesis (such as not enough evidence to convict a suspect).
Type I and Type II Errors
A Type I error:
• The mistake of rejecting the
null hypothesis when it is
actually true.
• The symbol α (alpha) is
used to represent the
probability of a type I error.
A Type II error:
• The mistake of failing to
reject the null hypothesis Using only the consonants from those words (RouTiNe FoR
when it is actually false. FuN), we can easily remember:
• The symbol β (beta) is • type I error is RTN: Reject True Null (hypothesis)
used to represent the • type II error is FRFN: Fail to Reject a False Null
probability of a type II error. (hypothesis).
Parametric Test
t-test and z-test
• The t-test can be understood as a statistical test which is used to compare and
analyse whether the means of the two population is different from one another
or not when the standard deviation is not known. As against, Z-test is a
parametric test, which is applied when the standard deviation is known, to
determine, if the means of the two datasets differ from each other.
• The t-test is based on Student’s t-distribution. On the contrary, z-test relies on
the assumption that the distribution of sample means is normal. Both student’s t-
distribution and normal distribution appear alike, as both are symmetrical and
bell-shaped. However, they differ in the sense that in a t-distribution, there is less
space in the centre and more in the tails.
• One of the important conditions for adopting t-test is that population variance is
unknown. Conversely, population variance should be known or assumed to be
known in case of a z-test.
• Z-test is used to when the sample size is large, i.e. n > 30, and t-test is
appropriate when the size of the sample is small, in the sense that n < 30.
Praktik
1. .cd "C:\Bahan Ajar\Metode Penelitian\Metode Penelitian 2022-2023 Ganjil\Pertemuan 10-
11\Data"
2. . use "nhanes_clean.dta"
3. Statistics > Summaries, tables, and tests > Frequency tables > One-way tables
• . tabulate sex
• . tabulate race
4. Graphics > Pie chart
• graph pie, over(sex) plabel(_all percent, color(white) size(huge)) title(Sex)
• graph export piechart.png, as(png) width(3200) height(2400) replace
5. Graphics > Boxplots
• graph box sbp, over(race)
6. Statistics > Summaries, tables, and tests > Summary and descriptive statistics > Summary
statistics
• summarize sbp, detail
• Buka .\Introduction to Stata – Huber\02_Descriptives.pdf
Stata Syntax for Proportion
• One-sample test of proportion
prtest varname == #p [if] [in] [, onesampleopts]
• Two-sample test of proportions using groups
prtest varname [if] [in] , by(groupvar) [twosamplegropts]
• Two-sample test of proportions using variables
prtest varname1 == varname2 [if] [in] [, level(#)]
• Immediate form of one-sample test of proportion
prtesti #obs1 #p1 #p2 [, level(#) count]
• Immediate form of two-sample test of proportions
prtesti #obs1 #p1 #obs2 #p2 [, level(#) count]
Stata Syntax for Known σ
• One-sample z test
ztest varname == # [if] [in] [, onesampleopts]
• Two-sample z test using groups
ztest varname [if] [in] , by(groupvar) [twosamplegropts]
• Two-sample z test using variables
ztest varname1 == varname2 [if] [in], unpaired [twosamplevaropts]
• Paired z test
ztest varname1 == varname2 [if] [in] , sddiff(#) [level(#)]
ztest varname1 == varname2 [if] [in] , corr(#) [pairedopts]
• Immediate form of one-sample z test
ztesti #obs #mean #sd #val [, level(#)]
• Immediate form of two-sample unpaired z test
ztesti #obs1 #mean1 #sd1 #obs2 #mean2 #sd2 [, level(#)]
Stata Syntax for Unknown σ
• One-sample t test
ttest varname == # [if] [in] [, level(#)]
• Two-sample t test using groups
ttest varname [if] [in] , by(groupvar) [options1]
• Two-sample t test using variables
ttest varname1 == varname2 [if] [in], unpaired [unequal welch level(#)]
• Paired t test
ttest varname1 == varname2 [if] [in] [, level(#)]
• Immediate form of one-sample t test
ttesti #obs #mean #sd #val [, level(#)]
• Immediate form of two-sample t test
ttesti #obs1 #mean1 #sd1 #obs2 #mean2 #sd2 [, options2]
One Sample Test
A manufacture of steel rods considers that the manufacturing process is working properly if the mean length of the
rods is 8.6. The standard deviation of these rods always runs about 0.3 inches. Suppose a random sample of size n =
36 yields an average length of 8.7 inches. Should the manufacturer conclude the process is working properly or
improperly? Use the .05 level of significance.
Immediate form of one-sample z test
ztesti #obs #mean #sd #val [, level(#)]
. ztesti 36 8.7 .3 8.6, level(95)
One-sample z test
------------------------------------------------------------------------------
| Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
x | 36 8.7 .05 .3 8.602002 8.797998
------------------------------------------------------------------------------
mean = mean(x) z = 2.0000
H0: mean = 8.6
Ha: mean < 8.6 Ha: mean != 8.6 Ha: mean > 8.6
Pr(Z < z) = 0.9772 Pr(|Z| > |z|) = 0.0455 Pr(Z > z) = 0.0228
https://www3.nd.edu/~rwilliam/stats1/OneSample-Stata.pdfhttps://www3.nd.edu/~rwilliam/stats1/OneSample-Stata.pdf
The Deans contend that the average graduate student makes $8 (000) a year. Zealous
administration budget cutters contend that the students are being paid more than that, while the
Graduate Student Union contends that the figure is less. A random sample of 6 students has an
average income (measured in thousands of dollars) of 6.5 and a sample variance of 2. Using both
confidence intervals and significance tests, test the Deans’ claim at the .10 and .02 levels of
significance.
Immediate form of one-sample t test
ttesti #obs #mean #sd #val [, level(#)]
ttesti 6 6.5 1.414213562 8, level(90)
One-sample t test
use "1sample-III.dta" ttest varname == # [if] [in] [, level(#)]
ttest pay = 8, level(90)
https://www3.nd.edu/~rwilliam/stats1/OneSample-Stata.pdfhttps://www3.nd.edu/~rwilliam/stats1/OneSample-Stata.pdf
. ttesti 6 6.5 1.414213562 8, level(90)
One-sample t test
------------------------------------------------------------------------------
| Obs Mean Std. err. Std. dev. [90% conf. interval]
---------+--------------------------------------------------------------------
x | 6 6.5 .5773503 1.414214 5.336611 7.663389
------------------------------------------------------------------------------
mean = mean(x) t = -2.5981
H0: mean = 8 Degrees of freedom = 5
Ha: mean < 8 Ha: mean != 8 Ha: mean > 8
Pr(T < t) = 0.0242 Pr(|T| > |t|) = 0.0484 Pr(T > t) = 0.9758
.
. use "1sample-III.dta"
(One Sample Tests, Case III, sigma unknown)
.
. ttest pay = 8, level(90)
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [90% conf. interval]
---------+--------------------------------------------------------------------
pay | 6 6.5 .5773503 1.414214 5.336611 7.663389
------------------------------------------------------------------------------
mean = mean(pay) t = -2.5981
H0: mean = 8 Degrees of freedom = 5
Ha: mean < 8 Ha: mean != 8 Ha: mean > 8
Pr(T < t) = 0.0242 Pr(|T| > |t|) = 0.0484 Pr(T > t) = 0.9758
Pengguna mencatat pemakaian selama 2 bulan dalam file GasCons.txt. Buktikan apakah pernyataan
dari Mobil LCGC benar bila perusahaan menyatakan konsumsi bahan bakar mobil mereka:
a. 10.7 liter/100 km
b. 12.1 liter/100 km
import delimited "GasCons.txt", clear
ttest cons=10.7
ttest cons=12.1
Simpulannya?
. import delimited "GasCons.txt", clear
(encoding automatically selected: ISO-8859-1)
(1 var, 50 obs)
.
. ttest cons=10.7
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
cons | 50 11.4278 .3876539 2.741127 10.64878 12.20682
------------------------------------------------------------------------------
mean = mean(cons) t = 1.8774
H0: mean = 10.7 Degrees of freedom = 49
Ha: mean < 10.7 Ha: mean != 10.7 Ha: mean > 10.7
Pr(T < t) = 0.9668 Pr(|T| > |t|) = 0.0664 Pr(T > t) = 0.0332
.
. ttest cons=12.1
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
cons | 50 11.4278 .3876539 2.741127 10.64878 12.20682
------------------------------------------------------------------------------
mean = mean(cons) t = -1.7340
H0: mean = 12.1 Degrees of freedom = 49
Ha: mean < 12.1 Ha: mean != 12.1 Ha: mean > 12.1
Pr(T < t) = 0.0446 Pr(|T| > |t|) = 0.0892 Pr(T > t) = 0.9554
Two Sample Tests
Indiana University (population 1) claims that it has a lower crime rate than Ohio State University
(population 2). A random sample of crime rates for 12 different months is drawn for each school,
yielding μ1= 370 and μ2 = 400. It is known that σ12 = 400 and σ22 = 800. Test Indiana‘s claim at the
.02 level of significance. Also, construct the 99% confidence interval.
Immediate form of two-sample unpaired z test:
ztesti #obs1 #mean1 #sd1 #obs2 #mean2 #sd2 [, level(#)]
ztesti 12 370 20 12 400 28.28427125, level(99)
A professor believes that women do better on her exams than men do. A sample of
8 women (N1 = 8) and 10 men (N2 = 10) yields μ1 = 7, μ2 = 5.5, s12= 1, s22= 7,
s2=1.303840481.
a. Using α = .01, test whether the female mean is greater than the male mean.
Assume that σ1 = σ2 = σ.
b. Compute the 99% confidence interval
Immediate form of two-sample t test:
ttesti #obs1 #mean1 #sd1 #obs2 #mean2 #sd2 [, options2]
ttesti 8 7 1 10 5.5 1.303840481, level(99)
ttesti 8 7 1 10 5.5 1.303840481, level(99) unequal (if σ1 ≠ σ2)
Using Raw Data:
use "2sample-II.dta"
ttest score, by(gender) level(99) Two-sample t test using groups
ttest varname [if] [in] , by(groupvar) [options1]
• Pada trial suatu obat, peserta kelompok 1 diberi obat yang sebenarnya, sedangkan peserta kelompok 2 hanya
diberi placebo yang tidak berkhasiat dan tidak berbahaya.
• Data di file medicine.txt menunjukkan nilai rata2 sebagai berikut:
• Apakah obat berhasil menyembuhkan lebih cepat?
To see whether the variances are approximately equal:
• sdtest variable, by(sorting variable)
• robvar variable, by(sorting variable)
If variances are equal:
• ttest [dependent variablename], by([independent variable])
If variances are not equal:
• ttest [dependent variablename], by([independent variable]) unequal
Contoh:
• import delimited "medicine.txt", clear
• ttest time, by(group)
Reference
• Sekaran, Bougie, 2016, Research Methods for Business, 7E.
• Cooper, Schindler, 2014, Business Research Methods, 12E.
• Saunders, Lewis, Thornhill, 2016, Research Methods for Business Students, 7E
• Hamilton, 2013, Statistics with STATA ver 12
• Hill, Griffiths, Lim, 2011, Principles of Econometrics, 4E
• Huber, 2016, Introduction to Stata (ppt)
• Triola, 2018, Elementary Statistics Using Excel 6E
• https://www3.nd.edu/~rwilliam/stats/StataHighlights.html
• https://stats.idre.ucla.edu/stata/webbooks/reg/