No command
1 insheetusing or file-import
alternatively one can import using the foll option
import excel filename.xlsx, sheet("mean") cellrange(a1:H10) firstrow
note : save op and data files separately
op can be saved in both smcl and txt format
2 use"path"
3 log using"path.smcl",append
4 log
using"path.smcl",replace
5 cmdlogusing" "
6*
7 sum
tabstat varname, stats(mean median kurtosis skewness)
8 display ""
sort
order var1 var2 var3
9 recode var(2000/min=1) (2001/5000=2)(5001/max=3)
10 corr
11 regress var1 var2
twoway lfit depvar indvar|| scatter depvar indvar
* tabulate varname, gen(D)
12 ttesti obsv samplemean SD populationmean
13 ttest beforevar=aftervar
14 ttest varname(classifier)
NOTE: the above t test commands are used if the statistic/parameter value is known
15 describe
16 tabulate varname varname
tabulate varname1 varname2, row
tabulate varname varname, column
tabulate varname varname, chi2
17 findit varname
18 bysort varname: sum varname2
19 if
20 codebook
21 generate
22 keep if varname== category
23 replace
eg: replace gender="1" if Gender=="Male"
replace age=25 in 50
24 ttest varname,by(category)
ttest var1=var2
26 tabstat varname, stats()
eg: tabstat Income. Stats(min max mean)
to find out descriptive statistics for a particular
tabstat varname, stats() by()
25 oneway varname factor
if frequencies are needed, then the command will be
oneway varname factor,tabulate
27 anova varname factor1 factor2
eg: anova Output Fert_type Pest_type
28 generate logvarname = log(varname)
alternatively
generate logvarname = ln(varname)
29 preserve and restore
syntax:
preserve
Savings=Income+Expenditure
restore
egen
31 egen varname2=rank(varname)
example: egen Inc_rank= rank(Income)
32 gen nawvar=oldvar
recode newvar()
meaning
to import excel file
ge(a1:H10) firstrow
to use an already saved stata file
to use the same file - existing results
will resul and new results will get appended
for new results
to save the commands in a diff place
comment
descriptive statistics
displays only selected statistics as requested
displays strings and values of scalar expressions
to arrange the values of a variable in ascending order
rearranges the dataset in the particular order
to categorize variables into different variables or groups
to perform correalation between two or more variables
to do regression
to perform a twoway scatter with a line of fit
to generate a dummy variable
to do one sample t test
paired t test
independent sample ttest
gives the details of the variables
frequency of the given variables CROSSTABS
to get row percentages
column percentages
gives the pearson chi-square
to show the associated help with a plethora of options
to sort the output on two fronts
if can be put at the end of a command
information about one or more variables
to generate a new variable- similar to compute on spss
replaces the value 25 for the 50th observation
for twosameple ttest
for paired sample ttest
The tabstat command provides a more flexible alternative to
summarize . We can specify just
which summary statistics we want to see.
to perform One Way ANOVA
to perform twoway ANOVA
to create a natural log of the variable
gives you the option to undo a command that you've typed
incorrect
OOPS! this is wrong
this helps you to undo the command
to create a new variable called
to create a newvariable and recode it as diff categories
remarks or use
sheet name in the bracket and firstrow values acts as variable name
Pg 15 in Statistics with book
to check for file name
inside the smcl to write the interpreation i.e, * followed by the anything
to type in the command window
eg: calculator, used to print some key info
for string variables, the values will be arranged alphabetically
also use nonmissing option
dependent var followed by independent var(s)
categorical can take any value - 1 ,2, 3
dummy variable wll only take 0 and 1 (subset of categorical variable)
comparind the grp mean with a benchmark (popmean)
varname for eg IQbeforetest=IQaftertest
ttest cost_sqft(location)
to do crosstabs - combined frequencies
alternative to help
varname is the one according to which the output will be displayed
independent sample t tes where you check if the 2samples ccome from indentical pop with same mean
to check if the obsv diff between the two means is stat sig
statistical sig
to get stats only for female
lfitci
The syntax of generate and replace are identical, except
to get ouput as per categories -- say developed and
underdeveloped
1. gen newvar=income
2.recode
3.sort newvar
4.by newvar: any statistical operation
Double equal: Test for equality
The double equals, ==, is used to test for equality. It is sometimes
called logical equals because it is part of a logical test that returns
either a one (true) or a zero (false). Here are some examples:
to check if there is a relationship - to see how representative is the sample for the population
thumb value of t is 2 , i.e if the computed value is greater than 2 , we can say there is a statistical sig
relationship between t and pvalue is inverse
tabstat Income, stats(min max mean) ,if Gender=="Female"
regression line with 95% C.I.
- generate works when the variable does not yet exist and will give an error if the variable already exists.
- replace works when the variable already exists, and will give an error if the variable does not yet exist.