CD 314: Research Methods
and Skills
STATISTICAL DATA
PROCESSING
Mr. John Lanata
Scope
Initial situation Processes
• Set of data • manual editing &
• Recorded on forms coding
Objective • data capture
• Description • machine editing
• Sample
• estimation
• Population
• Inference • summarization
• From sample to • inference
population
• presentation
Methodological analysis
Sample size
Response rate
Non-response bias
Missing data
Manual editing
Less done now than in days
before personal computers
became widespread
Only those tasks that are not
feasible to be done by computer
need be done, eg.
• checking that all forms are present
• assessing legibility
Manual coding
Needs to be done where data are
in textual form
Process consists of allocating a
suitable code to each such data
item
It may be necessary to develop
suitable code lists based on data
if not already done
Data entry/capture
Process consists of entering data into
computer records, usually through a
keyboard
Storage options include:
• Spreadsheet (eg. Excel)
• Database (eg. Access)
• Other programs
Choice depends on volume &
complexity of data
Data entry into Excel
Raw data should be kept in one
or more sheets
Analysis etc. should be done in
other sheets or in other
workbooks
Data entry into Access
May be done directly into one or
more tables
Usually better to design special
forms for data entry
Machine editing
Process consists of detecting and
possibly amending data that is or
may be incorrect
Two main types of edit check:
validity & consistency
Simple editing may be done in Excel
Access queries enable much more
complex editing
Estimation
Process consists of producing estimates
relating to population from sample data
For self-weighting sample, estimates are
calculated directly from sample data
Otherwise, suitable weights must be
calculated and applied to sample data
These calculations more easily done in
Access than in Excel
Summarization
Forms include:
• Frequency distributions (one or more
variables)
• Measures of central tendency
• Measures of dispersion
• Positional measures
• Correlation
• Regression
Extent of multivariate tabulation is
constrained by sample size
Summarizing in Excel
Frequency distribution for one variable
may be constructed by using Histogram
option
Cumulative distribution may be easily
constructed with COUNTIF function
Statistical functions use raw data as input
If data available only in form of frequency
distribution, calculation tables may be set
up
Multivariate analysis in
Excel
Bivariate & multivariate frequency
distributions are difficult to
construct in Excel
Data Analysis option of Tools menu
enables more advanced multivariate
analysis of raw data, eg.
• multiple regression
• moving averages
Summarizing in Access
Only few statistical functions
available
Frequency distribution for one
variable may be easily constructed
by using Total query
Bivariate & multivariate
frequency distributions may be
constructed by using Crosstab query
Inference in Excel &
Access
Excel has many suitable
functions, eg.
• CHIDIST
• FDIST
• NORMDIST
• TDIST
Access does not facilitate
inference
Inference in other
programs
Many specialized statistical
programs available, eg.
• SPSS
• SAS
• Minitab
These require some knowledge of
inference theory in order to choose
suitable tests & interpret them
Tabular presentation
Simple tabulations may be
constructed in Excel
Complex tabulations may be
constructed in Access by using
wizards
Graphical presentation
Excel may be used to produce
wide range of charts
Access does not produce charts
so easily
Joint use of Excel &
Access
Each program has its relative
advantages
Easy to transfer data between
Excel & Access
Best to use both & transfer data
etc. between them as needed
Strengths of Access
Generally best to use Access
for:
• data entry & storage
• machine editing
• complex estimation
• frequency distributions
Strengths of Excel
Generally best to use Excel for:
• summarization
• inference
• charts
Overall presentation
Easy to transfer tables & charts
from Access & Excel into:
• Word for full report
• PowerPoint for summary
presentation
OpenOffice programs
Similar remarks apply to equivalent
programs in the OpenOffice suite,
which is available in some ITS labs
Equivalent programs are:
MS OFFICE OPENOFFICE
Word <> Writer
Excel <> Calc
Access <> Base
PowerPoint <> Impress