[go: up one dir, main page]

100% found this document useful (1 vote)
228 views5 pages

Assignment 1 - Data Screening (16 March)

The document describes screening a dataset for missing data, unengaged responses, outliers, and performing appropriate imputations. It involves the following steps: 1) Checking for missing data by counting blank cases and deleting cases with many missing values. Two cases were deleted with 17 and 20 missing values. 2) Checking for outliers in the Age variable and deleting 9 cases with extreme ages above 45 years. 3) Identifying missing values in variables C4 and E5 and imputing them using the median of nearby points. Age was also found to have 9 missing values after outlier removal and was imputed using the series mean.

Uploaded by

gaprabaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
228 views5 pages

Assignment 1 - Data Screening (16 March)

The document describes screening a dataset for missing data, unengaged responses, outliers, and performing appropriate imputations. It involves the following steps: 1) Checking for missing data by counting blank cases and deleting cases with many missing values. Two cases were deleted with 17 and 20 missing values. 2) Checking for outliers in the Age variable and deleting 9 cases with extreme ages above 45 years. 3) Identifying missing values in variables C4 and E5 and imputing them using the median of nearby points. Age was also found to have 9 missing values after outlier removal and was imputed using the series mean.

Uploaded by

gaprabaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Question 1 (Case Screening)

a) Data and variable copied to Excel :

b) Check for missing data:


To find missing data, we have to count the blank cases:
(i) Create a column title BLANK and input formula “COUNTBLANK” and select
variables from B1 to E5.
(ii) Sort and filter (largest to smallest) and find the cases with many missing data and
delete.
(iii) In this data case No. 57 and 161 has 17 and 20 missing data, so the both cases is
deleted.

1
c) Check for unengaged data:
To find unengaged data, we have to count the standard deviation for each case:
(i) Create a column title SD and input formula “STDEV.P” and select variables from
B1 to E5.
(ii) Sort and filter (smallest to largest) and find
(iii) Based on Gaskin rule any value below 0.5 can be considered unengaged
responses, however we use general rule any value less than 0.4 is considered
unengaged responses and will be deleted.
(iv) No unengaged response cases are found in this data set.

2
Question 2 (Variable Screening)

a) Check for missing value:


i. Run frequency to identified the missing value
ii. Analyze  Descriptive statistics  frequencies (all variables except ID)
iii. In the data set, there were two variables with missing data, that is item C4
(perceived ease of use) has 3 missing value and E5 (System use item 5) has 2
missing value as shown below:

Missing value

Perceived ease of
System use item 5
use item 4
Valid 219 220
N
Missing 3 2

b) Check for outlier

i. Run Analyze  descriptive statistics  explore  enter Age to dependent list


then choose outlier & percentile.
ii. Age has 9 extreme value cases as shown below. The value of those cases is more
than 45 extreme value. So, delete the value for that particular cases.

Percentiles

Percentiles

5 10 25 50 75 90 95

Weighted Age (years)


19.0000 20.0000 21.0000 23.0000 27.0000 39.0000 43.0000
Average(Definition 1)
Tukey's Hinges Age (years) 21.0000 23.0000 27.0000

IQR = Third quartile – First quartile


= 27 -21
=6

Outlier Calculation Results


Mild outlier 1.5 x 6 9
Extreme Outlier 3x6 18
Mild outlier 27 + 9 36
Upper Extreme Outlier 27 + 18 45
Lower Mild outlier 21 - 9 12
Extreme Outlier 21 - 18 3
3
9 Extreme Value Cases

Extreme outlier
Case No. Age
181 55
179 50
33 52
212 49
57 49
208 48
159 47
18 47
74 46

Age (years)

4
c) Use appropriate value imputation for missing values
i. Item C4 (perceived ease of use) has 3 missing value and E5 (System use item 5)
has 2 missing value.
ii. Age has 9 missing value after outlier check.
iii. Run transform  replace missing value  enter C4 & E5 and change the name
and substitute with median of nearby point because both are interval scale of
measurement. For the Age, substitute with series mean.
iv. The results is shown below: -

Result Variables

Case Number of Non-Missing

N of Replaced Values Creating


Result Variable Missing Values First Last N of Valid Cases Function

1 C4 3 1 222 222 MEDIAN(C4,2)


2 E5 2 1 222 222 MEDIAN(E5,2)
3 Age 9 1 222 222 SMEAN(Age)

You might also like