0% found this document useful (0 votes)

24 views32 pages

Handling Missing Data

The document discusses the challenges and techniques for handling missing data, emphasizing the importance of imputation methods to preserve data integrity and improve predictive accuracy. It highlights multiple imputation using Bayesian methods, specifically the MICE (Multiple Imputation by Chained Equations) approach, which accounts for uncertainty in missing data. The document also outlines various imputation techniques and their comparative advantages and disadvantages, providing insights into practical implementation using statistical programming tools like R and Python.

Uploaded by

Mayura D

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views32 pages

Handling Missing Data

Uploaded by

Mayura D

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.

com

Dealing with Missing Data-

The Art and Science of Imputation
May 2021

For the International Cost Estimating and Analysis

Association Conference – May 2021
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

IMPUTATION
FILLING IN HOLES IN DATASETS

THE PROBLEM OF MISSING DATA

A significant problem, especially for small datasets
Often dealt with by removing observations with missing data

TECHNIQUES FOR HANDLING MISSING DATA

A variety of techniques exist for filling in missing data, though
some perform better than others

FILLING IN HOLES WITH STATISTICS

Recognizing the inherent uncertainty in missing data, we
adopt and advocate the method of multiple imputation
using Bayesian methods (“chained equations”)

2
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

Why Imputation?
Is it worth it?

Preserves Data
Fooled by Randomness
Imputation prevents the reduction of
Having more data prevents us from falling
sample size due to missing values. This
prey to overly optimistic models that are
helps to preserve all responses in the
fit to more noise than signals
sample

Impute and
Assess Risk!

Preserves Structure of Data

Predictive Accuracy
When we remove data points, we could
Reducible uncertainty can be reduced by
be missing important patterns in the data,
increasing sample size. This helps to
which can cause our analysis to distort
improve predictive accuracy
true patterns within the data

3
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

DATA
Foundation of All
Analyses

The goal is to turn

data into
How Should We Handle It? information, and
The bulk of the time in analytics should
be spent on collecting, normalizing
information into
and verifying data. In defense and insight.
aerospace applications, datasets are
small. Data should be preserved when -Carly Fiorina

4 possible
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

IMPUTATION
To impute or not to impute, that is the question

01 02 03

Understand Determine Know when

the available variables that blanks are
data would benefit intentional
from imputation

Imputation is a powerful method that is useful for filling blanks when they are missing within a dataset
An analyst must understand the data intimately to know if a blank means that the factor is not applicable for
that data point
5
Sometimes a blank does not reflect a nonresponse and should be observed “as is”
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

Is the response missing at random?

The US Census Bureau

deals with missing data all
the time. If no response is
provided for the name of
Person 7 on the Census
form from the household
of six members, this missing
value is not an omission;
the response is “Not
Applicable”

6
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

ISSUES WITH DATA GAPS

What can go wrong?

Fewer Degrees of Reduction of Predictive Inability to Use

Freedom Power Advanced Methods
Removing observations with Predictive power is diminished Certain Machine Learning
missing values results in fewer when degrees of freedom are methods cannot be applied
degrees of freedom in models small when missing values are
prevalent

7
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

METHODS ALLOWING
MISSING DATA
Complete-Case Analysis
Approach that excludes any records with missing data.
Disadvantage – bias becomes introduced into the analysis
due to the removal of data that may provide insight into the
population

Available-Case Analysis
Approach allows the analysis of subsets of the complete
dataset so that multiple aspects of a problem can be
studied. Disadvantage – bias is again introduced if data are
missing in a pattern

Alternative to Allowing Missingness

Though methods exist to continue with analysis upon removal
of missing data, better alternatives exist for filling data gaps

8
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

IMPUTATION METHODS

Mean Imputation Imputing using Regression Expectation

Related Observations Imputation Maximization
Filling missing values with the Filling missing values with Replacing missing values with Replacing missing values by
mean of the observed values responses from related a predicted value based on exploring the covariation
observations the results of fitting a among variables in order to
regression line to the available infer values for the missing
data data

To retain as much of the precious gold (data) as possible, we should consider using imputation
methods. There are several methods you can choose to make a best statistical inference at a
response that will close a data gap

9
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

IMPUTATION METHODS
How do they compare?

Mean Imputation Related Observations Regression Imputation Expectation Maximization

This method helps to restrict the This method also helps to restrict This method uses regression to This method uses maximum
variability of the data variability in the data predict missing values. MICE is a likelihood method to estimate
regression imputation method missing values
Disadvantage: it weakens Disadvantage: Introduces
covariances and correlations measurement error Advantage: Produces unbiased Advantage: Increases precision
amount features estimates with data that are and decreases parameter bias
10
Missing At Random (MAR)
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

Tools for Imputation

R Python
R is a language and Python is a high-level
environment for statistical programming language with
computing and graphics. It is dynamic semantics. Like R,
an integrated suite of software Python supports modules and
facilities for data manipulation, packages to help with analysis
calculation and graphical
display

11
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

MICE
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

MULTIPLE IMPUTATION BY
CHAINED EQUATIONS
MICE

Method
This method creates multiple imputations for a missing value
that accounts for the statistical uncertainty in the imputation

Assumptions
This method operates under the assumption that the missing
data is MAR. MAR occurs when a data gap is full accounted
for by variables where there is complete information

Iterations
Multiple regression models are conducted and each variable
with missing data is modeled conditionally on the responses
of the other variables within the dataset. With this method,
each variable is modeled according to its own distribution

13
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

HOW MICE FILLS GAPS

Several imputed versions of the data are created using plausible data values

01 02 03

NUMBER #01 NUMBER #02 NUMBER #03

Multiple imputation is a series of stochastic The first step is an imputation step (I-step) The number of iterations, m, are specified
regression imputations that fills data gaps using stochastic for the number of imputations that are
regression conducted in the I-step

06 05 04
NUMBER #06 NUMBER #05 NUMBER #04
The coefficients of the individual equation The P-step proceeds by taking a random In posterior step (P-step), the mean and
are averaged using a simple, unweighted draw from the mean and covariance covariance distributions are calculated
mean. Goodness-of-fit measures are distributions, which are used to calculate from the filled-in data
14
calculated using the pooled results regression coefficients
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

THE MICE PROCESS

Given the multiple imputations, the coefficients of the individual equation are averaged (using a
simple, unweighted mean). The other parameters, including the degrees of freedom, standard
errors, and R2s are combined using what is known as Rubin’s Rules, after the statistician who
developed them

15
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

UNDERSTANDING THE DATA

Exploring engine data

Dataset
The data used for analysis is a Wheeled and Tracked Vehicle
Engine dataset. The dataset is small, which makes the use of
imputation very important

Included Features
Identification (ID), Brake Horsepower (bHP), Displacement
(DISP), Engine Speed (EngSP), Cylinders (CYL), Unit Cost in
Dollars (UC), Dry Weight (DryWGT)

Missing Counts
Of the seven features included in the dataset, four of those
seven have missing values.
N=9

16
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

Dataset Example
Four variables have missing data

ID bHP EngSP CYL DryWGT DISP UC

1 290 2600 6 7.2 $40,079
2 330 2400 6 1296 7.2 $40,927
3 330 2200 6 1905 8.8 $29,563
4 515 1500 6 3090 15.2 $63,931
5 675 2101 8 14.8 $111,976
6 675 2101 8 14.8 $120,661
7 500 2100 8 12.1 $47,873
8 362 2300 3230 12.1
9 340 8 912 6.6 $40,661

17
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

IMPLEMENTING MICE

01 02 03
We used the statistical Conduct linear regression on Pooling Results
programming platform R and each of the five imputed
Combining the results of these separate
the ‘mice’ package to datasets analyses is referred to as pooling
calculate imputed data
To view each of the imputed datasets, we The pooled regression equation has
use the complete() function: coefficients that are the arithmetic means
R code:
of the coefficients for the five individual
install.packages('mice’)
R code: regressions
library(mice)
completedData<-complete(imputedata,1)
data<-read(“Example.csv”)
Let m denote the number of imputed
imputdata<-mice(data, m=5, meth=‘pmm’,
The number one in the complete function datasets, 𝛽𝑖 denote the ith coefficient, and 𝛽𝑖𝑗
seed=23109)
indicates that you want to see the first denote the ith coefficient for the jth imputed dataset;
iteration. To see the other 2-5 datasets, you then:
Fixed seed to ensure the analysis is
will need to write functions to create and σ𝑚𝑗=1 𝛽𝑖𝑗
repeatable 𝛽𝑖 =
view those datasets 𝑚
The default in mice is m=5. This parameter
will need to be included if another value of
imputations is desired
18
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

IMPLEMENTING MICE

04 05 06
Pooling Results - 2 Goodness-of-Fit Statistics Compare Results
To fit a linear model to a dataset, use the Unlike the coefficients, you cannot simply Compare the results from the imputed
lm() function. Then, pool the m estimates average the R2 values, standard errors, the dataset to the original dataset with missing
𝑄෠ (1) , … , 𝑄෠ (𝑚) into one model 𝑄.
ഥ F-stats, etc., in order to calculate the values removed
goodness-of-fit statistics
R code:
Fit1<-with(imputedata,lm(UC~bHP)) R code:
Summary(pool(Fit1)) pool.r.squared(fit4, adjusted = FALSE)

poolF<-mi.anova(mi.res=imputedata,
formula="UC~bHP")

19
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

ANALYZING RESULTS
Creating plots to determine reasonableness of imputations

Scatterplot Analysis
There is a linear relationship
between UC and bHP. The pattern
of the relationship seems plausible
for the imputed values (pink) as
compared to the observed values
(blue)

Density Plot Analysis

Density plots provide a visual into
the shapes of each imputation. The
plot is useful to determine outlier
imputations and works for variables
with two or more missing values

20
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

MICE Results
ID bHP EngSP CYL DryWGT DISP UC

1 290 2600 6 3090, 1296, 7.2 $40,079

1905, 1905, 912
2 330 2400 6 1296 7.2 $40,927

3 330 2200 6 1905 8.8 $29,563

4 515 1500 6 3090 15.2 $63,931

5 675 2101 8 912, 3230, 1296, 14.8 $111,976

3090, 1905
6 675 2101 8 3090,1905, 14.8 $120,661
3090, 912, 912
7 500 2100 8 912, 3090, 1296, 12.1 $47,873
3090, 912
8 362 2300 8, 8, 8, 3230 12.1 $47,873,
6 $47,873,
$40,079
$40,927
$111,976
9 340 2400, 2400, 8 912 6.6 $40,661
2300, 2300,
2400
21
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

FIT RESULTS
Comparing results from the original dataset to the imputed (pooled) dataset

Linear Model MICE Imputed Model

The model is a solid one with a statistically significant p-value less than Though the R2 statistic is lower than the original dataset, we gained some
alpha = 0.05 and an R2 equal to 87.5%. One data point was removed due degrees of freedom with the use of imputation with the creation of this
to missing a unit cost value statistically significant model. The model does not gain a full degree of
freedom since the iterations are pooled

22
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

EXPECTATION
MAXIMIZATION
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

Expectation
Maximization
Imputing by optimizing

Maximum Likelihood
The maximum likelihood method is used to impute missing values.
This method uses available data to impute a value and then checks
to determine the reasonableness of the guess

Covariance
The covariation among variables is used to infer probable values for
the missing data

Two-Step Process
The method follows a two-step process to fill in missing data

24
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

EM TWO-STEP PROCESS
How EM fills data gaps

STEP #01 01 02 STEP #02

Iterative Process
The maximum likelihood estimates
EM is an of the mean vector and

First Pass at Filling Gaps iterative covariance matrix are calculated.

The algorithm begins by filling the process The covariance matrix is then used
to derive regression equations for
gaps with the conditional mean of
used to fill the next iteration and the cycle
the missing values.
data gaps continues until the difference
between the covariance matrices
in subsequent runs falls below the
convergence criteria

25
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

IMPLEMENTING EM

01 02 03
Show missingness patterns Performing maximum Pooling Results
likelihood estimation using
The function prelim.norm if used on a matrix The average of the imputations is
of the x (bHP) and y (cost) variables to sort EM algorithm calculated for the variable with missing
rows according to the missingness patterns values
Fixed seed to ensure the analysis is R code:
repeatable b<-em.norm(a) R code:
c1<-getparam.norm(a,b) c1$mu[1]
R code:
a<-prelim.norm(cbind(y,x) This function produces a vector which can The estimates for the coefficients of the
then be used to return a list of parameters model are then estimated
b.est<-c(c1$mu[1]-
(c1$sigma[1,2]/c1$sigma[2,2])*c1$mu[2],c1
$sigma[1,2]/c1$sigma[2,2])

The model can then be used to calculate

the missing values for the dataset

26
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

EM
ID bHP UC
1 290 $40,079
2 330 $40,927
3 330 $29,563
4 515 $63,931
5 675 $111,976
6 675 $120,661
7 500 $47,873
8 362 $59,771
9 340 $40,661

27
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

FIT RESULTS - 2
Comparing results from the original dataset to the EM imputed dataset

Linear Model EM Imputed Model

The model is a solid one with a statistically significant p-value less than Compared to the results produced from removing the data points with
alpha = 0.05 and an R2 equal to 87.5%. One data point was removed due missing values, this is a better performing model. A degree of freedom
to missing a unit cost value was gained and the R2 metric increased while the model retained
statistical significance

28
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

EXPECTATION MAXIMIZATION
Why choose EM?

ADVANTAGES DISADVANTAGES
EM preserves the relationship with other EM can sometime underestimate standard
variables, unlike mean imputation error

29
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

COMPARING METHODS
MICE VERSUS EM

MICE and EM are based on similar For small data sets, it is wise to run both and
assumptions and in practice they often compare the results, as small differences in
produce similar results. The Bayesian the methods could have an outsized
estimation in MICE is asymptotically impact when the number of data points is
equivalent to the maximum likelihood limited
estimates in EM, so for large data sets the
two methods should provide similar results

There are multiple methods which can be used to impute data. Two of the strongest techniques, MICE
and EM, should be considered first as they preserve relationships between independent and
dependent variables and estimate error more accurately.

The MICE method for imputation has an edge over EM since MICE calculates multiple imputations for
the missing values instead of one single estimate.

30
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

Q&A

THE FUTURE. DELIVERED.

Galorath provides solutions that help organizational leaders make complex business decisions
with confidence. Our predictive analytics products and services give complete insight into the
implications of significant technical or financial decisions, allowing organizations to execute a
plan with assurance and reach their goals with absolute certainty.

Learn more or schedule a demo

(310) 906-6320 • sales@galorath.com Kimberly Roye Christian Smart, PhD, CCEA
kroye@galorath.com csmart@galorath.com

3
1
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

Presenters

Kimberly Roye Christian Smart Dustin Hilton

Senior Data Scientist Chief Scientist Senior Cost Analyst
Kroye@galorath.com csmart@galorath.com dhilton@galorath.com

01 Dealing With Missing Data The Art and Science of Imputation
No ratings yet
01 Dealing With Missing Data The Art and Science of Imputation
26 pages
Data Imputation For Missing Values
No ratings yet
Data Imputation For Missing Values
14 pages
Missing Data & How To Handle It
No ratings yet
Missing Data & How To Handle It
32 pages
Unit 2 Notes - Docx-3
No ratings yet
Unit 2 Notes - Docx-3
14 pages
DADM S5 Imputation of Missing Data
No ratings yet
DADM S5 Imputation of Missing Data
15 pages
Missing Data
100% (2)
Missing Data
35 pages
Ads Exp2
No ratings yet
Ads Exp2
3 pages
Marina Dealing With Missing Data HH
No ratings yet
Marina Dealing With Missing Data HH
20 pages
Missing Data Techniques - UCLA
No ratings yet
Missing Data Techniques - UCLA
66 pages
Imputation
No ratings yet
Imputation
10 pages
ISAT 600 Progress Report 2
No ratings yet
ISAT 600 Progress Report 2
6 pages
Modern Method Web in Ar May 2012
No ratings yet
Modern Method Web in Ar May 2012
45 pages
Missing Data Analysis: University College London, 2015
No ratings yet
Missing Data Analysis: University College London, 2015
37 pages
Platias2020 Greece
No ratings yet
Platias2020 Greece
10 pages
Missing Data
No ratings yet
Missing Data
71 pages
Missing Data Mechanisms and Imputation Methods
No ratings yet
Missing Data Mechanisms and Imputation Methods
16 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
Centraltendencywhattoconsider 1
No ratings yet
Centraltendencywhattoconsider 1
6 pages
Imputation
No ratings yet
Imputation
3 pages
WINSEM2018-19 - MGT1051 - TH - SJTG23 - VL2018195003627 - Reference Material I - 12-12 - C1 - BAE
No ratings yet
WINSEM2018-19 - MGT1051 - TH - SJTG23 - VL2018195003627 - Reference Material I - 12-12 - C1 - BAE
20 pages
Mice Lectures
No ratings yet
Mice Lectures
109 pages
MIssing Data Imputation Using Machine Learning Algorithm
No ratings yet
MIssing Data Imputation Using Machine Learning Algorithm
11 pages
Machine Learning Based Missing Data Imputation
No ratings yet
Machine Learning Based Missing Data Imputation
13 pages
3 - Missing Values-1
No ratings yet
3 - Missing Values-1
9 pages
Handling Missing Data
No ratings yet
Handling Missing Data
23 pages
Lecture 2.3.10
No ratings yet
Lecture 2.3.10
30 pages
SICE: An Improved Missing Data Imputation Technique: Open Access Research
No ratings yet
SICE: An Improved Missing Data Imputation Technique: Open Access Research
21 pages
Data Cleaning Techniques Guide
No ratings yet
Data Cleaning Techniques Guide
11 pages
Data - Preprocessing - 2
No ratings yet
Data - Preprocessing - 2
10 pages
Ijctt V3i2p104
No ratings yet
Ijctt V3i2p104
5 pages
Business Analytics ST1
No ratings yet
Business Analytics ST1
13 pages
BC 2014 Session2
No ratings yet
BC 2014 Session2
45 pages
Missng Data
No ratings yet
Missng Data
8 pages
Missing Value Imputation in Machine Learning
No ratings yet
Missing Value Imputation in Machine Learning
8 pages
Missing Data
No ratings yet
Missing Data
14 pages
Journal of Statistical Software: Reviewer: Abdolvahab Khademi University of Massachusetts
No ratings yet
Journal of Statistical Software: Reviewer: Abdolvahab Khademi University of Massachusetts
4 pages
Quntative Data Analysis SPSS: Formating, Handling, & Manipulation
No ratings yet
Quntative Data Analysis SPSS: Formating, Handling, & Manipulation
22 pages
Data Imputation Techniques Guide
No ratings yet
Data Imputation Techniques Guide
93 pages
Chapter 3
No ratings yet
Chapter 3
58 pages
An Analysis of Four Missing Data Treatment Methods For Supervised Learning
No ratings yet
An Analysis of Four Missing Data Treatment Methods For Supervised Learning
16 pages
Week 5 Lecture - Data Wrangling
No ratings yet
Week 5 Lecture - Data Wrangling
26 pages
SPSS
No ratings yet
SPSS
92 pages
1.7-Identify and Handle Missing Values
No ratings yet
1.7-Identify and Handle Missing Values
27 pages
Imputation Techniques for Missing Data
No ratings yet
Imputation Techniques for Missing Data
17 pages
Unit - 3 - R Programming
No ratings yet
Unit - 3 - R Programming
16 pages
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
66 pages
Unit2 - Data Cleaning and Multivariate Techniques - 26 - 01 - 2025
No ratings yet
Unit2 - Data Cleaning and Multivariate Techniques - 26 - 01 - 2025
42 pages
Approaches To Missing Data Handout
No ratings yet
Approaches To Missing Data Handout
26 pages
Soley-Bori 2013 Dealingwithmissingdata Keyassumptionsandmethodsforappliedanalysis
No ratings yet
Soley-Bori 2013 Dealingwithmissingdata Keyassumptionsandmethodsforappliedanalysis
21 pages
SVD-Based Missing Data Imputation
No ratings yet
SVD-Based Missing Data Imputation
6 pages
2019 Multiple Imputations
No ratings yet
2019 Multiple Imputations
27 pages
Efron 1994
100% (1)
Efron 1994
14 pages
Art Mouad 3
No ratings yet
Art Mouad 3
9 pages
AI351 Lecture 1 - Data Preprocessing
No ratings yet
AI351 Lecture 1 - Data Preprocessing
8 pages
Mida (AE)
No ratings yet
Mida (AE)
12 pages
CSC 452 DM Week04 Data PreProcessing A 13102020 015436pm
No ratings yet
CSC 452 DM Week04 Data PreProcessing A 13102020 015436pm
31 pages
Missing Data
No ratings yet
Missing Data
25 pages
Online_ATAL_FDP_Final Schedule (1)
No ratings yet
Online_ATAL_FDP_Final Schedule (1)
2 pages
DL_Experiential Learning Project Report Format
No ratings yet
DL_Experiential Learning Project Report Format
33 pages
Deep Learning Presentation Template
No ratings yet
Deep Learning Presentation Template
19 pages
Presentationwbja 1
No ratings yet
Presentationwbja 1
62 pages
Lecture 4 R and Time Series Analysis
No ratings yet
Lecture 4 R and Time Series Analysis
36 pages
Boots Trapping
No ratings yet
Boots Trapping
4 pages
INtro To Eco
No ratings yet
INtro To Eco
5 pages
Report
No ratings yet
Report
6 pages
Output Spss Tugas 2
No ratings yet
Output Spss Tugas 2
8 pages
Cement Process Engineering Vade Mecum: 2. Statistics
No ratings yet
Cement Process Engineering Vade Mecum: 2. Statistics
15 pages
Logistic Regression Interview Prep
No ratings yet
Logistic Regression Interview Prep
9 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
32 pages
Arch
No ratings yet
Arch
8 pages
Regression
No ratings yet
Regression
6 pages
Unit 10
No ratings yet
Unit 10
14 pages
Frequency Analyses of Natural Extreme Events: Jose A. Raynal Villaseñor
No ratings yet
Frequency Analyses of Natural Extreme Events: Jose A. Raynal Villaseñor
419 pages
Linear Regression in Excel
No ratings yet
Linear Regression in Excel
7 pages
Excel
No ratings yet
Excel
12 pages
Panel Data Model Selection Guide
No ratings yet
Panel Data Model Selection Guide
20 pages
UCS-401 - CSE7th M L Lect 07 - Case Study of Polynomial Regressions
No ratings yet
UCS-401 - CSE7th M L Lect 07 - Case Study of Polynomial Regressions
10 pages
Data Analysis Examples
No ratings yet
Data Analysis Examples
28 pages
Applied Probability and Statistics
No ratings yet
Applied Probability and Statistics
2 pages
Stepwise Logistic Regression With R: Akaike Information Criterion: AIC 2k - 2 Log L
No ratings yet
Stepwise Logistic Regression With R: Akaike Information Criterion: AIC 2k - 2 Log L
5 pages
Regression Analysis Case Study
No ratings yet
Regression Analysis Case Study
9 pages
Logit and Probit Models in R Guide
No ratings yet
Logit and Probit Models in R Guide
27 pages
ECMT1020 2023S1 Formulas
No ratings yet
ECMT1020 2023S1 Formulas
10 pages
CRD & RCBD With Sampling, Efficiency, Power Etc
No ratings yet
CRD & RCBD With Sampling, Efficiency, Power Etc
34 pages
General Mathematics Unit 2 Test Solutions 2022
No ratings yet
General Mathematics Unit 2 Test Solutions 2022
10 pages
Bivariate Analysis
No ratings yet
Bivariate Analysis
10 pages
Cinelli - Hazlett - 2020 - Making Sense of Sensitivity Extending Omitted Variable Bias
No ratings yet
Cinelli - Hazlett - 2020 - Making Sense of Sensitivity Extending Omitted Variable Bias
29 pages
Course: STAT-212 Term: 182 Homework # 4 Material: Chapter 13 Due Date: Sunday, 17-March-2019
No ratings yet
Course: STAT-212 Term: 182 Homework # 4 Material: Chapter 13 Due Date: Sunday, 17-March-2019
2 pages
Bayesian 1 - Exercise Three
No ratings yet
Bayesian 1 - Exercise Three
2 pages
Hierarchical Multiple Regression - D. Boduszek
100% (1)
Hierarchical Multiple Regression - D. Boduszek
27 pages
(Ebook) Advanced Kalman Filtering, Least-Squares and Modeling: A Practical Handbook by Bruce P. Gibbs ISBN 9780470529706, 9780470890035, 0470529709, 0470890037 Updated 2025
No ratings yet
(Ebook) Advanced Kalman Filtering, Least-Squares and Modeling: A Practical Handbook by Bruce P. Gibbs ISBN 9780470529706, 9780470890035, 0470529709, 0470890037 Updated 2025
121 pages

Handling Missing Data

Uploaded by

Handling Missing Data

Uploaded by

Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.

Dealing with Missing Data-

For the International Cost Estimating and Analysis

THE PROBLEM OF MISSING DATA

TECHNIQUES FOR HANDLING MISSING DATA

FILLING IN HOLES WITH STATISTICS

Preserves Structure of Data

The goal is to turn

Understand Determine Know when

Is the response missing at random?

The US Census Bureau

ISSUES WITH DATA GAPS

Fewer Degrees of Reduction of Predictive Inability to Use

Alternative to Allowing Missingness

Mean Imputation Imputing using Regression Expectation

Mean Imputation Related Observations Regression Imputation Expectation Maximization

Tools for Imputation

HOW MICE FILLS GAPS

NUMBER #01 NUMBER #02 NUMBER #03

THE MICE PROCESS

UNDERSTANDING THE DATA

ID bHP EngSP CYL DryWGT DISP UC

Density Plot Analysis

1 290 2600 6 3090, 1296, 7.2 $40,079

3 330 2200 6 1905 8.8 $29,563

4 515 1500 6 3090 15.2 $63,931

5 675 2101 8 912, 3230, 1296, 14.8 $111,976

Linear Model MICE Imputed Model

STEP #01 01 02 STEP #02

First Pass at Filling Gaps iterative covariance matrix are calculated.

The model can then be used to calculate

Linear Model EM Imputed Model

THE FUTURE. DELIVERED.

Learn more or schedule a demo

Kimberly Roye Christian Smart Dustin Hilton

You might also like