Chapter 11
Output Analysis for a Single
Model
Banks, Carson, Nelson & Nicol
Discrete-Event System Simulation
Purpose
n Objective: Estimate system performance via simulation
n If θ is the system performance, the precision of the
estimatorθˆ can be measured by:
¨ The standard error of θˆ .
¨ The width of a confidence interval (CI) for θ.
n Purpose of statistical analysis:
¨ To estimate the standard error or CI .
¨ To figure out the number of observations required to achieve
desired error/CI.
n Potential issues to overcome:
¨ Autocorrelation, e.g. time in system for successive jobs lack
statistical independence.
¨ Initial conditions, e.g. jobs in queue at the start of the simulation
influence the time in system for subsequent jobs, for a while
2
Outline
n Distinguish the two types of simulation: transient vs.
steady state.
n Illustrate the inherent variability in a stochastic discrete-
event simulation.
n Cover the statistical estimation of performance measures.
n Discuss the analysis of transient simulations.
n Discuss the analysis of steady-state simulations.
3
Type of Simulations
n Terminating verses non-terminating simulations
n Terminating simulation:
¨ Runs for some duration of time TE, where E is a specified event
that stops the simulation.
¨ Starts at time 0 under well-specified initial conditions.
¨ Ends at the stopping time TE.
¨ Bank example: Opens at 8:30 am (time 0) with no customers
present and 8 of the 11 teller working (initial conditions), and
closes at 4:30 pm (Time TE = 480 minutes).
¨ The simulation analyst chooses to consider it a terminating
system because the object of interest is one day’s operation.
4
Type of Simulations
n Non-terminating simulation:
¨ Runs continuously, or at least over a very long period of time.
¨ Examples: assembly lines that shut down infrequently, telephone
systems, hospital emergency rooms.
¨ Initial conditions defined by the analyst.
¨ Runs for some analyst-specified period of time TE.
¨ Study the steady-state (long-run) properties of the system,
properties that are not influenced by the initial conditions of the
model.
n Whether a simulation is considered to be terminating or
non-terminating depends on both
¨ The objectives of the simulation study and
¨ The nature of the system.
5
Stochastic Nature of Output Data
n Model output consist of one or more random variables (r.
v.) because the model is an input-output transformation
and the input variables are r.v.’s.
n M/G/1 queueing example:
¨ Poisson arrival rate = 0.1 per minute;
service time ~ N(µ = 9.5, σ =1.75).
¨ System performance: long-run mean queue length, LQ(t).
¨ Suppose we run a single simulation for a total of 5,000 minutes
n Divide the time interval [0, 5000) into 5 equal subintervals of 1000
minutes.
n Average number of customers in queue from time (j-1)1000 to
j(1000) is Yj .
6
Stochastic Nature of Output Data
n M/G/1 queueing example (cont.):
¨ Batched average queue length for 3 independent replications:
Batching Interval Replication
(minutes) Batch, j 1, Y1j 2, Y2j 3, Y3j
[0, 1000) 1 3.61 2.91 7.67
[1000, 2000) 2 3.21 9.00 19.53
[2000, 3000) 3 2.18 16.15 20.36
[3000, 4000) 4 6.92 24.53 8.11
[4000, 5000) 5 2.82 25.19 12.62
[0, 5000) 3.75 15.56 13.66
¨ Inherent variability in stochastic simulation both within a single
replication and across different replications.
¨ The average across 3 replications, Y1. , Y2. , Y3. ,can be regarded as
independent observations, but averages within a replication, Y11,
…, Y15, are not.
7
Measures of performance
n Consider the estimation of a performance parameter, θ (or
φ), of a simulated system.
¨ Discrete time data: [Y1, Y2, …, Yn], with ordinary mean: θ
¨ Continuous-time data: {Y(t), 0 ≤ t ≤ TE} with time-weighted mean:
φ
n Point estimation for discrete time data.
¨ The point estimator: n
1
θˆ = ∑ Yi
n i =1
E (θˆ) = θ Desired
n Is unbiased if its expected value is θ, that is if:
n Is biased if: E (θˆ) ≠ θ
8
Confidence-Interval Estimation
[Performance Measures]
n Central Limit Theorem.
If X1 , X2 , . . . , Xn are i.i.d. samples from a distribution
with mean ✓ and variance ⇢2 , then P as n ! 1
n
the random variable ✓ˆ = (1/n)( i=1 Xi ) is Gaussian
with mean ✓ and variance ⇢2 /n.
n Key points: sample mean is unbiased estimator
n Variance describes “how close” the sample mean is likely
to be to the true unknown µ
n i.i.d.
assump*on
sa*sfied
by
using
samples
across
replica*ons
9
Confidence-Interval Estimation
[Performance Measures]
n Suppose the model is the normal distribution with mean θ,
variance σ2 (both unknown) (e.g., sample mean)
¨ Let Yi be the average time in system for ith replication of the
simulation (its mathematical expectation is θ).
¨ Average time in system will vary from epoch to epoch, but over the
long-run the average of the averages will be close to θ.
¨ Sample variance across R replications:
1 R
2
S = ∑ (Yi. − Y.. ) 2
R − 1 i =1
Sample mean of sample means
Sample mean from replication
10
Confidence-Interval Estimation
[Performance Measures]
n Confidence Interval (CI):
¨ A measure of error.
¨ Where Yi. are normally distributed.
S
Y.. ± tα / 2, R −1
R
¨ We cannot know for certain how far Y.. is from θ but CI attempts to
bound that error.
¨ A CI, such as 95%, tells us how much we can trust the interval to
actually bound the error between Y.. and θ .
¨ The more replications we make, the less error there is in Y..
(converging to 0 as R goes to infinity).
11
Confidence-Interval Estimation
[Performance Measures]
n Confidence Interval (CI) at X% confidence:
¨ Random interval constructed from data that X% of the time will
contain the true θ
¨ Where Yi. are normally distributed.
S
Y.. ± tα / 2, R −1
R
¨ Bottom line—if sample mean is in (1-X)% tail of its distribution,
then θ not
in
interval
¨ The more replications we make, the less error there is in
12
(converging to 0 as R goes to infinity).
C.I. with Specified Precision
[Terminating Simulations]
n The half-length H of a 100(1 – α)% confidence interval for
a mean θ, based on the t distribution, is given by:
S0
H = tα /2,R−1
R
R is the # of S02 is the sample
replications variance
n Suppose that an error criterion ε is specified: with
probability 1 - α, a sufficiently large sample size should
satisfy:
( )
P Y.. − θ < ε ≥ 1 − α
13
C.I. with Specified Precision
[Terminating Simulations]
This is a statement about half-width of C.I.
H S0
t↵/2,R 1p <✏
<ε R
✓ ◆1/2
Satisfied when z↵/2 · S0
R
✏
Because N(0,1) critical value z↵/2 t↵/2,R 1
14
C.I. with Specified Precision
[Terminating Simulations]
n Assume that an initial sample of size R0 (independent)
replications has been observed.
n Obtain an initial estimate S02 of the population variance σ2.
n Then, choose sample size R such that R ≥ R0:
¨ Since tα/2, R-1 ≥ zα/2, an initial estimate of R:
2
⎛ z S ⎞
R ≥ ⎜ α / 2 0 ⎟ , zα / 2 is the standard normal distributi on.
⎝ ε ⎠ 2
t S
⎛ α / 2, R −1 0 ⎞
¨ R is the smallest integer satisfying R ≥ R0 and R ≥ ⎜⎜ ⎟⎟
⎝ ε ⎠
n Collect R - R0 additional observations.
n The 100(1-α)% C.I. for θ: S
Y.. ± tα / 2, R −1 0
R
15
C.I. with Specified Precision
[Terminating Simulations]
n Multi-server example: estimate a CPU’s utilization ρ over the first 2
hours of the workday.
¨ Initial sample of size R0 = 4 is taken and an initial estimate of the
population variance is S02 = (0.072)2 = 0.00518.
¨ The error criterion is ε = 0.04 and confidence coefficient is 1-α = 0.95,
hence, the final sample size must be at least:
2 2
⎛ z0.025 S 0 ⎞ 1.96 * 0.00518
⎜ ⎟ = 2
= 12.14
⎝ ε ⎠ 0 .04
¨ For the final sample size:
R 13 14 15
t 0.025, R-1 2.18 2.16 2.14
(tα / 2, R−1S0 / ε ) 2
15.39 15.1 14.83
¨ R = 15 is the smallest integer satisfying constraint (15>14.83) , so R - R0 =
11 additional replications are needed.
¨ After obtaining additional outputs, half-width should be checked to ensure
constraint met given better estimate of variance
16
Output Analysis for Steady-State Simulation
n Consider a single run of a simulation model to estimate a
steady-state or long-run characteristics of the system.
¨ The single run produces observations Y1, Y2, ... (generally the
samples of an autocorrelated time series).
¨ Performance measure:
1 n
θ = lim ∑ Yi , for discrete measure (with probability 1)
n →∞ n i =1
1 TE
φ = lim ∫ Y (t )dt , for continuous measure (with probability 1)
TE →∞ TE
0
n Independent of the initial conditions.
17
Output Analysis for Steady-State Simulation
n The sample size is a design choice, with several
considerations in mind:
¨ Any bias in the point estimator that is due to artificial or arbitrary
initial conditions (bias can be severe if run length is too short).
¨ Desired precision of the point estimator.
¨ Budget constraints on computer resources.
n Notation: the estimation of θ from a discrete-time output
process.
¨ One replication (or run), the output data: Y1, Y2, Y3, …
¨ With several replications, the output data for replication r: Yr1, Yr2,
Yr3, …
18
Initialization Bias [Steady-State Simulations]
n Methods to reduce the point-estimator bias caused by using
artificial and unrealistic initial conditions:
¨ Intelligent initialization.
¨ Divide simulation into an initialization phase and data-collection
phase.
n Intelligent initialization
¨ Initialize the simulation in a state that is more representative of
long-run conditions.
¨ If the system exists, collect data on it and use these data to specify
more nearly typical initial conditions.
¨ If the system can be simplified enough to make it mathematically
solvable, e.g. queueing models, solve the simplified model to find
long-run expected or most likely conditions, use that to initialize the
simulation.
19
Initialization Bias [Steady-State Simulations]
n Divide each simulation into two phases:
¨ An initialization phase, from time 0 to time T0.
¨ A data-collection phase, from T0 to the stopping time T0+TE.
¨ The choice of T0 is important:
n After T0, system should be more nearly representative of steady-state
behavior.
¨ System has reached steady state: the probability distribution of the
system state is close to the steady-state probability distribution
(bias of response variable is negligible).
20
Initialization Bias [Steady-State Simulations]
n M/G/1 queueing example: A total of 10 independent
replications were made.
¨ Each replication beginning in the empty and idle state.
¨ Simulation run length on each replication was T0+TE = 15,000
minutes.
¨ Response variable: queue length, LQ(t,r) (at time t of the rth
replication).
¨ Batching intervals of 1,000 minutes, batch means
n Ensemble averages:
¨ To identify trend in the data due to initialization bias
¨ The average corresponding R
batch means across replications:
1
Y. j = ∑ Yrj R replications
R r =1
¨ The preferred method to determine deletion point.
21
Initialization Bias [Steady-State Simulations]
n A plot of the ensemble averages, Y ..(n, d ), versus 1000j, for j =
1,2, …,15.
¨ Illustrates the downward bias of the initial observations.
22
Initialization Bias [Steady-State Simulations]
n Cumulative average sample mean (after deleting d
n
observations): 1
Y.. (n, d ) = ∑ Y. j
n − d j =d +1
¨ Not recommended to determine the initialization phase.
Cumulated sample mean by given point in time
¨ It is apparent that downward bias is present and this bias can be
reduced by deletion of one or more observations.
23
Initialization Bias [Steady-State Simulations]
n No widely accepted, objective and proven technique to
guide how much data to delete to reduce initialization bias
to a negligible level.
n Plots can, at times, be misleading but they are still
recommended.
¨ Ensemble averages reveal a smoother and more precise trend as
the # of replications, R, increases.
¨ Ensemble averages can be smoothed further by plotting a moving
average.
¨ Cumulative average becomes less variable as more data are
averaged.
¨ The more correlation present, the longer it takes for Y. j to
approach steady state.
¨ Different performance measures could approach steady state at
different rates.
24
Batch Means for Interval Estimation
[Steady-State Simulations]
n Using a single, long replication:
¨ Problem: data are dependent so the usual estimator is biased.
¨ Solution: batch means.
n Batch means: divide the output data from 1 replication (after
appropriate deletion) into a few large batches and then treat the
means of these batches as if they were independent.
n A continuous-time process, {Y(t), T0 ≤ t ≤ T0+TE}:
¨ k batches of size m = TE/k, batch means:
1 jm
Yj = ∫ Y (t + T0 )dt
m ( j −1) m
n A discrete-time process, {Yi, i = d+1,d+2, …, n}:
jm
¨ k batches of size m = (n – d)/k, batch means: 1
Yj = ∑ Yi + d
m i =( j −1) m+1
25
Batch Means for Interval Estimation
[Steady-State Simulations]
Y1 , ...,Yd , Yd +1 , ...,Yd +m , Yd +m+1 , ...,Yd +2m , ... , Yd +( k −1) m+1 , ...,Yd +km
deleted Y1 Y2 Yk
n Starting either with continuous-time or discrete-time data, the
variance of the sample mean is estimated by:
S 2
1
k
(Y j − Y )2 = k
Y j2 − kY 2
k
=
k ∑j =1
k −1 ∑
j =1
k (k − 1)
n If the batch size is sufficiently large, successive batch means
will be approximately independent, and the variance estimator
will be approximately unbiased.
n No widely accepted and relatively simple method for choosing
an acceptable batch size m (see text for a suggested
approach). Some simulation software does it automatically.
26
Summary
n Stochastic discrete-event simulation is a statistical experiment.
¨ Purpose of statistical experiment: obtain estimates of the performance
measures of the system.
¨ Purpose of statistical analysis: acquire some assurance that these
estimates are sufficiently precise.
n Distinguish: terminating simulations and steady-state simulations.
n Steady-state output data are more difficult to analyze
¨ Decisions: initial conditions and run length
¨ Possible solutions to bias: deletion of data and increasing run length
n Statistical precision of point estimators are estimated by standard-
error or confidence interval
n Method of independent replications was emphasized.
27