Risk Analysis using Simulation
Page 1 of 29
Risk Analysis using simulation
Introduction
Simulation is a process of understanding behavior of a system or
evaluating the various strategies for the operation of the system.
Simulation calculates multiple scenarios of a model by repeatedly
sampling values from the probability distributions for the uncertain
variables.
Simulation involves the generation of the artificial history of the system
and observation of the artificial history to draw inferences concerning the
operating characteristics of the real system that is presented.
Why Simulation?
When it is too costly to do physical studies on the system itself (e.g., trying
alternative layout of a factory, building new facility)
The corresponding analytic models are too complicated to study (e.g. a
queuing/transportation network)
When uncertainty, non-stationarity dominates the output results.
Steps in Simulation
Model development: System defination, objectives to study, decision
variables, output measures, input variables and parameters.
Data collection: Collect data from the real system, obtain probability
distributions of the input parameters by statistical analysis
Model translation: Translate the physical model into computer simulation
software.
Simulation runs & Output Data Analysis: Run the simulation upto
satisfactory level and use statistical analysis of the ouput data to estimate
the performance measures.
Risk Analysis using Simulation
Page 2 of 29
Model Verification and Validation:
z Verification: Ensuring that the model is free from logical errors. It
does what it is intended to do.
z Validation: Ensuring that the model is a valid representation of the
whole system. Model outputs are compared with the real system
outputs.
Optimization/Experimental Design: Analyze alternative strategies on the
validated simulation model.
Sensitivity analysis: Analuze the input data sensitivity in performance
measure of the system. Cheking model robustmess.
Tools for simulation
@Risk
Crystal Ball
MatLab
Minitab
Excel Spreadsheet
And so on, etc
Simulation using Excel Spreadsheet
Basic Excel Skills
To facilitate the understanding of simulation using Excel, it is necessary to be
familiar with the following features of Excel
Copying formula and cell references
Function: It is used in performing special calculations in the Spread sheet
cells. Some of the more common functions that are usually used in
simulation models are including:
9 MlN (range) - finds the smallest value in a range of cells.
9 MAX (range) - finds the largest value in a range of cells
9 SUM (range) - finds the sum of values in a range of cells
Risk Analysis using Simulation
Page 3 of 29
9 AVERAGE (range) - finds the average of the values in a range of
cells
9 STDEV(range) -finds the standard deviation for a sample in a
range of cells
9 AND (condition 1, condition 2, ) -a logical function that returns
TRUE if all conditions are true, and FALSE if not.
9 OR (condition1, conditiott2 . . .) - a logical function that returns
TRUE if any condition is true, and FALSE if not.
9 IF (condition, value if true, value if false) -a logical function that
returns one value if the condition is true and another if the condition
is false.
9 VLOOKIUP (value, table range column number) -looks up a
value in a table.
Charts and graphs
Excel has many other functions for statistical, financial and other
applications
Others useful features
Split Screen: Division of displayed spreadsheet screen
Paste Special: Controlling of direct pasting into the cells
Column and Row width: Customization of Column and Row size.
Displaying Formulas in Worksheets: Showing actual formula in
spreadsheet
Displaying Grid Lines and Row, and Column Headers for Printing
Filling a Range with a Series of Numbers.
Comment Boxes
Building Simulation Models in Excel
However a good design is essential to user understanding. Any good simulation
model design in any spreadsheet should be included with the following features.
A descriptive title
Risk Analysis using Simulation
Page 4 of 29
A separate input data section area
A separate working space
A separate output section that provides the model results.
Appropriate formatting such as in currency or comma formatting.
Complex calculations should be divided into several cells to minimize the
chances of error and enhance understanding.
Comments should be placed next to formula cells or in comment boxes for
explanation, if appropriate.
Example: A Simulation Model for Dave's Candies
Daves Candies is a small family owned business that offers gourmet chocolates
and ice cream fountain service. For special occasions such as Valentines day,
the store must place orders for special packaging several weeks in advance from
their supplier. One product, Valentines day chocolate massacre, is bought for
$7,50 a box and sells for $12.00. Any boxes that are not sold by February 14 are
discounted by 50% and can always be sold easily. Historically Daves candies
has sold between 40-90 boxes each year with no apparent trend. Daves
dilemma is deciding how many boxes to order for the Valentines day customers.
If the order quantity, Q is 70, what is the expected profit?
Formulation:
Selling price=$12
Cost = $7.50
Discount price=$6
| If *D<Q
Profit=selling price*D - cost*Q + discount price*(Q-D)..........[1]
| D>Q
Profit=selling price*Q-cost*Q................................................[2]
*D=Demand & Q=Quantity
Risk Analysis using Simulation
Page 5 of 29
Simulation model in Excel
Model description
The input data are given in Column A and Column B.
Simulation results are shown in Columns D through F.
IF function is used in Column F to calculate the profit based on order
demand and quantity relation using the equations [1] and [2].
Risk Analysis using Simulation
Page 6 of 29
Each row in the results table represents one trial of the simulation.
Demand in each row was generated from probability distribution.
Since selling boxes range 40-90 boxes/year, the values in Column E were
generated by rolling a die and entered into the worksheet to verify the
formulas for profit in column F.
Probability & Statistic in Simulation
Fundamentals
Probability - Probability is a measure of how likely a value or event is to
occur. It can be measured from simulation data as frequency by
calculating the number of occurrences of the value or event divided by the
total number of occurrences. This calculation returns a value between 0
and 1 which then can be converted to percentage by multiplying by 100.
Probabilistic Risk Assessment (PRA) - A risk assessment that yields a
probability distribution for risk, generally by assigning a probability
distribution to represent variability or uncertainty in one or more inputs to
the risk equation.
Probability Density Function (PDF) - A function representing the
probability distribution of a continuous random variable. The density at a
point refers to the probability that the variable will have a value in a narrow
range about that point.
Probability Distribution - A probability distribution is a set of probabilities
associated with all possible outcomes of an uncertain event.
Probability Mass Function (PMF) - A function representing the
probability distribution for a discrete random variable. The mass at a point
refers to the probability that the variable will have a value at that point.
Random Variable - A variable that may assume any value from a set of
values according to chance. Discrete random variables can assume only a
finite or countable infinite number of values (e.g., number of rainfall events
per year). A random value is continuous if its set of possible values is an
entire interval of numbers (e.g., quantity of rain in a year).
Risk Analysis using Simulation
Page 7 of 29
Central Tendency Exposure (CTE) - A risk descriptor representing the
average or typical individual in a population, usually considered to be the
mean or median of the distribution.
Confidence Interval - Confidence intervals refers an interval that provides
a range where the exact or true probability value of parameters may lie.
Confidence Limit - The upper or lower value of a confidence interval.
Cumulative Distribution Function (CDF) - Obtained by integrating the
PDF, gives the cumulative probability of occurrence for a random
independent variable. Each value c of the function is the probability that a
random observation will be less than or equal to c.
Frequency Distribution or Histogram - A graphic (plot) summarizing the
frequency of the values observed or measured from a population. It
conveys the range of values and the count (or proportion of the sample)
that was observed across that range.
Monte Carlo Analysis (MCA) or Monte Carlo Simulation - A technique
for characterizing the uncertainty and variability in risk estimates by
repeatedly sampling the probability distributions of the risk equation inputs
and using these inputs to calculate a range of risk values.
Point Estimate - In statistical theory, a quantity calculated from values in
a sample to estimate a fixed but unknown population parameter. Point
estimates typically represent a central tendency or upper bound estimate
of variability. [Source: U.S. EPA]
Types of Distribution:
Continuous distribution: A probability distribution is continuous when
any value between the minimum and maximum is possible (has finite
probability). For example, an uncertainty function describing the possible
annual rainfall in Ithaca, New York next year would be a continuous
function since any value between 0 and some upper limit is possible.
Risk Analysis using Simulation
Page 8 of 29
Discrete distribution: A discrete probability distribution has only a finite
number of possible values between the maximum and minimum. For
example, an uncertainty function describing the outcome of a coin toss is
discrete since only two values are possible: heads or tails.
Descriptive Statistics
Parameter - A value that characterizes the distribution of a random variable.
Parameters commonly characterize the location, scale, shape, or bounds of the
distribution. For example, a truncated normal probability distribution may be
defined by four parameters: arithmetic mean [location], standard deviation
[scale], and min and max [bounds]. It is important to distinguish between a
variable (e.g., ingestion rate) and a parameter (e.g., arithmetic mean ingestion
rate).
Measures of Central Tendency: Mean, Median, & Mode
Mean: The mean of a set of values is the sum of all the values in the set
divided by the total number of values in the set.
1
N
i =1
xi
Median: The median is the middle data point when the data is ordered.
The middle value (50th percentile) in the ordered sequence of measured
values. For highly skewed data sets the median can give a better
Risk Analysis using Simulation
Page 9 of 29
representation than the mean of the middle of the data distribution. The
median is not as comprehensive a measure of the data set as the mean.
Mode: The most likely value or mode is the value that occurs most often
in a set of values. In a histogram and a result distribution, it is the center
value in the class or bar with the highest probability.
Measures of Variation
Standard deviation: The standard deviation is a measure of how widely
dispersed the values are in a distribution. Equals the square root of the
variance.
N
(x
i =1
)2
N 1
Variance: The variance is a measure of how widely dispersed the values
are in a distribution, and thus is an indication of the "risk" of the
distribution. It is calculated as the average of the squared deviations about
the mean. The variance gives disproportionate weight to "outliers", values
that are far away from the mean. The variance is the square of the
standard deviation.
Range: The range is the absolute difference between the maximum and
minimum values in a set of values. The range is the simplest measure of
the dispersion or "risk" of a distribution
Kurtosis: Kurtosis is a measure of the shape of a distribution. Kurtosis
indicates how flat or peaked the distribution is. The higher the kurtosis
value, the more peaked the distribution. The coefficient of kurtosis is
computed as:
N
CK =
(x
i =1
)4
Risk Analysis using Simulation
Skewness:
Page 10 of 29
Skewness is a measure of the shape of a distribution.
Skewness indicates the degree of asymmetry in a distribution. Skewed
distributions have more values to one side of the peak or most likely value
one tail is much longer than the other. A skewness of 0 indicates a
symmetric distribution, while a negative skewness means the distribution
is skewed to the left. Positive skewness indicates a skew to the right. The
coefficient of skewness is computed as:
N
(x
CS =
i =1
)3
Coefficient of Variation: It is the ratio of standarad deviation and the
sample mean.
CV =
Correlation: It is measure the interdependences of two variables. The
sample correlation coefficient is computed as:
N
r=
(x
i =1
x )( y i y )
( N 1) S x S y
Random numbers & Probability distribution in Simulation
It is a numerical description of the outcome of some experiment. Example
of outcome of experiment could be profit values, failure rate or chance of
an event occurrence, etc.
It can generated from a probability distribution.
In simulation terminology, a random number is one that is uniformly
distributed in between 0 and 1.
Recall from statistic that uniform probability distribution characterizes a
random variable for which all outcomes between a minimum value a and a
maximum value b are equally likely.
Risk Analysis using Simulation
Page 11 of 29
Some Important Continuous Distribution
Exponential Distribution: The exponential distribution is characterized
by the single parameter (.) The exponential random variable T (t > 0) has
a probability density function
f(t) = e-t
for t > 0
Where,
= mean number of occurrences per unit time
t= number of time units until the next occurrence
The cumulative distribution function is
t
F (t ) = e x dx = e x = 1 e t for t > 0.
0
Mean: =
1
and
1
Variance: 2 =
Lognormal Distribution: Distribution parameter (, )
Risk Analysis using Simulation
Page 12 of 29
Risk Analysis using Simulation
Normal Distribution: Distribution parameter (, )
Page 13 of 29
Risk Analysis using Simulation
Page 14 of 29
Triangular Distribution : Distribution Parameter (min, m.likely, max)
Risk Analysis using Simulation
Page 15 of 29
Risk Analysis using Simulation
Uniform Distribution : Distribution parameter (min, max)
Page 16 of 29
Risk Analysis using Simulation
Page 17 of 29
Risk Analysis using simulation
Some Important Discrete Distribution
Binomial Distribution: Distribution parameter (n, p)
Page 18 of 29
Risk Analysis using simulation
Page 19 of 29
Risk Analysis using simulation
Poisson distribution: Distribution parameter ()
Page 20 of 29
Risk Analysis using simulation
Page 21 of 29
Generating random variables in Excel
Two properties of probability distribution are useful to generate random numbers:
1. the probability of any outcome is always between 0 and 1 and
2. the sum of the probabilities of all outcomes adds to 1
Random numbers from discrete distribution
Divide the range from 0 to 1 into intervals that correspond to the probabilities of
the discrete outcomes.
Use VLOOKUP function (look up value, table array, column index number) or
Random number generation function in excel.
Risk Analysis using simulation
Example : Random number generation for Dave's Candies
Page 22 of 29
Risk Analysis using simulation
Page 23 of 29
Random numbers from continuous distribution
Inverse transform method is used in case if there is no direct function to
generate random numbers.
Inverse transform method uses the following steps:
1. generate
random
number
from
the
Uniform
distribution:
u=Uniform(0,1),
2. Calculate inverse cumulative distribution function (ICDF). In Excel, the
function name to calculate the ICDF for Normal distribution NORMINV
and for LogNormal distribution is Lognormal (LOGINV). RAND function
in excel can also be used to generate random numbers from the Uniform
distribution, and apply the built-in functions to calculate the ICDF.
Example:
For example, the following formula will return the inverse CDF of the Normal
distribution with mean=1 and standard_deviation=2 evaluated at p=0.2:
=NORMINV(0.2; 1; 2)
Replacing 0.2 with RAND will yield the Normal random number generation formula:
=NORMINV(RAND(); 1; 2)
Similarly for LogNormal distribution the function is
LOGINV (probability, mean, Standard_dev)
For triangular distribution, It may need to use If function to generate random numbers
in Excel. The structure of the IF function is:
=IF(expression, what is returned if true, what is returned if false)
If function first determines which side of the distribution corresponds to the random
number and then evaluates the appropriate formula.
=IF (RAND () < (Mode Min)/(Max Min),Left formula, Right formula)
Let assume,
In a triangular distribution, if Min is a, Mode is b and Max. is c, then
Left Formula = a + RAND () * (b a ) * (c a )
Right Formula = c (1 RAND()) * (c b) * (c a )
a xb
bxc
Risk Analysis using simulation
Page 24 of 29
Monte-Carlo Simulation
Figure 1: Monte Carlo analysis to a model.
Monte-Carlo simulation uses the following steps for assessing parameters uncertainty:
i.
Select a distribution to describe possible values of a parameter.
ii.
Generate data from this distribution.
iii.
Use the generated data as probable values of the parameters in the model to
produce output.
Monte-Carlo simulation on Excel
Monte-Carlo simulation on excel spreadsheet follows the following steps::
1. Develop the spreadsheet model including a separate input and output
region.
2. Generate random numbers from the assigned probability distribution and
use the random data into the appropriate formula in the simulation model.
3. Repeat step 2 until to obtain the satisfactory output to create a distribution of
results.
Risk Analysis using simulation
Page 25 of 29
4. Make a summary for the descriptive statistics and collect output data in a
frequency distribution or histogram for analysis.
Example: Monte-Carlo simulation on excel for Dave's Candies problem
Output
Example: Methyl-mercury has been inadvertently released to a nearby lake. Use
Monte Carlo simulation. With 90% (subjective) confidence, what is the risk to the
maximally exposed individual? The following table shows the (subjective) probability
distributions for this problem.
Risk Analysis using simulation
Page 26 of 29
Solution:
Step 1: Selecting distribution for model parameters
9 Table 1 is used as input for a Monte Carlo simulation
Table 1: Input data for MCS
Input Region
Parameter
Distribution
Min.
Max.
Mean/Mode
STD
2.06E-01
4.22E-02
Fish concentration(CF),
mg/kg
Normal
Intake of Fish (IF),kg/d
Uniform
2.00E-02
1.30E-01
6.50E-02
Methyl mercury RfD
(RfDMM), mg/kg-d
Triangle
1.50E-04
3.00E-03
3.00E-04
Body mass (BM), kg
Triangle
4.50E+01
1.20E+02
7.00E+01
Step 2: Generate data from this distribution.
9 Table 3 shows the generated dated for this example:
Table 3: Random number generation
Sample
Number
CF
IF
RfD
BM
1
2
3
4
5
6
7
8
9
10
11
12
96
97
98
99
100
0.209
0.268
0.203
0.199
0.140
0.230
0.191
0.182
0.197
0.220
0.218
0.198
0.265
0.213
0.233
0.265
0.161
0.032
0.047
0.052
0.026
0.123
0.083
0.025
0.102
0.095
0.041
0.099
0.024
0.098
0.109
0.113
0.070
0.066
5.39E-04
9.61E-04
2.35E-03
2.51E-04
3.71E-04
6.53E-04
2.77E-03
1.00E-03
1.36E-03
6.73E-04
7.61E-04
1.57E-03
2.90E-04
4.57E-04
1.55E-03
4.98E-04
6.63E-04
59.203
66.066
109.907
95.667
94.443
60.822
78.394
80.310
57.432
84.090
59.938
88.117
90.694
84.283
64.637
93.816
87.898
Risk Analysis using simulation
Page 27 of 29
Step 3: Use the generated data as probable values in the model to calculate HQ.
9 Used model to calculate the Hazard Index (HI):
HI =
CF I F
BM RfD
Table 3: MCS to calculate HI using 100 iterations
Sample
Number
CF
IF
RfD
BM
HI
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-
0.209
0.268
0.203
0.199
0.140
0.230
0.191
0.182
0.197
0.220
0.218
0.198
0.142
0.268
0.217
-
0.032
0.047
0.052
0.026
0.123
0.083
0.025
0.102
0.095
0.041
0.099
0.024
0.037
0.099
0.070
-
5.39E-04
9.61E-04
2.35E-03
2.51E-04
3.71E-04
6.53E-04
2.77E-03
1.00E-03
1.36E-03
6.73E-04
7.61E-04
1.57E-03
2.05E-03
1.83E-03
2.17E-03
-
59.203
66.066
109.907
95.667
94.443
60.822
78.394
80.310
57.432
84.090
59.938
88.117
64.939
87.285
76.220
-
2.12E-01
1.97E-01
4.07E-02
2.13E-01
4.92E-01
94
95
96
97
98
99
100
0.216
0.223
0.265
0.213
0.233
0.265
0.161
0.119
0.078
0.098
0.109
0.113
0.070
0.066
2.01E-03
2.68E-03
2.90E-04
4.57E-04
1.55E-03
4.98E-04
6.63E-04
69.327
74.652
90.694
84.283
64.637
93.816
87.898
4.79E-01
2.16E-02
2.32E-01
2.40E-01
1.59E-01
4.71E-01
3.48E-02
3.92E-02
1.66E-01
9.23E-02
1.85E-01
8.68E-02
9.83E-01
6.04E-01
2.63E-01
3.95E-01
1.83E-01
Risk Analysis using simulation
Page 28 of 29
Output Summary:
The output from MCS is shown in Fig 1.
The frequency distribution for the output from MCS (Monte Carlo simulation) is
shown in Fig 2.
From Monte Carlo simulation, the confidence interval for 90% & 95% are
obtained [0.193, 0.251] & [0.188, 0.256] respectively.
This implies that after taking into account the uncertainties on the parameters,
one is highly confident (at a subjective level of 95%) that the true HI should lie
between 0.188 and 0.256.
Since the 95% upper confidence limit of HI is still below 1, there is high
confidence that the maximally exposed individual for this scenario is not
exposed to an unacceptable level of risk, and remediation should not be
warranted.
Figure 1: MCS output
Risk Analysis using simulation
Page 29 of 29
Forcast: Hazard Index (HI)
25
Frequenc
20
15
10
5
1.
01
0.
91
0.
81
0.
71
0.
61
0.
51
0.
41
0.
31
0.
21
0.
11
0.
01
Hazard Index
Figure 2: Frequency distribution of HI
References:
Hammonds, J. S. , Hoffman, F. O., and Bartell, S .M., An Introductory Guide to
Uncertainty Analysis in Environmental and Health Risk Assessment managed by
MARTIN MARIETTA ENERGY SYSTEMS, INC. for the U.S. DEPARTMENT OF
ENERGY
James R. Evans, David Louis Olson, Introduction to Simulation and Risk Analysis,
Prentice Hall PTR, Upper Saddle River, NJ, 2001
http://www.cas.lancs.ac.uk/glossary_v1.1/prob.html#pdf
http://www.ltcconline.net/greenl/courses/201/probdist/zScore.htm