FTA Fault Tree Analysis
FTA Fault Tree Analysis
EUROCONTROL
SAF.ET1.ST03.1000-FTA-01-00
DOCUMENT CHARACTERISTICS
TITLE
Keywords
SAM SSA
Safety Assessment Fault Tree Analysis
ATM Procedure Fault Tree
PSSA
Contact Person(s) Tel Unit
Patrick MANA 93295 DAP/SAF
ELECTRONIC SOURCE
H:\Private\TRS\Fault Tree\deliverables
Host System Software Size
Windows_XP Word xxx Kb
EATMP Infocentre
EUROCONTROL Headquarters
96 Rue de la Fusée
B-1130 BRUSSELS
DOCUMENT APPROVAL
The following table identifies all management authorities who have successively approved
the present issue of this document.
Please make sure that the EATMP Infocentre Reference is present on page ii.
CONTENTS
1 - INTRODUCTION........................................................................................................2
1.1. Relation to SAM........................................................................................................ 2
1.2. Purpose .................................................................................................................. 3
1.3. Structure of this document.......................................................................................... 4
1.4. Target audience........................................................................................................ 4
1.5. Readership .............................................................................................................. 6
1.6. When is the Fault Tree Analysis used?......................................................................... 7
1.7. What is the Fault Tree Analysis (FTA)? ........................................................................ 7
1.8. What about software tools? ........................................................................................ 7
1.9. Safety / Dependability................................................................................................ 8
Acronyms
N None
FHA Functional Hazard Assessment
FTA Fault Tree Analysis
PSSA Preliminary System Safety Assessment
SAM Safety Assessment Methodology
SO Safety Objective
SR Safety Requirement
SRS Safety Requirement Specification
SSA System Safety Assessment
Documents
Ref. 3 Safety and Performance Requirements Standard For Initial Air Traffic Data Link
Services In Continental Airspace (ED120)
RTCA SC-189 / EUROCAE WG-53
Executive summary
This document provides guidance to apply Fault Tree Analysis in the framework of the Safety
Assessment Methodology (SAM). Therefore, this document constitutes a SAM Level 2
Guidance.
Specifically, it introduces the basic principles underlying this very widely spread and used
technique in safety assessment of many domains (not only aviation). It details the
advantages and limitations of such technique when undergoing Safety Assessment of Air
Navigation System.
It explains how to specifically use Fault Tree Analysis for both PSSA (Preliminary System
Safety Assessment) and SSA (System Safety Assessment):
1 - INTRODUCTION
SAM
Level 1: Methodology
Level 3: Examples
1.2. Purpose
This document constitutes a Level 2 document of the SAM (Part IV Annex K).
This document does not prescribe the only way of performing Fault Tree
Analysis. It rather provides one possible way to achieve such a task.
The purpose of these guidelines is to provide more insights for the Fault T ree
Analysis (FTA) by:
Providing the link with the SAM PSSA and SSA process,
Main body (Chapters), which describe how and where to use FTA as
part of a safety assessment;
They are responsible for the verification and review of the safety
documentation.
1.5. Readership
The following table suggests a minimum reader’s attention to this document.
Chapter 1 –
Introduction
Chapter 2 –
Basic Principles of FTA
Chapter 3 –
Fault Tree Description
Chapter 4 –
FTA in support of the PSSA
Chapter 5 –
FTA in support of the SSA
Appendix A –
Standardized basic symbols
N/A N/A
Appendix B –
Fault Tree Construction
Fundamentals
N/A N/A
Appendix C –
Boolean algebra / Minimal Cut N/A N/A
Set / Qualitative Approach
Appendix D –
N/A N/A
Quantitative Approach
Appendix E –
Quantitative allocation through N/A N/A
the fault tree
: Detailed knowledge;
: Aware;
N/A: Not Applicable.
FTA History
Historically, FTA appeared in the beginning of the 60's to assess and improve the
reliability of a missile launch system. It is frequently used since the middle of the
60's in many industrial domains as aeronautics, chemical industry, nuclear industry,
ground transportation (railway, automotive) etc.
A deductive method
There are several reasons for performing a fault tree analysis independently of, or in
conjunction with, other safety analyses. These include:
The fault tree is particularly suited to the analysis of complex systems comprising
several functionally related or dependent subsystems with different performance
objectives. This is especially true whenever the system design requires the
collaboration of many specialized technical design groups. Examples of systems to
which fault tree analysis is commonly applied include nuclear power generating
stations, aeroplanes, communication systems, chemical and other industrial
processes, etc.
The development of the fault tree should start early in the system design stage. T he
growth of the fault tree should be such that it reflects the progress of the design.
Thus an increased understanding of the failure modes will be obtained as the design
proceeds. The "analysis concurrent with design" allows for early systems design
change as significant failure modes and their contribution to the undesired events
are identified. The final fault trees will be large, in which case a specialized software
tool will be needed to handle them.
It is important to note that fault tree events are not confined solely to equipment
(hardware or software) failures, but include all conditions or other factors which are
relevant to the top event for the system concerned, like human errors, inappropriate
procedures and adverse environment conditions.
Steps to be done
In order to use the fault tree technique effectively as a method for system analysis,
the procedure shall consist of at least the following steps:
Boolean reduction is applicable in fault trees, as far as the occurrence of the top
event does not depend on timing or sequencing of events.
Each event in the fault tree shall be uniquely identified. Events should be
labelled so that cross reference from the fault tree to the corresponding design
documentation can easily be made.
The top event of the fault tree is the undesirable event which is the primary
reason for undertaking the fault tree analysis. It should be noted that only a
single top event may be associated with a given fault tree (meanwhile, note
that particular sub-trees of a given tree might be used to analyse specific
feared events representing causes of the main top event).
If several events in a fault tree all refer to different failure modes of the same
item, then such events shall be labelled so as to enable them to be
distinguished. At the same time, it should be clear that they are a group of
events related to the same item.
The fault tree is in effect a diagram in which the events are linked by logic
gates. Each gate has one output event but one or more input events. The
gates show the relationship of events (causes) explaining the occurrence of a
"higher" event.
The input events identify possible causes and conditions for the occurrence of
the output events. However, such linking does not necessarily define the
sequential (time) relationship between the events.
The basic fault tree uses AND and OR gates. However, for complex systems
analysis, additional gate symbols may be required to assure that the fault
trees are readable and as simple as possible.
It is important for the analyst to define and report those symbols being used
and ensure uniform and consistent use throughout a given fault tr ee analysis.
This is particularly true if computer-aided techniques are to be used.
The two basic types of fault tree gates (OR gate, AND gate), the elementary
event symbol and forward events which are the most used are drawn and their
meaning is given in the table below. Moreover, a complete presentation of
Fault Trees standardized basic symbols is provided at APPENDIX A.
NOTE: These symbols may be slightly different in other Fault Tree Analysis
technique reference documents.
Once the fault tree has been drawn, it can be evaluated to yield its qualitative
and quantitative characteristics. This is obtained from the equivalent Boolean
equations.
The OR-gate represents the logical "union" of the input events. The OR gate
is equivalent to the Boolean symbol " + ".
For example, the OR gate with two input events, as shown in Figure 3.1, is
equivalent to the following Boolean expression:
FE = A + B
OR Gate
G7
Ev ent A Ev ent B
A B
In terms of probability:
NOTE: For n input events attached to the OR-gate, the equivalent Boolean
expression is FE = A1 + A2 + A3 + … + An.
The AND gate represents the logical "intersection" of the input events. The
AND gate is equivalent to the Boolean symbol "∙".
For example, the AND gate with two input events, as shown in Figure 3.2, is
equivalent to the following Boolean expression:
FE = A ∙ B
AND Gate
G6
Ev ent A Ev ent B
A B
In terms of probability:
NOTE: For n input events attached to the AND-gate, the equivalent Boolean
expression is FE = A1 ∙ A2 ∙ A3 ∙ … ∙ An.
IMPORTANT
Note that a necessary condition for the Fault Trees to be correctly evaluated
qualitatively and quantitatively (based on the Boolean reduction) is the
independency of the basic events.
Before proceeding to the qualitative & quantitative evaluation, the F ault T ree
should be developed at a low enough level to ensure that condition is reached.
Moreover, be aware that Fault Trees don't allow to account for stochastic
dependency (a classical example of stochastic dependency is the one of two
pumps operating in active redundancy, each one charged at 50% and
displaying a failure rate ; in case of one pump failure, the remaining one will
increase its charge at 100%, resulting in an increase in its failure rate ' > ).
Stochastic dependency can be modelled and evaluated by using state -
transition diagrams (e.g. Markov chains).
The construction of the fault tree should be preceded by the following steps:
Scope of analysis,
System familiarization,
etc.
The top event is the focus of the entire analysis. Such an event may be the
onset or existence of a dangerous condition, or the inability of the system to
provide a desired performance.
In a global safety assessment process, the top event is usually output from a
previous analysis (e.g. a hazard identified by a Functional Hazard
Assessment, a sub-system failure mode having resulted from a system-level
Failure Modes and Effects Analysis, an event associated to a Safety Objective
or Requirement that have to be met by the system, etc.). The top event is
defined with respect to the mission(s) that the analyzed system must fulfil.
Generally, one or several measures are defined for a top event. The top event
measure is necessary for the Fault Tree quantitative evaluation.
The "immediate cause" concept requires that the analyst determines the
immediate necessary and sufficient causes for the occurrence of the analyzed
event (which could be the top event or an intermediary event in the tree) . This
concept is explained in APPENDIX B.
The immediate, necessary and sufficient causes of the top event are
addressed as sub-top events and the analyst proceeds to determine their
immediate, necessary and sufficient causes.
In this way, the analyst proceeds down the tree continually approaching a finer
resolution.
The concept of "basic units" can be used to save the analyst the effort of
developing fault tree diagrams which do not yield new or useful information.
This concept is explained in APPENDIX B.
In the two following figures (Figure 3.3 and Error! Reference source not
found.), two examples are presented to show the development and
representation of a fault tree. Symbols used in these examples are the ones
previously described (AND gate, OR gate, elementary cause symbol).
In the Figure 3.3, event A will occur only if both events B and C occur. Event
C is present if either event D or E occurs.
Ev ent A
Ev ent B Ev ent C
B C
Ev ent D Ev ent E
D E
NOTE: In this example, for each event, A, B, etc., information included in the
event description box is:
event code;
In a fault tree, common events and common cause events could appear.
Common cause events are elementary (or basic) events that involve a
common cause of failure not made evident at the current level of detail of the
analysis (e.g. several identical HW components exposed to the same
temperature or humidity conditions, two replicas of the same SW application,
etc.).
NOTE: A sub fault tree which appears in different branches of a fault tree
could be considered as a common intermediate event.
The impact of common events is illustrated by the two following fault tree
structures (see event A in Figure 3.4 and Figure 3.5) which may appear to be
different; however, according to Boolean logic defined above, they are
equivalent.
Feared event
FE
A AND B A AND C
A B A C
Feared event
FE
Event A Event B OR C
A B or C
Event B Event C
B C
NOTE: There is not one "correct" fault tree for a problem but many correct
forms which are equivalent to one another. The rules of Boolean
algebra can thus be applied to restructure the tree to a simpler,
equivalent form for ease of understanding or for simplifying the
evaluation of the tree. Boolean algebra rules shall be applied to
obtain the reduced form of the fault tree, called the minimal cut set
form, which allows quantitative and qualitative evaluations to be
performed in a straightforward manner.
Investigation;
Investigation
In all other cases, investigation is necessary for a correct decision on the type
and extent of further analyses. Direct usual investigation is possible only for
small trees with reduced complexity. Investigation of larger or more complex
trees, as arise from the analysis of actual systems, requires a suitable
software tool, but the overall approach remains the same.
Investigation deals with the review of the fault tree structure. All events which
are linked to the top event through a continuous chain of OR gates are single
causes for the top event to occur. Therefore, if a fault tree consists only of OR
gates, no further analysis is required. If the fault tree includes other gate types,
the analyzed system incorporates some sort of redundancy or other fault
tolerance features which could be invalidated by common or common cause
events. Investigation might identify those events, but that process becomes
error prone as size or complexity of the tree increase.
Boolean reduction can be carried out by solving Boolean equations for the
fault tree. The result are the minimal cut sets.
A cut set is a group of events which, when occurring together cause the top
event to happen. A minimal cut set is the smallest such group in which all
events must occur for the top event to occur. If any of the events in a minimal
cut set does not occur, it prevents the top event from o ccurring (by this
combination).
Minimal Cut Sets are the main instrument for performing the qualitative
analysis of the Fault Tree, mainly the qualitative importance analysis and the
common cause analysis.
Moreover, minimal cut sets are the necessary basis for the quantitative
evaluation of the Fault Tree.
After obtaining the minimal cut sets, some idea of basic event importance (in
terms of contribution to the top event) can be obtained by ordering the minimal
cut sets according to their size. The single-event minimal cut sets (if any) are
listed first, then the double-event minimal cut sets, then the triple, etc.
Because the failure probabilities associated with the minimal cut sets often
decrease by orders of magnitude as the size of the cur sets increases, the
ranking according to size gives a gross indication of the importance of the
minimal cut set.
The single-event minimal cut sets (events which by occurring alone lead to the
undesired event) provide essential information on the critical failures in the
system. In a safety allocation process during the system design, these failures
are susceptible to generate stringent safety requirements, in terms of limiting
their frequency of occurrence or of limiting the severity of their effects (e.g. by
providing detection and recovery means allowing to mitigate those effects).
In some cases, the double-event minimal cut sets can provide some useful
information as well. If a same basic event occurs in a large pr oportion of these
cut sets, it might be revealing a critical point in the system. Nevertheless, the
qualitative analysis should be completed by a quantitative evaluation (if
probabilistic data available) before driving final conclusions.
in a minimal cut set. A cause which does not trigger all the basic events in a
minimal cut set will not by itself cause system failure.
The common cause analysis requires the identification of minimal cut sets
which are susceptible to common cause failures. For doing that, common
cause categories, which are general areas that cause component
dependence, could be defined first. Examples of common cause categories
include manufacturer, location, seismic susceptibility, flood susceptibility,
wear-out susceptibility, other environment factors (temperature, humidity,
radiation), operator interactions, test degradation or maintenance degradation.
For each common cause category, several elements are defined (e.g. for the
"location" category, the site might be divided into a number of physical
locations) then when basic events are labelled, a part of the label could denote
the category and element for that event. The aim is to be able to easily find the
minimal cut sets whose basic events all have the same element of a given
category. Finally, the most difficult task is to screen these cut set s to
determine those which may require further action.
in order to identify the most efficient strategy of improving the system safety by
improving the probabilistic characteristics of its elements.
- Unavailability Q(t)
o For minimal cut set: the probability that all the components(i.e.
basic events) in the minimal cut set are down at time t and
unable to operate,
o For top event: the probability that the system is in the degraded
state involved by the top event at time t and unable to operate if
called on;
o For minimal cut set: the probability per unit time of the minimal
cut set occurring,
o For top event: probability per unit time of top event occurrence
at time t
- Unreliability F(t)
o For minimal cut set: the probability of the minimal cut set
occurring on (0, t]
The computation of the minimal cut set safety related characteristics is done
based on the basic event probabilistic characteristics which in their turn are
computed from the parameters associated to each basic event. There are
several types of basic events, each of them being characterized by a specific
set of probabilistic parameters. A non-exhaustive list of these types toge ther
with the relevant parameters and an approximation for calculating their
probabilistic characteristics (supposing that occurrence rate is constant in
time) is provided at Table 3-2.
The same table provides approximate formulas for the quantification of safety
related measures for a minimal cut set and for the top event.
Where
= component failure rate per hour (operating or standby, as applicable)
T D = average downtime per failure in hours
T = test interval in hours
TR = average repair time per failure in hours
p = probability of cyclic component failure per demand
n(t) = expected number of demands in time t (note that for one demand n=1 and q(t)=p)
k(t) = cyclic component demand rate per hour at time t
Table 3-2: Summary of equations for approximate quantification of safety related characteristics
The practical example below illustrates the quantification of each of the safety
measures above.
Radar ground
station failure
G1
GS_chain1_fail G2
GS_chain2_fail switch_fail
Wtop = q(GS_chain2_fail).w(GS_chain1_fail) +
+q(GS_chain1_fail).w(GS_chain2_fail) + q(switch_fail).
w(GS_chain1_fail) =.TD. + .TD. + p. = 3e-7
Note that the fact of confusing the quantification of the top event occurrence
rate with the unavailability, involves for the example an error of one order of
magnitude !
The quantitative evaluation can be performed for the top event, for an
intermediary event or for a minimal cut set.
The following rules are to be followed to ensure that the calculations are
correct:
Fault Trees allow modelling of both equipment failures (HW and SW)
and human errors (nevertheless, the quantitative evaluation is
Fault Trees are generally easier to read and to understand than other
dependability models (e.g. state-transitions graphs or Petri Nets);
Fault Trees might be less easy to build and understand than other
types of models (e.g. the Reliability Block Diagrams), but far more
powerful in terms of representation of failure behaviour of complex
systems;
The technique is well accepted and lends itself for quantification (with
the limits exposed above);
The results can provide either qualitative or quantitative data for the
risk assessment process;
Mature and powerful software tools exist to support fault trees edition
and evaluation;
A fault tree may get very large and complex but some market
available computer tools support this complexity; the difficulty of
validating the fault tree models still remains;
The FTA technique provides valuable support to all these stages. The fault
trees might be progressively built and refined along these stages.
B) Refinement
Stage 1
Design
modification
Evaluate system architecture(s) Data output
Stage 2
P
S
S Apply risk mitigation
A Strategies Stage 3
-
S
R Apportion safety objectives into List of Safety
S safety requirements Stage 4 Requirements
No Balanced/
Back to FHA Reconciled?
or PSSA
Yes Stage 5
End of PSSA-SRS
a system definition:
- assumptions;
- design constraints;
regulatory requirements;
applicable standards.
All of these inputs or most of them, depending on the analysis to be done, are
necessary to develop the level of understanding of the system design and its
rationale required by the FTA performance.
For doing that, a Fault Tree Analysis might be performed until the sub-
functions decomposition level is reached (note that generally the FHA
addressed the failure of services or user-oriented functions; the role of the
current FTA is to analyze the failure of lower level technical functions that
implement the former).
In this case, the FMEA has a great In this case, FTA is first used to refine
interest as a preliminary study to FTA, functions (in sub-functions).
facilitating the construction of this latter.
Functional FMEA is better indicated to
It is performed at the level of refine sub-functions which are made
decomposition of the sub-functions. with no redundant constituents.
Note that generally the FHA addressed For example, a functional FMEA, after
the failure of services or user-oriented an FTA is highly recommended for
functions; the role of the current FMEA software functions study.
is to analyze the failure of lower level
technical functions that implement the The FTA for refining sub-functions safety
former. contribution is better indicated for:
The FTA can permit to validate some assumptions of design … and not to
validate others ones.
The fault tree(s) to be developed during this stage is the one begun during the
previous stage (See chapter 4.3). It is carried on and refined in order to
evaluate the system architecture.
The fault trees can be constructed as explained in the first part of this
guidance material (chapter Error! Reference source not found.).
The more details included in the fault tree and the more the nodes are refined,
the more information is available to the designer for mitigating the associated
risk.
Note that the Risk Mitigation Strategies can be determined using other
documents as "Guidelines for the Safety Assessment of ATM procedures
(SAAP)" or or Guidance Material A of PSSA – Chapter 3, etc.
EXAMPLE:
So, it means that after the first apportionment iteration, if a safety requirement
allocation to a human error is more severe than Probable then the
apportionment needs to be re-evaluated in order to satisfy the defined
strategy.
In the following fault tree, after the first apportionment iteration, a "Remote"
requirement is allocated to a human error (HE-EARLY). This does not follow
the previous defined rule.
Early delivery of
CPDLC message used
f or separation
REMOTE
EARLY delivery
HE-EARLY HMI-EARLY
REMOTE REMOTE
So, one solution (described in the above figure) can be proposed in order to
allocate a Probable requirement to the human error (HE-EARLY event),
instead of the previous "Remote" requirement.
Safety Objective
PROBABLE PROBABLE
Early delivery of Voice Readback
fails to detect that
CPDLC message used message was too
for separation
early delivered
HE-EARLY HMI-EARLY
PROBABLE PROBABLE
As it can be seen, in this case, the solution is to add detection realized with a
Voice Read-Back action
Warning: this is a fictitious example. Therefore it does not suggest that Voice -
read Back shall be always used for Controller Pilot Data Link Communications
(CPDLC) applications.
The document "Safety and Performance Requirements Standard For Initial Air
Traffic Data Link Services In Continental Airspace" (ED120) provides actual
guidance for such application.
FHA
B) Refinement
Stage 1
Design
modification
Evaluate system architecture(s)
P Stage 2
S
S
Apply risk mitigation
A
Strategies Stage 3
-
S
R
S Apportion safety objectives into
safety requirements Stage 4
No Balanced/
Back to FHA
Reconciled?
or PSSA
Yes Stage 5
After the previous three stages of the PSSA SRS process, the fourth stage
performs the apportionment of Safety Objectives into Safety Requirements for
each individual system element (people, procedures and equipment).
This step and the subsequent balancing / reconciliation with the non -safety
related requirements are required to ensure that the system would be able to
meet the safety objectives defined in the FHA. They compose a local iterative
process.
The iterative allocation process has to be carried on until the final set of Safety
Requirements (taking into account the architecture and all the inputs) can
reasonably be expected to achieve the Safety Objectives.
Note that before beginning the apportionment task, the fault tree, constructed
in the previous stages, needs to be developed trying to give a detailed and
comprehensive overview of the contribution of all domains (Ground / Airborne)
and types of element (equipment, human factor, procedure). Fault tree is
elaborated by searching the causes and contributions thereof leading to the
top event (as seen in the chapter Fault Tree Description). If needed,
architectural FMEA (as opposed to functional FMEA) might be used to
contribute to the thorough fault tree construction.
Apportionment should account for any available data related to fault tree
events:
field experience;
availability;
integrity;
maintainability, etc.
Note that quantitative safety objectives result, through allocation process, into
Safety Requirements addressing reliability, availability, integrity,
maintainability.
- design constraints;
- minimal cuts;
- qualitative importance;
- Importance factor;
- Sensitivity analysis;
Conclusions *.
* Note that the marked items are to be detailed for each hazard to be studied.
to collect evidences;
The FTA technique provides valuable support for the "providing safety
assurance" step.
a system definition:
- definitive/validated assumptions;
- design constraints;
regulatory requirements;
applicable standards.
All of these inputs or most of them, depending on the analysis to be done, are
necessary to complete the level of understanding of the system design and its
rationale required by the FTA performance.
to realize a more detailed analysis for some events (or each event)
linked to the system elements (human, procedure, equipment);
In this case, the aim of the use of FTA is to detail the causes of an event in
order to precise the research of data and to ease the quantitative evaluation.
The most useful FTA utilization during SSA is to verify that the Safety
objectives (at hazard level) are satisfied.
After detailing the fault tree (if necessary) and collecting data, a bottom-up
method is to be performed (as explain in APPENDIX E).
Note that all the values are to be either qualitative ones or quantitative ones,
corresponding to the characteristic of the Safety Objectives to be met.
Note that the FTA is not used to research and define data and the associated
evidences.
If the Safety Objective is not satisfied, the FTA can also be suitable for the
research of what is (are) the cause(s) for this dissatisfaction.
- design constraints;
Conclusions *.
* Note that the marked items are to be detailed for each hazard to be studied.
** Note that all the required outputs can be only about the discrepancies
between the PSSA and SSA phases.
OR gate Output (event) occurs only at least one of the input events occurs.
A description of the event is provided in the rectangle.
AND Gate Output (event) occurs only if all input events occur simultaneously.
A description of the event is provided in the rectangle.
{4} Forward target: Target event that will be used in another tree as a called sub-tree.
The calling event (see previous line) has the same reference inside the
triangle (the calling event is presented in this particular example as
beginning with an "OR" gate).
A description of this called event (identical to that of the calling event) is
provided in the rectangle.
Exclusive-OR gate Output (event) occurs only if one of the input events occurs
alone (used typically with two input events).
A description of the event is provided in the rectangle.
or
G1
NOT gate Output (event) represents a condition which is an inverse of the condition
defined by the input event.
A description of the event is provided in the rectangle.
The NOT gate renders the objects selected in the tree negative (or vice
versa): an AND gate becomes NAND, an OR gate becomes NOR, an
or event becomes a complementary event.
Combination gate Output (event) occurs only if at least m of the n inputs occur
simultaneously (with m < n).
m/n
Cardinality gate Output (event) occurs only if x inputs occur, x comprised between
M (min) and N(max).
m <> n
G2 IF … ELSE gate The output (event) of the IF …ELSE gate corresponds to the event
on the left (if condition verified Gate) IF the Condition event is at true
ELSE it corresponds to the event on the right (if condition not verified
Gate).
This gate is used for representing a structure of the type If Evt1 then
Evt2 else Evt3. This type of gate has been introduced to represent a
twin-input switch:
Top
ev ent
Top
event
A B C A C B
↔
C
The second step is to construct the fault tree: This construction requires few
fundamentals rules which are explained hereafter.
The system failure mode defined in the first step constitutes the fault tree top
event. Knowing this, the analyst determines the immediate, necessary, and
sufficient causes for the occurrence of the top event. It should be noted that
these are not the basic causes of the event but the immediate causes or
immediate mechanisms for the event. This is an extremely important point
which will be clarified and illustrated in later examples.
The immediate, necessary, and sufficient causes of the top event are now
treated as sub-top events and the analyst proceeds to determine their
immediate, necessary, and sufficient causes.
In this way, the analyst proceeds down the tree, continually approaching
further resolution, until ultimately, the limit of resolution of the tree is reached .
A D E
The immediate cause of the top event, "no signal to E," is "no output from D."
The analyst should strongly resist the temptation to list the event, "no input t o
D" as the immediate cause of "no signal to E." In the determination of
immediate causes, one step should be taken at a time. The "immediate cause"
concept is sometimes called the "Think Small" Rule because of the
methodical, one-step-at-a-time approach.
The sub-top event, "no output from D," is now identified and it is next
necessary to determine its immediate cause or causes. There are two
possibilities:
Therefore, the sub-top event, "no output from D," can arise from the union of
the two events, 1 or 2.
NOTE: The reader should note that if more than one step was taken and the
cause of "no input to D," had been identified (improperly) then event 1 above
would have been missed. In fact, the motivation for considering immediate
causes is now clear: it provides assurance that no fault event in the sequence
is overlooked.
The immediate causes for the new mode failures, events 1 and 2 can now be
sought. If the limit of resolution is the subsystem level, then event 1 (which can
be rephrased, "D fails to perform its proper function due to some fault intern al
to D") is not analyzed further and constitutes a basic input to the tree. With
respect to event 2, its immediate, necessary and sufficient cause is "no output
from B and no output from C," which appears as an intersection of two events,
i.e.,
2 = 3 and 4
Where
As far as 3 is concerned:
3 = 5 or 6
Where
The further steps in the analysis of this system can now be easily supplied by
the reader. The analysis will be terminated when all the relevant basic tree
inputs have been identified. In this connection, the event "no input to A" is also
considered to be a basic tree input.
The analysis of the top event ("no input to E") consequently produced a
linkage of fault events connected by "and" and "Or" logic. The framework (or
system model) on which this linkage is "hung" is the fault tree.
The following section provides the necessary details for connecting the fault
event linkage to its framework (fault tree),
Observance of these rules helps to ensure successful fault trees so that the
process is now less of an art and more of a science.
The basic rules for successful fault tree analysis are now examined.
The fault tree in the Figure 5.1 is considered as a simple fault tree or perhaps
a part of a larger fault tree.
Note that none of the failure events have been ''written in"; they have been
designated just Q, A, B, C, D.
A B
C D
Write the statements that are entered in the event boxes as faults; state
precisely what the fault is and when it occurs.
The "what-condition" describes the relevant railed (or operating) state of the
component. The "when-condition" describes the condition of the system – with
respect to the component of interest - which makes that particular state of
existence of the component a fault.
Note that Ground Rule I may frequently require a fairly verbose statement. So
be it. The analyst is cautioned not to be afraid of wordy statements. Do not
tailor the length of your statement to the size of the box that you have drawn. If
necessary, make the box bigger. It is permissible to abbreviate words but
resist the temptation to abbreviate ideas. Examples of fault statements are:
(1) Normally closed relay contacts fails to open when EMF is applied to
coil.
The next step in the procedure is to examine each boxed statement and ask
the question: "Can this fault consist of a component failure?" This question
and its answer lead us to Ground Rule II:
In addition to the above ground rules, there are a number of other procedural
statements that have been developed over the years. The first of these is the
No Miracles Rule:
that the component functions normally, thus allowing the passage of the fault
sequence in question.
Two other procedural statements address the dangers of not being methodical
and attempting to shortcut the analysis process. The first is the Complete-the-
Gate Rule:
Gate inputs should be properly defined fault events, and gates should
not be directly connected to other gates.
The Complete-the-Gate Rule states that the fault tree should be developed in
levels, and each level should be completed before any consideration is given
to a lower level. With regard to the No Gale-to-Gate Rule, a "shortcut" fault
tree is shown below.
A B
X Y C
In order for the unit to be considered "basic", it is necessary and sufficient that
the following requirements be satisfied:
X+Y=Y+X
Law of Absorption X ∙ (X + Y) = X /
X + (X ∙ Y) = X
Complementation X ∙ X' = Ø = 0 /
X + X' = Ω = 1
(X')' = X
1∙X=X
1+X=1
0' =1
1' = 0
Feared event
A+ B A+ C D +B D +C
A B A C D B D C
(A + B) ∙ (A + C) ∙ (D + B) ∙ (D + C) (1)
(A + B) ∙ (A + C) = (A ∙ A) + (A ∙ B) + (A ∙ C) + (B ∙ C)
= A + (A ∙ B) + (A ∙ C) + (B ∙ C)
= A + (B ∙ C)
Likewise,
(D + B) ∙ (D + C) = (D ∙ D) + (D ∙ B) + (D ∙ C) + (B ∙ C)
= D + (D ∙ B) + (D ∙ C) + (B ∙ C)
= D + (B ∙ C)
If E represents (B ∙ C):
[A + (B ∙ C)] ∙ [D + (B ∙ C)] = (A + E) ∙ (D + E)
(A + E) ∙ (D + E) = (A ∙ D) + (A ∙ E) + (D ∙ E) + (E ∙ E)
= E + (A ∙ E) + (D ∙ E) + (A ∙ D)
= E + (A ∙ D)
(B ∙ C) + (A ∙ D)
(A + B) ∙ (A + C) ∙ (D + B) ∙ (D + C) = (B ∙ C) + (A ∙ D)
Feared event
A.B A . B° A° . B°
A B A
B A B
This example can be worked either by (a) removing the outermost prime as a first
step or by (b) manipulating the terms inside the large brackets and removing the
outermost prime as a last step.
= [A' + (B ∙ B')] ∙ (A + B)
= A' ∙ B
= A' ∙ B
Feared
event
Intermediary Intermediary
event event
X Y A B C X Y
C B A
De Morgan's theorem can be applied to the second term inside the second bracket:
= (D ∙ D) + (D ∙ E') + (D ∙ E) + (E ∙ E')
= D + (D ∙ E') + (D ∙ E) + Ø
= D
= X∙Y
Any fault tree will consist of a finite number of minimal cut sets, which are
unique for that top event. The one-event minimal cut sets, if there are any,
represent those single events which will cause the top event to occur. The
two-event minimal cut sets, if there are any, represent the doubles events
which together will cause the top event to occur. For an n -event minimal cut
set, all n events in the cut set must occur in order for the top event to occur.
The minimal cut set expression for the top event can be written in the following
general form:
T=M1 + M2 + … + Mi + … + Mm
Where T is the top event and Mi are the minimal cut sets. Each minimal cut set
consists of a combination of specific elementary events, and hence the
general n-event minimal cut can be expressed as:
Mi =X1 ∙ X2 ... Xj … Xn
T = A + (B ∙ C)
This top event has one-event minimal cut set (A) and a two-event minimal cut
set (B ∙ C). The minimal cut sets are unique for a top event and are
independent of the different equivalent forms the same fault tree may have.
To determine the minimal cut sets of a fault tree, the tree is first translated to
its equivalent Boolean equations and then a substitution method is used.
The redundancies are removed using the distributive and the absorption laws.
For small fault trees, the determination of the minimal cut sets can be done by
hand. For larger trees, various computer algorithms and codes for fault tree
evaluation are available.
In complex systems, the minimal cut set computation provides the analyst with
a thorough and systematic method for identifying the basic combinations of
events which can cause the undesired event (top event).
Considering the following fault tree (Figure 5.2), the Boolean equations are
shown hereafter.
Event T
Event E1 Event E2
E1 E2
A C
Event E3 Event E4
E3 E4
B C A B
T = E1 ∙ E2
E1 = A + E3
E3 = B + C
E2 = C + E4
E4 = A ∙ B
T = (A + E3) ∙ (C + E4)
Using the idempotent law (C ∙ C = C) and the absorption law (C + x∙C = C):
T = (A ∙ E4) + C + (B ∙ E4)
T = (A ∙ (A ∙ B)) + C + (B ∙ (A ∙ B))
= (A ∙ A ∙ B) + C + (B ∙ A ∙ B)
T = C + (A ∙ B)
The minimal cut sets of the top event are thus C and (A ∙ B): one single event
minimal cut set and one double event minimal cut set.
The fault tree can thus be represented as shown in the Figure 5.3, which is
equivalent to the original tree (both trees have the same minimal cut sets)
Event T
C A.B
A B
Considering the following fault tree (Figure 5.4), the Boolean equations are
shown hereafter.
E1
E2 E3
A E4 E5 B
C E6 D E F
D E7 F
G E H
E1 = E2 ∙ E3
E2 = A + E4
E3 = B + E5
E4 = C ∙ E6
E5 = D + E + F
E6 = D + E7 + F
E7 = E ∙ G ∙ H
E1 = (A + E4) ∙ (B + E5)
E1 = (A ∙ B) + (A ∙ (D + E + F)) + (C ∙ E6 ∙ B) + (C ∙ E6 ∙ (D + E + F))
E1 = (A ∙ B) + (A ∙ D) + (A ∙ E) + (A ∙ F) + (C ∙ (D + E7 + F) ∙ B) +
(C ∙ (D + E7 + F) ∙ (D + E + F))
E1 = (A ∙ B) + (A ∙ D) + (A ∙ E) + (A ∙ F) + (C ∙ (D + E7 + F) ∙ B) +
(C ∙ ((D ∙ D) + (D ∙ E) + (D ∙ F) + (E7 ∙ D) + (E7 ∙ E) + (E7 ∙ F) +
(F ∙ D) + (F ∙ E) + (F ∙ F)))
E1 = (A ∙ B) + (A ∙ D) + (A ∙ E) + (A ∙ F) + (C ∙ (D + E7 + F) ∙ B) +
(C ∙ (D + (E7 ∙ E) + F))
E1 = (A ∙ B) + (A ∙ D) + (A ∙ E) + (A ∙ F) + (C ∙ (D + E7 + F) ∙ B) +
(C ∙ (D + (E ∙ G ∙ H ∙ E) + F))
E1 = (A ∙ B) + (A ∙ D) + (A ∙ E) + (A ∙ F) + (C ∙ B ∙ D) + (C ∙ B ∙ E7) +
(C ∙ (D + (E ∙ G ∙ H) + F))
E1 = (A ∙ B) + (A ∙ D) + (A ∙ E) + (A ∙ F) + (C ∙ B ∙ D) + (C ∙ B ∙ E7) +
(C ∙ D) + (C ∙ E ∙ G ∙ H) + (C ∙ F)
E1 = (A ∙ B) + (A ∙ D) + (A ∙ E) + (A ∙ F) + (C ∙ B ∙ E ∙ G ∙ H) + (C ∙ D) +
(C ∙ E ∙ G ∙ H) + (C ∙ F)
E1 = (A ∙ B) + (A ∙ D) + (A ∙ E) + (A ∙ F) + (C ∙ D) + (C ∙ F) + (C ∙ E ∙ G ∙ H)
A∙B
A∙D
A∙E
A∙F
C∙D
C∙F
C∙E∙ G ∙H
As previously discussed, the minimal cut sets give all the unique combinations
of basic events that cause the top event.
Qualitative importances
After obtaining the minimal cut sets, some idea of basic event importan ces
can be obtained by ordering the minimal cut sets according to their size. T he
single-event minimal cut sets (if any) are listed first, then the double -event
minimal cut sets, then the triple, etc.
It is often the practice to sort only the single, double, and perhaps triple-event
minimal cut sets. As an additional calculation, higher order minimal cut sets
(quadruples, etc.) can also be sorted if they show potential susceptibility to
common cause events.
Because the basic event probabilities associated with the minimal cut sets
often decrease by orders of magnitude as the size of the cut set increases, the
ranking according to size gives a gross indication of the importance of the
minimal cut set. For example, if individual basic event probabilities are o f the
order of 10 -3, a single-event cut set probability will be of the order of 10 - 3 , and
a double cut set 10 -6, a triple 10 -9,etc.
The minimal cut set information can sometimes be used directly to check
design criteria. For example, if a design criterion states that no single event
shall lead to the top event, then this is equivalent to stating that the system
shall contain no single event minimal cut sets. The minimal cut sets can be
checked to see if this criterion is satisfied.
In evaluating a fault tree, the events which could be common cause events are
not known. However, the susceptibility that component failures may have a
common initiating cause can be indicated.
By definition, the top event occurs if all the basic events in a minimal cut set
occur. So, what is interesting are only the common causes which can trigger
all the basic events in a minimal cut set. A cause which does not trigger all the
basic events in a minimal cut set will not by itself cause system failure.
To identify minimal cut sets which are susceptible to common cause events,
common cause categories can be defined. These are general areas that can
cause event dependence.
The list below gives some example categories which might be considered in a
common cause susceptibility evaluation:
Location;
Temperature;
Humidity;
Flood susceptibility;
Manufacturer;
Wear-out susceptibility;
Maintenance degradation;
Energy sources;
Etc.
For each common cause category, specific "elements" can be defined. For
example, for the category "Manufacturer" the elements would be particular
manufacturers involved which might be coded as "Manufacturer 1",
"Manufacturer 2", etc. For the "Location" category, a system can be divided
into a given number of physical locations which would be the elements. For
the category "Flood susceptibility", several sensitivity levels might be defined
ranging from no sensitivity to extreme sensitivity.
The next task in the common cause susceptibility evaluations involves event
coding. As part of the event name code (reference) or in associated event
description fields, for each event occurrence, the element of each category
associated with the basic event can be denoted. The categories can be
indexed or keyed according to any convenient coding system.
Having performed this coding, the potentially susceptible minimal cut sets can
be identified among the collection of minimal cut sets determined for the fault
tree. The minimal cut sets which are potentially susceptible to common cause
failures are those whose primary failures all have the same element of a given
category. Having identified the potentially susceptible minimal cut sets, the
minimal cut sets need finally to be screened in order to determine those which
may require further action. This final screening may be based on past histories
of common cause occurrences, some sort of quantification analysis, and/or
expert judgement.
APPENDIX D QUANTITATIVE
APPROACH
In addition to the evaluation of the safety measure(s) associated to the top event (or
a selected event), the quantitative analysis encompasses the importance analysis,
the sensitivity analysis and the uncertainty analysis.
in order to identify the most efficient strategy of improving the system safety by
improving the probabilistic characteristics of its elements.
The critical importance factor can be calculated for a minimal cut set
as well. This is the probability that the minimal cut set causes the top
event knowing that the top event has occurred. The critical
importance factor is really interesting because it gives the respective
weight of each minimal cut in the contribution to the system failure.
the top event, given that the top event has effectively occurred.
It is useful for diagnosing the causes for the top event occur rence.
NOTE: Provided that adequate SW tools are available for supporting it, the
sensitivity analysis should be preferred to the analysis based on importance
factors.
D.4 Bibliography
[1] Z. W. Birnbaum – On the importance of different components and a
multicomponent system – Multivariate Analysis II – P. R.
Krishnaiah Editor, 1969
[2] H. E. Lambert – Fault trees for decision making in system anal ysis
– Ph. D. Thesis, Lawrence Livermore Laboratory, 1975
[6] I.B. Wall and D.H. Worledge – Some perspectives on risk importance
measures. In Proceedings of the international conference on
Probabilistic Safety Assessment – PSA 96, 1996
APPENDIX E QUANTITATIVE
ALLOCATION THROUGH
THE FAULT TREE
According to the fault tree, safety objectives are allocated to the different elementary
causes in order to derive quantitative probability requirements that prevent the
hazard from occurring with a probability higher than the one allocated by the
corresponding Safety Objective.
As for the qualitative allocation method, two apportionment strategies could be used
for quantitative apportionment:
A "bottom-up" strategy, which aims at filling the missing safety data in the
fault tree (by "guessing" the Safety Requirements to be allocated to the
elementary events) in order to reach the qualitative Safety Objective at the
top of the tree.
NOTES:
Each probability can be considered as a scientific number "X . 10 -y " or "X E-y ".
The number X, member of the probability assigned to each input, could include
several decimal places. To simplify the apportionment process, each probability can
be considered as a scientific number "X.10 -y " or "X E-y " where X is an integer (whole
number).
The process to convert a decimal number into an integer must still be "safety
conservative". The number X must be rounded down to the nea rest integer.
X . 10-Y
A
For example:
10-6
A
5.10-7 5.10-7
B C
X . 10-Y
A
For example:
10-6 10-6
A A
For the two previous proposed methods, what is important is that the
combination of all inputs would be a conservative "orders of magnitude" of the
output objective.
NOTE: Most of the time, the apportionment probabilities are rounded down to
the nearest ten. If the OR gate safety objective is to be apportioned to 2 to 10
gate inputs, then the probability for an input is the output probability divided by
ten. If the OR gate safety objective is to be apportioned to 11 to 100 gate
inputs, then the probability for an input is the output probability divided by one
hundred. If the OR gate has only one input, the objective is the same.
10-6 10-6
A A
X . 10-Y
A
N
X 10 Y/N
N inputs N
X 10 Y/N
B C
If the output objective is X . 10-Y then for one input the objective is:
N
X 10-Y = N
X 10 Y/N
For example:
4.10-6
A
2.10-3 2.10-3
B C
X . 10-Y
A
For example:
4.10-6 4.10-6
A A
NOTE: Most of the time, the apportionment probabilities are rounded down to
the nearest ten. For example:
5.10-6
A
B D C
1,71.10-2 1,71.10-2 1,71.10-2
10-2 10-2 10-2
Valve B open
Valve
A
closed
Incorrect control
Valve B
to valve B
failed
open
Level sensing
Controller failed
failed
Sensor Sensor
X failed Y failed
Valve B open
Valve
A
closed
Incorrect control
Valve B
to valve B
failed
open
Level sensing
Controller failed
failed
Sensor Sensor
X failed Y failed
Incorrect control
Valve B
to valve B
failed
open
Level sensing
Controller failed
failed
Sensor Sensor
X failed Y failed
Level sensing
Controller failed
failed
Sensor Sensor
X failed Y failed
Valve B
Controller
o SR-C-1: Controller shall not fail to close valve B more than 6 x10 - 6
per operating hour.
Sensors
from an OR gate:
The difference between the Safety Objective and the sum of all the known
quantitative requirements is determined.
X . 10-Y
A
???
N.10-n M.10-m
B C D
Then, this difference is apportioned (even or not) to the basic events for which
no safety requirement was defined.
For example:
10-6 10-6
A A
??? 3.10-7
3.10-7 4.10-7 3.10-7 4.10-7
B C D B C D
Note that for this method, the Safety Objective must be greater than the sum
of all the known quantitative requirements. If not, the apportionment cannot be
done because the result of the fault tree is greater than the defined Safety
Objective.
The Safety Objective is divided by the product of all the known quantitative
requirements.
X . 10-Y
A
???
N.10-n M.10-m
B C D
Then, the result of the division is apportioned (the product of all the
apportioned Safety Requirements, in an even manner (using an "n t h root") or
not, must be less than the result of the division) to the basic events for which
no safety requirement was defined.
For example:
10-6 10-6
A A
??? ≈ 8.10-3
3.10-2 4.10-1 3.10-2 4.10-3
B C D B C D
Note that if the product of all the known quantitative requirements is less than
the required Safety Objective, then the basic events, for which a quantitative
requirement is needed, can be allocated with a probability of 1.
Because of the definition of the minimal cut sets, the apportionment of Safety
Objectives into Safety Requirements using this method requires to apportion
the objectives through AND and OR gates.
Indeed, as seen in the APPENDIX C, the minimal cut set expression for the
top event can be written in the following general form:
T=M1 + M2 + … + Mi + … + Mm
Where T is the top event and Mi are the minimal cut sets. Each minimal cut set
consists of a combination of specific elementary events, and hence the
general n-event minimal cut can be expressed as:
Mi =X1 ∙ X2 ... Xj … Xn
The "Minimal Cut Set" method can be used to realize the apportionment of a
Safety Objective knowing or not Safety Requirements.
Then, for each minimal cut, the apportionment is done through an AND gate
(Mi =X1 ∙ X2 ... Xj … Xn ).