Root Cause Analysis Tools
Root Cause Analysis Tools
Welcome
Our mission is preventing patient harm.
We look in large part to High-Reliability Organizations (HROs) like aviation and nuclear power for
guidance in accomplishing our mission. Both of these industries have very low accident rates in spite of
the complex and hazardous nature of their business. Some keys to their success are:
Mindfulness, an active awareness of the risks at hand and attention to cues signaling that something
isnt right.
Standardized approaches to tasks (e.g., checklists) to ensure consistency regardless of who is doing
the task.
Teamwork that features a common understanding of the task at hand, its intended outcome, rules
for communication, and individual roles.
Seeking out and listening to local experts regardless of rank.
Resiliency through well-known and rehearsed plans (e.g., drills, simulations) for recovering or
rescuing a situation when things go wrong.
An organizational expectation that employees speak up as soon as a situation seems unsafe.1
Organizational openness to and support of employees speaking up with their concerns, constructive
criticism and suggestions is fundamental to a just culture.2
Many HRO strategies are grounded in the science of Human Factors Engineering (HFE). HFE
examines how people interact, accomplish work, and respond to surroundings using their thinking,
communication and physical abilities (senses, size, strength, etc.) under various personal and
environmental conditions (hot/cold, dark/light, quiet/noisy, clear/confusing, focused/distracted,
rested/fatigued, etc.). HFE provides evidence-based approaches for improving processes, performance
and the places we work, by design.
We can design in safety! In other words, we can intentionally make things safer. The strongest
designs are permanent and physical (installing a staircase railing) rather than temporary and procedural
(warnings to Watch out for the steps!). Design may naturally lead us to do the correct thing (threepronged plugs fit three-pronged sockets) or simplify complicated tasks (adding step-by-step photos to
written instructions). HFE strategies aim to put knowledge in the world when and where we need it,
instead of adding burden to our limited attention span and memory (reminders to Be careful! or
repetitive training about things we already know). 3
Each of us comes to work to do our part in providing ideal care for Veterans not to make a
mistake. But, health care is complex and hazardous. Our best intentions are not enough. We must apply
the lessons learned and practice the techniques HROs and HFE provide to us.
From a patient safety perspective, when things go wrong its about design or system failure, not
individual fault.* We use a confidential and protected** team-based approach called Root Cause
Analysis (RCA) adapted from aviation to figure out the answers to four core questions that will help
us design safer care:
What happened?
Why did it happen?
What action can we take to prevent it from happening again?
How will we know if the action we took made a difference?
Page 1
Notice that none of these core questions ask who was involved. Our focus is on system failures not
individual fault. If you make a mistake today, I could make the same one tomorrow: unless we speak
up, analyze the situation, fix something, and monitor whether our action made a difference.
What youll find in the rest of this document is a series of questions and technical guidance that will
assist you and your team in answering the four core questions and designing and sustaining safer care.
Section 1:
What Happened? ................................................................................................................ 4
Environment .............................................................................................................................9
Equipment ..............................................................................................................................11
Training ...................................................................................................................................18
Communication ......................................................................................................................20
Section 2:
Why Did It Happen?.......................................................................................................... 23
Section 3:
What Action Can We Take to Prevent It From Happening Again? ............................... 25
Section 4:
How Will We Know If the Action We Took Made a Difference? .................................... 29
Page 2
Thank you
*NOTE: Purposeful or intentional harm (crimes, patient abuse, intentionally unsafe acts, and events involving provider substance abuse) are
not a part of patient safety. Intentional harm is handled by administrative processes.
**VHA Directive 2008- 077. QM and patient safety activities that can generate confidential documents describes quality improvement
processes protected by 38 USC 5705. http://vaww1.va.gov/vhapublications/ViewPublication.asp?pub_ID=1803
1.
2.
3.
Weick, K.E. and Sutcliffe, K.M. Managing the unexpected: resilient performance in an age of uncertainty. 2nd ed. San Francisco: JosseyBass, 2007.
Marx, D. Patient safety and the just culture: a primer for health care executives. New York: Columbia University Press, 2001.
Norman, D.A. The design of everyday things. New York: Doubleday, 1988
Page 3
Section 1
What Happened?
To discover what actually happened and why, RCA teams must keep an open mind, suspend their
personal biases, and actively resist jumping to conclusions. Teams pursue objectives, deep questioning
about how things are supposed to work (organizational rules) versus how things actually work day-today (local norms and work-arounds). The difference teams find between what should be and what is
points them to what needs to be fixed.
Organizations with independent (silo) management and work processes for delivering exceptional
service on time to an ever-changing population are constantly at risk for missed opportunities and
accidents. This is true for any industry, including health care. James Reason has famously illustrated the
defenses (e.g., culture, individual competency, teamwork) and failures (e.g., short staffing, time
pressure, clumsy technology) in constant tension with each other in his Swiss Cheese model. 4
A series of standard questions has been updated* to assist all RCA teams in figuring out what
happened and why in a uniform, purposeful way. The questions are designed to reveal vulnerabilities in
work processes and systems and prompt the team to review documents, conduct interviews, walk
around and observe, and safely simulate or reenact events as needed. Not all questions will be directly
relevant or applicable to every situation but they are useful to consider for any event or close call. You
will notice some overlap in questions between some sections; it reflects their inter-relatedness in real
life. Of course, teams are welcome to develop additional questions of their own.
We begin with questions about what should happen ideally (Rules and Safeguards) and proceed
from there to the physical space and tools available to patients and providers (Environment, Equipment,
and Information Technology) and conclude with questions that look more closely at how we function as
humans in complex situations (Fatigue and Scheduling, Training, and Communication). You will find
everything from narrowly focused technical questions to more general cultural questions in this series.
*NCPS Triage Cards for Root Cause Analysis, 2001
4.
Reason, J. The human contribution: unsafe acts, accidents and heroic recoveries. Burlington, VT: Ashgate Publishing Company, 2008.
Page 4
2. Were these policies and procedures consistent with relevant federal and VHA policies, standards
and regulations?
3. Were these policies and procedures clear, understandable and easily available to all staff?
5. Did management have an audit or quality control process to inform them about how key processes
related to the event were functioning?
Page 5
8. If a previous RCA or audit had been done for a similar event were effective actions taken on a timely
basis and evaluated?
9. Was the care the patient required within the scope of the facilitys mission, staffing and support
service resources?
Rules Training
10. Was staff involved in the adverse event or close call qualified and trained to perform their tasks?
11. Was staff oriented to the job, facility and relevant unit policies regarding: safety, security, hazardous
material management, emergency preparedness, life safety management, medical equipment, and
utilities management?
Page 6
Patient-Centered Safeguards
1. Were patient risks and unintended consequences considered in designing or selecting the
safeguard? For example: while sally ports may reduce elopement from locked units, they may also
hinder rapid access to the unit in a medical emergency.
2. Was the safeguard intended to protect the patient (or staff, equipment, environment)?
Page 7
5. (If the safeguard malfunctioned) would the event or close call have been prevented if the safeguard
had functioned correctly?
6. If the safeguard was a piece of equipment or device, was it routinely checked and referred for
maintenance and repair?
7. If relevant, did selection of the safeguard include evaluation of blueprints, design specifications,
installation and maintenance requirements, and consideration of its effects on other processes and
tasks?
8. Was the safeguards performance tested before it was put into use (did it reliably do what it was
supposed to do)?
9. Before the safeguard was put into use, was a method for monitoring its use and effectiveness in
place (to identify the results of implementation)?
Page 8
Environment
RCA teams really need to visit the area involved in an adverse event or close call and safely
simulate the steps leading up to the event (do not have a second accident!). Walking around in the area
and getting an actual sense of the physical layout and how things work in that space will provide the
team with a clear understanding of how things could have happened.
2. Were the tools and equipment staff needed easily accessible to them?
3. Was the level of automation and technology appropriate for the area?
Risk Assessment
4. Did the work area meet current codes, specifications and regulations?
5. Were environmental stressors an obstacle (e.g., distraction, distance, noise, lighting, odors,
vibration, temperature, construction, etc.)?
6. Had relevant patient emergency and facility disaster drills been conducted (e.g., cardiac arrest
response, MRI suite, fire and weather related evacuation)?
7. Had an environmental risk assessment of the area been done (e.g., Environment of Care Rounds,
Annual Workplace Evaluation, Mental Health Environment of Care Checklist, Patient Safety
Assessment Tool, etc.)?
Page 9
9. Did new or temporary staff (students, floats) receive an orientation to the unit and emergency
procedures?
Page 10
Equipment
There are many kinds of equipment also known as devices used in health care settings. The
following questions apply to all kinds of equipment used by clinical, administrative and support service
staff as well as patients, families and significant others (from telephones and fax machines to MRI
machines, from syringes and medication vials to programmable IV pumps, from slippers and beds to
patient lift devices, etc.). RCA teams need to examine equipment, request an expert demonstration of
how it works, and safely simulate using the equipment to better understand how things happened.
3. Did equipment work ideally in terms of: staff needs and expertise, workload (enough equipment on
hand as needed), and physical space?
6. Did the equipment detect malfunctions and make them obvious in a timely manner (e.g., alarms,
warning, color changes, etc.)?
7. Were equipment displays (screens) and controls (dials, switches, knobs) easy to understand and
use?
Page 11
Equipment Maintenance
9. Was an NCPS Alert, Advisory or Recall associated with this equipment?
10. Was there a routine maintenance program in place to ensure equipment stayed in good working
condition (checking expiration dates and no use if expired, proper cleaning and sterilization, no
missing or damaged parts, etc.)?
11. If there was a maintenance program, did the most recent inspection indicate the equipment was
working properly?
12. If the previous inspection pointed to equipment problems, what corrective actions were taken and
were they effective?
Equipment Training
13. Was staff properly trained to operate the equipment before using it?
Equipment Policy
14. Did the equipment meet current codes, specifications and regulations?
15. If a vendor/manufacturer brought in equipment for use, was VHA policy followed (or was the
equipment obtained through a non-standard process)?
Page 12
16. If equipment was re-used, was it supposed to be re-used (not a single use device)?
18. Were emergency supplies and back-up systems available on-demand in case of equipment failure?
Page 13
Information Technology
Information Technology (IT) includes a wide variety of products that employees use in the delivery of
care (e.g., CPRS, VistA, VistA Imaging, BCMA, etc.). The questions in this section apply to any such
electronic products. RCA teams need a safe demonstration of the product involved in an adverse event
or close call to better understand how things happened.
Please request that IT support staff enter a remedy ticket for all IT malfunctions. In particular,
remedy tickets should note potential patient safety issue to ensure that the Informatics Patient Safety
Office is notified in a timely manner. (Consider interim measures that may be taken while awaiting a
national fix: communication, training, safeguards, etc.)
2. Once the screen or report information was accessed, was it difficult for staff to read or interpret it?
Page 14
8. If there was a mistake in the IT system, where in the process was it detected? For example: an
overdose was undetected during order entry, and caught by a nurse before it was given.
9. If two or more IT systems were used was the information displayed in the same way?
12. Did staff know what the software can and cannot do?
13. Did IT policies, procedures or training have an effect on the event or close call?
Page 15
3. Was there a process in place to minimize the effects of fatigue (e.g., strategic naps, modification of
assignments, etc.)?
5. If relevant was the patient actively engaged in unit level treatment programs and activities?
7. Was the mix of staff (clinical, administrative, support) adequate for the workload?
8. Was there sufficient staff on hand for the clinical care workload?
Page 16
9. Was there sufficient staff on hand for the administrative workload (logistics, phone calls, patient
transport, etc.)?
Page 17
Training
As adults, we learn in many different ways, and our preferences vary. Some people learn best by
listening to information, some by reading documents, some by writing down what they hear and see,
and some by watching a demonstration followed by their own return demonstration, etc. The modalities
available for teaching adult learners are equally varied: spoken instruction, discussion, coaching,
scholarly papers, posters, videos, computer-based training modules with embedded tests, low- or highfidelity simulation, etc. It is important to know how each learner learns best, and to pair that preference
with the most effective training modality available.
2. If relevant, were the patient and family/significant others trained in the use of medical devices and
treatments (e.g., oxygen, patient controlled analgesia, hand hygiene, etc.)?
4. Was training designed with the intent of helping staff perform tasks correctly using a HFE approach?
For example: practice scenarios, drills, simulation, etc.
5. Was the level of staff experience, training, scope of practice and competency consistent with their
tasks?
Page 18
Training Content
7. Was staff trained in the use of safeguards and back-up/contingency plans in case their task did not
proceed as planned?
8. Was staff trained in how to report adverse events and close calls, calling for help, and hand-off
communications?
Training Effectiveness
9. Was relevant training provided before the task started?
11. Did the results of training show that it was effective and successful?
Page 19
Communication
Communication With Patients
1. Was the patient identified correctly?
2. Were the patient and their family/significant others actively involved in assessment and treatment
decisions?
7. Was communication between front-line staff within the patient care area adequate?
8. Was communication across patient care areas adequate (e.g., transfers, consults)?
Page 20
9. If the patient had a life threatening condition was it communicated effectively with urgency?
Communication by Documentation
10. Were patient assessments from relevant disciplines available and used by the treatment team in a
timely manner?
11. Did the patients record give a clear picture of his/her work-up, treatment plan, and response to
treatment?
12. Were relevant policies and procedures available and communicated adequately to the people who
needed them?
13. Was there an existing recall/alert/bulletin related to the event or close call? Did relevant staff know
about it?
14. Did information technology (CPRS, BCMA, VistA, etc.) adequately support communication about the
patient and their plan of care?
16. Were standardized communication tools and techniques used within and across disciplines (e.g.,
checklists, read-back, hand-off templates, briefings, debriefings, time-out procedures, etc.)?
Page 21
17. Were there any obstacles to relaying or talking about safety risks?
18. Did relevant work areas welcome staff to speak up about risks and offer suggestions for reducing or
eliminating risk?
19. Does the organization promote a just culture by openly encouraging and rewarding staff when they
report actual or potential mistakes?
Page 22
Section 2
Why Did It Happen?
Root Causes and Contributing Factors
Once the RCA team fully understands what happened and why, they come to a critical step: clearly
describing how the root cause(s) or contributing factor(s) led to the event or close call. We call these
descriptions root cause/contributing factor statements.
David Marxs research and findings describing why things go wrong during airplane maintenance is
our inspiration for health care root cause/contributing factor statements.5 With the help of colleagues in
the aviation community, David found that accurate, precise and unemotional HFE-based descriptions of
what happened led to deeper, more thorough understanding. This is important, because the better we
understand why things happened, the better we can develop and implement remedies to fix or minimize
future problems. David developed seven rules of causation, and we have adopted five of those rules.
A scenario followed by an example of the correct and incorrect use of each of the five rules of
causation follows. You will see how these rules enable us to pinpoint the things that need to be fixed.
There is a 1-2-3 rhythm to these statements: 1 - something (a system breakdown or gap), 2 - led
to/caused/increased the likelihood of, 3- something (close call or adverse event). You can see that these
statements point us to fixing system breakdowns or gaps in order to prevent or minimize the close call or
adverse event in the future.
Rule 2 Negative or derogatory words (e.g., poorly, careless, wrong) do not belong in
causal statements.
Scenario:
Do we always read the instructions/user manuals that come with our home appliances? Or, do we
often assume well know how to use them since were smart people and have lots of hands-on
experience with similar appliances, and besides the instructions are just too hard to tackle? Our sense
of instructions being too hard or not worth our time is related to our limited attention span and the
constant burden on our active memory. If we add production pressure (many tasks and limited time) to
the sense of too hard its no wonder we skip instructions sometimes!
In this scenario, an experienced staff member decided not to read the instructions for insertion and
use of a new nasogastric tube after glancing at them; they were too hard to read and understand.
Correct: Instructions for insertion and use of a new nasogastric tube were in 8-point font and did
not include photos or step-by-step diagrams, so staff did not review the instructions, and the tube was
misplaced.
VA National Center for Patient Safety RCA Tools REV.02.26.2015
Page 23
Incorrect: The nasogastric tube instructions were poorly written. (This negative statement doesnt
tell us what could be fixed to make the instructions more useful.)
Rule 3 Each human error or mistake must have a preceding system level cause.
Scenario:
If you can make a mistake today, I can make the same one tomorrow unless we figure out what the
system level vulnerability is and fix it (e.g., inadequate lighting for a precise task). In this scenario
phlebotomists are required to draw blood samples at night. And, lights are kept low at night to help
patients sleep after their exhausting day. Enabling patients to sleep at night is highly valued and
emphasized. One night, under low light conditions a phlebotomist unintentionally misidentifies a patient
and the samples are mislabeled at the bedside.
Correct: Low nighttime bedside lighting substantially reduced visibility, increasing the likelihood of
mislabeled lab specimens.
Incorrect: The phlebotomist mislabeled the specimen. (We dont need to fix well-intended staff, we
need to fix the lighting or light sources available to them.)
Rule 4 Violations of policy/procedures are not root causes, they must have a preceding
system level cause.
These statements are easiest to write if your team has identified informal norms, work-arounds, or
short cuts that differ from policy requirements. Work-arounds are a cue or clue to us that something isnt
working ideally. It is also useful to discover perceived positive (addition of something valued) or negative
(loss of something valued) incentives that led to the situation in order to leverage them.
Scenario:
Sometimes patients knowingly or unknowingly bring in contraband (e.g., weapons, illicit drugs, etc.)
with their belongings. Contraband checks may occur at the time of admission to mental health units in
order to discover and safely remove such items, reducing risk to the individual patient and the entire unit
(a positive incentive). These checks may be time consuming for staff and decrease the comfort level of
newly admitted patients (two negative incentives). In this scenario, the evening shift receives three new
admissions in a short period of time. In the interest of addressing higher priority issues, evening staff
decide to defer contraband checks, and contraband enters the unit.
Correct: Time-consuming contraband checks were deferred for three newly admitted patients on
the evening shift, therefore prohibited items entered the unit.
Incorrect: The evening shift did not do contraband checks. (Blaming the evening shift for not doing
the checks wont fix anything. But, in the correct statement, we are pointed to figuring out a less timeconsuming process for use on off-tours.)
Rule 5 Failure to act is only causal when there was a pre-existing duty to act.
Task or role expectations must be clearly communicated and understood.
Scenario:
Patients scheduled for surgery and anesthesia are often required to be NPO (no food or liquids by
mouth) for a number of hours before their operation. It is very important that staff and patients/family
know about the surgical schedule and NPO restrictions beforehand.
Correct: The night shift was not notified of an update to the surgical schedule and NPO restrictions,
so the patient received breakfast, resulting in a delay in surgery.
Incorrect: The night shift violated NPO status.
5. Marx, D. Maintenance Error Causation. Prepared under contract for FAA Headquarters Office of Aviation Medicine (AAM240) Washington DC: June 9, 1999.
VA National Center for Patient Safety RCA Tools REV.02.26.2015
Page 24
Section 3
What Action Can We Take to Prevent It From Happening Again?
Developing and Choosing Actions
Purpose of Actions
Now that you have figured out WHY the adverse event occurred it is time to examine/answer the
question:
o
Page 25
i)
ii)
iii)
iv)
v)
vi)
Where will it be done? In the patients room (bedding, bathroom, closets, etc.).
Why are they doing this? Prevent patient harm resulting from contraband.
How is it going to be accomplished? Staff will wear gloves and manually search the bed and
4) Identify and consult the person responsible (process owner) for completing the action
a) Have them read and evaluate feasibility of the action. Ask them about what has been tried in the
past and whether it made things better or worse. Ask them for their best suggestions/ideas for
making care safer. Finally, ask them how they would know if things got safer (this piece of
information will help you design outcome measures).
5) Pilot test or simulate prior to system-wide implementation
a) Can these tools/actions be accurately used in the real world setting:
i) Will people have time to do this?
vii) Will the patient, staff or process be penalized by the action in any way?
viii) Will people be provided with feedback about whether or not the action was done correctly?
Action Strength
Action strengths are based on the principles of human factors where the most effective actions
accommodate or control for the limitations of human behavior and how people interact with systems,
tools, tasks and the environment through the use of design and standardization.
Stronger actions:
o The best at removing the dependence on the human to get it right (they are physical and
permanent, rather than procedural and temporary).
o Questions to ask in evaluating if the action is stronger in preventing the event/cause:
Does it involve standardized forcing functions to remove human error and variation
through technology and/or design?
Page 26
Intermediate actions
o
Reduce the reliance on the human to get it right, but do not fully control for human error.
Example of a Weaker Action statement (a policy): All computer users must maintain
strong, unique passwords for system accounts.
Page 27
Action Hierarchy
The following table breaks down some actions by strength category. For more information on other
action categories please reference the Primary Analysis Categorization (PAC) Glossary Keyword
Categories and Rules for Applying Them.
ACTION
PAC GLOSSARY
Stronger
Actions
Intermediate
Actions
Redundancy/back-up systems
Increase in staffing/decrease in workload
Software enhancements/modifications
Eliminate/reduce distractions
Checklist/cognitive aid
Eliminate look- and sound-alikes
Enhanced documentation/communication
Weaker
Actions
Double checks
Warnings and labels
New procedure/memorandum/policy
Training
Additional study/analysis
Page 28
Section 4
How Will We Know If the Action We Took Made a Difference?
Selecting Outcome Measures and Sustaining Improvement
Purpose of Outcome Measures
Measure the actions to see if they are making patient care safer by:
Eliminating/reducing the occurrence of the adverse event or root cause(s).
Measure what you expect to improve through the implementation of your actions.
completed
b) Implementation Measure Does not measure effectiveness of the action, only completion of the
action
Example: 95 percent of staff on the unit will have completed the training by June 2013. (This
outcome measure just tells us that staff completed the training; we dont know if the training made
care safer or not.)
Page 29
2) Non-Quantifiable
a) No threshold, numerator and denominator no counts/number/percentage of expected results
Example: A quarterly random sample of acute psychiatry admissions will be revealed for notification
of mental health outpatient providers upon admission and prior to discharge.
3) Non-Outcome
a) Restatement of the original action, or mention of a new action
Example: Staff will assess patients on their potential for aggressive behavior upon admission. (A
new action offered in addition to the original action of notifying mental health outpatient providers
about an acute inpatient admission.)
Conclusion
We hope that this information on the Root Cause Analysis process has been interesting and useful
to you and your team. By applying Human Factors Engineering and High-Reliability Organization
concepts we can design safer care. By keeping an open mind and asking thoughtful questions we can
figure out why adverse events or close calls happen and how to prevent them from happening again.
Thank you and best wishes from your friends and colleagues at NCPS.
Page 30