1 A Story of A Failure: Management's Summary: A Worker Drilled Into The Top of A 5 Gallon

1
A Story of a Failure
“Listen to the people doing the work…That’s how managers learn…”
Management’s Summary: A worker drilled into the top of a 5 gallon

Liquid Propane tank with a ¾ inch drill causing a potentially fatal near
miss of an explosion in a pressurized gas storage area.
Management’s Action: Discipline the employee in question; send all

shop personnel to additional training; rewrite shop procedures to
include a section prohibiting drilling into pressurized 5 gallon Propane
tanks with hand tools.
Freon comes in many different sized pressurized containers for many different
reasons. Once used, any sized empty tank becomes a problem to remove from
the facility (even before used, Freon brings with it its own set of environmental
issues). The problem is that while the tanks are still under pressure, these tanks
cannot easily, legally, or environmentally be removed. In fact, what is normally
done is that the cans are punctured, flattened, and recycled.
For this story, the Freon tanks in question are 5 gallon containers. These
Copyright © 2012. Ashgate Publishing Ltd. All rights reserved.
Freon tanks are used all over the facility and, in a stroke of organizational and
environmental brilliance, a central collection point was created in order to
provide a place to gather all the empty Freon tanks when they had been used.
This central empty Freon tank dumping spot was basically a square pin made
of chain-link fencing with a gate. This area was specifically designated as a
place for the facility to collect the empty Freon tanks, depressurize and flatten
these tanks, and prepare them to be recycled.
This Freon tank collection system worked fairly well. No one person at
the company owned the Freon process. The collection area and the process
of recycling the Freon tanks was the “brainchild” of a team of workers who
had been tasked with creating a way to recycle these tanks, minimize waste,
and reduce the costs associated with disposing of the tanks. The maintenance
organization kind of inherited this tank disposal program. It was not assigned
to the pressurized gas department, nor was it directly assigned to the waste
minimization teams. This program was given to the folks who managed the
maintenance shops.
Conklin, Todd, Dr. Pre-Accident Investigations : An Introduction to Organizational Safety, Ashgate Publishing Ltd,
2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/portsmouth-ebooks/detail.action?docID=976591.
Created from portsmouth-ebooks on 2018-11-07 07:31:17.
2 Pre-Accident Investigations
When the cage filled up with many empty tanks, someone was selected
to go out to the storage area, sort through the tanks, depressurize and flatten
the tanks, and pack them up on a pallet to go to the metal recycler. No one
worker owned this job. This job was seen as a straightforward process and was
usually assigned to new or less experienced workers. It was a straightforward
task, requiring little experience, easy to do, and was seen as more of a nuisance
than actual work.
For this event, a new worker who was not getting along well with coworkers
within the shop was selected to clean out the Freon cage. He was taken to the
work area and shown how to pop a small plastic seal on each Freon tank to be
recycled, in order to depressurize and begin the process of flattening. He was
shown the work activity by a supervisor, observed doing the work twice on his
own, and left to finish the project alone in the work area. The worker seemed
to understand the task, be able to accomplish the task with few problems, and
capable of continuing with the task to completion.
The disposal of Freon tanks is an easy, unimportant, low risk task. This
task was considered so unimportant that there was no process owner, no
formal procedures, no identified risk assessment activity, and, therefore, no
risk reduction activities created to do this work. Why would a worker need any
of these things? Recycling Freon tanks is not a high risk activity. Well, that was
the case at least until the process failed.
And fail it did. The worker drilled a hole in the side of a 5 gallon, pressurized
Propane tank instead of a Freon tank. Propane is explosive. Freon is not
explosive. Luckily, in this case the tank did not explode. In this case, the
organization got lucky. This is a case of an event that did not happen, and
in turn was a gift to the organization since they could learn from it without
having a terrible failure.

However, the decision was made by the post-event investigation that this
worker should have known the difference between the two types of tank. If
the worker had recognized the difference between these two tanks, the worker
would not have tried drilling a hole in the tank of explosive gas, placing his life
at risk, and, as importantly, the future of the facility at risk. Nothing stops work
at a company faster than blowing up a worker.
The Second Story of the Same Failure
At this company’s facility there was no disposal path for small, 5 gallon Propane
tanks. You know these tanks pretty well. Every Propane grill in almost every
back yard in the country has a small, white, 5 gallon Propane tank underneath.
Many work groups at this facility owned Propane cookers. These grills were
used for cookouts and picnics that were held in the same facility: worker
A Story of a Failure 3
celebrations, parties, and picnics. The Propane tanks on the cookers last quite
a while.
Nothing is exceedingly unusual about Propane tanks and grills being used
at a facility. The problem at this company started when the tanks became
empty. In this company, it was immensely complicated to replace a Propane
tank, or refill a Propane tank. A task that would take minutes to do if you were
at your home was almost bureaucratically impossible to do at this facility. This
created the problem of having empty Propane tanks at the facility, and no real
way to refill these tanks. In fact, at this facility it was much easier to buy a new
tank than to refill an old tank. That left empty Propane tanks with no disposal
path available.
Unfortunately, the facility had no way to dispose of all of these tanks. There
was no Propane tank recycling program. The pressure shop did not handle
Propane tanks at any time for any reason. If you had an empty Propane tank
in your work area, it was a problem. In fact, the pressure shop foreman said in
an interview that an empty Propane tank at this facility was a problem for life.
There was no process to dispose of these empty tanks.
Which led to an interesting part of this story of failure. It seems that because
empty Freon tanks had a special little fenced-in area, an occasional Propane
tank would not be seen as out of place there. After all, in the mind of a worker,
a place to recycle a pressurized gas tank is a place to recycle a pressurized gas
tank. In workers’ minds all tanks are tanks when they have to find a place to
deposit an empty one. So, after hours, when nobody was watching, employees
at this facility were dropping off empty Propane tanks in with the Freon tanks.
These tanks don’t look alike, but they don’t look that different either. The
problem is that the tanks are entirely different. Freon tanks can be smashed
and recycled. Propane tanks are almost impossible to smash and depressurize.
They are actually designed to be refilled, not recycled.
To add to the complexity of this case, pushing in on a pressure-release seal
button on the tank easily depressurizes a Freon tank. Propane tanks are not
easily depressurized: even if you open the valve at the top of the tank all the
way, it will not depressurize.
If your job is to clean all the white-painted tanks out of the storage area and
depressurize, flatten, and recycle all of these tanks, you might find yourself in
a position where in order to accomplish the boss’s task you must adapt to any
and all tanks that appear in the empty Freon storage area. More amazingly,
the fact that this worker drilled a hole with a ¾ inch drill through the side of
a Propane tank is quite remarkable. It is hard to drill a hole in a pressure vessel
like a 5 gallon Propane tank. It takes time and discipline. It is not an easy task
to accomplish.
The Story of the Story
Suddenly, an event that was thought to have lots to do with a lack of

experience, skill, and intellect on the part of this worker (how can that guy be
so stupid? Geez nobody drills a hole in a Propane tank) was now a story about
the procedure, ability, and policy for the disposal of 5 gallon Propane tanks at
this company.
It is easy to say that the worker made an error in judgment in trying to
recycle the wrong kind of tank, and, in fact, the worker did make an error in
judgment of the situation; but the problem was not the worker’s ability to
know the difference between a Propane tank and a Freon tank—the problem
is much more systemic, and more compelling.
The problem was a unique mixture of some pretty normal conditions at
this facility. We can list many of them with little effort: a dual location of many
small tanks in the same area, the inability to refill tanks at this facility, the
inability to get rid of old and empty tanks, a new employee, the absence of
100% supervision, production pressures, performance management, HR—and
the list can go on and on and on.
It is at this point that the facility will feel a need to make a decision. The
facility will probably not know they are not making the decision they think
they are making. The decision is not about how to handle the worker. The
decision is far more about how the facility chooses to see this failure.
There are Two Choices

Fix the worker (training, discipline, or termination) who did something he did
not mean to do, in the hope that he won’t again do something he did not
want to do in the first place.
Fix the system that determines what should happen with the empty
Propane tanks. A system that was clearly not designed to imagine the presence
of Propane tanks where Freon tanks were to be stored.
Fixing the worker gives the impression of an immediate solution to the
problem, but probably fixes the wrong thing. Punishing the worker is a
fast and easy way to “solve” the problem, with the only issue being that it
unquestionably fixes nothing at all, not even the worker in question. Because
the whole failure will inevitably happen again with a different worker. It may
take some time to drift to this point again, but over time it will happen.
Fixing the system that determines where empty tanks are stored and
disposed of looks at a much larger problem that will take longer to fix, cost
more money most assuredly, and take longer to complete. However, it will fix
the right problem, and will ensure that the facility will never be in a position
again to accidentally drill a hole through a Propane tank.
The second answer will be better for the worker, better for the organization,
and better for the world, and will ensure a safer, more reliable path forward
for this organization.
Sadly, it is not the path that most facilities normally choose to follow.
The goal for this book is to provide a usable road map for understanding a
different way to manage safety in your workplace. In a way, this book should be
more like a road map than a textbook. This book should answer one question:
“All these ideas are great, but what should I do differently in my organization
to make this change? How can I find, and solve these problems, before they
become accidents?”
It is essential to note early in this book that I am standing on the shoulders
of giants in this field. So much effort and thought have gone ahead of all of
us in just understanding the theoretical underpinnings and creating a “new
view” for the world of Human Performance and reliability. Many, many times
I will direct you through quotes, cartoons, models, and ideas that are directly
attributed to these fathers and mothers of the Human Performance idea. We
won’t spend much time on theory (although it is my hope that this book will
make you want to read everything ever written about the topic), that is not
the goal.
This book will give you practical guides, tips and tools, and suggestions for
managing risks, hazards, and safety.
Practical means that the ideas and techniques you read in this book can
be used in your organization almost the day you read them. It is the goal of
this book to give simple and fast options you can deploy on the “plant floor”
immediately. If you try some of these ideas, you will see changes immediately—
seriously, immediately.
This is a substantial claim, a claim that in most cases I would be afraid to

make to you. Not here, trust me.
The Three Parts of Every Failure

The
The Context The Retrospective
Consequence Understanding
1 2 3 4
Figure
Conklin, Todd, 1.1 The
Dr. Pre-Accident Three
Investigations : An Parts ofto Organizational
Introduction Every Failure Safety, Ashgate Publishing Ltd,
What is the Difference between the Word “Event” and

the Word “Failure”?
In our business we talk a lot about failures. Not only do we talk a lot about
failures, we talk even more about events. In many ways, these words are used
to mean the same thing—they are semi-interchangeable. In many other ways,
these two words are worlds apart. We will need to give these two terms an
understood meaning for our discussion, but first let’s define what we mean
by the word “performance,” as in the title “Human Performance.” Because to
understand what is happening, a failure or an event, we must first understand
how the idea of performance expectation sets the goal. Operational outcomes
affect the meanings and context of a word. Performance, by its precise nature,
is a word that is defined by its outcome.
What is Performance?
Performance is the degree to which you get what you expect from a
person, a machine, or a process.
Anything other than what you expect is some type of deviation from what you
wanted. Sometimes systems and people over-perform (although that seems
rare, sadly), and sometimes our systems, and people underperform (that one
you probably know intimately). Either way, performance is defined by the
degree to which an action or process meets a desired outcome.
In reality, I don’t like the terms “event,” “accident,” or even “failure” all
that much. The best way to think about these “distinct moments in time or
a process” that have either attracted or should have attracted organizational
attention is as a moment where performance expectations were not met.
Instead of using “event” or “failure” as words to describe these times, I actually
prefer to use the term “deviation from an expected outcome.”
Any performance expectation that is not met (over-performance or
underperformance) is a deviation from an expected outcome. An accident, an
event, even a near miss, is certainly a deviation from an expected outcome. A
process improvement, a good idea, a success against all odds is also a deviation
from an expected outcome. One is a bad thing, the other is a good thing, but
they both are something different from what was expected.
Deviation from Expected Outcome (DFEO)
The only problem with using the term “deviation from expected outcome” is
that it gets really old, really fast. I tried to make an acronym out of this phrase
(DFEO); I never could get that to stick. DFEO just never caught on. I am not
surprised it did not stick. It is awkward to say, and difficult to remember. It also
doesn’t sound very practical. Saying “deviation from an expected outcome”
is something that a college professor would say, not a safety professional or
even a manager.
So what is the difference between the term “event,” and the term “failure”?
The quick answer is that one of these terms has a big and bad outcome. The
other has a small and simple outcome. But honestly the answer is that there is
no difference at all in the words themselves. The difference in the meanings of
these terms is in the person using the terms, and in the people that are listening
to the terms. There is no difference in the terms, themselves—they are both
deviations from expected outcomes. Both of these terms are indicators of a
difference in expected performance.
A New Definition of Safety
For all too long now our organizations have tried to define successful safety
programs by counting the number of times our workers screwed up or
got injured. In actuality, what we count is the absence of accidents. Our
safety incentives are measured and built around a zero accident goal. The
Institutional safety numbers we report are usually numbers of injuries or days
away from the workplace. These numbers are usually reported “up and out”
of the organization, to customers, regulators, and the world.
You don’t have to be a genius to know that something seems oddly wrong
about the way we measure safety success. We count the number of people we
hurt, and totally discount all the people we are keeping safe. The problem is,
and always has been, you can’t count what doesn’t happen. It is hard to count
the millions of decisions that are made every day in the field that don’t lead to
some type of failure. Those millions of operational decisions were all safety and
performance reliability successes.
This new definition of safety or performance reliability demands not only
a different way to think about events, accidents, and failures, but also, more
importantly, a new way to manage your organization. You must change the
way you manage safety in order to align your organization and operations
to a new definition of what “safe” is, and why it matters. Assuming your
management knows how to lead and manage this new method is a gigantic
mistake, and will lead to horrible outcomes. These are new ideas and new
ways of doing business. How would your managers know this information if
you did not inform them of this new way of doing business?
Management now wants the safest workplace that is humanly possible.
Obviously, no manager wants workers to get hurt. Knowing that management’s
intent is true, good, and hopeful, allows us to search for reasons for not
changing
Conklin, Todd, other
Dr. Pre-Accident than motivation.
Investigations : An Introduction to Organizational Safety, Ashgate Publishing Ltd,
Those reasons almost always come back to one single factor. Most
managers want to manage safety and performance using the best and latest
ideas for success. The problem is that we, as a new-view safety professional
community, have not told them about those ideas, and how to use this new
view. Take the way we define safety—this is a perfect example of the old world
view dominating our new world organizations.
Safety Defined
Safety is not the absence of events; safety is the presence of defenses.
Safety can’t be the absence of events. Safety is the presence of defenses within
the system, environment, and processes. If you constantly choose to use the
traditional standards in order to measure safety success, you will constantly
create workplaces where information about events and injuries is hidden from
management. Worse yet, we are measuring our failures.
The problem is that it is hard to measure what doesn’t happen. For every
accident that occurs at a workplace, how many accidents didn’t happen? How
many people didn’t get hurt? How many potential problems did workers who
identified the error precursors before they had a chance to become full-fledged
events solve? These numbers are seemingly impossible to count. That could be
why these numbers are not valued by your organization.
In reality, safety is probably best defined by this idea:
Safety is the ability to perform work in a varying and unpredictable

workplace environment.
This definition, although probably the most accurate, is the most difficult to
use operationally and discuss with the workforce. Yet, we do this every day.
Every day we live in an unpredictable and constantly varying world.
Human Error
It is distressing enough that the world is varying and unpredictable. Human

beings are also prone to their own unpredictability. Humans make errors.
People are fallible, and even the best of us make mistakes.
Error is an unexpected deviation from an expected outcome. Error is an
unintentional event. Error is doing something that you didn’t intend to do.
Errors happen all the time. Everybody makes errors, everybody. The very
worst performers make errors. The very best workers make errors. Error is a
predictable and natural part of being a human being.
People make a lot of errors. Not all errors have a consequence. In fact, not
all errors are actually errors. We only really notice an error if it has some type of
outcome or consequence that is large enough to be noticed by either you or
other people around you. Error only becomes apparent if you notice an error.
Contemporary wisdom says that the average skilled worker, workers who
work with their hands, makes 5 to 7 errors per hour. That same wisdom says
that a knowledge worker, workers who work with ideas and concepts, not
making things, makes between 15 and 20 errors per hour. True or untrue,
we all make errors. Errors are how we are wired, how we are made, a natural
part of being human. Human error is inevitable—all workers are error-making
machines. What all this means is pretty simple: error is everywhere, and there
is nothing you can do to avoid the errors. You can’t punish error away. You
can’t reward error away. Error is an unintentional, unpredictable event. You
know it, and I know it. Now we have to make sure that our organizational
managers remember that fact.
But it is not that easy. Error is always attributed in retrospect to the
worker by the organization after some type of consequence happens to the
organization. In other words, the worker making an error is determined after
an event happens, not before, and is usually seen as a moment in time where
the worker did something wrong. If the worker had done something other
than what the worker did, the event would not have happened.
This is always going to be true, but wrong, of every failure that will ever
happen in your organization. The notion that the worker chose to make this
error is also always true, but wrong. Choosing to make an error only becomes
an actual choice after the event has happened. In fact, error can never be
a choice. Error can never be a violation. Error is simply the unintentional
deviation from an expected behavior.

It is easy to find errors in retrospect. It is even easier to judge these errors
as wrong in retrospect. This process is called a fundamental attribution error.
Think of it like this:
A worker is walking across the office parking lot to go in to his office.

During this walk, the worker steps on a rock and sprains his ankle. The
injury is bad enough that the worker has to go to the clinic and have his
leg treated. The worker ends up with a series of x-rays, a cast on his foot,
and a day or two off from work.
This is an example of an error; however, this error will be attributed to the

worker’s judgment and walking ability. At some point, some manager will
comment that if this worker had “watched where he was going” this event
would not have happened. If this worker had cared more, the worker would
have been more attentive. If the worker had been more attentive, the worker
would have stepped over the rock, and completely avoided this injury. This
injury is clearly the worker’s fault.
The organization is attributing the error to the worker’s judgment and
behavioral choices. In a way, what the organization will do is assume that
a bad outcome must happen to a bad person. You will hear some manager
say something like this, “if only…if only the worker would have paid more
attention…”
Read on and see how remarkably unfair that way of thinking is to your
workforce.
The next day, the manager of the “twisted ankle worker” who was
injured the day before was walking across the same parking lot. The
manager had a lot on her mind. She had a reportable injury. She had
a worker who had to go to the hospital and get medical attention. She
had an employee that had gotten hurt under her watchful eye. While this
manager was thinking of all these things she stepped on a rock, the very
same rock, in the parking lot, and twisted her ankle. She was in pain and
could barely walk…but she didn’t report the injury.
Instead, this manager found out whose job it was in her organization to
sweep the parking lots and sidewalks. The manager immediately called
the roads and grounds crew supervisor, and had his crew sweep the
parking lot. Because this manager was so busy thinking about the event
that had happened to her worker, she was concentrating on her safety
problem and not on where she was walking.
What is intriguing about this concept is the idea that when the worker
walked incorrectly it was the worker’s fault. When the manager walked badly,
her response was to not only fix the problem, but also to move the fault from
her as the manager to the person whose job it was to keep rocks off the parking
lot. That is the fundamental attribution error, and it happens all the time with
safety events. Every critique or after action meeting you will ever attend will
have a moment when some manager will claim that had he been the worker
he would not have made the same errors that were made by the worker in
question.
Human error is a weird thing. Error is always present, but not always
noticeable or noteworthy. Error is usually attributed after the fact—a way
to explain rationally what happened and why. Error is hard to explain and
hard to predict—these are accidental actions done without knowledge of the
consequences. Error is never intentional—there is no such thing as an honest
mistake because there is no such thing as a dishonest mistake—they are all
errors, unintentional deviations from expected behaviors.
If all this is not complicated enough, the real question is if an error is actually
ever an error. Error is essential to our discussion, as you will see as you read on
in these chapters. Error, however, is not everything—I would caution you not
to fall prey to the idea that if you just could stop error you would stop failure.
You can never stop error, because errors we identify become visible only after
the failure.
Is Zero Accidents the Right Goal?
The quick answer is you don’t want to work in an organization that has any
other goal than zero. Zero is probably the best individual goal that any worker
could have. What is more compelling is the fact that most workers actually meet
that goal. Most workers will work their entire careers and not have a serious
accident or event. Zero accidents actually becomes in reality an outcome, the
right outcome, but having zero accidents does not make a safety program. It
is, at best, the target at which your program is aiming. Asking workers to have
zero accidents does not tell workers how to have zero accidents.
The problem is that “zero” is not a particularly realistic goal for the entire
organization. In short, the problem is that “zero” is a standard of perfection.
Your workers and managers are not perfect, and will make mistakes. They
know this fact, and you know it. You will have failures. You will not be able to
hold events to a “zero standard” over the lifetime of the organization. You will
have failures.
In many ways, that’s why your organization has learned to count the
number of failures, and track that number—with the goal of constantly driving
that number down to zero. It is a noble and noteworthy goal, zero. The
only problem with having zero for a goal is that you (and every other safety
professional in the world) are only one ankle sprain away from being the worst
safety manager at all times.
There is another problem, and that is the problem with using “perfection”
as our goal. Perfection leaves no room for the inevitable human error that will
happen. And we know it—errors will take place. We just need to learn where
and when they could take place. When your organization uses only perfection
as its standard, anything other than perfection is just fundamentally wrong.
Let’s not belabor this point too much; you live this every day of your life, but
suffice to say that if a victory in your business is no accident, we will never
actually know how you won the contest.
Humans and organizations learn by trial and error. We learn by becoming
smarter after we fail. In fact, when we fail, we most often go right back out
to the worksite and try again—only this time we try to fail smarter. When we
devalue failure, in many ways we are inadvertently devaluing our ability to
learn.
We Can Learn a Lot from the Auto Industry
Almost every day in some workshop someplace, I ask this question: “How
many people were killed in fatal highway accidents last year?” It is a depressing
question to ask a group of workers. Every highway death is a tragedy. But this
is a compelling question.
Why is this compelling? Partially because in the United States the fatal
highway numbers have stayed right around 38,000 people a year for almost
50 years. In 2000, our numbers were around 38,000. In 1990, the numbers
were around 38,000. In 1960, the numbers were also nearly 38,000. We have
leveled out in the number of people killed on our highways.
Yet, the number of cars on the road has dramatically increased. The
number of drivers has increased. Also, the number of millions of miles driven
by American drivers increases every year. Still our highway fatality numbers
stay essentially the same.
Why?
Every day, workers tell me the answer is that cars have changed. Cars are safer
today than they ever have been in automobile history. Today, roads are safer,
better designed, better engineered, better marked, better patrolled than in
the past. Everything in our highway and vehicle system is better engineered,
smarter, and safer. In fact, the auto industry has done everything but change
the driver.
The Statement “change everything but the driver…” is important. In many
ways on our job sites, we have taken the opposite approach. We have tried to
get to safety performance by “leaving everything the same except fixing the
worker.” It is about time to admit to ourselves that we have been managing
safety wrong. We were simply doing safety management backwards.
We can learn much from the idea that our organizations respond in the
opposite way. We have almost always “changed nothing but the worker.” We
rarely fix the system around the worker. We almost always try to “fix” the
worker—as our sole corrective action.
Why is it safety programs in our organizations are so interested in fixing the
worker and not truly engaged in creating survivable space in our work systems
and processes? I fear that the answer may be either we are lazy, or we are
cheap. All defenses in vehicle transportation that can be engineered to create
the least number of failure consequences possible—or the most survivable
space possible—are being created every day with one goal: “We cannot fix
drivers, drivers will never be perfect, drivers will screw up. We must design and
create systems around the drivers and passengers that keep them safer.”
Drivers, according to the daily discussions I have with workers and managers
in classes, and workshops, are getting worse. There are many things that make
drivers less attentive, less effective, less safe in their operation of a vehicle. Cell
phones, texting, GPS units, fancy stereos, iPods—the list can go on and on
of things that steal attention away from the highway. In fact, the automotive
industry assumes that drivers will be placed in positions of trading attention
between driving and “functioning” in the driver’s seat.
This is most compelling, because highway fatalities are getting rarer and
rarer. In fact, chances are extremely high that you will not die while driving or
riding in a vehicle. This is a direct result not of trying to stop car accidents (which
are all bad outcomes and should be avoided), but, in fact, of knowing that
when an accident inevitably happens the goal is to reduce the consequences
of the accident. Make cars fail safer.
Making our cars fail safer? That is exactly the primary idea of defending
against accident consequence. Not one person has the power to stop all
accidents in the workplace. Human beings simply are not particularly good at
predicting the future. I will gladly admit that I am not smart enough to avoid
all accidents in my personal life, let alone in my facility.
Because you cannot prevent all accidents, you must assume that accidents
will happen, and use your time, energy, effort, and resources in dramatically
reducing the consequences of the accidents that will happen in your workplace.
You must build systems that allow our workers to fail safer. Start thinking like
the car industry, and make your systems safer.
Rumble Strips
Probably my personal favorite defense against vehicle accident consequence

is the humble rumble strip (pardon my rhyme). Rumble strips are the grooves
that are placed in the actual road surface along the roadside, just outside the
normal driving system, that alert the driver that they have drifted out of the
system. These strips vibrate and audibly alert driver that something in the
system is different.
Rumble strips assume that the driver will make a mistake. Rumble strips
are designed to work when the driver screws up. In fact, rumble strips will not
work when the driver is actually behaving as road designers assume drivers
should behave. Rumble strips only work when the driver becomes inattentive.
What rumble strips do is allow the driver to have a reasonable warning
that something is wrong. The driver may then correct their behavior before
an accident happens. The screw up is still there, the only difference is that the
driver has time to recover. This example illustrates the importance of this idea.
Workers (and drivers) are at their best when they are allowed to detect and
correct their performance in normal systems. Normal systems are most reliable
(and they are by definition terribly reliable) when the humans using those
systems are allowed to detect trouble, and then correct that same trouble
while in normal operation of work.
From Today on…Look Forward
It is not that your managers, workers, and safety professionals are bad, or that
they make bad decisions. That is simply not true. Your managers have done
and continue to do their very best in making the workplace safer and more
reliable. Managers assuredly do want what is best for the organization, and
the fixes and corrective actions that have been done for years were fixes and
corrective actions that managers believed would fix the problem and make the
workplace safe.
In fact, it is essential to remember that managers have the best intentions
in mind when they march workers off the job site, or when they fire a worker
for making a mistake, or when they scream at workers for not stopping work
right before the failure that just happened.
The problem that we are attempting to solve with this new Human
Performance philosophy is not in how we are motivating owners and managers;
the problem is much more about teaching a new way to manage safety in your
organization. Just as we say it is bad to try to “fix” your workers, it is just as bad
to try to “fix” your managers.
Consider this: you are not trying to change the managers in your
organization, you are just trying to move their understanding of safety and
reliability to a different perspective. These managers will still be able to use
their talents, experience, and management tools—they will just be using them
in a different way.
What you are actually trying to do is to move your organization from a
“crime and punishment” model of managing safety performance to more
of a “diagnose and treat” model of safety management. We know that we
can’t punish away safety issues. Try as we might, we cannot hold workers
so accountable that they never again have a failure. We have to identify and
understand the underlying issues that create the environment in which these
failures can happen.
Don’t go back and look at old investigations and corrective actions to see
how you could have responded better or differently. That only serves to bring
up unpleasant feelings around events that you worked hard to both understand
and “fix.” It is far better simply to draw an imaginary line in your calendar
today, and simply respond differently the next time you have to respond. Look
forward, not backwards, in managing safety. You will have plenty of chances to
try these ideas. Just as you will have many chances to talk your managers out
of old school reactions to new view responses. In our business we always have
opportunities to practice these skills.
Try these ideas, and ask your management team to try these ideas. You
honestly have nothing to lose and everything to gain. Remember, you can
always move back to “blame and fix.”
This page has been left blank intentionally
2
Why Think about Failure at All,
Let Alone Think “Differently”
about Failure?
Questions about Your Organization:
• Is your organization as safe as it should be?

• How does your organization measure safety success?
• Is the production of product more valued than protection of people,
really?
• Look around and see what is valued with money, time, and recognition—
what is most relevant to your management team?
• Are your processes written to make work happen successfully, or to
avoid compliance failure?
• Do you learn as an organization from your successes as much as, or
more than, from your failures?
Here’s what we have learned about safety…not everything, but some

crucial things that seem to make a difference at work. If you will allow me the
pleasure, I am going to make a list and, in turn throughout this book, discuss
these ideas in more detail:
• The safest workers seem to be workers actually performing work.

• Workers are as safe as they need to be, without being overly safe, in
order to get work done.
• Workers consistently create safety in practice while they do their work.
• Workers are constantly detecting and correcting variations in the work,
the work environment, themselves, and others—to create safety.
• The work the workers do in an organization is remarkably different from
the work that was planned for the workers to follow.
• Workers constantly have to make the work match the process, not the
processes match the work. Workers are not empowered to “officially”
change your process—only you have the power to change the process.
• Your worksites are undoubtedly a “hidden laboratory of alternative
choices,” in which we are always hoping that the workers make the
right choice—and
Conklin, Todd, Dr. Pre-Accident Investigations : Annormally
Introduction tothey do…until
Organizational they don’t.
Safety, Ashgate Publishing Ltd,
• Your job sites, work areas, and facilities are normally exceedingly safe
and dependable—worse things can happen than will happen—and
because work is normally safe, workers are less careful.
In the opening case study, the Propane versus Freon tank failure, the failure is so
damn stupid that it is hard to believe there would be any other organizational
reaction to this failure than terminating the worker. The worker in this case
study had it all—bad attitude, lack of ability to get along with his coworkers,
“stupid” written on his forehead; he probably had bad habits and smelled
funny. And in many ways this is the precise problem we are trying our hardest
to combat. It is faster, cheaper, and a lot easier simply to claim the problem
belongs to the worker and “fix” the worker. By simply choosing to purchase
this book, you are telling me that you already have thought about why “fixing”
people is so remarkably risky.
When you think of Human Performance, think of the three layers of your
organization that are always present, during both failure and success. All three
layers must be considered:
• The individual worker

• The organizational system
• The performance expectations set up by management
Too often organizations tend only to consider the first third of this list: the
worker. The worker is where we put most of the effort; the worker is what
we try to fix. The problem is if you only try to fix the worker, you are only
getting ⅓ of the total organizational landscape. You therefore miss ⅔ of the
potential problem: the system and the management expectations. You must
teach yourselves to look at all three levels.
If the organization in the case study had fired the worker, it is a decent bet
that in this case the problem would have gone away. The idea that there could
ever be another perfect combination of a storage area, a tank smasher, that
worker, that supervisor, that day, and that particular organization is incredibly
low. But it is still there.
Is a “chance that event will happen is extremely low” an operational limit
you can live with in your organization? Could your organization survive two
failures involving Freon tanks being smashed dangerously?
By understanding the entire story, the whole system that “housed” the
event, you can begin to understand not only the failure in the case study. You
can also understand the other failures that are waiting to happen within that
system, that organization, even that same work team. The ability to investigate
an accident before it happens seems to be an extremely positive step forward
in the understanding of not just failure, but the systems, people, technology,
and components that must be present for failure to happen.
Why Think about Failure at All 19
There are Lots of Ways to be Safe, but Failure Seems to

Have a Single Path
This part gets a little tricky. My advice is not to over-think these next ideas, but
to understand how vital it is to your future success to recognize that these ideas
have power over how you react and respond to failure. In short, the failures
you look at in your organization could have only happened the way that they
happened. If anything else had happened during the event, the failure would
not have taken place. This idea is always true in retrospect, and seems to need
the addition of the word “duh!” every time it is said in a meeting or event
review.
But ponder this idea for a moment. Knowing the nature of all the things
that had to have taken place in order for failure to happen is pretty valuable
knowledge about not only the failure event, but also the organization that
hosted the failure.
How much of what happened always happens?

Answer: Almost all of what happened always happens.
How much of what happened has never happened before?

Answer: Usually nothing different is recognized until after the accident
happens. Remember this: it is exceedingly hard to notice something
that doesn’t happen.
How relevant is it to know the difference between the two sides of the
Freon/Propane story: the first story and the second story?
Answer: Knowing both sides of the story is the most vital part of
understanding the event. In many ways our entire job is to be able to
look at both sides of normal and abnormal factors, and make sense of
this information for the organization.
We are often lulled into believing that if we can just find the single, root place
where the world went wrong we will have done our job and will then be
able to make work a safer place for all. The problem with this idea is that we
are learning that there is no one, single root cause that is the problem. The
problem is always the relationships, causalities, or spaces around that single
root cause, which seem to be in a constant state of motion and variability.
Knowing what was normal, what was not normal, and the entire gradient
between normal and not normal begins to tell a different story of the accident,
incident, failure, error, mistake, or upset.
What is Failure?
It is pretty essential to get our vocabulary together. So let’s start with the
understanding that when we use the word “failure” we are actually meaning
the universe of words we can use to describe a deviation from expected
outcome—DFEO. Failure is a good word to use because it is a large word that
encompasses many other ideas. Accident, event, mistake, operational upset,
flaw, deviation from the norm, screw up, foul up—these are all words that
fit within the notion of failure. So when you see or say the word failure you
are opening up all of the above mentioned situations in a quick and efficient
manner.
Failure as a term is also significant because it crosses professional boundaries
well. Quality professionals understand failure, production professionals
understand failure, budget professionals understand failure, and, therefore,
you don’t have to spend much time translating these ideas internally.
Perhaps as significant, failure as a word translates well between professions
and between cultures. There actually is not a direct translation for the word
“safety” in Spanish, Japanese, or many Middle Eastern languages (all cultures
I have spent much time working with Human Performance issues with in the
last couple of years). But the word failure moves easily between concepts and
languages in many cultures.
Why some cultures don’t have a word for “safety” and do have a word
for “failure” is, to a great extent, a real fundamental part of this book, and is
pretty darn interesting. Safety, in many ways, and in many parts of the world,
is a luxury that exists when the rest of societal needs are met, or close to being
met. It hardly matters if the workplace is safe if you are more prone to being
killed at home or on the streets. The luxury of safety for those of us in the
developed world is no luxury at all—it has become a necessity and now a true
business advantage.
Let’s further define failure using Eric Hollnagel’s definition. Hollnagel, an
expert in this field, says, “A failure is the unexpected combination of normal
performance variability.” Hollnagel goes on to discuss that accidents don’t
happen because workers in the field have gambled their lives, and the
organization’s reputation; failure happens because of three key ideas.
First, failure happens because the worker believes that what is about to
happen to them is simply not possible. This is so profoundly true that it seems
simple and obvious. However, after a failure takes place we so easily move to
a place where we see the entire picture. We know how the story started, what
happened in the middle, and how the story ends, so we can see every possibility
for the failure along the way. The worker has none of that information, and,
therefore, does not have the same story or view of the failure within the event
that we hold after the event.
Every time you drive home from work you fall into this potential failure
mode. I will bet you that you don’t perform a major safety systems check, take
a pre-drive walk around your car, or even check your mirrors before you go
home. I will bet you hop in the car, and drive home. That is what most of us do
every single night of the work year. Why? Don’t you know better?
Sure, you know better, but you also know that that same car got you to
work in the morning without failure, so why would it not get you home that
night? You have no indicator that anything is wrong, or even could be wrong.
So you just hop in the car and go. Workers do the same thing at work. It
worked last time. It worked the last 10,000 times. Normally, it always works
OK. Why would it not work the next time?
All of us know that success one time does not lead to success the next time.
Yet, success one time actually does usually lead to success the next time. In
fact, failure is extremely rare. Your car does not break down very often. In fact,
your car is much more dependable than it is not dependable. Think about this
idea for a moment. Your car runs many, many more times than it doesn’t run.
Your car is more successful than the occasional mechanical failure. The same
goes for workers.
Secondly, failure happens because failure often has nothing to do with the
tasks and processes that the worker is currently doing in the field. We focus
exceptionally well as humans on one thing at a time. Current neuroscience is
telling us that humans are not particularly proficient at multitasking. In fact,
neuroscience studies show that humans don’t multitask—instead, humans
share attention, some of us are fast attention sharers, and some of us are
slow attention sharers. Depending on the context, we often find ourselves in
situations where we cannot share attention at all. Think of the moment you
pull a rental car out of some foreign airport and have no clue which way to go.
At that moment, you are focused entirely on one thing—driving in a direction
that will get you to your destination. You stop attention sharing, you turn
down the radio, and you are not talking on the phone. You are driving for your
life, or so it seems.
That same notion is true of things like texting and driving, or welding
and catching your clothes on fire. These things are connected, but they are
usually connected post-failure and in such a way that the connection becomes
obvious and simple—after you see the connection.
Thirdly, failure happens when the worker feels the possibility of getting
the intended outcome is well worth whatever risk is present in the work
environment. The worker makes a trade-off between being safe and being
productive. You want productive workers. The worker chooses to take a
chance and guess that this iteration of this process will not fail this time and
in this way. In other words, the worker looks around to see if any manager is
watching and goes for it. The pop culture reference to this is “get ’er done!”
Many of you are fully prepared to argue that that is gambling. It may
appear to have all the components of gambling. It may look like gambling.
Hell, it probably smells like a gamble. Problem is, it’s not gambling at all.
When you think of worker intent, or, as we have discussed it so far, gambling
workers, think of how serious the outcome is to the notion of risk. In fact, think
of this by using one simple word: surprise!
If a worker looks around and takes a chance and it fails, and the worker
is not surprised, then the worker knew better than to take the chance, and
probably was really gambling with your organization’s reputation. But, if the
worker looks around takes a chance and it fails, and the worker is genuinely
surprised by the failure, then the worker was trying their best to get a difficult
task done. It honestly is that simple, and is the greatest test of intent that I know
personally. What do I normally discover when I am sent in to investigations?
The surprises. I find many places in the middle of failure contexts that are
just plainly and simply a surprise to the worker. I use the word surprise in my
write-ups of events. I find the word and the idea of a surprise condition that
the worker discovers while doing work to be a concept that is well understood
by the people doing the work, and also well understood by the people who
manage the people who do the work.
Types of Failure
There are primarily two types of failure. These two types of failure are
noteworthy because they help us understand how to move forward with the
process of understanding not only how to investigate a failure, but also how
to create corrective actions that will ensure that this failure, and others like it,
is less prone to take place in our organizations. Knowing the type of failure
allows us the power to know how to proceed in organizational understanding
and learning. Soon we will discuss the power of learning in creating safe and
efficient workplace systems.
• Individual Failure: A failure happens and the worker is not protected

from the dangers present in the work environment. The consequence
of the failure is almost always to the worker or workers in the event.
This category of failure would include events similar to cuts, slips, trips,
and falls; chemical or hazard exposure; strains. In general, failing to
protect the worker from anything physical happening to that worker is
an individual failure. Hurting people is an awful thing. Hurting a worker
is seen as a failure of the worker not to operate safely.
• Organizational Failure/System Failure: A failure where the
organizational systems allow some type of threat to the system to have
a consequence where many people are adversely affected. In a system
failure, someone or something has been able to break through the

many layers of defenses that were thought to be in place protecting the
facility, its people, and reputation. What you often find is the profound
difference between thinking there are many layers of defenses, and
actually having the many layers of defenses in place. Organizational
events are things like oil spills, reactor coolant failure, single point
recycling storage pens for collocating different small gas containers, or
even air traffic controllers sleeping during the nighttime hours when no
planes are scheduled to land at an airport.
Why do you care about these two failure types—and I might add you certainly
do care about these two failure types? Because failure is almost always much
more complex than it may at first appear.
Dekker says it something like this: it is much more seductive to see Human
Performance as puzzling and perplexing, as opposed to complex. We become
much more interested in the “why” question and somehow miss the “how”
question about failure. Where we see puzzling “whys,” the story of the failure
often tells us something quite different. The story of the failure finds that
success is complex: safety critical work depends on expert human performers
performing work in an expert way. Our organizational systems tend to run
degraded (from the launch of a system until it runs to failure), and plans,
processes, rules, regulations, and procedures that we assume guide work
towards safe operations are almost always incomplete.
You use the types of failure—something you have done automatically your
whole career to understand where to start and stop the telling of the story of
the failure. Most of the stories I tell in my work are stories about the way the
organization failed the worker, and almost never about how the worker failed
the organization.
The concept gets clearer when you break a failure down a bit.
Caution: I am not asking you to deconstruct the failure down to its

smallest parts. Near as I can tell, examining an event by taking the event
down to its smallest parts has little value at all for you, for the workers
involved, or for the organization. What we want to deconstruct is not the
failure, but the way you and your organization understand the failure.
This next idea will help a lot.
Failure has Three Parts
Every failure that takes place in your workplace can be divided into three
distinct parts. Let me rephrase that statement: failure must be divided into
three parts in order for you and your organization to begin to understand what
is happening. You must be able to understand how the failure environment
changed while the actual failure was happening.
The three parts of a failure are straightforward, and immensely important:
Part 1. The Context: Everything that led up to the actual failure

event
We will call this “the context of the failure.” Sidney Dekker, my friend and
challenger on all things involving my organization, calls this part of a failure
“getting into the tunnel with the worker.” Everything—and I do mean
everything—that is happening while the work is being done plays a role in the
outcome of that specific work. We must try to capture and understand all the
things that are happening to set up this failure to fail. The problem is that we
must look both at the timeline of the event (a rather linear view of the world
and not at all truly representative of reality), while at the same time look at the
complex relationships that exist between all the many moving parts that had
to align to cause this failure to actually be triggered by these workers.
Part 2. The Consequence: The failure itself
Someone or something triggers an unexpected outcome that for our discussion

is seen by the organization as a bad outcome. Oddly enough, this is both the
easiest part to identify and the most uninteresting part of the failure. This
actual failure shows itself to us in many ways. All of these ways will be a clear
deviation from an expected outcome. Often these ways are tragic and horrible.
But the failure itself is only significant in the sense that it is the event. The
failure will have an immediate consequence and will drive an often immediate
organizational reaction.
Part 3. The Retrospective Understanding: Everything that happens

after the failure happens
Call this the organizational reaction. Ah, the power of retrospective. This is
where understanding the three distinct and different parts of a failure presents
its best and most valuable pay-off. Everything that happens after the failure
is colored with one giant rational advantage—that is, the fact that from this
point on you and your organization know how this story ended. Remember,
those workers back up in Part 1 of the failure had no inkling of this knowledge.
Had they the rational advantage of knowing this failure would happen, it is a
pretty solid bet that they would have done something—anything—differently
in order to avoid the Part 2 of this event.
All three of these parts of a failure always exist in every failure, no matter the
size. All three must be recognized and understood. Perhaps most importantly,
you and your organization must be able to understand that all three of these
failure parts are different, and must be defined and understood as different in
order to understand fully the “story” of the failure itself.
When Understanding Failure, Some Things Always

Matter
All divisions of failure aside, there are a set of core factors that seem to be
present and relevant in all events. These core factors are necessary and
collectively sufficient to help you tell the story of the failure. Without the core
context information, your story is incomplete.
Core Context Information List:
I must have an explanation of the failure

I don’t need the causal analysis document; I don’t want the fault tree; I am
not even sure I need a timeline; but I have to have an explanation of what
happened. I am surprised by how often I go to the field and look at amazing
report documents that don’t thoroughly explain how the event happened.
I actually look for a story that can be told of how the failure happened.
Remember your junior high school English class? A story has a beginning,
middle, and an ending; the story must make sense (have fidelity), and move
through time logically and sequentially, but not necessarily linearly.
I need to understand how the energy of the failure was or was not isolated from the
worker or workers involved in the failure
That sentence is written in High Reliability-ese; let’s see if we can reframe this
idea in a way that makes sense to people who are in the field. I need to know
what went wrong and what went right. Where did the failure start, and when
did our systems kick in to mitigate further failure (if at all)? Which all leads to
the third requirement…
I need an understanding of how some type of defense or defenses built into the
organizational system were either not present, or if present for some reason failed
This chapter of this book seems to contain many absolutes. Here is one more:
every failure that takes place happens because some defense either did not
work or was not present in the work environment. This defense comment is
always true; albeit incredibly retrospective, it is always true. Think of it like
this: I am not surprised that air traffic controllers, working alone in the middle
of the night at an airport with no scheduled landings or takeoffs, take naps. I
would be surprised if there were no way to wake the controllers up when they
were needed to perform their mission. A horn or a buzzer connected to the
radio would be an excellent defense against an exceptionally severe lack of
stimulation and an incredibly high amount of boredom.
I must be ever vigilant in remembering, “size does not matter”

The consequence of a failure does not determine the importance of the failure.
We are humans, our organizations are made up of humans, and we all believe
that consequence drives outcomes. I have spent the last 15 years of my career
asking this question of classrooms full of people just like us: “What is the
difference between a big event and a small event?” The answer, and I think
I have heard them all, really should always be, “nothing.” All events allow us
the opportunity to learn. Small events, those events with small consequences,
are often much richer in context and story than what our organizations define
as large events.
I must always remember that bad things don’t always happen to bad people
Or at least stop thinking the person is bad (or allowing the person to tell
me that they were for some reason bad). The truth of the matter is that bad
things happen all the time to very good people, very experienced people who
had no intention other than doing the very best job they could do for their
organization. As silly as this seems, not thinking people involved in an event
are bad is difficult and will take deliberate attempts on your part to not label
the workers involved as something less than perfect.
Here is what you know from years of doing safety work. Sometimes big,
horrific events are actually quite simple, straightforward failures. A bolt snaps,
scaffoldings fall, and the consequences are terrible. The other side of that
coin is that sometimes tiny events have profound learning implications for
the organization. A small event happens, and you suddenly realize your entire
system is flawed and must be immediately strengthened. Event size is not
nearly as powerful as event learning. Don’t let the drama of an event (or lack
of drama of an event) dictate how much or how deeply you respond.
The problem is that the only way you know which events are information
rich is to look at them. Therefore, every event demands event learning. Every
event must demand event learning.
Case Study
The Titanic: A Story to Help
You Rethink How You Think
About Failure
Before you read this case study, a study you will undoubtedly know something
about, ask these three questions as a pretest of your failure understanding
ability:
• Whose fault was the sinking of the Titanic?

• What caused the Titanic to sink?
• What corrective actions would you have recommended for the next
ocean liner to be built?
The Titanic was built to be the flagship for the White Star Line. It was the
largest and most opulent passenger ship in the world. This stately ship was
“state of the art” for its time. White Star designers touted a ship built with
all of the luxuries of the “gilded age,” with countless fineries and undeniable
craftsmanship: a Turkish bath, teak wood furniture (even in third class), a
heated pool and lending library. There were crystal chandeliers, hand carved
banisters, and gold leaf detailed throughout the ship. No expense was spared
in its luxury and beauty.
The same eye for detail was brought to the technological design of the
Titanic. New technologies were integrated. It boasted a 5 kilowatt Marconi
wireless radio (invented just 11 years before), a telephone system, two bronze
10’ propellers, and a double plated 2’ steel hull. The builders and designers
had believed they had built the safest ship in the world. In a word, it was
“unsinkable.”
The Safest Vessel of its Time
The Titanic was to be the safest vessel of its time. A major improvement to its
design was its hull. The hull was subdivided into watertight compartments.
This was a new safety design and had never been done on any other passenger
vessel.
The ship was monumental, measuring 175’ high, with 9 decks. At its
top speed it could travel 23 knots (26 mph). Fully loaded it had a passenger
capacity of 3247 in all three classes, and a crew capacity of 885.
The Titanic began its maiden trip from Southampton, England to New York
City on April 10, 1912 with 2,223 crew and passengers. The first class passenger
list boasted the “jet set” of its time. The list included Benjamin Guggenheim,
John Jacob Astor IV and his wife Madeleine, and Isidor and Ida Strauss, the
owners of Macy’s department store. In the second and third classes were many
families moving to start a new life in America.
Systems are Normally Dependable…
The first few days of the journey went incredibly smoothly, leading the ship’s
owner, Bruce Ismay, and its experienced Captain, Edward John Smith, to begin
to feel a bit more at ease. They considered pushing the ship’s speed a bit.
The normally choppy North Atlantic seas were calm, and the weather was
cooperative. Every hour the crew would drop a bucket over the side of the
ship and bring it up, drop in a thermometer, and measure the temperature
of the water. The temperature would indicate the presence of ice. The crew
were rotating their shifts in the eagle’s nest at the top of the ship, spying for
ice. The ship was also receiving Marconi messages from other ships in the area
providing the location of ice fields moving into the shipping lanes. These were
the standards of practice at the time. The crew was making sure that they, and
more importantly, the passengers, would have a smooth trip to New York.
Until they Aren’t
In the late evening of April 14, 1912, right after the first class passengers
enjoyed a magnificent 10 course meal, at 11:40 pm, the Titanic hit an iceberg.
Only seconds after the crew saw the ice, the ship was hit on its starboard
(right) side, with the impact lasting a few seconds, leaving a giant gash in the
steel hull.
In just under three hours, this dazzling, beautiful liner sank to the bottom of
the North Atlantic, resulting in the tragic demises of 1,500 people. This is the
deadliest maritime disaster in our peacetime history.
How?
How did this happen? This ship was “unsinkable.” There were systems in place
to ensure everyone’s safety. The Titanic was a technological marvel. It had
everything to conquer the predictable hazards of transatlantic passage at the

time. Multiple safety systems existed around every analyzed hazard that was
known. Safety was not just a significant value; safety was also a marketing
strategy. It would be hard to find evidence that reliable and safe operations
were not one of the most important operational values. Nothing could have or
should have ever happened. The entire mission of the Titanic was not to sink,
and yet it did.
Let’s continue the story of this complex, multilayered failure. All of the
designed safety systems, and trained and safe workers were in place, and
functioning exactly the way they should have been. Nothing unusual happened;
in fact, what happened was an amazing and unexpected combination of many
normal things.
We will never know the exact size of the iceberg; however, early newspaper
reports of the time estimate that the iceberg was around 50 to 100’ high and
200 to 400’ long. However, an interesting point about the size of icebergs is
that only a teeny, tiny part of the berg is visible above water; the greater part of
the iceberg is underwater, and is 10 times larger than the visible portion. Most
icebergs expose about 1/10 of their mass above water, leaving the other 9/10
of their mass below the water line. In this part of the Atlantic, icebergs were an
expected, accepted, and normal part of sea travel.
Sixteen separate cells of Titanic’s hull could each be isolated and closed
with special doors in the event of an emergency. This was an amazing new
design for any passenger ship. However, to be watertight they would have to
be able to be closed like a box on six sides, making parts of the ship impassable
to crew and passengers. This unintended impediment was operationally
impractical, so doors were placed in the bulkheads to allow access. Because
of these doors, an adaption in the ship’s design so that the ship could be
operated, the watertight bulkheads were not really watertight. Because of the
design confidence in these separate cells, no consideration was given to the
idea of what would happen if water did penetrate them. The cell walls were
tall but not watertight all the way up to the ceilings of the ship. Water could
flow over into each hull compartment. All of these new and highly technical
defenses were designed with the best intention of safety. This system was the
most modern and the safest ever developed.
In the crow’s nest of the ship, the place high above the ship where crew
kept a lookout for ice, there were supposed to be a pair of binoculars. White
Star leadership made a relatively normal last minute change in the equipment
officer. The old equipment officer forgetfully took the key to the equipment
storage locker with him when he was transferred from the ship. Titanic’s new
equipment officer couldn’t find a replacement key, and because there was no
access to the equipment storage area, the lookouts were not issued with any
binoculars.
The White Star Line’s official policy was that binoculars were supposed to
be given to all lookouts. Since they only had a few binoculars on the ship due
to the error, these binoculars were given to the highest ranking officers, not
the lookouts. On the night of the collision, the weather was cooperating. It is
very difficult to notice something that doesn’t happen, and in this case in this
sea condition, nothing was happening. In these conditions it was very difficult
to see the iceberg. Churning or violent water would have caused breakers,
with some flotsam and jetsam, around the base of an iceberg in the water,
making it easier to see it from a greater distance.
Titanic did not strike the “typical” iceberg. It wasn’t white, gleaming out
of the ocean like most icebergs. Witnesses say that it was almost invisible from
continuous melting and refreezing. This clear surface of the berg reflected the
water and dark, clear night sky like a mirror, thereby making it almost black, or
the same color as the evening ocean. It would have been the ocean equivalent
to the black ice seen on roads. This type of iceberg is nearly impossible for a
qualified and competent lookout to see in these conditions, even with the
keenest eye and the highest professional motivations. The very best lookout
could and did easily miss this hazard. Not because the lookout could have done
it better, but because this hazard was so difficult to identify in that condition.
The Titanic crew had a total of 6 hours for sea trials before they sailed from
Southampton to New York. The sea trials were pretty much a formality. The
members of the crew who stoked the fires were on board, and representatives
of the White Star Line, but no domestic staff were aboard. Once the surveyor
from the British Board of Trade, who was also present, saw that everything
worked to his satisfaction, and that the ship was seaworthy and crew were fit
to service passengers, he signed an “Agreement and Account of Voyages and
Crew.” This document was valid for 12 months, and that deemed the ship
seaworthy. They did have additional time to prepare when the ship traveled
from the shipyard in Belfast, Ireland to Southampton. The crew had 6 hours
of practice to prepare for its first intercontinental cruise. Many of the crew did
not know where stairwells and exits were located, especially in the second
and third class areas. Some of the exits were locked, keeping the passengers
below deck. Familiarity with the ship was clearly as complete as it needed to
be without being overly complete. This short sea trial was perfectly acceptable
for the most technically advanced and safest ship ever built. This sea trial only
becomes incomplete, rushed, and wrong because the Titanic was lost at sea.
Safety Systems Compliant
The Titanic was fitted with a total of 20 lifeboats, comprising three different
types. Initial designs for the ship had included 48 lifeboats. However, in an
uncharacteristic but financially reasonable cost-cutting measure they cut 28,
leaving only enough lifeboats to hold 58 percent of the passengers and crew.
Because of the added safety design features built into the ship itself, it was
acceptable to determine that the additional lifeboats were not necessary. The
White Star line was fully compliant, well within the law. The British Board of
Trade regulations stated that all vessels over 10,000 tons must carry 16 lifeboats
with a capacity of 5,500. These regulations were becoming antiquated since
the sizes of passenger vessels were growing enormously. The Titanic weighed
in at 46,000 tons. Not only had the White Star Line fulfilled their legal and
regulatory commitments, they were providing more lifeboat accommodations
than were required. The high casualty rate in the sinking was due to the fact
that although White Star were meeting the current regulatory requirements,
the Titanic only carried enough lifeboats for 1,178 out of a passenger and
crew list of 2,223. An outcome from the investigations into the disaster in
both England and the United States was a drastic change in the antiquated
regulations.
Recent tests on the steel from the Titanic reveal that it was far more brittle
than the steel used on modern ships, but it was the best available at the time.
It is easy to look back and say that they used inferior steel. However, that is the
advantage of time, retrospect, and knowing the ending of this tragic story. The
White Star Line spared no expense in this part of the construction, and used
what was the best material of the time.
Profit over Protection
Titanic housed an office for the wireless company The Marconi Company.
The wireless allowed the passengers to send and receive cables, a luxury in
communications at that time. It also allowed other ships in the area to transmit
the coordinates of ice flows. Policies on these types of cables were simple: the
cables with the locations of ice fields or icebergs were given to the bridge
officer, who would then tack these paper cable transcriptions on to a board in
the bridge. This provided current information to the officers or those coming
on the next shift.
The Marconi officer working the night of the sinking logged that he gave
a cable to Capt. Smith in the early evening. He remembered that the Captain
was in a rush and that he placed it into the right-hand pocket of his jacket
with just a precursory glance. There were six iceberg warnings received by
the Titanic on the day of the disaster. One message got to the Captain, but
the others were ignored by the wireless operator who was busy transmitting
passenger messages. Passenger messages were the Marconi’s bread and
butter. The wireless was clearly less available for operational messages (which
generated no money for the company) because of the substantial profitability
of the passenger use.
New Rules or Old Rules?
Company policy only required the wireless offices to be operating until

midnight. Titanic hit the iceberg at 11:40. Most of the ships in the area had
turned in for the night by the time the ship was transmitting calls for help.
For several years the distress call for ships was CQD, which translates as: CQ =
attention all stations, D = distress or danger. A short time before the Titanic’s
launch, an international convention introduced a new distress call to supersede
the traditional CQD. The letters chosen were SOS, not because they stood for
anything in particular (although rumored to be Save our Ship), but because
they were simple enough for the new amateurs flocking to the wireless craze
to send and receive. Titanic was tapping out at 12:15 “CQD MYG” (the ship’s
call letters). Titanic was the first ship to have ever used the newly negotiated
distress call SOS. The ship’s wireless crew tapped both CQD and SOS in order
to cover all bases.
At 12:19, on the Cunard Line’s Carpathia, just as the wireless operator
prepared to turn in that night, he heard the call. The ship was less than 60
miles away. The Carpathia picked up and saved the Titanic survivors. Had the
operator turned off his wireless, the distress call would not have been heard
and no rescue craft would have responded.
At around 10:30 pm, the liner the Californian, a cargo ship, had stopped
for the night on the edge of an ice field. As was the custom of the time, the
wireless operator had turned the radio off and had gone to bed. The ship’s
night crew noticed a behemoth liner stop an estimated 6 miles to the south
at 11:40 pm.
Shortly after midnight, the Captain of the Californian was informed by the
crew that an enormous passenger liner was shooting rockets into the sky. The
crew concluded that this ship had anchored for the night and that there was
some type of party. These were not fireworks but emergency flares. At 2:20 am,
it was noticed on the Californian that the Titanic had disappeared. Those on
the Californian believed the other ship had steamed away. The Californian’s
wireless operator woke up in the early morning and turned on his Marconi
radio. It was at that moment the crew learned of the tragic fate of the Titanic.
In both the British and U.S. Senate inquiries into the disaster, the Captain
of the Californian insisted his ship had been many miles north of the Titanic,
and that he could not have reached it in time to rescue passengers. However,
many survivors testified in the investigations that they had seen the lights of
another ship roughly six miles north of the Titanic. Both international inquiries
concluded that the Californian may have been only six miles or so to the north
of the Titanic. The Californian could have reached the Titanic before it sank.

1 A Story of A Failure: Management's Summary: A Worker Drilled Into The Top of A 5 Gallon

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 A Story of A Failure: Management's Summary: A Worker Drilled Into The Top of A 5 Gallon

Uploaded by

Copyright:

Available Formats

1

Management’s Summary: A worker drilled into the top of a 5 gallon

Management’s Action: Discipline the employee in question; send all

having a terrible failure.

The Second Story of the Same Failure

The Story of the Story

Suddenly, an event that was thought to have lots to do with a lack of

There are Two Choices

This is a substantial claim, a claim that in most cases I would be afraid to

The Three Parts of Every Failure

What is the Difference between the Word “Event” and

Deviation from Expected Outcome (DFEO)

A New Definition of Safety

Safety is not the absence of events; safety is the presence of defenses.

Safety is the ability to perform work in a varying and unpredictable

It is distressing enough that the world is varying and unpredictable. Human

deviation from an expected behavior.

A worker is walking across the office parking lot to go in to his office.

This is an example of an error; however, this error will be attributed to the

Is Zero Accidents the Right Goal?

We Can Learn a Lot from the Auto Industry

Probably my personal favorite defense against vehicle accident consequence

From Today on…Look Forward

Questions about Your Organization:

• Is your organization as safe as it should be?

Here’s what we have learned about safety…not everything, but some

• The safest workers seem to be workers actually performing work.

• The individual worker

There are Lots of Ways to be Safe, but Failure Seems to

How much of what happened always happens?

How much of what happened has never happened before?

• Individual Failure: A failure happens and the worker is not protected

failure, someone or something has been able to break through the

Caution: I am not asking you to deconstruct the failure down to its

This next idea will help a lot.

Failure has Three Parts

Part 1. The Context: Everything that led up to the actual failure

Part 2. The Consequence: The failure itself

Someone or something triggers an unexpected outcome that for our discussion

Part 3. The Retrospective Understanding: Everything that happens

When Understanding Failure, Some Things Always

Core Context Information List:

I must have an explanation of the failure

I must be ever vigilant in remembering, “size does not matter”

• Whose fault was the sinking of the Titanic?

The Safest Vessel of its Time

Systems are Normally Dependable…

Until they Aren’t

everything to conquer the predictable hazards of transatlantic passage at the

Safety Systems Compliant

Profit over Protection

New Rules or Old Rules?

Company policy only required the wireless offices to be operating until

You might also like