MTBF, MTTF, MTTR 2009
MTBF,MTTF,MTTR
BY:
KIRAN.R (080922018)
RAJATH RAO (080922
CLASS: 2ND SEMESTER M Tech C.A.M.D.A
DEPARTMENT OF MECHANICAL AND MANUFACTURING ENGINEERING
MANIPAL INSTITUTE OF TECHNOLOGY, MANIPAL
DATE OF SUBMISSION: 29-01-2009
MIT(CAMDA) Page 1
MTBF, MTTF, MTTR 2009
Seminar report on MTBF, MTTF, MTTR
Introduction
Mean Time Between Failure (MTBF),MTTF has been used for over many years as a
basis for various decisions. Over the years more than 20 methods and procedures for
lifecycle predictions have been developed. Therefore, it is no wonder that MTBF has
been the daunting subject of endless debate.
Explanation:
Definition of failure
The termination of the ability of the product as a whole to perform its
required function.
The termination of the ability of any individual component to perform its
required function but not the termination of the ability of the product as a
whole to perform.
MTBF impacts both reliability and availability. Before MTBF methods can be
explained, it is important to have a solid foundation of these concepts. The difference
between reliability and availability is often unknown or misunderstood. High
availability and high reliability often go hand in hand, but they are not
interchangeable terms.
MIT(CAMDA) Page 2
MTBF, MTTF, MTTR 2009
Reliability is the ability of a system or component to perform its required functions
under stated conditions for a specified period of time
MTBF, or Mean Time Between Failure, is a basic measure of a system’s reliability. It
is typically represented in units of hours. The higher the MTBF number is, the higher
the reliability of the product. Equation illustrates this relationship.
Availability, on the other hand, is the degree to which a system or component is
operational and accessible when required for use
Mean Time between Failures (MTBF)
MTBF is the sum of the operational periods divided by the number of observed
failures. Referring to the figure below,
Reliability is quantified as MTBF (Mean Time between Failures) for repairable
product and MTTF (Mean Time to Failure) for non-repairable product. A correct
MIT(CAMDA) Page 3
MTBF, MTTF, MTTR 2009
understanding of MTBF is important. MTBF is measured in hours and indicates the
expected failure rate of a product, if a device has an MTBF of 200k hours; this
indicates that a failure occurred once every 200,000
200,000 hours during the manufacturer's
testing of the product. MTBF is a measure of how reliable a product is. MTBF is
usually given in units of hours; the higher the MTBF, the more reliable the product
is.
Downtime
owntime is the instantaneous time it went down, which is after (i.e. greater than)
the moment it went up, uptime. The difference (downtime - uptime) is the amount of
time it was operating between these two events.
A power supply with an MTBF of 40,000 hours does not mean that the power supply
should lastt for an average of 40,000 hours. According to the theory behind the
statistics of confidence intervals, the statistical average becomes the true average as
the number of samples increase. An MTBF of 40,000 hours, or 1 year for 1 module,
becomes 40,000/2 for
or two modules and 40,000/4 for four modules. Sometimes failure
rates are measured in percent failed per million hours of operation instead of MTBF.
The FIT is equivalent to one failure per billion device hours, which is equivalent to a
MTBF of 1,000,000,000
00 hours.
MIT(CAMDA) Page 4
MTBF, MTTF, MTTR 2009
The formula for calculating the MTBF is
Ɵ=T/r
Ɵ = MTBF
T = total time
r = number of failures
When there are no delays in repair:
MTBF = MTTF + MTTR
Mean time to failure (MTTF)
An estimate of the average, or mean time until a design's or component's first failure,
(you may not want to include external failures), or disruption in the operation of the
product, process, procedure, or design occurs. Mean time until a failure assumes that
the product cannot be repaired and the product cannot resume any of its normal
operations. MTTF is related to items such as expected and/or operating life or other
items that in general are not fixed or replacement even though it sometimes may be.
Some systems, such as spacecrafts, cannot be repaired after a major failure. In other
cases, even though maintenance tasks can be performed offline, they cannot be
performed during a mission. For all of these types of non-repairable systems, the
time to system failure is an important reliability characteristic. The expected value is
known as mean time to failure (MTTF). Because a non-repairable system can fail
only once, both mean time to first failure (MTTFF) and MTTF refer to the same
metric. Because the time to failure is equivalent to the time before failure.
MIT(CAMDA) Page 5
MTBF, MTTF, MTTR 2009
Difference between MTTF and MTBF
MTBF (mean time between failures) is the expected time between two successive
failures of a system. Therefore, MTBF is a key reliability metric for systems that can
be repaired or restored. MTTF (mean time to failure) is the expected time to failure
of a system. Non-repairable systems can fail only once. Therefore, for a non-
repairable system, MTTF is equivalent to the mean of its failure time distribution.
Repairable systems can fail several times. In general, it takes more time for the first
failure to occur than it does for subsequent failures to occur
To distinguish between the two, the concept of suspensions must first be
understood. In reliability calculations, a suspension occurs when a destructive test or
observation has been completed without observing a failure. MTBF calculations do
not consider suspensions whereas MTTF does. MTTF is the number of total hours of
service of all devices divided by the number of devices. It is only when all the parts
fail with the same failure mode that MTBF converges to MTTF
γ= T/N
γ= MTTF
T = total time
N = Number of units under test
Example: Suppose 10 devices are tested for 500 hours. During the test 2 failures
occur.
The estimate of the MTBF is:
Ɵ = 10*500/2= 2,500 hours / failure.
MIT(CAMDA) Page 6
MTBF, MTTF, MTTR 2009
Whereas for MTTF
γ = 10*500/10 = 500 hours / failure.
If the MTBF is known, one can calculate the failure rate as the inverse of the MTBF.
The
formula for (λ) is:
λ = 1/Ɵ = r/T
Where:
r = number of failures.
λ = is the failure rate
The probability that the product will work for some time T without failure is given
by:
R(T) = exp(-T/MTBF)
Mean time to recovery (MTTR)
MTTR is the average time that a device will take to recover from any failure.
Examples of such devices range from self-resetting fuses (where the MTTR would be
very short, probably seconds), up to whole systems which have to be repaired or
replaced.
MTTR can range from a few milliseconds, as in the case of an uninterrupted power
supply (UPS) to many hours or even days in the case of application software or
complex machinery.
The time taken to restore the system back to normalcy includes the period of
diagnosis of the problem as well as its rectification. When the failure rate is
predictable and well documented, the MTTR can be considerably reduced. On the
MIT(CAMDA) Page 7
MTBF, MTTF, MTTR 2009
other hand, if the system fails unexpectedly, the time taken to diagnose the problem
itself might be quite high in the first place. Sometimes improper diagnosis can lead
to faulty repairs that can complicate matters and lengthen the recovery period. All of
these can contribute towards raising the MTTR for the system.
Some systems have redundancy built into them so that when one subsystem fails,
another takes its place and keeps the whole system running. While the overall
system has a zero MTTR, the faulty subsystem still needs to be repaired or replaced
and hence the subsystem alone has a non-zero MTTR.
When the MTTR is built into a maintenance contract, a lower MTTR would
normally entail a higher cost since the service provider has to ensure that the system
is restored within a shorter period of time. Hence the service buyer has to pay more
for this quicker turnaround time.
System reliability is a matter of importance to a wide range of industries. Be it the
manufacture of automobiles, aero planes and rockets or the creation of complex
software for the smooth running of a major business corporation, system reliability is
an area of great concern for the creators as well as the users of the system. So MTTR
is a vital parameter that indicates how soon things will get back to normal which has
a great bearing on the overall stability of the system.
Misconceptions about MTBF
A common misconception about MTBF is that it is equivalent to the expected
number of operating hours before a system fails, or the “service life”. It is not
uncommon, however, to see an MTBF number on the order of 1 million hours, and it
would be unrealistic to think the system could actually operate continuously for
over 100 years without a failure. The reason these numbers are often so high is
because they are based on the rate of failure of the product while still in their “useful
MIT(CAMDA) Page 8
MTBF, MTTF, MTTR 2009
life” or “normal life”, and it is assumed that they will continue to fail at this rate
indefinitely. While in this phase of the products life, the product is experiencing its
lowest (and constant) rate of failure. In reality, wear-out modes of the product would
limit its life much earlier than its MTBF figure. Therefore, there should be no direct
correlation made between the service life of a product and its failure rate or MTBF. It
is quite feasible to have a product with extremely high reliability
Application:
One area in particular where this is evident is in the design of mission critical
facilities that house IT and telecommunications equipment. When minutes of
downtime can negatively impact the market value of a business, it is crucial
that the physical infrastructure supporting this networking environment be
reliable. The business reliability target may not be achieved without a solid
understanding of MTBF.
MIT(CAMDA) Page 9
MTBF, MTTF, MTTR 2009
References
www.wikipedia.org
Wendy Torell, Reliability Engineer, American Power Conversion.
MIT(CAMDA) Page 10