White Paper On MTBF

FAILURE RATE
• The number of failures of an item within the

population per unit of operation (time, cycles,
miles, runs, etc.)
ELECTRONIC SYSTEM
RELIABILITY - WHY IMPORTANT?
• PROBLEMS
– Electronic systems involves the utilization of very
large numbers of components which are very similar.
– The designer has little control over their production
and manufacture but must specify catalogue items.
– The designer has little control over device reliability.
– Control of the production process is a major
determinant of reliability.
– It is difficult to test for electronic component defects
that do not immediately affect performance.
• SOLUTION: Very close attention must be paid to
electronics part reliability. The design must involve a
reliability team.
OUTLINE
• DEFINITIONS
• CAUSES OF ELECTRONIC
COMPONENT FAILURE
• PREDICTION METHODS- TEST
• Mil- HDBK- 217 PREDICTION
METHODS- CALCULATIONS
– PARTS STRESS ANALYSIS
PREDICTIONS
– PARTS COUNT RELIABILITY
METHOD
– LIMITATIONS
• ADDITIONAL INFORMATION
– Other Failure Rate Data Sources
– Arrhenius Model
DEFINITIONS • OPERATING
STRESS
– The actual stress (or load) applied
during
operation of the part (e. g. voltage for
capacitor, dissipated power for
resistors)
• RATED STRESS
– The manufacturers rating for the
part.
• STRESS RATIO
– Ratio of operating stress to rated
stress.
• PART GRADES
– Grade 1, 2 etc. designates high
quality
standard parts.
– JAN, Industrial and Commercial
Grades
designations for other parts that can
be
used.
BACKGROUND
• Reliability engineering and
management grew
up largely in response to the
problems of
electronic equipment reliability.
• Many reliability techniques have
been
developed from electronics
applications.
CAUSES OF
ELECTRONIC
COMPONENT FAILURES
Electronic Failures =
f ( design, mfg. process,
quality type,
temperature, electrical
load, vibration,
chemical, stresses )
OTHER CAUSES OF
ELECTRONIC
COMPONENT FAILURES
(con't)
Electrical Load
• Higher that anticipated voltage or
current loads can
cause arcing, and other damage.
Vibration
• Shock and vibration can cause
fatigue damage to even
properly made components.
Chemical
• Contaminants introduced in the
manufacturing process
may eventually degrade an IC or
other device.
• Environmental contaminants
(moisture, etc) may
promote chemical attacks on
components.
Mil- HDBK- 217 PREDICTION

METHODS
PARTS STRESS
ANALYSIS
PREDICTIONS
• This method is applicable when
most of the
design is completed and a detailed
parts list
including parts stresses is available.
• This model takes into account part
quality,
use environment, the base failure
rate (which
includes electrical and temperature
stresses)

METHODS
PARTS STRESS
ANALYSIS (con't)
λ p= λ bπ Tπ Aπ Rπ sπ cπ Qπ E
(Failures/ 10 6 Hour)
where:
λ p = parts failure rate (Failures/ 10 6
Hours)
λ b = base failure rate (often with
electrical, temp. stress)
π T = Temperature Factor
(dimensionless typical 1 - 150)
π A = Applications Factor
(dimensionless, typical 1- 5)
π R = Power Rating Factor
(dimensionless, typical 0.5- 1.0)

METHODS
PARTS STRESS
ANALYSIS (con't)
π s = Voltage Stress Factor
(dimensionless, typical 0.1- 1.0)
π c = Construction Factor
(dimensionless, typical 1 - 5)
π Q = Quality Factor (dimensionless,
typically 0.7 to 8.0)
π E = Environmental Factor
(dimensionless, typical 1 - 450)
Each devices uses some or all of
these factors. Other
factors are also used.

METHODS
COMBINING RESULTS
• The general procedure for
determining board
level failure rate is to:
• Sum individually calculated failure
rates for
each component.
• This summation is then added to a
failure rate
for the circuit board (which includes
the
effects of soldering parts to it).
• Then effects of connecting circuit
boards
together is accounted for by adding
in a
failure rate for each connector.

METHODS
Non- operating Failures
• Parts continue to fail even when not
in use. In
general electronic parts fail less
frequently
when not operating because failures
are
related to operating stress. But other
components tend to degrade even
when not
in use. Example:
– Hydraulic parts fail because
organic rubber
seals out gas and cross link when
exposed
to heat and ultraviolet light.
– Solid rocket engines undergo
chemical
degradation and can develop cracks.
•R s = R operating R non operating
METHODS
Parts Count Reliability
Method
• Used early in the design or when
detailed
data is not available.
• Uses Generic Part Type, a Quality
Factor and
Environmental Factor.
• information needed:
• (1) generic part types (including
complexity
for microcircuits) and quantities,
• (2) part quality levels, and
• (3) equipment environment.

METHODS
Parts Count Reliability
Method
λ EQUIP = Σ N i ( λ g π Q ) i
λ EQUIP = Total equipment failure rate
(Failures/ 10 6 hrs.)
λ g = Generic failure rate for i th
generic part.
π Q = Quality factor for the i th generic
part .
N i = Quantity of the i th generic part .
n
=
Number of different generic part
categories in
the equipment.i= 1
i= n

METHODS
LIMITATIONS
• RELIABILITY PREDICTION
MUST BE USED
INTELLIGENTLY, WITH DUE
CONSIDERATIONS
TO ITS LIMITATIONS
• FAILURE RATE MODELS ARE
POINT
ESTIMATED WHICH ARE
BASED ON
AVAILABLE DATA
–THEY ARE VALID FOR THE
CONDITIONS
UNDER WHICH DATA
OBTAINED AND
DEVICES COVERED.
–MODELS ARE INHERENTLY
EMPIRICAL
Purpoee - The purpose of thfs MruboOk is to establish and maintain consistent and
uniform
ti.~ for estimating the hhemnt rek&Slity (i.e., the reUabflityof a mature design) of rnilbry
@edron&
~~~ - systems. It provides a common basfs for ~ predictionsckhg aoquis&bn progmms
for military ebctrcmc systems and equipment. h atso establishes a common basis for
oomparfng and
evafuatlng reliability predictions of rdated or competitive destgns. The handbook is
intended to be used
as a tool to increase the reliabil”~ of the equ@merxbeing designed.
1.2 Appllcatlon - This handtmok oontains two methods of reMWiJity pmdiotbn - “Part
Stress
Analysis” In Sectfons 5 through 23 amf 7%rts Count- in Appendix IL These methods vary
in degree of
informatbn needed to apply them. lhe Part Stress Anafysii Method recpires a greater
amount of detailed
In&mtfon and ts appfkabfe mrfng the later design phase when actual hardware and c&wits
are being
designed. The Parts Count Method raquires less infonnatbn, generally part quantities,
qmtity level, and
the applkatbn environmen& This method Is appfioable cMng the early de- @ase and du~
pmpo@
formulation. In general, the Parts Count Metfwd wffl usually result in a more conservative
estknate (i.e.,
~f*mte)ofsy’stem r@taMtythanthe Parts Stress Method.
1.3 Computerfzad Rellablllty PmcffctlOn - Rome Laborato~ - ORACLE is a computer
program
developed to aid in appfying the part stress analysis procedure of MIL-HDBK-217. Based
on
environmental use chamcteristks, piece part oount, thermal and electrical stresses,
subsystem repair rates
and system configuration, the program calculates piece part, assemMy and subassembly
failure rates. It
also flags overstressed parts, afbws the user to perform tradeoff analyses and provides
system mean-time-
to-failure and availability. The ORACLE computer program software (available in both VAX
and IBM
co~atible PC versbns) is available at replacement tape/disc cost to all DoD organizations,
and to
contractors for applbcatbn on spedfk DoD contraots as government furnished property
(GFP). A
statement of terms and conditions may be obtained upon written request to: Rome
Laborato~/ERSR,
Grtffiss AFB, NY 13441-5700.
What is MTBF?
MTBF is an acronym for Mean Time Between Failures. In general, a higher MTBF number
indicates a more reliable product. Beyond this simple definition, you’ll find a wide variety of
special meanings.
In the military/aerospace industries, MTBF is defined by a specific set of calculations. The formula
for system longevity is based on the thermal, electrical and environmental stresses on each
component. The engineer evaluates the components and subassemblies in a particular product
by these formulas and produces an overall number called calculated MTBF.
Another way to compute MTBF is to evaluate product reliability based on the product’s actual
performance in the field. Instead of theoretical calculations of what might occur, field MTBF is a
measure of the numbers and types of failures that the products actually experience in real
applications.
At Liebert, we track two types of field MTBF statistics: critical bus MTBF and hardware MTBF. In
the next few paragraphs, we will explain each of these.
Critical Bus MTBF

Our primary focus is on critical bus MTBF. This measures how effectively the UPS, batteries and
bypass source can support the customer’s critical load without a failure attributable to the UPS or
System Control Cabinet.
Liebert maintains a database with information on every Series 600 UPS ever shipped. We also
keep records of all reported failures. Each quarter we evaluate the reliability information and tally
up the critical bus outages that were attributable to the UPS or System Control Cabinet.
Some events are excluded from the total. For example, if a UPS experiences an alarm condition
and successfully transfers the load to bypass, there is no critical bus outage.
Likewise if utility input power fails and the UPS and batteries support the critical load for the
proper number of minutes, the UPS has done its job. If the utility power (or backup Diesel
generator) is not available when the UPS has drained the batteries, the UPS -- with ample
warning to the operator -- will perform an orderly shutdown. This is not a chargeable critical bus
outage since the equipment performed as designed.
Other excluded situations are those caused by site conditions or operator error. For example, one
customer wired his facility fire alarm system to trigger the Emergency Power Off circuit on the
UPS. Unfortunately, he forgot to disconnect the circuit before performing a routine test of the fire
alarm system. This caused a critical bus outage, but did not count against UPS MTBF.
What have we done lately?

Each quarter we tally up the cumulative system operating hours and the total number of critical
bus outages reported since the introduction of the Series 600 UPS.
As of this writing, we have records of more than 7,000 Series 600 modules in more than 5,500
systems. Cumulative system operating hours exceed 220 million. Since shipments began in
1989, we have records of just 80 critical bus failures. Considering our exposure is approximately
4 million system operating hours per month, this is a remarkably small number of failures.
We compute our field MTBF numbers by dividing system operating hours by “failures plus one.”
We do this to be conservative and to be consistent with earlier published documents. Dividing
220 million hours by 81 (80 + 1) gives us a number considerably in excess of 2 million hours. We
recognize that some Series 600 sites are not under contract to Liebert Global services and might
not be reporting all failures. Therefore we choose not to advertise the exact calculated number.
“In excess of one million hours” is sufficient.
Module MTBF
The other way we track reliability is the field MTBF of the UPS modules. For these purposes, we
count every type of module or System Control Cabinet failure that causes the module to take
itself off-line. As before, we exclude incidents of operator error, site problems or instances of
shutdown after successful discharge of batteries.
To compile this number, we have taken various sample periods. For a challenge, one of the
periods was chosen to coincide with one of the worst heat waves on record in large portions of
the Midwest and Northeast. A difficult test indeed!
During the sample periods, Series 600 UPS modules accumulated approximately 6 million
operating hours and 35 hardware failures. Of these, only one caused a critical bus outage. The
other 34 events featured the UPS successfully transferring the load to the bypass source.
Dividing 6 million hours by 35 gives a module MTBF of approximately 170,000 hours.
Methodology
The Equations
Failure Rate, MTBF, and FITs
Description of Methodology
The parts count method is a technique for developing an estimate or
prediction of the average life, the Mean Time Between Failures (MTBF),
of an assembly. It is a prediction process whereby a numerical
estimate is made of the ability, with respect to failure, of a design to
perform its intended function. Once the failure rate is determined,
MTBF is easily calculated as the inverse of the failure rate, as follows:
MTBF = 1
FR1 + FR2 + FR3 + ...........FRn
where FR is the failure rate of each component of the system up to n,
all components
The general procedure for determining a board level (or system level)
failure rate is to sum individual failure rates for each component. For
MIL-HDBK-217, the summation is then added to a failure rate for the
circuit board, which includes the affect of solder joints. Component
failure rates are provided by MIL-HDBK-217, "Military Handbook,
Reliability Prediction of Electronic Equipment", as standard part failure
rate models or directly from the manufacturers.
The failure rates presented apply to equipment under normal operating
conditions, i.e., with power on and performing its intended function in
its intended environment. Consideration is given to various
environments, component quality, and thermal aspects.
The Equations
A sample calculation for integrated circuits taken from MIL-HDBK-217 is
as follows:
Failure Rate = (C1 * PiT + C2 * PiE) * PiQ * PiL
Each factor in this equation is dependent upon a certain part
parameter. The end result of this equation is the failure rate of the
integrated circuit.
Failure Rate, MTBF, and FITs

For this discussion, we will assume that the resulting failure rate is
shown in failures per million hours. This is simply the number of
failures that you would expect to have in a million hours of operation of
your equipment. Failure rates for many basic devices are well below 1
failure per million hours, so these values may seem insignificant. But if
you have hundreds of parts in your design and have a thousand
systems operating in the field, you can see that the failure rates will
quickly add up. MTBF, or Mean Time Between Failures, is the inverse of
the failure rate and is the average time between failures. It is
calculated from the failure rate as follows:
MTBF = 1,000,000/Failure Rate
You can choose the units in which the failure rate is shown. Another
common unit used, besides failures/million hours, is failures per billion
hours which is also known as FITs (Failures In Time).
What is MIL-HDBK-217?
MIL-HDBK-217 is a reliability prediction standard originally developed for defense and aerospace
related organizations, but later adopted by many commercial and industrial companies. Many
times referred to simply as 217, MIL-HDBK-217 includes mathematical reliability models for
nearly all types of electrical and electronic components. These reliability models are based on
parameters of the components such as number of pins, number of transistors, power dissipation,
and environmental factors. Results from MIL-HDBK-217 are provided as both a failure rate and
as an MTBF (Mean Time Between Failures) where the MTBF is the mathematical inverse of the
failure rate.

White Paper On MTBF

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

White Paper On MTBF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

White Paper On MTBF

Uploaded by

Copyright:

Available Formats

FAILURE RATE

• The number of failures of an item within the

miles, runs, etc.)

Mil- HDBK- 217 PREDICTION

Mil- HDBK- 217 PREDICTION

Mil- HDBK- 217 PREDICTION

Mil- HDBK- 217 PREDICTION

Mil- HDBK- 217 PREDICTION

Mil- HDBK- 217 PREDICTION

Mil- HDBK- 217 PREDICTION

Grtffiss AFB, NY 13441-5700.

Critical Bus MTBF

What have we done lately?

Failure Rate, MTBF, and FITs

You might also like