[go: up one dir, main page]

0% found this document useful (0 votes)
15 views12 pages

l6 Reliability of System

Maintenance reliability

Uploaded by

Faraja Machira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views12 pages

l6 Reliability of System

Maintenance reliability

Uploaded by

Faraja Machira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

RELIABILITY

he concept ‘reliability’ is often confused with another concept: ‘quality’. The quality of a component,
T product or services (generally speaking a ‘system’) is determined by the degree to which the
properties of that system are within predetermined and specified tolerances. If there are not specifications
with regard to the expected life in a system specification and hence the quality only pertains to the state of
the system. If there are also specifications with regard to the life of a system are also of recognized
importance after the time of delivery, the fraction of the total number of supplied products that still functions
in accordance with the specifications at a time after the time of delivery to is expressing the reliability of
that system.

The following section defines exactly what is understood by the reliability of a system.

Definitions
In this manual the reliability of a system shall be the probability that this system uninterruptedly performs
certain (accurately) specified functions during a stated interval of a life variable, on the condition that the
system is used within a certain specified environment. This general definition contains six
elements which will be explained briefly below:-
a) Reliability: This is a statistical probability which is usually denoted as R (t). It is often confused
with the concept ‘quality’. Both concepts originated in the area of quality control, from which
reliability engineering later emerged as a separate field of specialization.

b) Probability: One should distinguish predicted, or a priori reliability, which is defined as a sheer like
hood, and prove, or a posteriori reliability, which is a retrospective certainty, and is defined as the
fraction of surviving systems. For a future design one can only predict; afterwards, in a case history
for example, one has certainty.

c) System: A system encompasses a collection of elements (components, units, and modules) between
which there is a mutual interaction (interconnection) which can be separated from the environment
of that system (system boundaries). The mutual interaction between the elements of a system realizes
the system function, which can, in general, be divided into a number of specified attributes or
properties. The designation ‘system’ does not only imply technical systems such as biological
organism, organization, and services. For convenience we will restrict our examples to technical
systems.
d) Specified function: The purpose of a certain system is reflected by the system functions, which in
turn consist of one or more specified properties or attributes. In system function (for example
amplification) can be separated into a number of properties (e.g. voltage amplification 100,
bandwidth 2MHz) which are subjected to tolerances (voltage amplification 100+5%,
bandwidth>2MHz).

Table 2.1 An example of a system (instrumentation amplifier) with a certain function (voltage amplification
which is specified). The environment in which the instrument should be used is also specified. The
reliability specification is given as the expected average life.

NAME ; Instrumentation Amplifier


MANUFACTURE Xxx Corporation
MODEL NUMBER 3456-B
ALL SPECIFICATIONS TRACEABLE TO US BUREAU OF STANDARDS
Function Voltage Amplification
specification Gain 100 +/_5%
Frequency Range(-Db) DC-2MHz
Noise (referred to input) < 1.5nV/ Hz
Input Impedance >1Mohm
Output Impedance < 0.1 ohm
Nonlinearity (input <1 v) <10-3
Max. Output Current >100 mA( short circuit
protected)
Required line Power <42 VA
Environment Temperature Range
Operational 00C to 50 0C
Storage -40 0C to 750C
Humidity Range <95%, no condensation
Altitude
Operational <4.5 km
Mechanical shocks <50 m/s2
Line voltage Range 120V +5%, -10%
Line frequency Range 48 Hz to 440 Hz
Reliability Mean time to failure 5 years
(No maintenance)

In Table 2.1 the specifications of an analog measurement instrument are given. If one or more specified
properties exceed the tolerance intervals the system is no longer reliable; it has failed. In the case of analog
systems (here the amplifier) the system may still be able to function, but outside the tolerances. In systems
operating with binary signals (digital systems) one usually sees that a certain function (for example, access
to a background memory) or a property of it (the ability to store information) ceases completely, i.e. can no
longer be used, after a failure has occurred. Therefore, the temptation to continue using a failed system is
not as strong here.
a) Life variable: The elapsed time in almost all cases will be the life variable. This may be calendar
time, but also accumulated user time (operation time). The time that the system is not in use must
be accounted for, however, if it contributes to a shortening of the system’s life. The total time is then
t = t e + ato in which te is the operation time and to the time that the system is not in use. The
coefficient ‘a’ which indicates the severity of ‘non use’ is almost always smaller than 1. However,
there are cases in which systems out of operation have, per unit of time, a greater mortality then when
in operation. Just think of electrolytic capacitor, effects of condensation in systems that are not in
use and think also of people with a task too light or no task at all who more often make mistakes
from plain boredom. Besides time, the life variable may also be the number of times a system is
switched on and off (relay), the number of load changes (fatigue fractures in airplane wings, landing-
gears, jet turbine blades, etc), or it may be distance travelled (cars).

b) Specified environment: Every system is placed in a certain environment. All elements that are not
part of the system belong to this environment, thus most of the time also the user and the rest of the
installation of which the system in question is a part. If a system is put in the wrong environment
(i.e. outside the specified environment parameter ranges), either on purpose or inadvertently, the
system may fail or age more quickly. Examples are an environment that is too hot or too wet, a
supply voltage that is too high, input signals that are too large, or a load that is too great or too small
(applying full throttle while the car’s gear is in neutral). This so-called misuse of a system outside
the specified environment cannot be accurately forecast by the designer and must therefore be
excluded in the reliability definition.

NB: In practice most system fails due to misuse, either by the user or by the designer who wrongly applies
the components in the system; so most systems fail because of human error.

In the above, the definition of reliability has been explained detail. It turns out that no statement about the
reliability of the system can be made without an explicitly, clearly formulated description of the system
under observation, the system functions, and the allowed environment. For example, what is the reliability
of a human being? Is a human outside the specifications if he or she has a headache?
2.1 METHODS FOR INCREASING RELIABILITYOF ENGINEERING SYSTEM
There are several ways in which the inherent reliability of a system can be assured. The inherent reliability
is the reliability intrinsic to the system that will indeed be realized in the field provided the system is not
misused. In this section the most important measures that can be taken to secure a high inherent reliability.

a) The introduction of reliability in early phase of the system design as possible as one of the aims of
that design. This early introduction is necessary because, if the reliability is introduced in the late
phases where the design is final or nearly final, the only thing a designer can do is to resort to the
use of reliable (and therefore expensive) components, or he can apply redundancy at the system
level (which is very ineffective), or he can improve the weakest link in the chain. These are all
methods that are not very cost effective.

b) The choice of those technical means and technologies that can easily realize the required system
functions without necessitating. After the choice of a proper technology or a proper combination
of technologies, one should be able to design the system with configuration necessitating the
minimal quantitative and qualitative complexity. The aim of the design must be tolerant of
variations with time in the properties of the other, less critical component.

c) The application of derating. Derating is the reduction of operational and the environmental stress
to which the components of a system are exposed. The components here are well operated below
their maximum ratings by using more components to share the load or by utilizing other stress
derating measures.

d) The thorough testing of the system prototype for unreliability and the interim inspection of the
systems for flaws during production.

e) The introduction of a burn –in period to trace early failures by running the system for a period of
time, possibly under increased stress.

f) The conduction of life tests resulting in failure-rate data which can be used to adjust the initial
design. More failure rate data are obtained from the field in a later phase and should be reported
back to the designer. These actual data are used for validity studies of the life tests which have been
conducted and for use in later designs or design up dates.
g) The use of redundancy. Incorporating redundancy into a system is providing alternative means for
the realization of the required system functions when the primary means have failed. To avoid
dependent errors, the redundant (sub) systems should preferably realize the required function in a
different way than the primary subsystems. They should consists of a different components and be
made by different manufacturers. If the reliability of the primary (sub) systems is higher, the
redundant system connected in parallel has a greater effect, i.e. the reliability of the combination is
increased to a greater extent. Because of this, redundancy should be used in the system at a
hierarchical level that is as low as possible, so preferably at component level.

h) The introduction of preventive maintenance where this is possible. Preventive maintenance aims to
avoid system breakdown. Because preventive maintenance is usually conducted according to a
predetermined plan, the cost are lower than those of corrective maintenance (repairs) Also the costs
resulting from an unscheduled stop of the system due to a sudden breakdown are reduced. However,
preventive maintenance is not useful in all systems. Moreover, some corrective maintenance will
always be necessary.

i) The establishment of an organization structure aimed at designing, developing, producing, and


maintaining a reliable product. The principle management aspects involved here are organization,
training, logistics and coordination of manpower and means, etc.

2.2 RELIABILITY EVALUATION METHODS


Introduction:
The reliability engineering evaluation methods usually comprise a graphical representation of the causal
connection between the events that may give rise to the occurrence of a certain undesired event and that
event. This representation takes the form of oriented graphs with tree structure. The nodes in the structure
represent events; the transition between the nodes represents the occurrence of such an event.

The reliability models allows us to tackle structure failures (hardware failures) that were the cause of an
undesired deviation from the system performance which, if larger enough ,would in turn result in the
respective system function exceeding the tolerance stated in the system specification for that function. There
are two different groups of event-oriented evaluation methods. In the one group the relations between cause
and event are set up in the causal direction.
First the cause is plotted and subsequently the resulting event. These evaluation methods are therefore called
forward methods. One starts with all possible failure events in the components of a system (of which there
usually are many) and one finishes with the single event that the system fails (of which there is only one)
the graphic presentation of such an evaluation gives a graph that starts wide and ends narrow. Such an
evaluation is therefore often called bottom-up evaluation. Another name for this evaluation method stems
from logic reasoning of induction in which one derives from the particular (or detailed) information the
general (or more global) information. Since this group of evaluation methods is based on such derivation
by inductive reasoning they are also called inductive methods.

(a) Inductive, forward or bottom up graph.

Causes

Consequences
Figure 2.1 Inductive, forward or bottom up graph

(b) Deductive, backward or top-down graph.

Consequences
Causes
Figure 2.2 Deductive, backward or top-down graph

The structure and the orientation of graphs used for the evaluation of system reliability (a) Inductive forward
or bottom up graph. (b) Deductive, backward or top-down graph. The arrows indicate the direction of
building-up the tree-like graphs.

The second group of evaluation methods is characterized by the fact that its basic reasoning is ant-causal:
one starts with the resulting event and traces this back to all possible causes. After what has been discussed
above, it will be clear that the methods are also called backward, top-down or deductive methods.

2.3 FMECA ANALYSIS


The most important member of the group of causal evaluation methods is the so called “Failure Mode Effect
and Critical Analysis” (FMECA)

Example
A petrochemical industry requires the investment of many hundreds of millions of dollars .Relatively short
production breaks in an industry like this may amount to millions of dollars per year. For that reason an
availability of very close to 100% is required. In this branch of industry which produces plastic and rubber
from crude oil, good process engineering is of key importance. For properly controlling the complex
process, the control engineers make intensive use of computers, transmitters, terminals and control
equipment.

The level A is required to function most reliably since the instruments at this level are closest to the process
have to pass on data to higher hierarchy levels. These instruments are usually composed of modules and
are standardized to a very high degree so that there mutually interchangeable. In addition, at critical point
these instruments have been implemented redundantly .Level B, that is the level of the process control
computer, can do with lower reliability. This is because the instrument at level A are also able to operate
autonomously (for not too long a period of time) This will, of course, make the process more efficient. This
autonomous mode of operation will therefore only be used to bridge the time it takes to get level B back on
its feet. The computer process at level B combines data from various parts of the process and converts this
data into control data for the instruments at level A. Further, so called data logging and reporting are
important functions of this B level. For that reason, a modular composition and the use of some redundancy
is usually prescribed at level B. However, the much greater complexity at this level lowers the availability
realized. At the highest level, C level, the breakdown of a system is more a nuisance than causes of
catastrophic consequences. The availability may therefore be lower here than at level B.

Drawing; An example of the hierarchy in a control system used in the petroleum industry. Level A is the
level with the highest required availability. Level C is that with the lowest availability.

Figure 2.3 The control loop of level A


Drawing The control loop of level A shown in more detail. (a) Functional representation (b) Electrical
representation (E is the power supply).
Figure 2.4 (a) Functional representation

Matrix of the failure mode, the consequence of failure, the degree of risk associated with that failure and
the relative failure frequency of the components of the detailed control loop.
Figure 2.5 (a) Electrical representation (E is the power supply).

Table 2.2 Failure mode effect criticality frequency table


S/N Component Failure mode Effect Criticality Frequency
1 Measuring orifice (i)contamination flow controlled to Critical High
(ii)nearly wrong level marginal Very low
blocked instability
2 Pressure (i)Out of Wrong flow Critical Low
transmitter calibration
(ii)zero offset Wrong flow Critical Low
failure
(iii)short circuit Control stops negligible Very low
3 Controller (i)no set point Control stops Negligible Very low
amplification Tend to become Marginal Very low
(ii)too high unstable
(iii)Too low Slow Negligible Very low
(iv)shorted Process stops marginal Very low
4 Control valve (i)got stuck faulty Control stops Negligible Low
(ii)fully open Explosion Catastrophe Very low
(iii)completely closed Process stops Marginal low

+ Power supply E Defective Process stops marginal Very low

We may consider the FMECA method as being composed of a Failure mode analysis (FMA) Followed by
a failure effect analysis (FEA) and a Failure critical analysis (FCA).

a) FMA: We have considered all important system components and all failure modes of these
components. In the table just given not all failure modes have been detailed, of course. In general,
one is concerned with much larger matrices than given in the table. The computer
b) Is a very useful tool for these real life analyses?
c) FEA: With this sub-analysis the consequences for the system operation are traced for the system
component failing in a certain failure mode. It should be noted that in the example above the system
apparently was already designed well, i.e. fail-safe. If the controller erringly would allow full flow
an explosion would be likely to occur here. This must be avoided ,of course .That is the case in this
design: if the supply voltage E becomes too low the control valve (held open against the spring)
closes completely, if an input of the controller fails it freezes the valve in the position it was in
before the error ,if the controller fails the valves closes also, etc.
d) FCA: The risk associated with a failure is an important quantity .After all; a system failure may not
result in expensive consequences that are hazardous for human beings or the environment. For that
reason several critical levels, negligible, marginal, critical and catastrophic, are distinguished. In
the above example an incorrect flow may result in an overdose of one ingredient in the chemical
process which is dangerous .It has therefore been marked critical.

FMECA-Is a technique that can also be used well to determine the maintenance need of a system. In the
example the measuring orifice, for example, has clogged-up condition result in a critical failure, scheduled
maintenance has to be performed at relatively brief intervals T. Furthermore, extra safety circuitry would
be advisable in view of the high frequency of this contamination.(For example by measuring the turbulence
in the flow resulting from oil and tar residues clogging the orifice plate.)In addition, the pressure transmitter
should be regularly re-adjusted and the control valve should be checked for operability.

In this way one can already come up with the maintenance program during the design phase of the system.

2.4 MANAGING FAILURE ANALYSIS


To be a good failure analyst one must also be a good manager. After all, failure analysis or problem solving
is more than just brainstorming a solution to an identified problem. Successful analysis can only be achieved
when a structured technique that uncovers the facts of the incident being investigated is used and adhered
to at every step of the analysis process. As the manager or Principal Analyst for the failure your management
skills will not only be put to the test but will be an integral part of the investigation.

2.4.1 Managing the Failure Definition


The first step in the analysis effort would be to clearly define what constitutes a failure. This may sound
simple but I can assure you that it is not. Ask anyone and they will all tell you that they know what their
failures are. Now explore a little deeper and you will find that they all know what’s breaking down but they
care for a different reason. The fact is we all tend to care for a different reason and there are many factors
that will directly affect the reason why we care thereby changing our failure definition. For example,
consider a plant whose production levels are low and maintenance, downtime, and parts cost high. In this
example the Operations Manager considers the low production levels to be the failure, while the
Maintenance Manager considers the Mean Time between Failure (MTBF) and Mean Time to Repair
(MTTR) to be the failure. The Plant Manager considers the low bottom line to be the failure while the
maintenance staff cares about the number of times that they must repair the equipment. What we have here
is clearly a failure but a different failure definition at every level of the organization. Now add to the thought
process by considering another factor that affects how we feel about the failure; i.e., the business
environment. Low production levels in a non-sold out condition are not as big a problem as high
maintenance cost. Conversely, in a sold out condition maintenance cost are not nearly as important as
production levels and downtime. The job of the Principal Analyst is to recognize these factors and apply
the necessary focusing tools (Impact – Effort Matrix, Decision by Pairs, Force Field Analysis, Failure
Modes and Effects Analysis, etc.) to uncover those failures that represent the greatest amount of potential
return or unrealized opportunity based on the right definition of failure for the facility.

2.4.2 Managing the Scope of the Analysis


Don’t bite off more than you can chew! The size and scope of the analysis you intend to tackle should not
exceed the available resources for the analysis effort. Therefore, the scope of the analysis should be
directly proportional to the resources available to conduct the analysis. Always remember that the bigger
the scope the bigger the analysis. Process or system related-analyses tend to be the largest in size because
of the many variables associated with the modes of failure. Whereas, single components tend to be the
smallest due to the relatively few variables associated with a single item. The key is to determine what is
really important and what you can reasonably manage. These are easily done if you

You might also like