[go: up one dir, main page]

0% found this document useful (0 votes)
37 views3 pages

Software Redundancy.

Software redundancy involves additional software in fault-tolerant computers to manage recovery from errors, which can be either software design errors or hardware malfunctions. Various methods, including parallel programming and error detection techniques, are utilized to handle these errors, with a focus on maintaining redundant storage for critical data. Diagnostic programs are essential for locating faults and ensuring system reliability by testing for malfunctions proactively.

Uploaded by

Shiju Kp.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views3 pages

Software Redundancy.

Software redundancy involves additional software in fault-tolerant computers to manage recovery from errors, which can be either software design errors or hardware malfunctions. Various methods, including parallel programming and error detection techniques, are utilized to handle these errors, with a focus on maintaining redundant storage for critical data. Diagnostic programs are essential for locating faults and ensuring system reliability by testing for malfunctions proactively.

Uploaded by

Shiju Kp.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Software redundancy refers to all additional software installed

in a system that would not be needed for a fault-free computer.

Software redundancy plays a major role in most faulttolerant

computers. Even computers that recover from failures

mainly by hardware means use software to control their

recovery and decision-making processes. The level of software

used depends on the recovery system design. The recovery

design depends on the type of error or malfunction that

is expected. Different schemes have been found to be more

appropriate for the handling of different errors. Some can be

accomplished most efficiently solely by hardware means.

Others need only software, but most use a mixture of the two.

For a functional system, i.e., one without hardware design

faults, errors can be classified into two varieties: (1) software

design errors and (2) hardware malfunctions.

The first category can be corrected mainly by means of

software. It is extremely difficult for hardware to be designed

to correct for programmers’ errors. The software methods,

though, are often used to correct hardware faults—especially

transient ones. The reduction and correction of software design

errors can be accomplished through the techniques outlined

below.

Computers may be designed to detect several software

errors.14,15 Examples include the use of illegal instructions

(i.e., instructions that do not exist), the use of privileged

instructions when the system has not been authorized to


process them, and address violations. This latter refers to

reading or writing into locations beyond usable memory.

These limits can often be set physically on the hardware.

Computers capable of detecting these errors allow the programmer

to handle the errors by causing interrupts. The interrupts

route the program to specific locations in memory. The

programmer, knowing these locations, can then add his own

code to branch to his specific subroutines, which can handle

each error in a specified manner.

Software recovery from software errors can be accomplished

via several methods. As mentioned before, parallel

programming, in which alternative methods are used to determine

a correct solution, can be used when an incorrect solution

can be identified. Some less sophisticated systems print out

diagnostics so that the user can correct the program off line

© 2003 by Béla Lipták

128 General Considerations

from the machine. This should only be a last resort for a

fault-tolerant machine. Nevertheless, a computer should always

keep a log of all errors incurred, memory size permitting.

Preventive measures used with software methods refer

mainly to the use of redundant storage. Hardware failures

often result in a garbling or a loss of data or instructions that

are read from memory. If hardware techniques such as coding

cannot recover the correct bit pattern, those words will

become permanently lost. Therefore, it is important to at least


duplicate all necessary program and data storage so that it

can be retrieved if one copy is destroyed. In addition, special

measures should be taken so that critical programs such as

error recovery programs are placed in nonvolatile storage,

i.e., read-only memory. Critical data as well should be placed

in nondestructive readout memories. An example of such a

memory is a plated-wire memory.

The second task of the software in fault tolerance is to

detect and diagnose errors. Software error-detection techniques

for software errors often can be used to detect transient

hardware faults. This is important, since “a relatively large

number of malfunctions are intermittent in nature rather than

solid failures.”9 Time-redundant processes, i.e., repeated trials,

shall be used for their recovery.

Software detection techniques do not localize the sources

of the errors. Therefore, diagnostic test programs are frequently

implemented to locate the module or modules

responsible. These programs often test the extent of the faults

at the time of failure or perform periodic tests to determine

malfunctions before they manifest themselves as errors during

program execution. Almost every computer system uses

some form of diagnostic routines to locate faults. In a faulttolerant

system, the system itself initiates these tests and

interprets their results, as opposed to the outside insertion of

test programs by operators in other systems.

You might also like