US5260945A - Intermittent component failure manager and method for minimizing disruption of distributed computer system - Google Patents
Intermittent component failure manager and method for minimizing disruption of distributed computer system Download PDFInfo
- Publication number
- US5260945A US5260945A US07/721,143 US72114391A US5260945A US 5260945 A US5260945 A US 5260945A US 72114391 A US72114391 A US 72114391A US 5260945 A US5260945 A US 5260945A
- Authority
- US
- United States
- Prior art keywords
- working
- broken
- status
- signal
- filter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims description 7
- 238000011084 recovery Methods 0.000 claims abstract description 25
- 230000007423 decrease Effects 0.000 claims abstract description 5
- 230000008859 change Effects 0.000 claims description 15
- 238000001514 detection method Methods 0.000 claims description 11
- 238000004891 communication Methods 0.000 claims description 3
- 230000003247 decreasing effect Effects 0.000 claims description 3
- 230000000977 initiatory effect Effects 0.000 claims 2
- 230000007246 mechanism Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 230000007774 longterm Effects 0.000 description 6
- 238000012360 testing method Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 4
- 230000007704 transition Effects 0.000 description 4
- 238000010420 art technique Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0817—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/02—Topology update or discovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/16—Multipoint routing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/24—Multipath
- H04L45/247—Multipath using M:N active or standby paths
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/28—Routing or path finding of packets in data switching networks using route fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/48—Routing tree calculation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/44—Star or tree networks
- H04L2012/445—Star or tree networks with switching in a hub, e.g. ETHERNET switch
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0681—Configuration of triggering conditions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/35—Switches specially adapted for specific applications
- H04L49/351—Switches specially adapted for specific applications for local area network [LAN], e.g. Ethernet switches
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/55—Prevention, detection or correction of errors
- H04L49/555—Error detection
Definitions
- the present invention relates generally to communications between components of a distributed computer system, and particularly to a method of minimizing disruption caused by intermittent component failures.
- a mesh connected local area network there are usually redundant interconnections between system components so that messages can be routed between any two network members in multiple ways.
- the network's switches monitor directly connected links and neighboring switches and set up appropriate tables so that messages are routed only through links and switches that are known to be available (i.e., which appear to be working properly). If any switch or link in the network is "not available” (e.g., not working, or disconnected), the network is configured to ignore the existence of these non-working components.
- each switch in the network includes hardware and software for automatically testing the status of the links connected to that switch.
- any self-diagnostic tool it is not perfect in that it cannot detect every type of failure, especially intermittent failures. Thus, as in most systems, the ultimate test of whether a component is working is during actual use.
- Every change in the status of a component imposes a certain amount of overhead on the system, such as requiring that the system reconfigure itself.
- a component with a history of frequent, intermittent failure should only be reinstated when it has demonstrated that it can continuously remain in working condition for a period of time. Attempting to use a component that is broken can be harmful if it causes system users to loose information, or unnecessarily delays their work.
- One prior art technique for avoiding interruptions caused by intermittently failing components is to allow only a limited number of failures during a specified amount of time. For instance, one could allow any component to fail no more than ten times per hour. That is, it will be allowed to change from "working" to "broken” status no more than ten times per hour. After ten transitions from "working" to "broken” during any one hour period, the component is simply treated as being “broken” until the end of the one hour period. Then the process starts all over again. Thus, if the component is fixed in the middle of the one hour period, its recovery will be delayed, but the system will be spared possibly hundreds or thousands of failures by the component.
- a component with a good history must be allowed to fail and recover several times without significant penalty.
- a component's average long term failure rate must not be allowed to exceed some predetermined low rate.
- Common behaviors shown by bad components should result in exceedingly low average long-term failure rates.
- a component that stops being bad must eventually be forgiven its bad history.
- Requirement 1 is met because a low number of failures (e.g., less than ten) doesn't result in the component being unused for a long period of time.
- Requirement 2 is met because in the worst case, the long term failure rate cannot exceed a specified number of failures per hour.
- Requirement 4 is met because once a broken component is fixed, and any remaining recovery time period left over from when it was broken expires, its use is no longer prevented.
- Requirement 3 distinguishes the present invention from the prior art "ten failures per hour" mechanism. Regardless of the failure mechanism, this prior art technique will still allow a specified number of failures per hour.
- the present invention does better than this by providing for a recovery period that increases every time that component is allowed to be used by the system and then fails.
- the recovery time period is automatically increased (up to a predetermined maximum).
- the recovery period is decreased for subsequent failures.
- the present invention is a status filter for limiting the impact of intermittently failing components on a computer system.
- a fault monitor coupled to one such component detects whether that component is working. Whenever the fault monitor detects that the component has failed, i.e., changed from working to broken status, the status filter transmits without delay a "broken" signal. Whenever the fault monitor detects that the component has changed broken to working status, however, the status filter transmits a "working" signal only after a recovery time interval corresponding to a computed skepticism level, and only if the component does not fail during that recovery time interval.
- the status filter increases the computed skepticism level and redetermines the recovery time interval each time that the component fails after the status filter has transmitted a "working" signal. That is, the skepticism level is increased whenever the component fails after having been declared by the status filter to be working.
- the status filter also decreases the computed skepticism level and redetermines the recovery time interval when, after status filter has transmitted a "working" signal, the component does not fail for at least a defined interval of time.
- the status filter is implemented as a state machine with three states: DEAD, WAIT, and GOOD.
- the status filter receives "working” and “broken” signals from a fault monitor, and responds by moving to a corresponding state and selectively transmitting filtered "working" and “broken” signals to the computer system.
- WAIT or GOOD states it responds to receiving a "broken” signal by moving to the DEAD state and retransmitting the "broken” signal without delay.
- the computed skepticism level is increased by the status filter when, in the GOOD state, the status filter receives a "broken" signal.
- the computed skepticism level is decreased by the status filter when, in the GOOD state, no "broken" signals are received for at least a defined interval of time.
- FIG. 1 is a block diagram of a mesh connected local area network.
- FIG. 2 is a block diagram of one node of the local area network shown in FIG. 1.
- FIG. 3 is a conceptual diagram of an intermittent failure manager program in accordance with the present invention.
- FIG. 4 is a block diagram of the fault detection mechanisms used in the preferred embodiment, and their relationship to the intermittent failure management program of the preferred embodiment.
- FIG. 5 is a flow chart of the intermittent failure management program of the preferred embodiment.
- FIG. 6 is a diagram showing the relationship between a "skepticism level” and the wait time before a transition from “broken” to "working" of a system component is accepted.
- the network 100 is a set of host computers 120 and switches 124-130 that are interconnected by links, which are bi-directional data channels.
- each host 120 in the network has a network controller 132 which couples the host 120 to two distinct switches (e.g., switches 124 and 126 in the case of host 120-1).
- the two links 134 and 136 which couple the host 120 to switches 124 and 126 are identical, except that only one of the two links is active at any one time. For this reason link 136 is shown as a dashed line to indicate that it is inactive.
- the host's network controller 132 automatically activates the other link 136, thereby reconnecting the host to the network. It is strongly preferred that the two links 134 and 136 for each host be coupled to two different switches so that if an entire switch fails all the hosts coupled to that switch will have alternate paths to the network. Generally, the provision of two alternate paths or channels from each host to the network provides sufficient redundancy that no single hardware failure can isolate a host from the network.
- each switch 124 in the local area network is coupled to its links 150 by interfaces 152.
- the switch has a central processing unit (CPU) 140, memory 142, and software stored in memory 142 including a network reconfiguration program 144 and a "skeptic" status change filtering program 146.
- Link error detectors 154 periodically send status information via the switch's internal bus 156 to a fault monitor routine 158 which determines whether each link is currently "WORKING" or "BROKEN", at least insofar as the fault monitor is able to determine. It sends corresponding "WORKING" and "BROKEN" signals to the skeptic program 146. The skeptic, in turn, sends out a filtered version of these signals to the network reconfiguration program 144.
- the skeptic 146 can be viewed as a mechanism that stands between a subordinate object 160 and a higher level of the system, and which provides a "filtered object" 162 whose rate of status change is limited.
- the subordinate object 160 is an abstraction that emits a series of signals, each of which says either "working” or "broken”.
- the skeptic 146 sends out a filtered version of these signals to the next higher level of the system.
- the links in the preferred embodiment could be other types of objects or system components in other embodiments of the present invention.
- the preferred embodiment has three link error detectors 164-168.
- the corrupt packet detector 164 tests a CRC error correction code included with each transmitted packet.
- This detector uses a quota mechanism for detecting faults, based on the concept that a few corrupt packets may be the result of random glitches, but that more than a few corrupt packets are indicative of a fault. Thus, if more than five corrupted packets are received in a specified period of time (e.g., 40 minutes), this detector issues a fault, which causes the fault monitor 170 to issue a "BROKEN" signal.
- the Stuck Link detector 166 detects when a link becomes stuck in a state which prevents any data transmission. When this happens, the switch automatically clears all messages in the switch and reinitializes itself. If this happens only occasionally, the condition may be due to mis-transmission of a single flow-control command or packet framing command code (i.e., due to errors in critical data bits that control usage of the link). This error detector also imposes a quota on how many such errors it will forgive in any period of time before declaring a fault.
- the coding violation detector 168 detects static on the communication link. For example, coding violations can result from connecting or disconnecting the link cable, from a cable that is too long for good transmission, or from a nearby electric motor. As with other types of errors, isolated violations should be ignored but a burst of violations is a significant error.
- the violation detector checks the number of violations during successive test periods, each about 170 milliseconds long, checks the number of violations during each test period, and declares a fault only if a threshold violation rate is exceeded. The permitted number of violations depends on whether the skeptic says the link is working or broken. If the link is working (according to the skeptic 146), three errors are permitted per test period, but if the link is broken no errors are permitted. The more strict rule for broken links insures that no link will recover unless it can pass the entire skeptic recovery time without a single coding violation, while occasional violations on working links are ignored.
- the skeptic program 146 is a state machine with auxiliary variables (e.g., level, wtime, gtime), timers, and policy parameters.
- the skeptic When the system is first powered on, the skeptic defines a set of policy parameters and sends a "BROKEN" signal to the system reconfiguration program (step 200). Then it moves to the DEAD state (step 202). When in the DEAD state, "BROKEN" signals received by the skeptic are ignored (step 204), since they provide no new information. A "WORKING" signal, causes the skeptic to start a wait timer (step 206), and to move into the WAIT state (step 208). The duration of the wait timer is calculated by a formula described below. If the skeptic receives a "BROKEN" signal and returns to the DEAD state before the wait timer expires, the timer is stopped (step 210).
- the skeptic responds to intermittent failures by maintaining a level of skepticism about the subordinate object.
- the skepticism level is kept in an auxiliary variable called LEVEL.
- the skepticism level is used to compute WTIME, the duration set on the wait timer, according to the formula:
- WBASE and WMULT are policy parameters.
- a policy parameter MAXLEVEL establishes an upper limit on skepticism.
- the skeptic sends a "WORKING" signal to the system reconfiguration program, starts a good timer and moves to the GOOD state (steps 214 and 216). This is the only way the skeptic can get to the GOOD state.
- the skeptic forgives old failures by decrementing the skepticism level occasionally. Whenever the good time expires, the skeptic decrements the skepticism level and then sets and restarts the good timer (step 220), unless the skepticism level has already been reduced to a level of zero.
- the formula used to compute the duration of the good timer, GTIME is the same as the formula used for the wait timer, except that it uses different policy parameters GBASE and GMULT:
- the minimum amount of time required to forgive one level of skepticism is ten minutes. This limits the worst case long-term average failure rate to about six times per hour.
- receiving a "BROKEN” signal causes the skeptic to immediately send a "BROKEN” signal, and stop the good timer. It also increments the skepticism level by one (step 222), and moves the skeptic into the DEAD state (step 202).
- the maximum wait time will be about seventeen minutes.
- the initial skepticism LEVEL is set equal to 8 because we don't know at power up time we don't know the link's history, and therefore we don't know whether the link should have a high or low skepticism level.
- Another common link failure mode occurs when a technician connects a link cable. As the metal components scrape past each other, the link transceiver hardware detects bursts of coding violations that are evaluated as faults. Each additional wiggle of the cable tends to generate more faults. The five second minimum wait time in the skeptic causes all of these faults to be reflected as only one failure.
- a third common failure mode occurs on marginal links.
- the error rate on a marginal link is usually very data dependent: it is much higher when the link is carrying packets than when it is idle. This results in such a link failing soon after it recovers, but then having no further faults until it recovers again.
- the skepticism level increases over time until it reaches its maximum value, MAXLEVEL, which is set to twenty in the preferred embodiment, at which point the wait time is about seventeen minutes.
- the MAXLEVEL and WMULT parameters can be set accordingly. For instance, a MAXLEVEL of twenty-five will result in a maximum wait time of about 9.3 hours.
- the duration set on the wait timer actually varies as a random fraction between one and two time the value calculated for WTIME. This random variation causes different skeptics in the computer system to disperse their wait timer expirations. If the network is running with several intermittent links, this randomness reduces the possibility of getting caught in a systematic pattern.
- the recovery time intervals associated with various skepticism levels could be determined simply by precomputing or predetermining the recovery time interval for all the possible skepticism levels, and then looking up the appropriate recovery time interval whenever the wait timer is to be started.
- the table lookup mechanism is flexible it that it allows "manual" fine tuning of the system, for instance by allowing the first few skepticism levels to be programmed with a constant recovery time interval, or with a slowly, linearly increasing amount, or whatever the system programmer chooses.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Maintenance And Management Of Digital Transmission (AREA)
Abstract
Description
WTIME=WBASE+WMULT×2.sup.LEVEL
GTIME=GBASE+GMULT×2.sup.LEVEL.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US07/721,143 US5260945A (en) | 1989-06-22 | 1991-06-26 | Intermittent component failure manager and method for minimizing disruption of distributed computer system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US07/370,285 US5088091A (en) | 1989-06-22 | 1989-06-22 | High-speed mesh connected local area network |
US07/721,143 US5260945A (en) | 1989-06-22 | 1991-06-26 | Intermittent component failure manager and method for minimizing disruption of distributed computer system |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US07/370,285 Continuation-In-Part US5088091A (en) | 1989-06-22 | 1989-06-22 | High-speed mesh connected local area network |
Publications (1)
Publication Number | Publication Date |
---|---|
US5260945A true US5260945A (en) | 1993-11-09 |
Family
ID=27004888
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US07/721,143 Expired - Lifetime US5260945A (en) | 1989-06-22 | 1991-06-26 | Intermittent component failure manager and method for minimizing disruption of distributed computer system |
Country Status (1)
Country | Link |
---|---|
US (1) | US5260945A (en) |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5495471A (en) * | 1994-03-09 | 1996-02-27 | Mci Communications Corporation | System and method for restoring a telecommunications network based on a two prong approach |
WO1996006474A1 (en) * | 1994-08-23 | 1996-02-29 | Nokia Telecommunications Oy | A method for recovering a faulty unit and a recovery system |
US5533195A (en) * | 1993-07-01 | 1996-07-02 | Larochelle; Paul E. | Testing tool for diagnosing defective computer system devices |
US5581543A (en) * | 1995-02-27 | 1996-12-03 | Motorola, Inc. | Communication network and method which respond to a failed link |
US5606659A (en) * | 1993-02-10 | 1997-02-25 | Telefonaktiebolaget Lm Ericsson | Method and system for demounting a chain of linked processes in a distributed operating system |
US5678005A (en) * | 1993-07-02 | 1997-10-14 | Tandem Computers Inorporated | Cable connect error detection system |
US5859959A (en) * | 1996-04-29 | 1999-01-12 | Hewlett-Packard Company | Computer network with devices/paths having redundant links |
US6192302B1 (en) | 1998-07-31 | 2001-02-20 | Ford Global Technologies, Inc. | Motor vehicle diagnostic system and apparatus |
US6212649B1 (en) * | 1996-12-30 | 2001-04-03 | Sentar, Inc. | System and method for providing highly-reliable coordination of intelligent agents in a distributed computing system |
US6392990B1 (en) | 1999-07-23 | 2002-05-21 | Glenayre Electronics, Inc. | Method for implementing interface redundancy in a computer network |
US20020073338A1 (en) * | 2000-11-22 | 2002-06-13 | Compaq Information Technologies Group, L.P. | Method and system for limiting the impact of undesirable behavior of computers on a shared data network |
US20020162044A1 (en) * | 2001-04-27 | 2002-10-31 | Kenichi Kuwako | Backup system for operation system in communications system |
US20030028817A1 (en) * | 2001-08-06 | 2003-02-06 | Shigeru Suzuyama | Method and device for notifying server failure recovery |
US6570881B1 (en) * | 1999-01-21 | 2003-05-27 | 3Com Corporation | High-speed trunk cluster reliable load sharing system using temporary port down |
US6601185B1 (en) * | 1999-04-07 | 2003-07-29 | Lucent Technologies Inc. | Secondary alarm filtering |
US6606630B1 (en) * | 2000-08-21 | 2003-08-12 | Hewlett-Packard Development Company, L.P. | Data structure and method for tracking network topology in a fiber channel port driver |
US6628661B1 (en) | 1998-08-27 | 2003-09-30 | Intel Corporation | Spanning tree recovery in computer networks |
US6766482B1 (en) | 2001-10-31 | 2004-07-20 | Extreme Networks | Ethernet automatic protection switching |
US6922414B1 (en) | 2000-08-21 | 2005-07-26 | Hewlett-Packard Development Company, L.P. | Apparatus and method for dynamic command queue depth adjustment for storage area network nodes |
US20050177779A1 (en) * | 2004-01-23 | 2005-08-11 | Pomaranski Ken G. | Cluster node status detection and communication |
US20050210311A1 (en) * | 2004-03-08 | 2005-09-22 | Rodeheffer Thomas L | Method and system for probabilistic defect isolation |
US6952734B1 (en) * | 2000-08-21 | 2005-10-04 | Hewlett-Packard Development Company, L.P. | Method for recovery of paths between storage area network nodes with probationary period and desperation repair |
US20050281191A1 (en) * | 2004-06-17 | 2005-12-22 | Mcgee Michael S | Monitoring path connectivity between teamed network resources of a computer system and a core network |
US20060007869A1 (en) * | 2004-07-09 | 2006-01-12 | Fujitsu Limited | Method for preventing control packet loop and bridge apparatus using the method |
US20060031186A1 (en) * | 2003-08-06 | 2006-02-09 | Canon Kabushiki Kaisha | Information processing method, information processing program, and information processing apparatus |
US20070070885A1 (en) * | 2005-09-13 | 2007-03-29 | Lsi Logic Corporation | Methods and structure for detecting SAS link errors with minimal impact on SAS initiator and link bandwidth |
WO2007122603A2 (en) | 2006-04-21 | 2007-11-01 | Cisco Technology, Inc. | Configurable resolution policy for data switch feature failures |
WO2008037679A1 (en) * | 2006-09-28 | 2008-04-03 | Siemens Aktiengesellschaft | Method for reconfiguring a communication network |
US20100011096A1 (en) * | 2008-07-10 | 2010-01-14 | Blackwave Inc. | Distributed Computing With Multiple Coordinated Component Collections |
US20100082197A1 (en) * | 2008-09-30 | 2010-04-01 | Honeywell International Inc. | Intermittent fault detection and reasoning |
EP2222023A1 (en) * | 2007-12-14 | 2010-08-25 | Huawei Technologies Co., Ltd. | Link fault processing method and data forwarding device |
US20100325471A1 (en) * | 2009-06-17 | 2010-12-23 | International Business Machines Corporation | High availability support for virtual machines |
US20110191635A1 (en) * | 2010-01-29 | 2011-08-04 | Honeywell International Inc. | Noisy monitor detection and intermittent fault isolation |
US20130343228A1 (en) * | 2012-06-25 | 2013-12-26 | Qualcomm Atheros, Inc. | Spanning tree protocol for hybrid networks |
US20150106781A1 (en) * | 2013-10-14 | 2015-04-16 | International Business Machines Corporation | Verification of uml state machines |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4105420A (en) * | 1977-05-23 | 1978-08-08 | Bayfront Carpet And Vacuum, Inc. | Canister vacuum cleaner with transparent lid |
US5146452A (en) * | 1990-10-26 | 1992-09-08 | Alcatel Network Systems, Inc. | Method and apparatus for rapidly restoring a communication network |
US5173689A (en) * | 1990-06-25 | 1992-12-22 | Nec Corporation | Self-distributed logical channel node failure restoring system |
-
1991
- 1991-06-26 US US07/721,143 patent/US5260945A/en not_active Expired - Lifetime
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4105420A (en) * | 1977-05-23 | 1978-08-08 | Bayfront Carpet And Vacuum, Inc. | Canister vacuum cleaner with transparent lid |
US5173689A (en) * | 1990-06-25 | 1992-12-22 | Nec Corporation | Self-distributed logical channel node failure restoring system |
US5146452A (en) * | 1990-10-26 | 1992-09-08 | Alcatel Network Systems, Inc. | Method and apparatus for rapidly restoring a communication network |
Cited By (62)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5606659A (en) * | 1993-02-10 | 1997-02-25 | Telefonaktiebolaget Lm Ericsson | Method and system for demounting a chain of linked processes in a distributed operating system |
US5533195A (en) * | 1993-07-01 | 1996-07-02 | Larochelle; Paul E. | Testing tool for diagnosing defective computer system devices |
US5678005A (en) * | 1993-07-02 | 1997-10-14 | Tandem Computers Inorporated | Cable connect error detection system |
US5495471A (en) * | 1994-03-09 | 1996-02-27 | Mci Communications Corporation | System and method for restoring a telecommunications network based on a two prong approach |
CN1047035C (en) * | 1994-08-23 | 1999-12-01 | 诺基亚电信公司 | A method for recovering a faulty unit and a recovery system |
WO1996006474A1 (en) * | 1994-08-23 | 1996-02-29 | Nokia Telecommunications Oy | A method for recovering a faulty unit and a recovery system |
US5852650A (en) * | 1994-08-23 | 1998-12-22 | Nokia Telecommunications Oy | Method for recovering a faulty unit and a recovery system |
US5581543A (en) * | 1995-02-27 | 1996-12-03 | Motorola, Inc. | Communication network and method which respond to a failed link |
US5859959A (en) * | 1996-04-29 | 1999-01-12 | Hewlett-Packard Company | Computer network with devices/paths having redundant links |
US6212649B1 (en) * | 1996-12-30 | 2001-04-03 | Sentar, Inc. | System and method for providing highly-reliable coordination of intelligent agents in a distributed computing system |
US6192302B1 (en) | 1998-07-31 | 2001-02-20 | Ford Global Technologies, Inc. | Motor vehicle diagnostic system and apparatus |
US6940825B2 (en) | 1998-08-27 | 2005-09-06 | Intel Corporation | Spanning tree recovery in machine networks |
US20040062209A1 (en) * | 1998-08-27 | 2004-04-01 | Goldman Tomasz J. | Spanning tree recovery in machine networks |
US6628661B1 (en) | 1998-08-27 | 2003-09-30 | Intel Corporation | Spanning tree recovery in computer networks |
US6570881B1 (en) * | 1999-01-21 | 2003-05-27 | 3Com Corporation | High-speed trunk cluster reliable load sharing system using temporary port down |
US6601185B1 (en) * | 1999-04-07 | 2003-07-29 | Lucent Technologies Inc. | Secondary alarm filtering |
US6392990B1 (en) | 1999-07-23 | 2002-05-21 | Glenayre Electronics, Inc. | Method for implementing interface redundancy in a computer network |
US6922414B1 (en) | 2000-08-21 | 2005-07-26 | Hewlett-Packard Development Company, L.P. | Apparatus and method for dynamic command queue depth adjustment for storage area network nodes |
US6606630B1 (en) * | 2000-08-21 | 2003-08-12 | Hewlett-Packard Development Company, L.P. | Data structure and method for tracking network topology in a fiber channel port driver |
US6952734B1 (en) * | 2000-08-21 | 2005-10-04 | Hewlett-Packard Development Company, L.P. | Method for recovery of paths between storage area network nodes with probationary period and desperation repair |
US20020073338A1 (en) * | 2000-11-22 | 2002-06-13 | Compaq Information Technologies Group, L.P. | Method and system for limiting the impact of undesirable behavior of computers on a shared data network |
US7383574B2 (en) * | 2000-11-22 | 2008-06-03 | Hewlett Packard Development Company L.P. | Method and system for limiting the impact of undesirable behavior of computers on a shared data network |
US20020162044A1 (en) * | 2001-04-27 | 2002-10-31 | Kenichi Kuwako | Backup system for operation system in communications system |
US6792558B2 (en) * | 2001-04-27 | 2004-09-14 | Fujitsu Limited | Backup system for operation system in communications system |
US6874106B2 (en) * | 2001-08-06 | 2005-03-29 | Fujitsu Limited | Method and device for notifying server failure recovery |
US20030028817A1 (en) * | 2001-08-06 | 2003-02-06 | Shigeru Suzuyama | Method and device for notifying server failure recovery |
US6766482B1 (en) | 2001-10-31 | 2004-07-20 | Extreme Networks | Ethernet automatic protection switching |
US7761470B2 (en) * | 2003-08-06 | 2010-07-20 | Canon Kabushiki Kaisha | Information processing method, information processing program, and information processing apparatus |
US20060031186A1 (en) * | 2003-08-06 | 2006-02-09 | Canon Kabushiki Kaisha | Information processing method, information processing program, and information processing apparatus |
US20050177779A1 (en) * | 2004-01-23 | 2005-08-11 | Pomaranski Ken G. | Cluster node status detection and communication |
US7228462B2 (en) * | 2004-01-23 | 2007-06-05 | Hewlett-Packard Development Company, L.P. | Cluster node status detection and communication |
US20050210311A1 (en) * | 2004-03-08 | 2005-09-22 | Rodeheffer Thomas L | Method and system for probabilistic defect isolation |
US20050281191A1 (en) * | 2004-06-17 | 2005-12-22 | Mcgee Michael S | Monitoring path connectivity between teamed network resources of a computer system and a core network |
US9491084B2 (en) * | 2004-06-17 | 2016-11-08 | Hewlett Packard Enterprise Development Lp | Monitoring path connectivity between teamed network resources of a computer system and a core network |
US20060007869A1 (en) * | 2004-07-09 | 2006-01-12 | Fujitsu Limited | Method for preventing control packet loop and bridge apparatus using the method |
US8582467B2 (en) * | 2004-07-09 | 2013-11-12 | Fujitsu Limited | Method for preventing control packet looping and bridge apparatus using the method |
US20070070885A1 (en) * | 2005-09-13 | 2007-03-29 | Lsi Logic Corporation | Methods and structure for detecting SAS link errors with minimal impact on SAS initiator and link bandwidth |
US7738366B2 (en) * | 2005-09-13 | 2010-06-15 | Lsi Corporation | Methods and structure for detecting SAS link errors with minimal impact on SAS initiator and link bandwidth |
US7877505B1 (en) | 2006-04-21 | 2011-01-25 | Cisco Technology, Inc. | Configurable resolution policy for data switch feature failures |
EP2014018A2 (en) * | 2006-04-21 | 2009-01-14 | Cisco Technology, Inc. | Configurable resolution policy for data switch feature failures |
CN101496365B (en) * | 2006-04-21 | 2013-08-28 | 思科技术公司 | Configurable resolution policy for data switch feature failures |
EP2014018A4 (en) * | 2006-04-21 | 2010-06-09 | Cisco Tech Inc | Configurable resolution policy for data switch feature failures |
WO2007122603A2 (en) | 2006-04-21 | 2007-11-01 | Cisco Technology, Inc. | Configurable resolution policy for data switch feature failures |
US7924702B2 (en) | 2006-09-28 | 2011-04-12 | Siemens Aktiengesellschaft | Method for reconfiguring a communication network |
WO2008037679A1 (en) * | 2006-09-28 | 2008-04-03 | Siemens Aktiengesellschaft | Method for reconfiguring a communication network |
US20100110880A1 (en) * | 2006-09-28 | 2010-05-06 | Vivek Kulkarni | Method for reconfiguring a communication network |
CN101523805B (en) * | 2006-09-28 | 2012-07-04 | 西门子公司 | Method for reconfiguring a communication network |
US8331222B2 (en) | 2007-12-14 | 2012-12-11 | Huawei Technologies Co., Ltd. | Link fault handling method and data forwarding apparatus |
EP2222023A1 (en) * | 2007-12-14 | 2010-08-25 | Huawei Technologies Co., Ltd. | Link fault processing method and data forwarding device |
EP2222023A4 (en) * | 2007-12-14 | 2011-05-25 | Huawei Tech Co Ltd | Link fault processing method and data forwarding device |
US20100260041A1 (en) * | 2007-12-14 | 2010-10-14 | Huawei Technologies Co., Ltd. | Link fault handling method and data forwarding apparatus |
US8650270B2 (en) * | 2008-07-10 | 2014-02-11 | Juniper Networks, Inc. | Distributed computing with multiple coordinated component collections |
US20100011096A1 (en) * | 2008-07-10 | 2010-01-14 | Blackwave Inc. | Distributed Computing With Multiple Coordinated Component Collections |
US20100082197A1 (en) * | 2008-09-30 | 2010-04-01 | Honeywell International Inc. | Intermittent fault detection and reasoning |
US8135985B2 (en) * | 2009-06-17 | 2012-03-13 | International Business Machines Corporation | High availability support for virtual machines |
US20100325471A1 (en) * | 2009-06-17 | 2010-12-23 | International Business Machines Corporation | High availability support for virtual machines |
US8386849B2 (en) | 2010-01-29 | 2013-02-26 | Honeywell International Inc. | Noisy monitor detection and intermittent fault isolation |
US20110191635A1 (en) * | 2010-01-29 | 2011-08-04 | Honeywell International Inc. | Noisy monitor detection and intermittent fault isolation |
US20130343228A1 (en) * | 2012-06-25 | 2013-12-26 | Qualcomm Atheros, Inc. | Spanning tree protocol for hybrid networks |
US9160564B2 (en) * | 2012-06-25 | 2015-10-13 | Qualcomm Incorporated | Spanning tree protocol for hybrid networks |
US20150106781A1 (en) * | 2013-10-14 | 2015-04-16 | International Business Machines Corporation | Verification of uml state machines |
US9454382B2 (en) * | 2013-10-14 | 2016-09-27 | International Business Machines Corporation | Verification of UML state machines |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5260945A (en) | Intermittent component failure manager and method for minimizing disruption of distributed computer system | |
Rodeheffer et al. | Automatic reconfiguration in Autonet | |
US5390326A (en) | Local area network with fault detection and recovery | |
EP3836485B1 (en) | Switching method, device and transfer control separation system of control plane device | |
US6580697B1 (en) | Advanced ethernet auto negotiation | |
EP0416942B1 (en) | Method of detecting a cable fault and switching to a redundant cable in a universal local area network | |
US5859959A (en) | Computer network with devices/paths having redundant links | |
US6202170B1 (en) | Equipment protection system | |
US5983371A (en) | Active failure detection | |
JP3454297B2 (en) | Method and apparatus for testing a link between network switches | |
US5329528A (en) | Duplex communication control device | |
JPH0799688A (en) | Multiplex transmission system | |
EP1601140B1 (en) | Method of monitoring a member router in a VRRP group | |
US20030224827A1 (en) | Concentrator and reset control method therefor | |
CN110213402B (en) | Electronic data distribution control device and method for operating such a control device | |
US5784274A (en) | System and method for monitoring errors occurring in data processed by a duplexed communication apparatus | |
Cisco | Troubleshooting WAN Connectivity | |
Cisco | Troubleshooting WAN Connectivity | |
Cisco | Troubleshooting WAN Connectivity | |
Cisco | Troubleshooting WAN Connectivity | |
Cisco | Troubleshooting WAN Connectivity | |
Cisco | Troubleshooting WAN Connectivity | |
Cisco | Troubleshooting WAN Connectivity | |
JP2578985B2 (en) | Redundant controller | |
EP0139125A2 (en) | Computer system including a workstation takeover control apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: DIGITAL EQUIPMENT CORPORATION, A CORPORATION OF MA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST.;ASSIGNOR:RODEHEFFER, THOMAS L.;REEL/FRAME:005761/0237 Effective date: 19910626 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
AS | Assignment |
Owner name: COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DIGITAL EQUIPMENT CORPORATION;COMPAQ COMPUTER CORPORATION;REEL/FRAME:012447/0903;SIGNING DATES FROM 19991209 TO 20010620 |
|
AS | Assignment |
Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS Free format text: CHANGE OF NAME;ASSIGNOR:COMPAQ INFORMATION TECHNOLOGIES GROUP, LP;REEL/FRAME:015000/0305 Effective date: 20021001 |
|
FPAY | Fee payment |
Year of fee payment: 12 |
|
CC | Certificate of correction |