WO2005053231A1 - Communication fault containment via indirect detection - Google Patents
Communication fault containment via indirect detection Download PDFInfo
- Publication number
- WO2005053231A1 WO2005053231A1 PCT/US2004/039260 US2004039260W WO2005053231A1 WO 2005053231 A1 WO2005053231 A1 WO 2005053231A1 US 2004039260 W US2004039260 W US 2004039260W WO 2005053231 A1 WO2005053231 A1 WO 2005053231A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- fault
- component
- node
- observing
- condition
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/44—Star or tree networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0659—Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0681—Configuration of triggering conditions
Definitions
- the self-checking pair provides near perfect coverage for preventing the propagation of faults in the network.
- Many other techniques have also evolved. Many of these techniques involve independent guardian functions that look at the content of the message itself to determine whether the data is faulty. These techniques include, but are not limited to, the use of a cyclic redundancy check (CRC), timers, etc. that determine whether there is a fault with the message based on some aspect of the message itself. .
- CRC cyclic redundancy check
- timers etc.
- Complexity has two detriments. First, an increase in complexity means an increase in the probability of hardware failure.
- Summary Embodiments of the present invention provide improved fault coverage through indirect detection of the operating conditions of component in a system, e.g., faults and proper operating conditions.
- indirect detection means that the component that detects a fault does so based on other components' responses to a faulty signal, rather than observing the faulty signal directly.
- the method includes monitoring for an expected action of the system that indirectly identifies the operating condition of the first component to a second component of the system, when the monitored expected action indicates a faulty operating condition, isolating the first component's errant behavior, and when the monitored expected action indicates a proper operating condition, proceeding with normal operation of the system.
- Figure 1 is a block diagram of a system with a guardian function that uses indirect detection of faults.
- Figure 2 is a flow chart of one embodiment of a process for indirect detection of a fault.
- FIG. 1 is a block diagram of a system, indicated generally at 100, with a central guardian function 102 that uses indirect detection of faults.
- system 100 is a communication system.
- the system 100 uses a time-triggered protocol such as the TTP/C time-triggered protocol. In other embodiments, other TDMA protocols are used.
- System 100 includes a plurality of components 104-1 to 104-N, e.g., nodes with transceivers for sending and receiving messages over the system 100.
- components 104-1 to 104-N are coupled in a star configuration as shown in Figure 1.
- components 104-1 to 104-N are coupled together in other known or later developed configurations, e.g., a mesh, bus or other appropriate communication architecture.
- components 104- 1 to 104-N may also include other electronic circuitry such as, for example, actuators, sensors, processors, controllers, or the like.
- System 100 includes a central component or hub 106.
- Hub 106 is configured to include the central guardian 102 that uses indirect detection to detect faults in system 100.
- central guardian 102 isolates the node that caused the fault to thereby prevent propagation of the fault.
- the central guardian 102 allows the nodes of the system 100 to operate normally.
- indirect detection means that the component that detects a fault or operating condition of a system component does so based on other components' responses or expected actions to a faulty or good signal, rather than observing the faulty or good signal directly.
- the information that is used to indirectly detect a fault or operating condition is based on control signals generated by other components that are used for other specific purposes in the system.
- central guardian 102 uses indirect detection of an operating condition, e.g., faulty or good, in system 100.
- Central guardian 102 monitors a condition or an expected action of network 100 to indirectly detect a fault.
- central guardian 102 monitors control signals, e.g., beacons (action time signals), Clear to Send signals, or other appropriate control signals.
- central guardian 102 monitors other messages, e.g., X frames, or modified CRC or other check value, to isolate faults in the network through indirect detection.
- FIG. 2 is a flow chart of one embodiment of a process for indirect detection of a fault in a component of a system having a plurality of components.
- the method begins at block 200.
- the method monitors a condition or expected action in the system. For example, in one embodiment, the method observes inaction in one component. In another embodiment, the method monitors status information derived by other system components, e.g., a status vector of an X-Frame. In yet another embodiment, the method observes the relative timing of actions of multiple system components. In yet a further embodiment, the method observes conflicting requests for access to system resources.
- the method derives sequencing information from messages communicated in the network.
- the process analyzes the observed condition or expected action to determine, indirectly, whether the operating condition, e.g., good or faulty, of a component in the system. Continuing the examples from above, if the method observed inaction in one component after a message intended to cause action, then the method identifies a fault condition. On the other hand, if the proper action is observed, the method identifies a good or proper operating condition.
- the method determines that the component is faulty without independent analysis of the underlying faulty data.
- the method observes the relative timing of actions of multiple system components includes one that falls outside of a system specification, the process identifies a fault condition.
- the process determines that the operating condition of the component is good.
- the method identifies a fault condition.
- the process determines that the components are operating properly.
- the method identifies a fault condition.
- the process identifies a proper operating condition. If there is no fault, the process proceeds with normal operation at block 206 and returns to block 202 to further observe conditions or expected actions in the system. If there is a fault, the process proceeds to block 208 and takes action to prevent the propagation of faults in the system.
- the method identifies a node as faulty by mapping a number of indirect fault detection observations to an inference of which node is faulty. Further, the method drops further messages generated by the faulty node at least for a period of time or takes other action to prevent the fault from propagating through the network. The method then returns to block 202 to observe further conditions in the system.
- indirect detection are described in the co-pending applications incorporated by reference above. Provisional Patent Application serial no.
- Provisional Patent Application serial number 60/523,899, entitled “CONTROLLED START UP IN A TIME DINISION MULTIPLE ACCESS SYSTEM,” filed on November 19, 2003 and co-pending application attorney docket number H0005066 entitled “CONTROLLING START UP IN A NETWORK,” filed on even date herewith describe a technique for indirectly identifying a fault based on a lack of beacons, e.g., action time signals, or other signal normally generated the synchronous mode of operation following a message from a node in an unsynchronized mode of operation.
- these applications also use indirect detection to detect entry into a synchronized state by observing the transmittal of signals, e.g., guardian messages for voted schedule enforcement or beacons (action time signals) from the many nodes after start up. When the signals are not present, a fault is detected.
- H0005061 entitled “MESSAGE ERROR NERIFICATION USING CRC WITH HIDDEN DATA,” filed on even date herewith describe a technique for deriving sequence information from CRC values.
- the methods and techniques described here may be implemented in digital electronic circuitry, or with a programmable processor (for example, a special- purpose processor or a general-purpose processor such as a computer) firmware, software, or in combinations of them.
- Apparatus embodying these techniques may include appropriate input and output devices, a programmable processor, and a storage medium tangibly embodying program instructions for execution by the programmable processor.
- a process embodying these techniques may be performed by a programmable processor executing a program of instructions stored on a machine readable medium to perform desired functions by operating on input data and generating appropriate output.
- the techniques may advantageously be implemented in one or more programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device.
- a processor will receive instructions and data from a read-only memory and/or a random access memory.
- Storage devices or machine readable medium suitable for tangibly embodying computer program instructions and data include all forms of non- volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and DND disks. Any of the foregoing may be supplemented by, or incorporated in, specially-designed application-specific integrated circuits (ASICs).
- ASICs application-specific integrated circuits
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Maintenance And Management Of Digital Transmission (AREA)
- Small-Scale Networks (AREA)
- Debugging And Monitoring (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006541636A JP2007511989A (en) | 2003-11-19 | 2004-11-19 | Confinement of communication failure by indirect detection |
EP04811902A EP1698105A1 (en) | 2003-11-19 | 2004-11-19 | Communication fault containment via indirect detection |
Applications Claiming Priority (10)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US52378203P | 2003-11-19 | 2003-11-19 | |
US52389903P | 2003-11-19 | 2003-11-19 | |
US52390003P | 2003-11-19 | 2003-11-19 | |
US52386503P | 2003-11-19 | 2003-11-19 | |
US52378303P | 2003-11-19 | 2003-11-19 | |
US60/523,783 | 2003-11-19 | ||
US60/523,865 | 2003-11-19 | ||
US60/523,782 | 2003-11-19 | ||
US60/523,899 | 2003-11-19 | ||
US60/523,900 | 2003-11-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005053231A1 true WO2005053231A1 (en) | 2005-06-09 |
Family
ID=34637436
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2004/039260 WO2005053231A1 (en) | 2003-11-19 | 2004-11-19 | Communication fault containment via indirect detection |
Country Status (4)
Country | Link |
---|---|
US (1) | US20050172167A1 (en) |
EP (1) | EP1698105A1 (en) |
JP (1) | JP2007511989A (en) |
WO (1) | WO2005053231A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8204037B2 (en) * | 2007-08-28 | 2012-06-19 | Honeywell International Inc. | Autocratic low complexity gateway/ guardian strategy and/or simple local guardian strategy for flexray or other distributed time-triggered protocol |
US8498276B2 (en) | 2011-05-27 | 2013-07-30 | Honeywell International Inc. | Guardian scrubbing strategy for distributed time-triggered protocols |
US11481291B2 (en) * | 2021-01-12 | 2022-10-25 | EMC IP Holding Company LLC | Alternative storage node communication channel using storage devices group in a distributed storage system |
US11221907B1 (en) * | 2021-01-26 | 2022-01-11 | Morgan Stanley Services Group Inc. | Centralized software issue triage system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5049873A (en) * | 1988-01-29 | 1991-09-17 | Network Equipment Technologies, Inc. | Communications network state and topology monitor |
US5864662A (en) * | 1996-06-28 | 1999-01-26 | Mci Communication Corporation | System and method for reported root cause analysis |
US6292508B1 (en) * | 1994-03-03 | 2001-09-18 | Proxim, Inc. | Method and apparatus for managing power in a frequency hopping medium access control protocol |
WO2002045315A2 (en) * | 2000-11-28 | 2002-06-06 | Micromuse Inc. | Method and system for predicting causes of network service outages using time domain correlation |
US20020152185A1 (en) * | 2001-01-03 | 2002-10-17 | Sasken Communication Technologies Limited | Method of network modeling and predictive event-correlation in a communication system by the use of contextual fuzzy cognitive maps |
US20030084146A1 (en) * | 2001-10-25 | 2003-05-01 | Schilling Cynthia K. | System and method for displaying network status in a network topology |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5987432A (en) * | 1994-06-29 | 1999-11-16 | Reuters, Ltd. | Fault-tolerant central ticker plant system for distributing financial market data |
FR2724026B1 (en) * | 1994-08-29 | 1996-10-18 | Aerospatiale | METHOD AND DEVICE FOR IDENTIFYING FAULTS IN A COMPLEX SYSTEM |
DE19509558A1 (en) * | 1995-03-16 | 1996-09-19 | Abb Patent Gmbh | Process for fault-tolerant communication under high real-time conditions |
US5809220A (en) * | 1995-07-20 | 1998-09-15 | Raytheon Company | Fault tolerant distributed control system |
JPH10276196A (en) * | 1997-03-28 | 1998-10-13 | Ando Electric Co Ltd | Communication monitor |
US6272648B1 (en) * | 1997-05-13 | 2001-08-07 | Micron Electronics, Inc. | System for communicating a software-generated pulse waveform between two servers in a network |
JP4108877B2 (en) * | 1998-07-10 | 2008-06-25 | 松下電器産業株式会社 | NETWORK SYSTEM, NETWORK TERMINAL, AND METHOD FOR SPECIFYING FAILURE LOCATION IN NETWORK SYSTEM |
US20010052084A1 (en) * | 1998-11-10 | 2001-12-13 | Jiandoug Huang | Apparatus and methods for providing fault tolerance of networks and network interface cards |
US6577599B1 (en) * | 1999-06-30 | 2003-06-10 | Sun Microsystems, Inc. | Small-scale reliable multicasting |
US6775236B1 (en) * | 2000-06-16 | 2004-08-10 | Ciena Corporation | Method and system for determining and suppressing sympathetic faults of a communications network |
AT410490B (en) * | 2000-10-10 | 2003-05-26 | Fts Computertechnik Gmbh | METHOD FOR TOLERATING "SLIGHTLY-OFF-SPECIFICATION" ERRORS IN A DISTRIBUTED ERROR-TOLERANT REAL-TIME COMPUTER SYSTEM |
US6782489B2 (en) * | 2001-04-13 | 2004-08-24 | Hewlett-Packard Development Company, L.P. | System and method for detecting process and network failures in a distributed system having multiple independent networks |
US7284047B2 (en) * | 2001-11-08 | 2007-10-16 | Microsoft Corporation | System and method for controlling network demand via congestion pricing |
US6721907B2 (en) * | 2002-06-12 | 2004-04-13 | Zambeel, Inc. | System and method for monitoring the state and operability of components in distributed computing systems |
-
2004
- 2004-11-19 WO PCT/US2004/039260 patent/WO2005053231A1/en not_active Application Discontinuation
- 2004-11-19 EP EP04811902A patent/EP1698105A1/en not_active Withdrawn
- 2004-11-19 JP JP2006541636A patent/JP2007511989A/en not_active Withdrawn
- 2004-11-19 US US10/993,916 patent/US20050172167A1/en not_active Abandoned
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5049873A (en) * | 1988-01-29 | 1991-09-17 | Network Equipment Technologies, Inc. | Communications network state and topology monitor |
US6292508B1 (en) * | 1994-03-03 | 2001-09-18 | Proxim, Inc. | Method and apparatus for managing power in a frequency hopping medium access control protocol |
US5864662A (en) * | 1996-06-28 | 1999-01-26 | Mci Communication Corporation | System and method for reported root cause analysis |
WO2002045315A2 (en) * | 2000-11-28 | 2002-06-06 | Micromuse Inc. | Method and system for predicting causes of network service outages using time domain correlation |
US20020152185A1 (en) * | 2001-01-03 | 2002-10-17 | Sasken Communication Technologies Limited | Method of network modeling and predictive event-correlation in a communication system by the use of contextual fuzzy cognitive maps |
US20030084146A1 (en) * | 2001-10-25 | 2003-05-01 | Schilling Cynthia K. | System and method for displaying network status in a network topology |
Also Published As
Publication number | Publication date |
---|---|
US20050172167A1 (en) | 2005-08-04 |
EP1698105A1 (en) | 2006-09-06 |
JP2007511989A (en) | 2007-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2137892B1 (en) | Node of a distributed communication system, and corresponding communication system | |
US7430261B2 (en) | Method and bit stream decoding unit using majority voting | |
US20100229046A1 (en) | Bus Guardian of a User of a Communication System, and a User of a Communication System | |
US9417982B2 (en) | Method and apparatus for isolating a fault in a controller area network | |
EP3185481B1 (en) | A host-to-host test scheme for periodic parameters transmission in synchronous ttp systems | |
KR100848853B1 (en) | Error handling method and system of error-tolerant distributed computer system | |
KR20090088381A (en) | Restoration method, system and computer readable recording medium on network | |
US8228953B2 (en) | Bus guardian as well as method for monitoring communication between and among a number of nodes, node comprising such bus guardian, and distributed communication system comprising such nodes | |
WO2013044281A1 (en) | Method for a clock-rate correction in a network consisting of nodes | |
CN111130951B (en) | Equipment state detection method, device and storage medium | |
US20050172167A1 (en) | Communication fault containment via indirect detection | |
Cranen | Model checking the FlexRay startup phase | |
US7729254B2 (en) | Parasitic time synchronization for a centralized communications guardian | |
Daniel et al. | Failure detection in tsn startup using deep learning | |
US20070271486A1 (en) | Method and system to detect software faults | |
Steiner et al. | Layered diagnosis and clock-rate correction for the ttethernet clock synchronization protocol | |
US7802150B2 (en) | Ensuring maximum reaction times in complex or distributed safe and/or nonsafe systems | |
US7698395B2 (en) | Controlling start up in a network | |
Latronico et al. | Design time reliability analysis of distributed fault tolerance algorithms | |
Kordes et al. | Startup error detection and containment to improve the robustness of hybrid FlexRay networks | |
Milbredt et al. | An investigation of the clique problem in FlexRay | |
Pfeifer | Formal methods in the automotive domain: The case of TTA | |
JPH08307438A (en) | Token ring type transmission system | |
EP2761795B1 (en) | Method for diagnosis of failures in a network | |
CN119109827A (en) | Vehicle communication test method, vehicle, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2004811902 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2006541636 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
WWP | Wipo information: published in national office |
Ref document number: 2004811902 Country of ref document: EP |