CN109032822B - Method and device for storing crash information - Google Patents
Method and device for storing crash information Download PDFInfo
- Publication number
- CN109032822B CN109032822B CN201710432510.XA CN201710432510A CN109032822B CN 109032822 B CN109032822 B CN 109032822B CN 201710432510 A CN201710432510 A CN 201710432510A CN 109032822 B CN109032822 B CN 109032822B
- Authority
- CN
- China
- Prior art keywords
- watchdog
- information
- condition
- cpu
- under
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000008569 process Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 3
- 239000000725 suspension Substances 0.000 claims description 3
- 230000002159 abnormal effect Effects 0.000 abstract description 17
- 230000000694 effects Effects 0.000 abstract description 4
- 241000282472 Canis lupus familiaris Species 0.000 description 10
- 238000010586 diagram Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 208000031361 Hiccup Diseases 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0787—Storage of error reports, e.g. persistent data storage, storage using memory protection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/24—Resetting means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
- G06F11/0754—Error or fault detection not based on redundancy by exceeding limits
- G06F11/0757—Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Retry When Errors Occur (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a method and a device for storing crash information, wherein the method comprises the following steps: under the condition that the system is determined to be abnormally restarted, the first watchdog is controlled to reset the CPU; after the CPU is reset, setting a first watchdog and a second watchdog as a hardware feeding mode, and storing dead information into a flash memory; the second watchdog is configured as a dongle, and all devices of the whole board are reset if the second watchdog is overtime. The invention effectively solves the technical problem that the crash information cannot be effectively saved under the condition of abnormal crash restarting of the system in the prior art, so that the follow-up failure cannot be effectively analyzed, and achieves the technical effect that the crash information can be effectively saved under the condition of serious crash of the system.
Description
Technical Field
The present invention relates to the field of computers, and in particular, to a method and an apparatus for storing crash information.
Background
The crash information plays an important role in analyzing the cause of the failure. In general, the system can save the crash information in the flash when the crash occurs, but sometimes, the crash situation is serious, so that the system cannot respond in time, and the crash information cannot be saved when the crash occurs. When the serious crash occurs, the crash information is lost because the crash information cannot be stored, so that failure analysis cannot be performed and recovery analysis cannot be performed on the failure.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The invention provides a method and a device for storing crash information, which are used for solving the technical problem that the crash information cannot be effectively stored under the condition of abnormal crash restarting of a system in the prior art, so that the follow-up failure cannot be effectively analyzed.
In order to solve the above technical problems, in one aspect, the present invention provides a method for storing crash information, including: under the condition that the system is determined to be abnormally restarted, the first watchdog is controlled to reset the CPU; after the CPU is reset, setting a first watchdog and a second watchdog as a hardware feeding mode, and storing dead information into a flash memory; the second watchdog is configured as a dongle, and all devices of the whole board are reset if the second watchdog is overtime.
Optionally, in the case that the system is determined to be abnormally restarted, controlling the first watchdog to reset the CPU includes: under the condition that the system crashes and does not respond, the software stops feeding the software to the first watchdog; and under the condition that the feeding dog is overtime, resetting the CPU by the first watchdog, and setting a first identifier, wherein the first identifier is used for identifying that the system is abnormally restarted.
Optionally, after the CPU resets, setting the first watchdog and the second watchdog as a hardware watchdog feeding mode, and storing the dead information in the flash memory includes: determining whether the first watchdog and the second watchdog are successfully set to store the dead information in the flash memory for the hardware feeding mode; under the condition of unsuccessful, retrying to set the first watchdog and the second watchdog to save the dead information in the flash memory for the hardware feeding mode, and recording the retry times; and under the condition that the retry times exceed a preset threshold value, discarding the first watchdog and the second watchdog to be in a hardware feeding mode.
Optionally, the software feeding is performed by software and the hardware feeding is performed by a programmable logic device.
Optionally, the crash information includes at least one of: memory mirror information, register information for one or more devices in the overall board.
Optionally, in the case that the system is determined to be abnormally restarted, before the first watchdog resets the CPU, the method further includes: in the running process of the system, the software continuously updates the current stack pointer to a preset memory address; after resetting all devices of the whole board, the method further comprises: and checking data information by checking a stack pointer when the machine is halted.
On the other hand, the invention also provides a device for storing the crash information, which comprises: the control module is used for controlling the first watchdog to reset the CPU under the condition that the system is determined to be abnormally restarted; the storage module is used for setting the first watchdog and the second watchdog to be in a hardware feeding mode after the CPU is reset, and storing the dead information into the flash memory; and the resetting module is used for configuring the second watchdog as a software watchdog, and resetting all devices of the whole board under the condition that the second watchdog is overtime.
Optionally, the control module includes: a suspension unit, configured to stop software feeding the first watchdog under a condition that a system crashes and does not respond; and a control unit. And under the condition that the feeding dog is overtime, the first watchdog resets the CPU and sets a first identifier, wherein the first identifier is used for identifying that the system is abnormally restarted.
Optionally, the storage module includes: the determining unit is used for determining whether the first watchdog and the second watchdog are successfully set to save the dead information in the flash memory in a hardware feeding mode; the retry unit is used for retrying setting the first watchdog and the second watchdog to be in a hardware feeding mode under the condition of unsuccessful, saving the dead information in the flash memory, and recording the retry times; and the discarding unit is used for discarding the first watchdog and the second watchdog to be in a hardware feeding mode under the condition that the retry number exceeds a preset threshold value.
Optionally, the device for storing crash information further includes: the updating module is used for continuously updating the current stack pointer to a preset memory address by software in the running process of the system before the first watchdog resets the CPU under the condition that the system is determined to be abnormally restarted; and the checking module is used for checking the data information by checking a stack pointer when the whole device is halted after resetting all the devices of the whole board.
In another aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.
The invention has the following beneficial effects: under the condition that the system is determined to be abnormally restarted, by arranging two watchdog, only resetting the CPU is carried out by arranging a first watchdog, so that the system can save the crash information, and then resetting the whole board is realized by a second watchdog, so that the system is restarted, the crash information can be effectively saved under the condition that the system is seriously crashed, the problem that the crash information cannot be effectively saved under the condition that the system is abnormally crashed and restarted in the prior art is solved, the subsequent technical problem that the failure cannot be effectively analyzed is solved, and the technical effect that the crash information can be effectively saved under the condition that the system is seriously crashed is achieved.
Drawings
FIG. 1 is a flow chart of a method for storing crash information in an embodiment of the invention;
FIG. 2 is a block diagram of a device for storing crash information according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a system architecture according to an embodiment of the present invention;
FIG. 4 is a flowchart of a method for saving crash information according to an embodiment of the invention.
Detailed Description
In order to solve the technical problem that in the prior art, under the condition of abnormal crash restarting of a system, crash information cannot be effectively stored, so that faults cannot be effectively analyzed later, the invention provides a method and a device for storing the crash information, and the invention is further described in detail by combining drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In this example, as shown in fig. 1, a method for saving crash information is provided, which may include the following steps:
step 101: under the condition that the system is determined to be abnormally restarted, the first watchdog is controlled to reset the CPU;
step 102: after the CPU is reset, setting a first watchdog and a second watchdog as a hardware feeding mode, and storing dead information into a flash memory;
step 103: the second watchdog is configured as a dongle, and all devices of the whole board are reset if the second watchdog is overtime.
In the above example, under the condition that the system is determined to be abnormally restarted, by setting two watchdog, only resetting the CPU is carried out by setting the first watchdog, so that the system can save the crash information, and then resetting the whole board is realized by the second watchdog, so that the restarting of the system is realized, the crash information can be effectively saved under the condition that the system is seriously crashed, the problem that the crash information cannot be effectively saved under the condition that the system is abnormally crashed and restarted in the prior art is solved, the subsequent technical problem that the failure cannot be effectively analyzed is solved, and the technical effect that the crash information can be effectively saved under the condition that the system is seriously crashed is achieved.
Considering that there is a process from the system crash to the restart, in which a plurality of devices are required to cooperate, in order to achieve effective recording of system information, so that each component can cooperate effectively, this object can be achieved by setting a flag. In one embodiment, when the system is determined to be abnormally restarted, and the first watchdog is controlled to reset the CPU, software can stop feeding the first watchdog under the condition that the system is dead and unresponsive; and under the condition that the feeding dog is overtime, resetting the CPU by the first watchdog, and setting a first identifier, wherein the first identifier is used for identifying that the system is abnormally restarted.
Specifically, the programmable logic device may provide two registers: a crash_flag register for recording whether there is an abnormal restart phenomenon; and a dump_retry register for recording the number of attempts to enter dump mode. When the abnormal restart condition is determined, an identifier is recorded in the crash_flag register and is used for identifying that the system is abnormally restarted. For example, the crash_flag register initial value may be set to 0; wherein, for the crash_flag register, 0 indicates a normal restart of the system, and 1 indicates a abnormal restart of the system.
In order to realize effective storage of the crash information, the two new products can be connected through a programmable logic device by arranging a first watchdog chip and a second watchdog chip, wherein the first watchdog chip is connected to a CPU reset signal, and the second watchdog chip is connected to a whole board reset signal. If the first watchdog chip is restarted, only the CPU will be restarted, and other devices can be kept in a state before reset.
Considering that when there is a special restart, it is desirable to perform a state or mode of saving crash information, but in a situation that the system tries to enter the mode all the time, the recognition may cause confusion of the system, and considering that a retry number may be set, if the retry number is exceeded, the system may be abandoned to ensure that the system can execute orderly. In one embodiment, after the CPU is reset, setting the first watchdog and the second watchdog to be in a hardware feeding mode, and storing the dead machine information in the flash memory may include: determining whether the first watchdog and the second watchdog are successfully set to store the dead information in the flash memory for the hardware feeding mode; under the condition of unsuccessful, retrying to set the first watchdog and the second watchdog to save the dead information in the flash memory for the hardware feeding mode, and recording the retry times; and under the condition that the retry times exceed a preset threshold value, discarding the first watchdog and the second watchdog to be in a hardware feeding mode. That is, it is determined whether the exception handling mode is successfully entered, and if the exception handling mode is not entered for a plurality of attempts, the exception handling mode is abandoned.
When the watchdog feeding is performed, the software feeding can be performed through software, and the hardware feeding can be performed through the programmable logic device, namely, different feeding modes can be selected according to the needs.
The crash information described above may include, but is not limited to, at least one of: memory mirror information, register information for one or more devices in the overall board.
In order to realize the reading and viewing of the data, the method can be carried out by setting a stack pointer, and specifically, before the first watchdog resets the CPU, the software continuously updates the current stack pointer to a preset memory address in the running process of the system under the condition that the system is determined to be abnormally restarted; thus, after all devices of the whole board are reset, the data information can be checked by checking a stack pointer when the device is in a dead state. For example, stack backtracking, variable viewing, code segment and data information can be performed by viewing stack fingerprints at the time of crash. By looking at the register information of the critical devices, the status of the problem module can be analyzed.
The method for storing the crash information provided by the embodiment can be applied to the embedded field and can be written through hardware design and programmable logic devices.
Based on the same inventive concept, the embodiment of the invention also provides a device for storing crash information, as described in the following embodiments. Because the principle of solving the problem of the dead halt information storage device is similar to that of the dead halt information storage method, the implementation of the dead halt information storage device can refer to the implementation of the dead halt information storage method, and repeated parts are not repeated. As used below, the term "unit" or "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated. Fig. 2 is a block diagram of a device for storing crash information according to an embodiment of the present invention, and as shown in fig. 2, may include: the control module 201, the save module 202, and the reset module 203 are described below.
The control module 201 is configured to control the first watchdog to reset the CPU if it is determined that the system is abnormally restarted;
the saving module 202 is configured to set the first watchdog and the second watchdog to be in a hardware feeding mode after the CPU is reset, and save the dead information to the flash memory;
and the resetting module 203 is configured to configure the second watchdog as a software watchdog, and reset all devices of the whole board if the second watchdog is overtime.
In one embodiment, the control module 201 may include: a suspension unit, configured to stop software feeding the first watchdog under a condition that a system crashes and does not respond; and a control unit. And under the condition that the feeding dog is overtime, the first watchdog resets the CPU and sets a first identifier, wherein the first identifier is used for identifying that the system is abnormally restarted.
In one embodiment, the save module 202 may include: the determining unit is used for determining whether the first watchdog and the second watchdog are successfully set to save the dead information in the flash memory in a hardware feeding mode; the retry unit is used for retrying setting the first watchdog and the second watchdog to be in a hardware feeding mode under the condition of unsuccessful, saving the dead information in the flash memory, and recording the retry times; and the discarding unit is used for discarding the first watchdog and the second watchdog to be in a hardware feeding mode under the condition that the retry number exceeds a preset threshold value.
In one embodiment, the apparatus for storing crash information may further include: the updating module is used for continuously updating the current stack pointer to a preset memory address by software in the running process of the system before the first watchdog resets the CPU under the condition that the system is determined to be abnormally restarted; and the checking module is used for checking the data information by checking a stack pointer when the whole device is halted after resetting all the devices of the whole board.
In this example, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
s1: under the condition that the system is determined to be abnormally restarted, the first watchdog is controlled to reset the CPU;
s2: after the CPU is reset, setting a first watchdog and a second watchdog as a hardware feeding mode, and storing dead information into a flash memory;
s3: the second watchdog is configured as a dongle, and all devices of the whole board are reset if the second watchdog is overtime.
Namely, under the condition that the system is determined to be abnormally restarted, by arranging two watchdog, only the CPU is reset through the first watchdog, so that the system can save the dead halt information, and then the reset of the whole board is realized through the second watchdog, so that the restarting of the system is realized, the dead halt information can be effectively saved under the condition that the system is seriously halted, the problem that the dead halt information cannot be effectively saved under the condition that the system is abnormally halted and restarted in the prior art is solved, the subsequent technical problem that the failure cannot be effectively analyzed is solved, and the technical effect that the dead halt new hiccup can be effectively saved under the condition that the system is seriously halted is achieved.
Considering that there is a process from the system crash to the restart, in which a plurality of devices are required to cooperate, in order to achieve effective recording of system information, so that each component can cooperate effectively, this object can be achieved by setting a flag. In one embodiment, when the system is determined to be abnormally restarted, and the first watchdog is controlled to reset the CPU, software can stop feeding the first watchdog under the condition that the system is dead and unresponsive; and under the condition that the feeding dog is overtime, resetting the CPU by the first watchdog, and setting a first identifier, wherein the first identifier is used for identifying that the system is abnormally restarted.
Specifically, the programmable logic device may provide two registers: a crash_flag register for recording whether there is an abnormal restart phenomenon; and a dump_retry register for recording the number of attempts to enter dump mode. When the abnormal restart condition is determined, an identifier is recorded in the crash_flag register and is used for identifying that the system is abnormally restarted. For example, the crash_flag register initial value may be set to 0; wherein, for the crash_flag register, 0 indicates a normal restart of the system, and 1 indicates a abnormal restart of the system.
In order to realize effective storage of the crash information, the two new products can be connected through a programmable logic device by arranging a first watchdog chip and a second watchdog chip, wherein the first watchdog chip is connected to a CPU reset signal, and the second watchdog chip is connected to a whole board reset signal. If the first watchdog chip is restarted, only the CPU will be restarted, and other devices can be kept in a state before reset.
Considering that when there is a special restart, it is desirable to perform a state or mode of saving crash information, but in a situation that the system tries to enter the mode all the time, the recognition may cause confusion of the system, and considering that a retry number may be set, if the retry number is exceeded, the system may be abandoned to ensure that the system can execute orderly. In one embodiment, after the CPU is reset, setting the first watchdog and the second watchdog to be in a hardware feeding mode, and storing the dead machine information in the flash memory may include: determining whether the first watchdog and the second watchdog are successfully set to store the dead information in the flash memory for the hardware feeding mode; under the condition of unsuccessful, retrying to set the first watchdog and the second watchdog to save the dead information in the flash memory for the hardware feeding mode, and recording the retry times; and under the condition that the retry times exceed a preset threshold value, discarding the first watchdog and the second watchdog to be in a hardware feeding mode. That is, it is determined whether the exception handling mode is successfully entered, and if the exception handling mode is not entered for a plurality of attempts, the exception handling mode is abandoned.
When the watchdog feeding is performed, the software feeding can be performed through software, and the hardware feeding can be performed through the programmable logic device, namely, different feeding modes can be selected according to the needs.
The crash information described above may include, but is not limited to, at least one of: memory mirror information, register information for one or more devices in the overall board.
In order to realize the reading and viewing of the data, the method can be carried out by setting a stack pointer, and specifically, before the first watchdog resets the CPU, the software continuously updates the current stack pointer to a preset memory address in the running process of the system under the condition that the system is determined to be abnormally restarted; thus, after all devices of the whole board are reset, the data information can be checked by checking a stack pointer when the device is in a dead state. For example, stack backtracking, variable viewing, code segment and data information can be performed by viewing stack fingerprints at the time of crash. By looking at the register information of the critical devices, the status of the problem module can be analyzed.
The method and apparatus for storing crash information are described below with reference to a specific embodiment, however, it should be noted that the specific embodiment is only for better explaining the present application and is not meant to be unduly limiting.
In order to solve the problem that the dead halt information cannot be recorded due to the fact that a CPU thoroughly loses sound in the existing dead halt processing process, in the embodiment, the dead halt information is recorded by recovering the system when the system completely loses response, and the purpose of recording the dead halt information when a serious dead halt condition occurs is achieved.
When the system is in normal operation, the pointer of the current stack is continuously updated into the memory, and when the system needs to be restarted, an abnormal processing mode (called as a DUMP mode for short) is entered, and crash information is stored in the abnormal processing mode, wherein the crash information can include: complete memory mirroring, critical device register state information, etc.
Based on the above-mentioned conception, in this example, a method for saving crash information is provided, so that the crash information can be saved for fault analysis in the case of serious crash phenomenon without affecting normal restarting of the system. The method can comprise the following steps:
s1: the programmable logic device connects two watchdog chips, a first watchdog chip connected to the CPU reset signal and a second watchdog chip connected to the reset signal of the whole board (i.e., the reset signal containing the CPU and all other devices). If the first watchdog chip is restarted, then only the CPU will be restarted, and the state of other devices (e.g., DDR, DSP, etc.) can remain in the pre-reset state.
S2: the configuration of the watchdog chip is controlled by a programmable logic device. The programmable logic device provides two registers: a crash_flag register for recording whether there is an abnormal restart phenomenon; and a dump_retry register for recording the number of attempts to enter dump mode.
The initial value of the programmable logic device is configured to: the first watchdog chip and the second watchdog chip are configured as hardware dogs, and the programmable logic device is used for feeding dogs; the initial value of the crash_flag register is 0; the dump_retry register initial value is 0. Wherein, for the crash_flag register, 0 indicates a normal restart of the system, and 1 indicates a abnormal restart of the system.
S3: and in the boot stage of the system, before the code is migrated to the memory, reading a crash_flag register, and judging whether an abnormal restart exists. crash_flag=1, indicating that the last restart of the system is an abnormal restart, then an exception handling mode (i.e., DUMP mode) is entered, and step S4 is entered; crash_flag=0, which indicates that the last restart of the system is a normal restart, and continues normal operation, and step S5 is entered: . The code is judged before being migrated to the memory, so that the boot can be prevented from modifying the memory content, namely, if the memory content is caused by abnormal restart in the last restart, the memory content is the same as the abnormal memory content.
S4: attempting to enter DUMP mode:
1) If the DUMP mode is successful, both the first and second watchdog chips are configured as hardware dogs, dump_retry=0 is set, and crash information is saved (e.g.: complete memory mirror and critical device register state, etc.) into flash. After the save operation is finished, the programmable logic device register crash_flag=0 is configured, and the first watchdog chip is configured as a dongle, all devices of the whole board are reset, and step S3 is re-entered. At this time, the crash information (i.e., the complete memory image, etc.) is the same as when the exception occurred.
2) If the DUMP mode is failed, the programmable logic device is not restarted at the moment, the first watchdog chip is a software watchdog, and the second watchdog chip is a hardware watchdog. Therefore, after the restart time is fixed (different devices are selected, the restart time is different), the CPU restarts again, and the step S3 is restarted, where crash_flag=1 indicates that the retry is to enter the DUMP mode. The number of retries is recorded to dump_retry. When the number of retries is greater than 3, the programmable logic device register crash_flag=0 and dump_retry=0 may be configured, indicating that the system foregoes entering DUMP mode.
S5: the boot continues to run normally, and the boot phase configures crash_flag=1. Waiting until the software can perform a software feeding stage (generally, a kernel stage), two watchdog chips are configured, the first watchdog chip is configured to perform software feeding, the second watchdog chip is configured to perform hardware feeding, the programmable logic device performs feeding, and when the current execution function is put on stack, a stack pointer is continuously updated and stored in a fixed memory address.
S6: the following three restart conditions may be encountered during operation:
1) And (5) restarting after dead halt: the first watchdog chip is not fed with software, after the restart time is fixed (different devices are selected according to the requirement, the restart time is different), the other devices are not restarted, and the other devices except the CPU are kept in the state before restarting, in this case, the register information of the memory and the key devices is mainly kept. The process will return to step S3, and since crash_flag=1 at this time, the DUMP mode will eventually be entered to save the crash information.
2) And (3) manual restarting: it can be considered as normal restart, when all devices of the whole board are reset by software configuration crash_flag=0, and the process returns to step S3.
3) And (5) restarting after power failure: it can be considered a normal restart and all devices of the whole board will be powered down and restarted, and will return to step s3.
By the method for storing the crash information, the crash information can be effectively stored under the condition that the system is not in response to the crash.
The following is a specific example:
in this example, as shown in fig. 3. In addition to using the system main body, it is mainly composed of a programmable logic device (which may be a device such as a CPLD or an FPGA, hereinafter simply referred to as a logic device) and two watchdog chips (hereinafter simply referred to as a watchdog 1 and a watchdog 2).
The logic device outputs two paths of watchdog input signals (MDI_1 and MDI_2), which are respectively input to watchdog input (MDI) pins of the two watchdog chips, wherein an output REST pin of the watchdog 1 is connected to a CPU RESET signal (CPU RESET) of the System, and an output RESET pin (RESET) of the watchdog 2 is connected to a whole board RESET signal (System RESET) of the System.
In this example, taking the case that the crash abnormality occurs and the system does not respond as an example, the method for saving the crash information specifically includes the following steps, as shown in fig. 4:
step S1: in the running process of the system, the software continuously updates the current stack pointer to a fixed memory address, and when the system is dead and has no response, the software stops feeding dogs, and after the feeding dogs are overtime, the watchdog 1 resets the CPU. At this time crash_flag=1.
Step S2: after the CPU is reset, the boot stage is entered, and the crash_flag is read before the code is migrated to the memory, because crash_flag=1, the system is indicated to be abnormally restarted, and an exception handling mode (DUMP mode) is entered.
Step S3: and entering a DUMP mode, configuring the watchdog 1 and the watchdog 2 into a hardware feed mode, wherein dump_retry=0, and storing complete memory mirror information and key device (DSP and the like) register information into a flash. After the save operation is completed, the logical device register crash_flag=0 is written, and watchdog 2 is configured as a software watchdog, and after the watchdog times out, the system resets all devices of the whole board.
Step S4: after the whole board is reset, the system is restarted, a boot stage is entered, and the crash_flag is read before the code is migrated to the memory, wherein the crash_flag=0 indicates that the system enters a normal starting stage.
Step S5: after the system is started normally, the stored crash information can be checked through a read flash. Stack backtracking, variable checking, code segment information, data information and the like can be performed by checking a stack pointer during crash; by looking at the register information of the critical devices, the status of the problem module can be analyzed.
In the above example, the problem that the dead halt information is lost under the condition that the existing system is dead halt and is not in reaction at all and dead halt information is not stored is solved through hardware design and software writing, and the normal operation of the system is not influenced when the system is realized, so that the realization is simpler.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be separately fabricated into individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (9)
1. A method for storing crash information, comprising:
under the condition that the system is determined to be abnormally restarted before the code is migrated to the memory, only the first watchdog in the first watchdog and the second watchdog is controlled to reset the CPU; wherein the first watchdog is connected to a CPU reset signal, and the second watchdog is connected to a whole board reset signal;
after the CPU is reset, setting a first watchdog and a second watchdog as a hardware feeding mode, and storing dead information into a flash memory;
configuring the second watchdog as a dongle, and resetting all devices of the whole board under the condition that the second watchdog is overtime;
in the case that the system is determined to be abnormally restarted before the code is migrated to the memory, controlling only the first watchdog in the first watchdog and the second watchdog to reset the CPU comprises:
under the condition that the system crashes and does not respond, software stops feeding the first watchdog;
and under the condition that the feeding dog is overtime, resetting the CPU by the first watchdog, and setting a first identifier, wherein the first identifier is used for identifying that the system is abnormally restarted.
2. The method of claim 1, wherein setting the first watchdog and the second watchdog in a hardware-fed mode after the CPU resets, and wherein saving the dead-time information to the flash memory comprises:
determining whether the first watchdog and the second watchdog are successfully set to store the dead information in the flash memory for the hardware feeding mode;
under the condition of unsuccessful, retrying to set the first watchdog and the second watchdog to save the dead information in the flash memory for the hardware feeding mode, and recording the retrying times;
and under the condition that the retry times exceed a preset threshold value, discarding the first watchdog and the second watchdog to be in a hardware feeding mode.
3. The method of claim 1, wherein the dongle is implemented by software and the hardware dongle is implemented by a programmable logic device.
4. The method of claim 1, wherein the crash information comprises at least one of: memory mirror information, register information for one or more devices in the overall board.
5. The method of claim 1, wherein, in the event that the system is determined to be abnormally restarted, the first watchdog resets the CPU, the method further comprising:
in the running process of the system, the software continuously updates the current stack pointer to a preset memory address;
after resetting all devices of the whole board, the method further comprises:
and checking data information by checking a stack pointer when the machine is halted.
6. A crash information storage device, comprising:
the control module is used for controlling the first watchdog in the first watchdog and the second watchdog to reset the CPU only under the condition that the system is determined to be abnormally restarted before the code is migrated to the memory; wherein the first watchdog is connected to a CPU reset signal, and the second watchdog is connected to a whole board reset signal;
the storage module is used for setting the first watchdog and the second watchdog to be in a hardware feeding mode after the CPU is reset, and storing the dead information into the flash memory;
the reset module is used for configuring the second watchdog as a software feeding dog, and resetting all devices of the whole board under the condition that the second watchdog feeds out time;
the control module includes:
the suspension unit is used for stopping the software feeding of the first watchdog under the condition that the system crashes and does not respond;
and the control unit is used for resetting the CPU by the first watchdog under the condition of overtime of feeding the dog, and setting a first identifier, wherein the first identifier is used for identifying that the system is abnormally restarted.
7. The apparatus of claim 6, wherein the means for storing comprises:
the determining unit is used for determining whether the first watchdog and the second watchdog are successfully set to save the dead information in the flash memory in a hardware feeding mode;
the retry unit is used for retrying setting the first watchdog and the second watchdog to be in a hardware feeding mode, saving the dead information to the flash memory and recording the retry times under the condition of unsuccessful;
and the discarding unit is used for discarding the first watchdog and the second watchdog to be in a hardware feeding mode under the condition that the retry number exceeds a preset threshold value.
8. The apparatus as recited in claim 6, further comprising:
the updating module is used for continuously updating the current stack pointer to a preset memory address by software in the running process of the system before the first watchdog resets the CPU under the condition that the system is determined to be abnormally restarted;
and the checking module is used for checking the data information by checking a stack pointer when the whole device is halted after resetting all the devices of the whole board.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710432510.XA CN109032822B (en) | 2017-06-09 | 2017-06-09 | Method and device for storing crash information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710432510.XA CN109032822B (en) | 2017-06-09 | 2017-06-09 | Method and device for storing crash information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109032822A CN109032822A (en) | 2018-12-18 |
CN109032822B true CN109032822B (en) | 2024-01-09 |
Family
ID=64628786
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710432510.XA Active CN109032822B (en) | 2017-06-09 | 2017-06-09 | Method and device for storing crash information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109032822B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109739675A (en) * | 2018-12-24 | 2019-05-10 | 深圳航天东方红海特卫星有限公司 | A method of program exception is captured using hardware watchdog |
CN109783267A (en) * | 2019-01-17 | 2019-05-21 | 广东小天才科技有限公司 | Method and system for solving abnormal downloading mode |
CN109828858A (en) * | 2019-01-17 | 2019-05-31 | 广东小天才科技有限公司 | Method and system for preventing system startup from being locked |
CN113010336A (en) * | 2019-12-20 | 2021-06-22 | 珠海全志科技股份有限公司 | Application processor crash field debugging method and application processor |
CN114077512A (en) * | 2020-08-21 | 2022-02-22 | 华为技术有限公司 | Exception reset processing method, exception handling device and storage medium |
CN112068980B (en) * | 2020-09-18 | 2023-06-23 | 展讯通信(上海)有限公司 | Method and device for sampling information before CPU suspension, equipment and storage medium |
CN114443330A (en) * | 2020-11-02 | 2022-05-06 | 迈普通信技术股份有限公司 | Watchdog restart fault determination method, device, electronic device and storage medium |
CN114741233A (en) * | 2020-12-23 | 2022-07-12 | 华为技术有限公司 | Quick Start Method |
CN113535448B (en) * | 2021-06-30 | 2024-04-26 | 浙江中控技术股份有限公司 | Multiple watchdog control method and control system thereof |
CN113946148B (en) * | 2021-09-29 | 2023-11-10 | 浙江零跑科技股份有限公司 | MCU chip awakening system based on multi-ECU cooperative control |
CN114138599B (en) * | 2021-11-25 | 2024-09-27 | 云尖信息技术有限公司 | Method for acquiring equipment state information after network operation equipment system and equipment crashes |
CN114911642B (en) * | 2022-04-27 | 2024-04-19 | 北京计算机技术及应用研究所 | Firmware restarting method based on UEFI event mechanism and watchdog |
CN115061752A (en) * | 2022-06-28 | 2022-09-16 | 展讯通信(上海)有限公司 | Terminal equipment restarting method and device |
CN115904793B (en) * | 2023-03-02 | 2023-05-23 | 上海励驰半导体有限公司 | Memory transfer method, system and chip based on multi-core heterogeneous system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1400529A (en) * | 2001-07-30 | 2003-03-05 | 华为技术有限公司 | A Fault Location Method for Real-time Embedded System |
CN101369237A (en) * | 2007-08-14 | 2009-02-18 | 中兴通讯股份有限公司 | A watchdog reset circuit and reset method |
CN102521098A (en) * | 2011-11-23 | 2012-06-27 | 中兴通讯股份有限公司 | Processing method and processing device for monitoring dead halt of CPU (Central Processing Unit) |
US9274894B1 (en) * | 2013-12-09 | 2016-03-01 | Twitter, Inc. | System and method for providing a watchdog timer to enable collection of crash data |
CN106326055A (en) * | 2016-08-29 | 2017-01-11 | 四川九洲空管科技有限责任公司 | Method for software and hardware crashing detection and resetting of airborne collision avoidance system |
-
2017
- 2017-06-09 CN CN201710432510.XA patent/CN109032822B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1400529A (en) * | 2001-07-30 | 2003-03-05 | 华为技术有限公司 | A Fault Location Method for Real-time Embedded System |
CN101369237A (en) * | 2007-08-14 | 2009-02-18 | 中兴通讯股份有限公司 | A watchdog reset circuit and reset method |
CN102521098A (en) * | 2011-11-23 | 2012-06-27 | 中兴通讯股份有限公司 | Processing method and processing device for monitoring dead halt of CPU (Central Processing Unit) |
US9274894B1 (en) * | 2013-12-09 | 2016-03-01 | Twitter, Inc. | System and method for providing a watchdog timer to enable collection of crash data |
CN106326055A (en) * | 2016-08-29 | 2017-01-11 | 四川九洲空管科技有限责任公司 | Method for software and hardware crashing detection and resetting of airborne collision avoidance system |
Also Published As
Publication number | Publication date |
---|---|
CN109032822A (en) | 2018-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109032822B (en) | Method and device for storing crash information | |
US9471435B2 (en) | Information processing device, information processing method, and computer program | |
JP4641546B2 (en) | Method and system for handling input/output (I/O) errors - Patents.com | |
US8468389B2 (en) | Firmware recovery system and method of baseboard management controller of computing device | |
JP7351933B2 (en) | Error recovery method and device | |
US7103738B2 (en) | Semiconductor integrated circuit having improving program recovery capabilities | |
US20080229158A1 (en) | Restoration device for bios stall failures and method and computer program product for the same | |
WO2016206514A1 (en) | Startup processing method and device | |
US7194614B2 (en) | Boot swap method for multiple processor computer systems | |
US10360115B2 (en) | Monitoring device, fault-tolerant system, and control method | |
US20210124655A1 (en) | Dynamic Configurable Microcontroller Recovery | |
US10108469B2 (en) | Microcomputer and microcomputer system | |
US7890800B2 (en) | Method, operating system and computing hardware for running a computer program | |
US20160179626A1 (en) | Computer system, adaptable hibernation control module and control method thereof | |
JP2017078998A (en) | Information processor, log management method, and computer program | |
CN104572332B (en) | The method and apparatus of processing system collapse | |
CN115904793B (en) | Memory transfer method, system and chip based on multi-core heterogeneous system | |
CN115576734B (en) | Multi-core heterogeneous log storage method and system | |
US10540222B2 (en) | Data access device and access error notification method | |
CN118132386B (en) | System crash information storage method, device and computer system | |
US10108499B2 (en) | Information processing device with watchdog timer | |
JP2785992B2 (en) | Server program management processing method | |
CN110442470B (en) | System stability monitoring and recovering method of communication equipment | |
CN118069205A (en) | Rollback method, device, equipment and storage medium of programmable logic device version | |
US20240012572A1 (en) | Operationalization of memories using memory information sets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |