Detailed Description
In order that the above-recited objects, features and advantages of the present disclosure will become more readily apparent, a more particular description of the disclosure will be rendered by reference to the appended drawings and appended detailed description.
The inventor finds that the existing STR debugging mode cannot accurately locate the type of faults aiming at the memory, increases the difficulty of solving the STR faults of the memory, and reduces the efficiency of STR debugging. The main reason is that after entering the STR state, the memory is in a self-refresh state. The memory refreshing circuit periodically refreshes the data in the memory to prevent the data loss under the STR from making the data not accurately restored to the state before the STR. In the existing STR debugging mode, a test program is in a memory, and an STR test process is carried out through an operating system to verify whether memory data are changed or not. And in the STR test state, the operating system is awakened. This process involves hibernation and wake-up of the operating system, kernel, and external devices, and therefore requires a large number of hard and soft operations. The data loss under STR may be caused by the failure of the memory circuit design, such as an error in the refresh circuit, a refresh failure or an overlong refresh period, or the failure or interference of the circuit of the memory cell, so that the data cannot be maintained and lost. The operating system may also introduce memory data changes before and after STR testing. Thus, if a memory data change is found before and after STR in an STR test under an operating system, it cannot be determined whether such memory data change originates from the memory circuit itself or from the operating system.
In view of the above technical problems, an embodiment of the present disclosure provides a memory detection method, which can accurately identify faults of a memory circuit and a memory self-refresh circuit. This is accomplished by masking the effects that the operating system may have in the detection of memory circuit failures. As an embodiment, the STR test program and the wake program may be set in the UEFI, and in the STR test wake state, the CPU and the UEFI are started, but the operating system is not woken up but kept dormant. The CPU executes a test program in the UEFI to detect whether the memory data is consistent before and after STR. If the two self-refresh circuits are inconsistent, the problem can be determined to occur in the memory circuit and the memory self-refresh circuit, and then the fault checking is carried out on the storage circuit and the self-refresh circuit of the memory.
FIG. 1 illustrates a flow chart of steps of an embodiment of a method for detecting an exemplary memory failure of the present disclosure. Referring to fig. 1, the method for detecting a memory failure specifically includes the following steps:
And 101, switching a software and hardware system between an STR state and an STR test awakening state through UEFI, wherein in the STR test awakening state, the UEFI is started and an operating system is kept dormant.
The software and hardware system refers to the sum of the software and hardware of the electronic device. UEFI (Unified Extensible FIRMWARE INTERFACE) refers to a unified extensible firmware interface.
The STR test wake state means that the UEFI is started and the operating system remains dormant, the STR test does not relate to the kernel and the operating system, does not relate to the peripheral equipment and the like, namely, the STR test does not have the processes of the kernel, the peripheral equipment and the operating system, only relates to the memory, and the influences of the kernel, the peripheral equipment and the operating system are shielded.
The STR state is to start the CPU and the UEFI and also needs to wake up the operating system, namely the STR state comprises dormancy, wake-up and the like of the operating system, the kernel and the external equipment. The main difference between the STR state and the STR test awake state is that the former requires the operating system to be woken up, and the latter does not require the operating system to be woken up.
Step 102, in the STR test awake state, checking the memory data during the last STR state through the test program in UEFI.
The test program in the UEFI refers to a correct test program, that is, the test program in the UEFI is run on a large amount of memory, so that the start of the UEFI can be successfully realized and the operating system keeps a dormant state.
The memory data during the last STR state refers to the next STR state of the electronic device in time before the test is performed, and during the last STR state, the CPU and UEFI need to be started, and the operating system needs to be awakened, which includes dormancy, awakening, and the like of the operating system, the kernel, and the external device.
In the STR test wake-up state, the memory data in the last STR state is checked through the test program in the UEFI, and the memory data check passes before and after the STR test wake-up state, because the STR test wake-up state does not enter the kernel and the operating system and does not involve the peripheral equipment, and the like, therefore, in the STR test wake-up state, new data is not stored in the memory to be checked, so that the memory data in the last STR state, namely the memory data of the memory to be checked, does not change, namely the memory data check passes before and after the STR test wake-up state, and the memory data of the memory to be checked is different, namely the memory data check does not pass.
For memories, faults in the STR test wake-up state are classified into test program faults and memory circuit faults. The test procedure in UEFI refers to the correct test procedure specification, and the test procedure in UEFI has no failure. Under UEFI firmware, a test program in UEFI checks memory data in the last STR state, the check shields the influence of a kernel, a peripheral and an operating system, and only simply verifies whether a memory has a circuit fault or not, in this case, the memory has no circuit fault, so that the memory data in the last STR state can be ensured to pass verification, the memory has a circuit fault, and the memory data in the last STR state cannot pass verification.
And step 103, under the condition that the memory data check is not passed, determining that the memory has circuit faults.
The memory has a circuit failure and the memory data verification during the last STR state is not passed. Therefore, under the condition that the memory data verification in the last STR state is not passed, the existence of the circuit fault in the memory is determined, and whether the circuit fault exists in the memory or not is conveniently and rapidly detected.
For example, the test program in the UEFI can successfully implement the startup of the UEFI and the operating system maintains the sleep state on a large number of memories of the computer or the motherboard, and in this case, the test program in the UEFI is considered to be correct. For a new motherboard or computer, such as a newly produced motherboard or computer, in the case that the test program in the UEFI is correct, in the STR test awake state, the memory data during the last STR state is checked by the test program in the UEFI. Under the STR test awakening state, the operating system keeps dormant, the influences of a kernel, a peripheral device, the operating system and the like are shielded, and only the memory is simply verified to have circuit faults, in this case, the memory of the newly produced main board or computer has no circuit faults, the memory data verification can be ensured to pass, the memory of the newly produced main board or computer has circuit faults, the memory data verification cannot pass, and further, the memory data verification can simply and rapidly detect whether the memory of the newly produced main board or computer has circuit faults without the support of the operating system and the like. It should be noted that, in order to further improve the stability of the STR function of the memory of the newly produced motherboard or computer, the memory failure test may be performed multiple times. Such as 1000 such memory failure tests for a newly produced motherboard or computer memory. For the memories of the main board or the computer in the same batch, a certain number of memories can be selected randomly, and the memory fault test is carried out for each memory for a plurality of times.
Optionally, the method may further include determining that the memory has no circuit failure if the memory data of the memory passes the verification in step 102. The influence of the kernel, the peripheral equipment and the operating system is shielded, and only whether the memory has a circuit fault or not is simply verified, under the condition that the memory has no circuit fault, the memory data verification of the memory can be ensured to pass, the memory has the circuit fault, and the memory data verification of the memory does not pass. Therefore, after the influence of the kernel, the peripheral equipment and the operating system is shielded, whether the memory has a circuit fault or not can be conveniently and rapidly detected.
Optionally, the failure of the memory circuit can basically indicate that the failure exists in the storage circuit of the memory and the self-refresh circuit of the memory, so after the step 103, the method may further include performing failure detection on the storage circuit of the memory and the self-refresh circuit of the memory, for example, performing failure detection on the power-on timing sequence of the memory, so as to solve the failure of the memory circuit in a targeted manner as soon as possible.
In one embodiment, the method for detecting a memory failure is further explained herein, and the embodiment may include the following procedures.
Firstly, switching the system between an STR state and an STR test awakening state through UEFI, wherein in the STR test awakening state, starting a CPU and the UEFI and keeping the operating system dormant. The UEFI stores test programs and sleep and wake programs, which are implemented by these programs in the UEFI. And in the STR test awakening state, checking the memory data during the last STR state through a test program in the UEFI, and determining whether the memory data in the STR test awakening state is consistent with the memory data before the STR state is last entered. Because the influence caused by the operating system is eliminated, under the condition that the memory data of the memory is not checked, the circuit fault of the memory under the STR can be determined, for example, the failure of part of memory units caused by the circuit design defect or the failure of a memory refreshing circuit is caused.
In an embodiment, the system sets registers of ACPI (advanced configuration and power management interface) according to sleep and wake procedures in UEFI, causes the system to enter STR state or enter STR test wake state from STR state, and sets memory entry self-refresh through UEFI before entering STR state or STR test wake state.
Fig. 2 is a flowchart illustrating steps of another embodiment of a method for detecting a memory failure according to the present disclosure. Referring to fig. 2, the method for detecting a memory failure specifically includes the following steps:
step 201, determining a target memory space to be verified in the memory.
In the related art, after the operating system is started, the UEFI firmware basically does not work, and then at least part of the memory space occupied by the UEFI firmware is released, and the operating system may store data in the released part of the memory space, so that the STR test is performed under the operating system, and the memory space to be tested needs to contain at least part of the memory space occupied by the UEFI firmware released. But the test program in the UEFI checks the memory data during the last STR state, since the operating system will not start, the UEFI firmware will not release at least part of the memory space occupied by it, and therefore, the part of the memory space that is not released will not need to be checked, nor will it be the target memory space. It should be noted that, compared to entering STR under the operating system, the memory data during the last STR state is checked by the test program in UEFI, and the target memory space to be checked is reduced, so that the checking speed is faster. Meanwhile, although the target memory space required to be checked for memory fault detection is less under the UEFI firmware, faults basically occur randomly in all the memory spaces, and the probability of faults occurring in only the unverified memory space is basically not high, so that the memory data in the last STR state is checked through a test program in the UEFI, the target memory space required to be checked is less, and the reliability basically equal to that of STR test under an operating system is still realized.
Fig. 3 shows a schematic diagram of allocation of a CPU address space of the present disclosure. For example, referring to fig. 3, the CPU address space is divided into 6 parts from 0x0 to 0x280000000, respectively corresponding to numbers 1 to 6. Wherein, the address space corresponding to the number 1 is from 0x0 to 0x00100000, the address space corresponding to the number 2 is from 0x00100000 to 0xf000000, the address space corresponding to the number 3 is from 0xf000000 to 0x10000000, the address space corresponding to the number 4 is from 0x10000000 to 0x90000000, the address space corresponding to the number 5 is from 0x90000000 to 0x100000000, and the address space corresponding to the number 6 is from 0x100000000 to 0x280000000. The address space numbered 4 is a non-memory space and is primarily used as register space. Regardless of whether the operating system enters the STR or the test program in the UEFI enters the STR test wake-up state, the UEFI firmware starts up with memory spaces numbered 1 and 3. As previously described, after the operating system enters STR, the UEFI firmware releases memory space No. 5 after the operating system is booted, so in this example, STR testing is performed under the operating system, and the memory space to be tested needs to include memory spaces No. 2, no. 5, and No. 6. In this example, the test program in UEFI enters STR test awake state, and UEFI firmware does not free memory space No. 5, and the target memory space to be checked is memory space No. 2 and No. 6. In contrast to entering STR under the operating system, the test program in UEFI enters STR test awake state, and the target memory space to be checked has no memory space numbered 5.
Step 202, switching a software and hardware system between an STR state and an STR test wake-up state through UEFI, wherein in the STR test wake-up state, UEFI is started and an operating system remains dormant.
Step 202 may refer to the relevant descriptions of step 101, and may achieve the same or similar advantages, and in order to avoid repetition, the description is omitted here.
Step 203, in the STR test awake state, checking the memory data in the target memory space during the last STR state through the test program in the UEFI.
Step 204, determining that the memory has a circuit fault if the memory data in the target memory space is not verified.
In the STR test wake-up state, checking the memory data in the target memory space during the last STR state through a test program in the UEFI, wherein the STR test wake-up state does not enter a kernel, an operating system, a peripheral and the like, so that new data is not stored in the target memory space to be checked in the STR test, the memory data in the target memory space of the memory to be checked is not changed before and after the STR check, namely the memory data in the target memory space passes the check, and the memory data in the target memory space to be checked is different before and after the STR check, namely the memory data in the target memory space does not pass the check.
Optionally, after step 201, before step 203, the method may further include writing an initial value in the target memory space. The step 204 may include determining that the memory has a circuit failure if the memory data in the target memory space is different from the initial value. Specifically, in the state of STR test wakeup, the test program in UEFI is used to check the memory data in the target memory space during the last STR state, and no new data is saved in the target memory space to be checked, so before and after the check, the memory data in the target memory space of the memory to be checked is not changed, that is, the memory data in the target memory space is checked to pass, and before and after the check, the memory data in the target memory space to be checked is different, that is, the memory data in the target memory space is checked to fail. However, before the verification, the original memory data in the target memory space may be unclear, so that an initial value may be directly assigned to the target memory space, and further, the original memory data in the target memory space need not be acquired, so that the memory fault detection may be rapidly realized. Compared with the original memory data in the target memory space, the method has the advantages that the time required for directly giving the initial value is obviously shorter, and for the same computer, for multiple times of memory fault detection, only one initial value is needed, and the subsequent multiple times of memory fault detection do not need to be given with the initial value again, so that the time is saved.
It should be noted that, the initial value may be some data that is easy to be compared, for example, all 0s or all 1 s. For example, in the target memory space, the initial value of all 0 is written, so that the test efficiency of the STR can be further improved.
Optionally, the step 202 may include starting the UEFI firmware, setting the memory to enter self-refresh through the UEFI, and setting an ACPI (Advanced Configuration and Power Interface ) register on the bridge, so that the software and hardware system is switched between the STR state and the STR test wake-up state, where the above manner may be successfully switched between the STR state and the STR test wake-up state of the software and hardware system.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the disclosed embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the disclosed embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the disclosed embodiments.
Referring to fig. 4, fig. 4 shows a block diagram of an embodiment of a memory failure detection apparatus of the present disclosure, which may specifically include the following modules:
a switching module 301, configured to switch, by using UEFI, a software and hardware system between an STR state and an STR test wake-up state, where the UEFI is started and an operating system remains dormant;
a checking module 302, configured to check, in an STR test awake state, memory data during a last STR state through a test program in UEFI;
The fault determining module 303 is configured to determine that a circuit fault exists in the memory if the memory data check fails.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
The application is further illustrated below in connection with specific examples. Fig. 5 is a flow chart illustrating a method for detecting a memory failure according to the present disclosure.
First, a target memory space to be verified in a memory is determined. As shown in fig. 3, in the determined STR test awake state, the target memory space to be verified in the memory is the memory space with the number of 2 and the number of 6.
First, UEFI firmware is started normally.
And secondly, writing an initial value in the target memory space. If the target memory space to be verified in the memory, i.e. the memory space with the number 2 and the memory space with the number 6, is written with the initial value 0x5a5a5a5a.
And thirdly, setting the memory to enter self-refresh through UEFI.
Fourth, setting bridge ACPI register through UEFI to switch the software and hardware system between STR state and STR test wake-up state.
And fifthly, after waiting for a few minutes, starting up to wake up.
And sixthly, normally starting the UEFI firmware, and keeping the operating system dormant.
And seventhly, checking the memory data in the target memory space during the last STR state through a test program in the UEFI, and determining whether the memory data in the target memory space is identical to the initial value. And determining that the memory has no circuit fault under the condition that the memory data in the target memory space is the same as the initial value, and determining that the memory has the circuit fault under the condition that the memory data in the target memory space is different from the initial value.
More specifically, after the UEFI firmware is started, checking the memory data in the target memory space with the number of 2 and the number of 6, judging whether the memory data in the target memory space with the number of 2 and the number of 6 is equal to the previously written 0x5a5a5a5a, if so, determining that the memory has no circuit fault, and if not, determining that the memory has the circuit fault.
Under the condition that the memory has a circuit fault, an error message can be reported, namely the memory has the circuit fault. And stopping or ending the memory fault detection, performing circuit fault detection on the memory, and detecting after the fault detection is finished. Because the occurrence of the circuit faults in the memories has randomness, the circuit faults which occur randomly in the memories can be completely detected after multiple times of fault detection, and therefore, the fourth step can be carried out again under the condition that the memory does not have the circuit faults, and whether the memory has the circuit faults or not can be detected again. By detecting faults of the memory for multiple times, the stability of the STR of the memory can be further improved.
Fig. 6 is a block diagram of an electronic device provided in an embodiment of the present disclosure. Referring to fig. 6, the disclosure further provides an electronic device, referring to fig. 6, including a processor 501, a memory 502, and a computer program 5021 stored in the memory and capable of running on the processor, where the steps of the embodiments of the method for detecting a memory failure are implemented when the processor executes the program.
In one embodiment, the memory is the ue fi firmware, the STR test program and the wake-up program are set in the ue fi, and the processor executes the processes in the method embodiments according to the program in the ue fi.
The present disclosure also provides a readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the steps of the embodiments of the method for detecting a memory failure described above.
The present disclosure also provides a computer program product comprising instructions that, when executed by a processor in an electronic device, enable the electronic device to perform the steps of the above-described embodiments of a method of detecting a memory failure.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the disclosed embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present disclosure may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present disclosure are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the disclosed embodiments have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the disclosed embodiments.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The foregoing describes in detail a method and apparatus for detecting a memory failure, an electronic device and a storage medium, and specific examples are provided herein to illustrate the principles and embodiments of the present disclosure, and the above examples are provided to assist in understanding the method and core ideas of the present disclosure, and meanwhile, to those skilled in the art, according to the ideas of the present disclosure, there are variations in the specific embodiments and application scope, so that the disclosure should not be construed as limiting the disclosure.