[go: up one dir, main page]

CN118093287B - Memory failure detection method, device, electronic device and storage medium - Google Patents

Memory failure detection method, device, electronic device and storage medium

Info

Publication number
CN118093287B
CN118093287B CN202410323930.4A CN202410323930A CN118093287B CN 118093287 B CN118093287 B CN 118093287B CN 202410323930 A CN202410323930 A CN 202410323930A CN 118093287 B CN118093287 B CN 118093287B
Authority
CN
China
Prior art keywords
memory
str
uefi
state
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410323930.4A
Other languages
Chinese (zh)
Other versions
CN118093287A (en
Inventor
王玉龙
杜望宁
钱东彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Loongson Technology Corp Ltd
Original Assignee
Loongson Technology Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Loongson Technology Corp Ltd filed Critical Loongson Technology Corp Ltd
Priority to CN202410323930.4A priority Critical patent/CN118093287B/en
Publication of CN118093287A publication Critical patent/CN118093287A/en
Application granted granted Critical
Publication of CN118093287B publication Critical patent/CN118093287B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4418Suspend and resume; Hibernate and awake

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

本公开实施例提供了内存故障的检测方法、装置、电子设备及存储介质,涉及计算机技术领域。方法包括:通过UEFI使软硬件系统在STR状态和STR测试唤醒状态间切换,其中,在所述STR测试唤醒状态,保持操作系统休眠;在STR测试唤醒状态下,通过UEFI中的测试程序对上次STR状态期间的内存数据进行校验;在内存数据校验不通过的情况下,确定所述内存存在电路故障。本公开实施例可以降低针对内存解决STR故障的难度,提升了STR调试的效率。

The disclosed embodiments provide a memory fault detection method, apparatus, electronic device, and storage medium, relating to the field of computer technology. The method includes: using UEFI to switch the hardware and software systems between a STR state and an STR test wake-up state, wherein in the STR test wake-up state, the operating system remains dormant; in the STR test wake-up state, a test program in UEFI verifies the memory data during the last STR state; if the memory data verification fails, it is determined that a circuit fault exists in the memory. The disclosed embodiments can reduce the difficulty of resolving STR faults in memory and improve the efficiency of STR debugging.

Description

Memory fault detection method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the field of computer technologies, and in particular, to a method and device for detecting a memory failure, an electronic device, and a storage medium.
Background
STR (Suspend to RAM) is a sleep state with low wake-up delay, and is a state that allows a software and hardware system to enter a fast wake-up state when idle. In STR state, CPU (Central Processing Unit ) and peripheral are powered off except for system memory, so the power consumption is low. When the standby command is sent out, the system stores all the context information into the memory, the memory is in a self-refresh state after entering the STR state, and after the wake-up event occurs, the system reads data from the memory and quickly restores to the state before STR. In the STR state, only the power supply is stored therein, which is the only component having the power supply. Since the state of the operating system, all applications and opened documents etc. are stored in memory, the user can restore the work to the state they were holding just last time, i.e. the content of the memory when the computer comes back from the STR state is the same as when it entered the STR state. Therefore, the memory data must be ensured not to be lost in the process that the STR sleeps down to wake up, and if the memory data is lost, the wake up failure is caused.
At present, in the debugging of STR, memory check is added after sleeping and waking up, whether memory data are consistent is compared, and under the condition of inconsistency, the STR debugging is considered to have faults.
However, the existing STR debugging mode cannot accurately locate the type of faults aiming at the memory, so that the difficulty of solving the faults of the STR of the memory is increased, and the efficiency of STR debugging is reduced.
Disclosure of Invention
In view of the above problems, embodiments of the present disclosure are provided to provide a method for detecting a memory fault to overcome the above problems or at least partially solve the above problems, so as to reduce difficulty in solving an STR fault of a memory and improve efficiency of STR debugging.
In a first aspect, the present disclosure provides a method for detecting a memory failure, where the method includes:
switching a software and hardware system between an STR state and an STR test awakening state through UEFI, wherein in the STR test awakening state, the UEFI is started and an operating system is kept dormant;
In the STR test awakening state, checking the memory data during the last STR state through a test program in UEFI;
and under the condition that the memory data check is not passed, determining that the memory has circuit faults.
In a second aspect, the present disclosure provides a device for detecting a memory failure, where the device includes:
the switching module is used for switching the software and hardware system between an STR state and an STR test awakening state through the UEFI, wherein in the STR test awakening state, the UEFI is started and the operating system is kept dormant;
the checking module is used for checking the memory data during the last STR state through a testing program in the UEFI under the STR test awakening state;
And the fault determining module is used for determining that the memory has a circuit fault under the condition that the memory data check is not passed.
In a third aspect, the present disclosure provides an electronic device comprising a processor;
UEFI firmware having stored thereon a computer program which, when executed by the processor, causes the processor to implement a method of detecting a memory failure as described in any of the preceding claims.
In a fourth aspect, the present disclosure provides a computer program product comprising instructions that, when executed by a processor in an electronic device, cause the electronic device to perform any of the foregoing methods of detecting a memory failure.
In a fifth aspect, the present disclosure provides a readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the above-described method of detecting a memory failure.
The present disclosure includes the following advantages:
The STR test program and the wake-up program can be arranged in the UEFI, and in the wake-up state of the STR test, the CPU and the UEFI are started, but the operating system is not woken up but kept dormant, so that the kernel and the operating system can not be entered, peripheral equipment and the like are not involved, and only the memory is involved. The CPU executes a test program in the UEFI to detect whether the memory data is consistent before and after STR. If the problems are inconsistent, the problem can be determined to occur in the memory circuit and the memory self-refresh circuit, and then the fault detection is carried out on the memory circuit and the self-refresh circuit of the memory, so that the method is simpler, more convenient and quicker, and therefore, the difficulty of solving the STR fault for the memory can be reduced through the STR fault detection of the memory by the scheme of the present disclosure, the efficiency of solving the STR fault is improved, and the efficiency of STR debugging is further improved.
Drawings
FIG. 1 is a flow chart illustrating steps of an embodiment of a method for detecting a memory failure of the present disclosure;
FIG. 2 is a flow chart illustrating steps of another embodiment of a method for detecting a memory failure of the present disclosure;
FIG. 3 illustrates a schematic diagram of allocation of CPU address space of the present disclosure;
FIG. 4 is a block diagram illustrating an embodiment of a memory failure detection apparatus of the present disclosure;
FIG. 5 is a flow chart of a method for detecting a memory failure according to the present disclosure;
fig. 6 is a block diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
In order that the above-recited objects, features and advantages of the present disclosure will become more readily apparent, a more particular description of the disclosure will be rendered by reference to the appended drawings and appended detailed description.
The inventor finds that the existing STR debugging mode cannot accurately locate the type of faults aiming at the memory, increases the difficulty of solving the STR faults of the memory, and reduces the efficiency of STR debugging. The main reason is that after entering the STR state, the memory is in a self-refresh state. The memory refreshing circuit periodically refreshes the data in the memory to prevent the data loss under the STR from making the data not accurately restored to the state before the STR. In the existing STR debugging mode, a test program is in a memory, and an STR test process is carried out through an operating system to verify whether memory data are changed or not. And in the STR test state, the operating system is awakened. This process involves hibernation and wake-up of the operating system, kernel, and external devices, and therefore requires a large number of hard and soft operations. The data loss under STR may be caused by the failure of the memory circuit design, such as an error in the refresh circuit, a refresh failure or an overlong refresh period, or the failure or interference of the circuit of the memory cell, so that the data cannot be maintained and lost. The operating system may also introduce memory data changes before and after STR testing. Thus, if a memory data change is found before and after STR in an STR test under an operating system, it cannot be determined whether such memory data change originates from the memory circuit itself or from the operating system.
In view of the above technical problems, an embodiment of the present disclosure provides a memory detection method, which can accurately identify faults of a memory circuit and a memory self-refresh circuit. This is accomplished by masking the effects that the operating system may have in the detection of memory circuit failures. As an embodiment, the STR test program and the wake program may be set in the UEFI, and in the STR test wake state, the CPU and the UEFI are started, but the operating system is not woken up but kept dormant. The CPU executes a test program in the UEFI to detect whether the memory data is consistent before and after STR. If the two self-refresh circuits are inconsistent, the problem can be determined to occur in the memory circuit and the memory self-refresh circuit, and then the fault checking is carried out on the storage circuit and the self-refresh circuit of the memory.
FIG. 1 illustrates a flow chart of steps of an embodiment of a method for detecting an exemplary memory failure of the present disclosure. Referring to fig. 1, the method for detecting a memory failure specifically includes the following steps:
And 101, switching a software and hardware system between an STR state and an STR test awakening state through UEFI, wherein in the STR test awakening state, the UEFI is started and an operating system is kept dormant.
The software and hardware system refers to the sum of the software and hardware of the electronic device. UEFI (Unified Extensible FIRMWARE INTERFACE) refers to a unified extensible firmware interface.
The STR test wake state means that the UEFI is started and the operating system remains dormant, the STR test does not relate to the kernel and the operating system, does not relate to the peripheral equipment and the like, namely, the STR test does not have the processes of the kernel, the peripheral equipment and the operating system, only relates to the memory, and the influences of the kernel, the peripheral equipment and the operating system are shielded.
The STR state is to start the CPU and the UEFI and also needs to wake up the operating system, namely the STR state comprises dormancy, wake-up and the like of the operating system, the kernel and the external equipment. The main difference between the STR state and the STR test awake state is that the former requires the operating system to be woken up, and the latter does not require the operating system to be woken up.
Step 102, in the STR test awake state, checking the memory data during the last STR state through the test program in UEFI.
The test program in the UEFI refers to a correct test program, that is, the test program in the UEFI is run on a large amount of memory, so that the start of the UEFI can be successfully realized and the operating system keeps a dormant state.
The memory data during the last STR state refers to the next STR state of the electronic device in time before the test is performed, and during the last STR state, the CPU and UEFI need to be started, and the operating system needs to be awakened, which includes dormancy, awakening, and the like of the operating system, the kernel, and the external device.
In the STR test wake-up state, the memory data in the last STR state is checked through the test program in the UEFI, and the memory data check passes before and after the STR test wake-up state, because the STR test wake-up state does not enter the kernel and the operating system and does not involve the peripheral equipment, and the like, therefore, in the STR test wake-up state, new data is not stored in the memory to be checked, so that the memory data in the last STR state, namely the memory data of the memory to be checked, does not change, namely the memory data check passes before and after the STR test wake-up state, and the memory data of the memory to be checked is different, namely the memory data check does not pass.
For memories, faults in the STR test wake-up state are classified into test program faults and memory circuit faults. The test procedure in UEFI refers to the correct test procedure specification, and the test procedure in UEFI has no failure. Under UEFI firmware, a test program in UEFI checks memory data in the last STR state, the check shields the influence of a kernel, a peripheral and an operating system, and only simply verifies whether a memory has a circuit fault or not, in this case, the memory has no circuit fault, so that the memory data in the last STR state can be ensured to pass verification, the memory has a circuit fault, and the memory data in the last STR state cannot pass verification.
And step 103, under the condition that the memory data check is not passed, determining that the memory has circuit faults.
The memory has a circuit failure and the memory data verification during the last STR state is not passed. Therefore, under the condition that the memory data verification in the last STR state is not passed, the existence of the circuit fault in the memory is determined, and whether the circuit fault exists in the memory or not is conveniently and rapidly detected.
For example, the test program in the UEFI can successfully implement the startup of the UEFI and the operating system maintains the sleep state on a large number of memories of the computer or the motherboard, and in this case, the test program in the UEFI is considered to be correct. For a new motherboard or computer, such as a newly produced motherboard or computer, in the case that the test program in the UEFI is correct, in the STR test awake state, the memory data during the last STR state is checked by the test program in the UEFI. Under the STR test awakening state, the operating system keeps dormant, the influences of a kernel, a peripheral device, the operating system and the like are shielded, and only the memory is simply verified to have circuit faults, in this case, the memory of the newly produced main board or computer has no circuit faults, the memory data verification can be ensured to pass, the memory of the newly produced main board or computer has circuit faults, the memory data verification cannot pass, and further, the memory data verification can simply and rapidly detect whether the memory of the newly produced main board or computer has circuit faults without the support of the operating system and the like. It should be noted that, in order to further improve the stability of the STR function of the memory of the newly produced motherboard or computer, the memory failure test may be performed multiple times. Such as 1000 such memory failure tests for a newly produced motherboard or computer memory. For the memories of the main board or the computer in the same batch, a certain number of memories can be selected randomly, and the memory fault test is carried out for each memory for a plurality of times.
Optionally, the method may further include determining that the memory has no circuit failure if the memory data of the memory passes the verification in step 102. The influence of the kernel, the peripheral equipment and the operating system is shielded, and only whether the memory has a circuit fault or not is simply verified, under the condition that the memory has no circuit fault, the memory data verification of the memory can be ensured to pass, the memory has the circuit fault, and the memory data verification of the memory does not pass. Therefore, after the influence of the kernel, the peripheral equipment and the operating system is shielded, whether the memory has a circuit fault or not can be conveniently and rapidly detected.
Optionally, the failure of the memory circuit can basically indicate that the failure exists in the storage circuit of the memory and the self-refresh circuit of the memory, so after the step 103, the method may further include performing failure detection on the storage circuit of the memory and the self-refresh circuit of the memory, for example, performing failure detection on the power-on timing sequence of the memory, so as to solve the failure of the memory circuit in a targeted manner as soon as possible.
In one embodiment, the method for detecting a memory failure is further explained herein, and the embodiment may include the following procedures.
Firstly, switching the system between an STR state and an STR test awakening state through UEFI, wherein in the STR test awakening state, starting a CPU and the UEFI and keeping the operating system dormant. The UEFI stores test programs and sleep and wake programs, which are implemented by these programs in the UEFI. And in the STR test awakening state, checking the memory data during the last STR state through a test program in the UEFI, and determining whether the memory data in the STR test awakening state is consistent with the memory data before the STR state is last entered. Because the influence caused by the operating system is eliminated, under the condition that the memory data of the memory is not checked, the circuit fault of the memory under the STR can be determined, for example, the failure of part of memory units caused by the circuit design defect or the failure of a memory refreshing circuit is caused.
In an embodiment, the system sets registers of ACPI (advanced configuration and power management interface) according to sleep and wake procedures in UEFI, causes the system to enter STR state or enter STR test wake state from STR state, and sets memory entry self-refresh through UEFI before entering STR state or STR test wake state.
Fig. 2 is a flowchart illustrating steps of another embodiment of a method for detecting a memory failure according to the present disclosure. Referring to fig. 2, the method for detecting a memory failure specifically includes the following steps:
step 201, determining a target memory space to be verified in the memory.
In the related art, after the operating system is started, the UEFI firmware basically does not work, and then at least part of the memory space occupied by the UEFI firmware is released, and the operating system may store data in the released part of the memory space, so that the STR test is performed under the operating system, and the memory space to be tested needs to contain at least part of the memory space occupied by the UEFI firmware released. But the test program in the UEFI checks the memory data during the last STR state, since the operating system will not start, the UEFI firmware will not release at least part of the memory space occupied by it, and therefore, the part of the memory space that is not released will not need to be checked, nor will it be the target memory space. It should be noted that, compared to entering STR under the operating system, the memory data during the last STR state is checked by the test program in UEFI, and the target memory space to be checked is reduced, so that the checking speed is faster. Meanwhile, although the target memory space required to be checked for memory fault detection is less under the UEFI firmware, faults basically occur randomly in all the memory spaces, and the probability of faults occurring in only the unverified memory space is basically not high, so that the memory data in the last STR state is checked through a test program in the UEFI, the target memory space required to be checked is less, and the reliability basically equal to that of STR test under an operating system is still realized.
Fig. 3 shows a schematic diagram of allocation of a CPU address space of the present disclosure. For example, referring to fig. 3, the CPU address space is divided into 6 parts from 0x0 to 0x280000000, respectively corresponding to numbers 1 to 6. Wherein, the address space corresponding to the number 1 is from 0x0 to 0x00100000, the address space corresponding to the number 2 is from 0x00100000 to 0xf000000, the address space corresponding to the number 3 is from 0xf000000 to 0x10000000, the address space corresponding to the number 4 is from 0x10000000 to 0x90000000, the address space corresponding to the number 5 is from 0x90000000 to 0x100000000, and the address space corresponding to the number 6 is from 0x100000000 to 0x280000000. The address space numbered 4 is a non-memory space and is primarily used as register space. Regardless of whether the operating system enters the STR or the test program in the UEFI enters the STR test wake-up state, the UEFI firmware starts up with memory spaces numbered 1 and 3. As previously described, after the operating system enters STR, the UEFI firmware releases memory space No. 5 after the operating system is booted, so in this example, STR testing is performed under the operating system, and the memory space to be tested needs to include memory spaces No. 2, no. 5, and No. 6. In this example, the test program in UEFI enters STR test awake state, and UEFI firmware does not free memory space No. 5, and the target memory space to be checked is memory space No. 2 and No. 6. In contrast to entering STR under the operating system, the test program in UEFI enters STR test awake state, and the target memory space to be checked has no memory space numbered 5.
Step 202, switching a software and hardware system between an STR state and an STR test wake-up state through UEFI, wherein in the STR test wake-up state, UEFI is started and an operating system remains dormant.
Step 202 may refer to the relevant descriptions of step 101, and may achieve the same or similar advantages, and in order to avoid repetition, the description is omitted here.
Step 203, in the STR test awake state, checking the memory data in the target memory space during the last STR state through the test program in the UEFI.
Step 204, determining that the memory has a circuit fault if the memory data in the target memory space is not verified.
In the STR test wake-up state, checking the memory data in the target memory space during the last STR state through a test program in the UEFI, wherein the STR test wake-up state does not enter a kernel, an operating system, a peripheral and the like, so that new data is not stored in the target memory space to be checked in the STR test, the memory data in the target memory space of the memory to be checked is not changed before and after the STR check, namely the memory data in the target memory space passes the check, and the memory data in the target memory space to be checked is different before and after the STR check, namely the memory data in the target memory space does not pass the check.
Optionally, after step 201, before step 203, the method may further include writing an initial value in the target memory space. The step 204 may include determining that the memory has a circuit failure if the memory data in the target memory space is different from the initial value. Specifically, in the state of STR test wakeup, the test program in UEFI is used to check the memory data in the target memory space during the last STR state, and no new data is saved in the target memory space to be checked, so before and after the check, the memory data in the target memory space of the memory to be checked is not changed, that is, the memory data in the target memory space is checked to pass, and before and after the check, the memory data in the target memory space to be checked is different, that is, the memory data in the target memory space is checked to fail. However, before the verification, the original memory data in the target memory space may be unclear, so that an initial value may be directly assigned to the target memory space, and further, the original memory data in the target memory space need not be acquired, so that the memory fault detection may be rapidly realized. Compared with the original memory data in the target memory space, the method has the advantages that the time required for directly giving the initial value is obviously shorter, and for the same computer, for multiple times of memory fault detection, only one initial value is needed, and the subsequent multiple times of memory fault detection do not need to be given with the initial value again, so that the time is saved.
It should be noted that, the initial value may be some data that is easy to be compared, for example, all 0s or all 1 s. For example, in the target memory space, the initial value of all 0 is written, so that the test efficiency of the STR can be further improved.
Optionally, the step 202 may include starting the UEFI firmware, setting the memory to enter self-refresh through the UEFI, and setting an ACPI (Advanced Configuration and Power Interface ) register on the bridge, so that the software and hardware system is switched between the STR state and the STR test wake-up state, where the above manner may be successfully switched between the STR state and the STR test wake-up state of the software and hardware system.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the disclosed embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the disclosed embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the disclosed embodiments.
Referring to fig. 4, fig. 4 shows a block diagram of an embodiment of a memory failure detection apparatus of the present disclosure, which may specifically include the following modules:
a switching module 301, configured to switch, by using UEFI, a software and hardware system between an STR state and an STR test wake-up state, where the UEFI is started and an operating system remains dormant;
a checking module 302, configured to check, in an STR test awake state, memory data during a last STR state through a test program in UEFI;
The fault determining module 303 is configured to determine that a circuit fault exists in the memory if the memory data check fails.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
The application is further illustrated below in connection with specific examples. Fig. 5 is a flow chart illustrating a method for detecting a memory failure according to the present disclosure.
First, a target memory space to be verified in a memory is determined. As shown in fig. 3, in the determined STR test awake state, the target memory space to be verified in the memory is the memory space with the number of 2 and the number of 6.
First, UEFI firmware is started normally.
And secondly, writing an initial value in the target memory space. If the target memory space to be verified in the memory, i.e. the memory space with the number 2 and the memory space with the number 6, is written with the initial value 0x5a5a5a5a.
And thirdly, setting the memory to enter self-refresh through UEFI.
Fourth, setting bridge ACPI register through UEFI to switch the software and hardware system between STR state and STR test wake-up state.
And fifthly, after waiting for a few minutes, starting up to wake up.
And sixthly, normally starting the UEFI firmware, and keeping the operating system dormant.
And seventhly, checking the memory data in the target memory space during the last STR state through a test program in the UEFI, and determining whether the memory data in the target memory space is identical to the initial value. And determining that the memory has no circuit fault under the condition that the memory data in the target memory space is the same as the initial value, and determining that the memory has the circuit fault under the condition that the memory data in the target memory space is different from the initial value.
More specifically, after the UEFI firmware is started, checking the memory data in the target memory space with the number of 2 and the number of 6, judging whether the memory data in the target memory space with the number of 2 and the number of 6 is equal to the previously written 0x5a5a5a5a, if so, determining that the memory has no circuit fault, and if not, determining that the memory has the circuit fault.
Under the condition that the memory has a circuit fault, an error message can be reported, namely the memory has the circuit fault. And stopping or ending the memory fault detection, performing circuit fault detection on the memory, and detecting after the fault detection is finished. Because the occurrence of the circuit faults in the memories has randomness, the circuit faults which occur randomly in the memories can be completely detected after multiple times of fault detection, and therefore, the fourth step can be carried out again under the condition that the memory does not have the circuit faults, and whether the memory has the circuit faults or not can be detected again. By detecting faults of the memory for multiple times, the stability of the STR of the memory can be further improved.
Fig. 6 is a block diagram of an electronic device provided in an embodiment of the present disclosure. Referring to fig. 6, the disclosure further provides an electronic device, referring to fig. 6, including a processor 501, a memory 502, and a computer program 5021 stored in the memory and capable of running on the processor, where the steps of the embodiments of the method for detecting a memory failure are implemented when the processor executes the program.
In one embodiment, the memory is the ue fi firmware, the STR test program and the wake-up program are set in the ue fi, and the processor executes the processes in the method embodiments according to the program in the ue fi.
The present disclosure also provides a readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the steps of the embodiments of the method for detecting a memory failure described above.
The present disclosure also provides a computer program product comprising instructions that, when executed by a processor in an electronic device, enable the electronic device to perform the steps of the above-described embodiments of a method of detecting a memory failure.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the disclosed embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present disclosure may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present disclosure are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the disclosed embodiments have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the disclosed embodiments.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The foregoing describes in detail a method and apparatus for detecting a memory failure, an electronic device and a storage medium, and specific examples are provided herein to illustrate the principles and embodiments of the present disclosure, and the above examples are provided to assist in understanding the method and core ideas of the present disclosure, and meanwhile, to those skilled in the art, according to the ideas of the present disclosure, there are variations in the specific embodiments and application scope, so that the disclosure should not be construed as limiting the disclosure.

Claims (10)

1. The method for detecting the memory fault is characterized by comprising the following steps:
switching a software and hardware system between an STR state and an STR test awakening state through UEFI, wherein in the STR test awakening state, the UEFI is started and an operating system is kept dormant;
In the STR test awakening state, checking the memory data during the last STR state through a test program in UEFI;
under the condition that the memory data check is not passed, determining that the memory has circuit faults;
Before checking the memory data during the last STR state by the test program in UEFI, the method further includes determining a target memory space to be checked in the memory, the target memory space excluding a memory space not released because the operating system does not start UEFI;
wherein, the memory is determined to have a circuit fault by:
Firstly, normally starting UEFI firmware;
writing an initial value in the target memory space;
Thirdly, setting a memory to enter self-refresh through UEFI;
Setting a bridge ACPI register through UEFI, so that the software and hardware system is switched between an STR state and an STR test awakening state;
Fifthly, waiting for a preset time length and starting up to wake up;
Sixthly, normally starting the UEFI firmware, and keeping the operating system dormant;
Seventh, checking the memory data in the target memory space during the last STR state through a test program in UEFI, determining whether the memory data in the target memory space is identical to the initial value, determining that the memory has no circuit fault if the memory data in the target memory space is identical to the initial value, and determining that the memory has circuit fault if the memory data in the target memory space is different from the initial value.
2. The method according to claim 1, wherein the method further comprises:
In the STR test awake state, the checking the memory data during the last STR state by the test program in the UEFI includes:
in the STR test awakening state, checking the memory data in the target memory space during the last STR state through a test program in UEFI;
And under the condition that the memory data check is not passed, determining that the memory has a circuit fault comprises the following steps:
And under the condition that the memory data in the target memory space is not checked, determining that the memory has circuit faults.
3. The method according to claim 2, wherein the method further comprises:
Writing an initial value in the target memory space;
and under the condition that the memory data in the target memory space is not checked, determining that the memory has a circuit fault comprises the following steps:
And determining that the memory has a circuit fault under the condition that the memory data in the target memory space is different from the initial value.
4. A method according to any one of claims 1 to 3, wherein the method further comprises:
Before entering STR state or STR test wake-up state, setting memory to enter self-refresh through UEFI;
switching the software and hardware system between an STR state and an STR test wake-up state through UEFI, comprising:
And setting an ACPI register through UEFI to enable the software and hardware system to be switched between an STR state and an STR test awakening state.
5. A method according to any one of claims 1 to 3, wherein the method further comprises:
and under the condition that the memory data passes the verification, determining that the memory has no circuit fault.
6. A method according to any one of claims 1 to 3, wherein the method further comprises:
and under the condition that the circuit faults exist in the memory, performing fault checking on the storage circuit and the self-refreshing circuit of the memory.
7. A memory failure detection apparatus, the apparatus comprising:
the switching module is used for switching the software and hardware system between an STR state and an STR test awakening state through the UEFI, wherein in the STR test awakening state, the UEFI is started and the operating system is kept dormant;
the checking module is used for checking the memory data during the last STR state through a testing program in the UEFI under the STR test awakening state;
the fault determining module is used for determining that the memory has a circuit fault under the condition that the memory data check is not passed;
The device is further configured to determine a target memory space to be verified in the memory, where the target memory space does not include a memory space that is not released due to the operating system not starting UEFI;
wherein, the memory is determined to have a circuit fault by:
Firstly, normally starting UEFI firmware;
writing an initial value in the target memory space;
Thirdly, setting a memory to enter self-refresh through UEFI;
Setting a bridge ACPI register through UEFI, so that the software and hardware system is switched between an STR state and an STR test awakening state;
Fifthly, waiting for a preset time length and starting up to wake up;
Sixthly, normally starting the UEFI firmware, and keeping the operating system dormant;
Seventh, checking the memory data in the target memory space during the last STR state through a test program in UEFI, determining whether the memory data in the target memory space is identical to the initial value, determining that the memory has no circuit fault if the memory data in the target memory space is identical to the initial value, and determining that the memory has circuit fault if the memory data in the target memory space is different from the initial value.
8. An electronic device, comprising:
Processor, and
UEFI firmware having stored thereon a computer program which, when executed by the processor, causes the processor to implement the method of detecting a memory failure as claimed in any one of claims 1 to 6.
9. A computer program product comprising instructions that, when executed by a processor in an electronic device, cause the electronic device to perform the method of detecting a memory failure of any of claims 1-6.
10. A readable storage medium, characterized in that instructions in said storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of detecting a memory failure according to any one of claims 1-6.
CN202410323930.4A 2024-03-20 2024-03-20 Memory failure detection method, device, electronic device and storage medium Active CN118093287B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410323930.4A CN118093287B (en) 2024-03-20 2024-03-20 Memory failure detection method, device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410323930.4A CN118093287B (en) 2024-03-20 2024-03-20 Memory failure detection method, device, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN118093287A CN118093287A (en) 2024-05-28
CN118093287B true CN118093287B (en) 2025-10-10

Family

ID=91142191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410323930.4A Active CN118093287B (en) 2024-03-20 2024-03-20 Memory failure detection method, device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN118093287B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120821605B (en) * 2025-09-17 2025-12-02 苏州元脑智能科技有限公司 Fault detection systems, methods, electronic devices, storage media, and program products

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116662092A (en) * 2023-06-21 2023-08-29 飞腾信息技术有限公司 Memory data testing method and device, storage medium and electronic equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6389556B1 (en) * 1999-01-21 2002-05-14 Advanced Micro Devices, Inc. Mechanism to prevent data loss in case of a power failure while a PC is in suspend to RAM state
US9830457B2 (en) * 2015-05-05 2017-11-28 Dell Products, L.P. Unified extensible firmware interface (UEFI) credential-based access of hardware resources
CN113703799B (en) * 2020-05-21 2024-06-04 华为技术有限公司 Computing device and BIOS updating method and medium thereof
CN113934561A (en) * 2020-06-29 2022-01-14 龙芯中科技术股份有限公司 Fault positioning method, device, system, hardware platform and storage medium
CN115658407A (en) * 2022-11-02 2023-01-31 深圳市金泰克半导体有限公司 Memory bank testing method, system, device, equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116662092A (en) * 2023-06-21 2023-08-29 飞腾信息技术有限公司 Memory data testing method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN118093287A (en) 2024-05-28

Similar Documents

Publication Publication Date Title
US7900090B2 (en) Systems and methods for memory retention across resets
CN109885343B (en) Controller low-power-consumption starting method and device, computer equipment and storage medium
KR102288558B1 (en) Memory built-in self-test for a data processing apparatus
TWI450078B (en) Debug register for terminating the processor core after reset or shutdown
US8862953B2 (en) Memory testing with selective use of an error correction code decoder
WO2016062084A1 (en) Power-off processing method and apparatus, and electronic device
US20040181656A1 (en) System and method for testing memory during boot operation idle periods
CN118093287B (en) Memory failure detection method, device, electronic device and storage medium
CN101236515A (en) Recovery method for single-core exception in multi-core system
TWI665606B (en) A system and a method for testing a data storage device
TWI534707B (en) Computer system, shutdown and boot method thereof
US20250225023A1 (en) Server maintainability configuration method and apparatus, electronic device and storage medium
CN111338698A (en) A kind of method and system for BIOS to accurately guide server to start
CN116662092A (en) Memory data testing method and device, storage medium and electronic equipment
JP2006065440A (en) Process management system
US10725689B2 (en) Physical memory region backup of a volatile memory to a non-volatile memory
CN112463508A (en) Server dormancy state testing method, system, terminal and storage medium
CN113900843A (en) Detection and repair method, device, equipment and readable storage medium
WO2023206926A1 (en) User configuration data recovery method and device, and medium
CN115437859A (en) Dormancy awakening processing method and device, electronic equipment and storage medium
CN120743630B (en) Processor fault recovery method
JPH07211066A (en) Storage system with backup function
CN118093248A (en) Fault determination method and device for STR debugging, electronic equipment and storage medium
CN119002821B (en) Data storage method, device, equipment and medium
CN101996129B (en) Method for detecting computer system crash

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant