PSU power failure reason detection method and device
Technical Field
The invention relates to the technical field of servers, in particular to a PSU power failure reason detection method and device.
Background
With the continuous development of electronic information technology, the requirements of customers and operation and maintenance personnel on the maintenance and problem location of equipment are more rigorous and strict, and higher requirements are provided for the accuracy and convenience of fault detection, so that upgrading is necessary in the aspects of design and strategy, and products satisfying the market and customers are designed.
The existing method for detecting the Power failure reason of a Power Supply Unit (PSU) generally reads a black box log inside the PSU manually through a command line after the PSU is powered on again, so as to judge the Power failure reason. Moreover, in the using process of a customer, when a power failure problem occurs, the customer generally does not want to read the black box log, but directly replaces the spare PSU, so that the reason causing the power failure cannot be known, and the same problem may still occur in subsequent use. Moreover, customers may not have time to replace the PSU, which may also result in equipment damage and increased costs.
Disclosure of Invention
In order to solve the technical problems, the invention provides a PSU power failure reason detection method and device, which can report PSU power failure reasons in time, ensure the timeliness of power failure problem positioning and reduce operation and maintenance cost.
In order to achieve the purpose, the invention adopts the following technical scheme:
a PSU power failure reason detection method comprises the following steps:
powering up and initializing the BMC;
the BMC acquires and records the last PSU power failure reason;
the system power down interrupt mask state register enables interrupt;
the BMC polls PSU working state signals, and when all the PSU working state signals are low, the CPLD sends system power failure interrupt to the BMC;
the BMC reads the internal record of a black box log register of the PSU and records the power failure reason of the PSU;
the system is powered off.
Further, the BMC obtains and records the reason for the last power failure of the PSU, including:
the BMC judges whether all PSUs have the same power failure reason, and if the PSUs have the same power failure reason, the system records the power failure reason as a system power failure reason;
if the PSU power failure reasons are different, the system records all non-invalid power failure reasons and simultaneously serves as the system power failure reasons.
Furthermore, a restart reason query function is provided through a Web page and used for querying the system power failure reason.
Further, the BMC polls the PSU operating status signals, and when all the PSU operating status signals are low, the CPLD sends a system power down interrupt to the BMC, including:
the BMC polls the PWR _ OK state of the PSU and records the PWR _ OK state as a power failure detection object flag bit;
the CPLD converges PWR _ OK states of all PSUs, and when working state signals of all PSUs are low, the CPLD sends system power-down interruption to the BMC, and the system power-down interruption is defined as highest-priority interruption inside the BMC.
Further, the flag bit of the power failure detection object records whether the corresponding PSU causes the system power failure.
Further, the BMC reads an internal record of a black box log register of the PSU, including:
and according to the flag bit of the power failure detection object, selecting and reading the internal record of the black box log register of the PSU causing the system power failure.
Further, after the CPLD sends the system power-down interrupt to the BMC, the system power-down interrupt is shielded.
The invention also provides a PSU power failure reason detection device, which comprises a BMC, a CPLD and a PSU,
the CPLD sends a system power-down interrupt signal to the BMC through the CPLD _ BMC _ INIT, and the BMC writes a system power-down interrupt mask state register enable interrupt to the CPLD through the I2C;
the BMC reads the internal record of the black box log register of the PSU through the PMBus;
and the PSU sends a working state signal to the BMC and the CPLD through the PSU _ PWR _ OK.
Further, the PSU also sends an in-place signal to the BMC and the CPLD through PSU _ PRESENT _ N.
The invention has the beneficial effects that:
the invention provides a PSU power failure reason detection method and a device, wherein the detection of the system power failure reason caused by the PSU power failure is completed by adopting the cooperation of a BMC (baseboard management controller) and a CPLD (complex programmable logic device), so that the accuracy and timeliness of the PSU power failure reason reporting and problem positioning can be effectively improved, a client and operation and maintenance personnel can be ensured to timely see the PSU power failure reason, correct judgment is timely made, and the problem that the operation and maintenance personnel or the client system is damaged due to the power failure and the PSU is replaced to cause board burnout and bring unnecessary loss is prevented; and the personnel cost of operation and maintenance personnel can be reduced.
Drawings
FIG. 1 is a flowchart of a PSU power failure cause detection method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a PSU power failure cause detection method according to a second embodiment of the present invention;
fig. 3 is a schematic diagram of a connection structure of the PSU power failure cause detection apparatus of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
As shown in fig. 1, an embodiment of the present invention provides a method for detecting a power failure cause of a PSU, including:
powering up and initializing the BMC;
the BMC acquires and records the last PSU power failure reason;
the system power down interrupt mask state register enables interrupt;
the BMC polls PSU working state signals, and when all the PSU working state signals are low, the CPLD sends system power failure interrupt to the BMC;
the BMC reads the internal record of a black box log register of the PSU and records the power failure reason of the PSU;
the system is powered off.
Based on the method for detecting the power failure cause of the PSU provided in the first embodiment, the second embodiment takes a system with two PSUs as an example, and each step is specifically described, and a flow of the method is shown in fig. 2 and explained as follows:
and the BMC is electrified and initialized, and the system power-down interruption is in a default shielding state in the CPLD at the moment.
And the BMC acquires the reason of the last PSU power failure from the system and records the reason to the system.
Specifically, the BMC determines whether the two PSUs have the same reason for power failure.
For the condition that the two PSUs have the same power failure reason, recording the PSU power failure reason as a system power failure reason to a system, and providing a restart reason query function by a Web page;
for the condition that the two PSUs have different power failure reasons, recording the PSU power failure reason of which the read result is not 15(15 is a decimal number and corresponds to hexadecimal F and represents invalid abnormal information) as a system power failure reason to the system; and if the power failure reasons of the two PSUs are not 15, recording the power failure reasons of the two PSUs as system power failure reasons to the system at the same time, and providing a restart reason query function by the Web page.
The BMC writes system power failure interrupt mask state register enabling interrupt to the CPLD through I2C, polls PWR _ OK working state signals of two PSUs at a polling interval of 1s, records the obtained PWR _ OK state as a power failure detection object flag bit and records whether the corresponding PSU causes system power failure or not.
When the system is powered off, the flag bit of the power-down detection object is used as a basis for the BMC to acquire which PSU power-down reason, for example:
the flag bit "0 x 11" indicates that both PSU power failure reasons need to be obtained;
the flag bit of "0 x 10" indicates that only the power failure reason of the PSU1 is acquired;
the flag bit "0 x 01" indicates that only the PSU2 power down cause was acquired.
And the CPLD converges the PWR _ OK states of the two PSUs, and when the PWR _ OK states of the two PSUs are both low, a system power-down interrupt is sent to the BMC, and the interrupt is defined as the highest-priority interrupt inside the BMC.
And the system power failure interruption is shielded, and the secondary triggering of interruption is prevented.
And according to the flag bit of the power failure detection object, selecting and reading the internal record of the black box log register of the PSU causing the system power failure.
When the flag bit is "0 x 11", the BMC reads the latest record inside the black box log registers MFR _ SPECIFIC _20(MFR _ PAGE, E4h) of the two PSUs through the PMBus;
when the flag is "0 x 10", the BMC reads the latest record inside the black box log register MFR _ SPECIFIC _20(MFR _ PAGE, E4h) of the PSU1 through the PMBus;
when the flag is "0 x 01", the BMC reads the latest record inside the black box log register MFR _ SPECIFIC _20(MFR _ PAGE, E4h) of the PSU2 via the PMBus.
And after the power failure reason is recorded, the system is powered off.
The invention also provides a PSU power failure reason detection device for realizing the PSU power failure reason detection method, taking detection of two PSUs as an example, as shown in FIG. 3, the device comprises a BMC, a CPLD, a PSU1 and a PSU 2;
the CPLD sends a system power-down interrupt signal to the BMC through the CPLD _ BMC _ INIT, and the BMC writes the system power-down interrupt mask state register enable interrupt to the CPLD through the I2C;
the BMC reads internal records of black box log registers of the PSU1 and the PSU2 through the PMBus;
PSU1 and PSU2 send operating status signals to the BMC and CPLD via PSU1_ PWR _ OK and PSU2_ PWR _ OK, respectively.
PSU1 and PSU2 also send in-bit signals to BMC and CPLD through PSU1_ PRESENT _ N and PSU2_ PRESENT _ N.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, the scope of the present invention is not limited thereto. Various modifications and alterations will occur to those skilled in the art based on the foregoing description. And are neither required nor exhaustive of all embodiments. On the basis of the technical scheme of the invention, various modifications or changes which can be made by a person skilled in the art without creative efforts are still within the protection scope of the invention.