[go: up one dir, main page]

CN111488050B - Power supply monitoring method, system and server - Google Patents

Power supply monitoring method, system and server Download PDF

Info

Publication number
CN111488050B
CN111488050B CN202010300845.8A CN202010300845A CN111488050B CN 111488050 B CN111488050 B CN 111488050B CN 202010300845 A CN202010300845 A CN 202010300845A CN 111488050 B CN111488050 B CN 111488050B
Authority
CN
China
Prior art keywords
power supply
monitoring device
power
fault
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010300845.8A
Other languages
Chinese (zh)
Other versions
CN111488050A (en
Inventor
滕学军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010300845.8A priority Critical patent/CN111488050B/en
Publication of CN111488050A publication Critical patent/CN111488050A/en
Application granted granted Critical
Publication of CN111488050B publication Critical patent/CN111488050B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/28Supervision thereof, e.g. detecting power-supply failure by out of limits supervision
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0745Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in an input/output transactions management context
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本发明公开了一种电源监控方法,检测电源和用于监控电源的工作状况的监控装置之间的通信链路是否中断;若未中断,则确定监控装置的电源故障报警有效;若中断,则确定监控装置的电源故障报警无效,并复位电源和监控装置的通信端口及二者之间的通信总线,以修复二者之间的通信链路。可见,本申请在监控装置与电源通信中断时,确定监控装置误报警,并修复监控装置与电源之间的通信链路,从而避免因监控装置与电源通信中断导致的误报警问题。本发明还公开了一种电源监控系统及服务器,与上述电源监控方法具有相同的有益效果。

Figure 202010300845

The invention discloses a power supply monitoring method, which detects whether the communication link between the power supply and a monitoring device for monitoring the working condition of the power supply is interrupted; if it is not interrupted, it is determined that the power failure alarm of the monitoring device is valid; It is determined that the power failure alarm of the monitoring device is invalid, and the communication port of the power supply and the monitoring device and the communication bus between the two are reset to restore the communication link between the two. It can be seen that in the present application, when the communication between the monitoring device and the power supply is interrupted, it is determined that the monitoring device falsely alarms, and the communication link between the monitoring device and the power supply is repaired, thereby avoiding the problem of false alarms caused by the interruption of communication between the monitoring device and the power supply. The invention also discloses a power supply monitoring system and a server, which have the same beneficial effects as the above-mentioned power supply monitoring method.

Figure 202010300845

Description

Power supply monitoring method, system and server
Technical Field
The invention relates to the field of power supply monitoring, in particular to a power supply monitoring method, a power supply monitoring system and a server.
Background
The servers of the data center play an important role in data calculation and storage, and once the power supply of the servers fails, the servers are down, which easily causes data loss of the servers. At present, in order to avoid data loss caused by power failure, a power system composed of two identical power supplies is configured for a server, a monitoring device for monitoring working conditions of the two power supplies is configured for the power system, and when the monitoring device monitors that one power supply fails, the other power supply takes over power supply operation to continuously ensure normal operation of the server. However, in the communication process between the monitoring device and the power supply, the handshaking failure and communication interruption of the monitoring device and the power supply may be caused by electromagnetic interference, and at this time, the monitoring device cannot monitor the power supply, and a power supply failure alarm is performed, so that a false alarm problem exists.
Therefore, how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a power supply monitoring method, a system and a server, which can determine false alarm of a monitoring device and repair a communication link between the monitoring device and a power supply when the communication between the monitoring device and the power supply is interrupted, thereby avoiding the false alarm problem caused by the communication interruption between the monitoring device and the power supply.
In order to solve the technical problem, the invention provides a power supply monitoring method, which comprises the following steps:
detecting whether a communication link between a power supply and a monitoring device which is directly connected with the power supply and is used for monitoring the working condition of the power supply is interrupted;
if not, determining that the power failure alarm of the monitoring device is effective;
and if so, determining that the power failure alarm of the monitoring device is invalid, and resetting the power supply, the communication port of the monitoring device and a communication bus between the power supply and the communication port of the monitoring device to repair the communication link.
Preferably, the power supply monitoring method further includes:
pre-establishing address information corresponding relation between the address of a register used for storing power failure information and the stored power failure information;
after analyzing the operation parameter information of the power supply to obtain the actual fault information of the power supply, determining a target address corresponding to the actual fault information according to the address information corresponding relation, and writing a target register corresponding to the target address into a preset fault value for the monitoring device to read.
Preferably, the operation parameter information includes input/output parameter information of the power supply and operation parameter information of key components inside the power supply;
and the power supply monitoring method further comprises:
when the actual fault information of the power supply is analyzed, recording the fault analysis condition of the power supply;
and periodically acquiring the current operation parameter information of the power supply, and predicting the future fault condition of the power supply by combining the fault analysis condition of the historical record.
Preferably, the power supply monitoring method further includes:
pre-establishing an index relation corresponding table for searching the power failure type and the failure processing mode according to the power failure information;
and after analyzing the operation parameter information of the power supply to obtain the actual fault information of the power supply, finding the power supply fault type and the fault processing mode corresponding to the actual fault information according to the index relation corresponding table.
Preferably, the power supply monitoring method further includes:
and when the searched fault processing mode is a firmware upgrading mode, triggering a chip for upgrading the firmware in the power supply to carry out online firmware upgrading.
Preferably, the chip comprises a first chip core and a second chip core;
correspondingly, the process of triggering a chip for firmware upgrade in the power supply to perform online firmware upgrade includes:
detecting whether a first chip core pre-designated to execute firmware upgrading operation fails;
if not, triggering the first chip core to execute firmware upgrading operation;
and if so, triggering the second chip core to execute firmware upgrading operation.
In order to solve the above technical problem, the present invention further provides a power supply monitoring system, including:
the first communication fault-tolerant-resisting module is arranged in the power supply and used for resetting a communication port of the power supply when a communication link between the power supply and a monitoring device which is directly connected with the power supply and used for monitoring the working condition of the power supply is interrupted;
the second communication fault-tolerant-resisting module is arranged in the monitoring device and used for detecting whether a communication link between the power supply and the monitoring device is interrupted or not, and if not, determining that the power supply fault alarm of the monitoring device is effective; if so, determining that the power failure alarm of the monitoring device is invalid, and resetting a communication port of the monitoring device and a communication bus between the monitoring device and the power supply so as to repair the communication link.
Preferably, the power supply monitoring system further comprises:
the register is arranged in the power supply and used for storing power supply fault information;
the fault processing module is arranged in the power supply and used for pre-establishing an address information corresponding relation between the address of the register and the stored power supply fault information; after analyzing the operation parameter information of the power supply to obtain the actual fault information of the power supply, determining a target address corresponding to the actual fault information according to the address information corresponding relation, and writing a target register corresponding to the target address into a preset fault value for the monitoring device to read.
In order to solve the technical problem, the invention also provides a server, which comprises a power supply and a monitoring device which is directly connected with the power supply and is used for monitoring the working condition of the power supply; wherein, the power supply is monitored by adopting any one of the power supply monitoring methods.
Preferably, the monitoring device is specifically a BMC in the server.
The invention provides a power supply monitoring method, which detects whether a communication link between a power supply and a monitoring device for monitoring the working condition of the power supply is interrupted; if the power failure alarm is not interrupted, determining that the power failure alarm of the monitoring device is effective; if the power failure alarm is interrupted, the power failure alarm of the monitoring device is determined to be invalid, and the communication ports of the power supply and the monitoring device and the communication bus between the power supply and the monitoring device are reset so as to repair a communication link between the power supply and the monitoring device. Therefore, when the communication between the monitoring device and the power supply is interrupted, the false alarm of the monitoring device is determined, and the communication link between the monitoring device and the power supply is repaired, so that the false alarm problem caused by the interruption of the communication between the monitoring device and the power supply is avoided.
The invention also provides a power supply monitoring system and a server, and the power supply monitoring system and the server have the same beneficial effects as the power supply monitoring method.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed in the prior art and the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart of a power monitoring method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of power monitoring under an Intel chip topology according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an improved power supply monitoring provided by an embodiment of the present invention;
fig. 4 is a schematic diagram of monitoring a power failure according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a power supply monitoring method, a system and a server, when the communication between a monitoring device and a power supply is interrupted, the false alarm of the monitoring device is determined, and a communication link between the monitoring device and the power supply is repaired, so that the false alarm problem caused by the communication interruption between the monitoring device and the power supply is avoided.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating a power monitoring method according to an embodiment of the present invention.
The power supply monitoring method comprises the following steps:
step S1: detecting whether a communication link between a power supply and a monitoring device which is directly connected with the power supply and is used for monitoring the working condition of the power supply is interrupted; if not, go to step S2; if yes, go to step S3.
Step S2: and determining that the power failure alarm of the monitoring device is effective.
Step S3: determining that the power failure alarm of the monitoring device is invalid and resetting the power supply and the communication port of the monitoring device and the communication bus therebetween to repair the communication link.
Specifically, referring to fig. 2, fig. 2 is a schematic diagram illustrating a power supply monitoring under an Intel chip topology according to an embodiment of the present invention. In the process of monitoring the power supply, firstly, an Intel chip set ME (Management Engine) passes through I2The C bus reads the information of the power supply, and the monitoring device (such as BMC (Baseboard Management Controller) as the monitoring device) passes through another path I2The C bus reads the information of the power supply from the interior of the ME to ensure that the monitoring device monitors the information of the power supply in real time, and the ME plays a role in bridging in the process. When the server is in the state of S5 (one of the states of the server motherboard, S5 represents that the motherboard AC is powered on but not powered on), the ME is not working properly; after the server enters the state of S0 (one of the states of the server motherboard, S0 represents that the motherboard is already powered on), the ME starts to work normally; when the server enters the S0 state from the S5 state, the motherboard boot signal is sent to the monitoring device and the PCH (integrated south bridge) at the same time, the monitoring device monitors the power information after receiving the signal, and the PCH controls the server to boot after receiving the signal; in the process, when the monitoring device scans the information of the monitoring power supply, the ME does not work normally yet, so that the monitoring device and the ME cannot communicate with each other, the monitoring device records the power supply fault and gives an alarm after detecting that the communication cannot be performed, however, the 'false' alarm is not a real 'fault', and a large amount of work is brought to operation and maintenance personnel.
In order to solve the problems, a direct connection topology design is adopted between the power supply and the monitoring device for monitoring the working condition of the power supply, as shown in fig. 3, namely, the monitoring device is directly communicated with the power supply in any state, and no intermediate link exists, so that the problem of 'error' alarm caused by the intermediate link is effectively solved.
In addition, considering that in the process of communication between the monitoring device and the power supply, the two may be caused by handshake failure and communication interruption due to electromagnetic interference, and at this time, the monitoring device cannot perform monitoring on the power supply, and power failure alarm is performed, so that the problem of false alarm exists, the technical means adopted by the application is as follows:
detecting whether a communication link between a power supply and a monitoring device is interrupted, and if the communication link between the power supply and the monitoring device is not interrupted, indicating that the monitoring device is not a power supply failure alarm caused by the interruption of the communication with the power supply, determining that the power supply failure alarm of the monitoring device is effective; if the communication link between the monitoring device and the power supply is interrupted, the monitoring device is indicated to be a power supply failure alarm caused by the interruption of the communication with the power supply, the power supply failure alarm of the monitoring device is determined to be invalid, namely, the monitoring device is determined to have a false alarm problem due to the interruption of the communication with the power supply, and the communication between the monitoring device and the power supply is repaired.
The communication repair operation between the monitoring device and the power supply specifically comprises the following steps: arranging a first communication fault-tolerant resisting module in the power supply, detecting whether the communication between the power supply and the monitoring device is interrupted or not by the first communication fault-tolerant resisting module, and resetting a communication port of the power supply to recover the communication port of the power supply if the communication is interrupted; if the communication is not interrupted, the reset operation of the power supply communication port is not executed. Similarly, a second communication fault-tolerant resisting module is arranged in the monitoring device, the second communication fault-tolerant resisting module detects whether the monitoring device is interrupted in communication with the power supply, and if the communication is interrupted, the communication port of the monitoring device is reset so as to recover the communication port of the monitoring device; meanwhile, the second communication fault-tolerant resisting module resets a communication bus between the monitoring device and the power supply so as to repair a communication link; if the communication is not interrupted, the reset operation of the communication port and the communication bus of the device is not executed.
More specifically, the first communication fault-tolerant resistant module detects the way of the power supply and the monitoring device communication interruption: recording the time of the monitoring device polling the power supply, if the monitoring device does not access the power supply in 15 polling periods (the monitoring device polls the power supply once a second generally, and 15 polling periods are 15 seconds), determining that the communication between the power supply and the monitoring device is interrupted, and executing the operation of resetting the communication port of the power supply, thereby ensuring that the problem of timely recovery due to the fault of the communication port of the power supply is solved in the process of communication between the power supply and the monitoring device. The second communication fault-tolerant-resisting module detects the mode of the monitoring device and the power supply communication interruption: when the monitoring device does not respond to the power supply communication through periodic detection and recognition, the communication interruption between the monitoring device and the power supply is determined, and the operation of resetting the communication port of the monitoring device is executed, so that the problem of timely recovery due to the fact that the communication port of the monitoring device breaks down in the process of communication between the monitoring device and the power supply is solved. In addition, the second communication fault-tolerant module also performs an operation of resetting the communication port of the monitoring device when a PEC (Parity Check) transmission error is detected during the communication process. The second communication fault-tolerant-resistant module resets the communication bus between the monitoring device and the power supply: the signal (9 clocks) that the monitoring device established communication with the power supply is retransmitted to the power supply.
The invention provides a power supply monitoring method, which detects whether a communication link between a power supply and a monitoring device for monitoring the working condition of the power supply is interrupted; if the power failure alarm is not interrupted, determining that the power failure alarm of the monitoring device is effective; if the power failure alarm is interrupted, the power failure alarm of the monitoring device is determined to be invalid, and the communication ports of the power supply and the monitoring device and the communication bus between the power supply and the monitoring device are reset so as to repair a communication link between the power supply and the monitoring device. Therefore, when the communication between the monitoring device and the power supply is interrupted, the false alarm of the monitoring device is determined, and the communication link between the monitoring device and the power supply is repaired, so that the false alarm problem caused by the interruption of the communication between the monitoring device and the power supply is avoided.
On the basis of the above-described embodiment:
as an optional embodiment, the power supply monitoring method further includes:
pre-establishing address information corresponding relation between the address of a register used for storing power failure information and the stored power failure information;
after the operation parameter information of the power supply is analyzed to obtain the actual fault information of the power supply, a target address corresponding to the actual fault information is determined according to the address information corresponding relation, and a target register corresponding to the target address is written into a preset fault value for the monitoring device to read.
It should be noted that the preset of the present application is set in advance, and only needs to be set once, and the reset is not needed unless the modification is needed according to the actual situation.
Further, the present application may also establish a correspondence relationship (address information correspondence relationship for short, which may be embodied in a table form) between an address of a register for storing power failure information and the power failure information stored therein in advance, that is, the address information correspondence relationship represents which kind of failure information (such as OVP overvoltage failure, UVP undervoltage failure) of the power supply is specifically stored in each register for storing power failure information. Based on this, after the actual fault information of the power supply is obtained by analyzing the operation parameter information of the power supply, the target address corresponding to the obtained actual fault information, namely the address of the target register for storing the obtained actual fault information, can be determined according to the established address information corresponding relation, and then the preset fault value is written into the target register based on the address of the target register to indicate that the power supply has the fault corresponding to the actual fault information. At the same time, the monitoring device may interact with the power supply to read stored information in a register within the power supply to determine a fault condition of the power supply based on the stored information in the register.
More specifically, referring to fig. 4, fig. 4 is a schematic diagram illustrating a power failure monitoring method according to an embodiment of the present invention. The specific mode of monitoring the power failure by the monitoring device is as follows: the power supply is internally provided with a fault processing module and a register for storing power supply fault information, wherein the fault processing module is used for pre-establishing an address information corresponding relation between the address of the register and the stored power supply fault information, determining a target address corresponding to the actual fault information according to the address information corresponding relation after analyzing the operation parameter information of the power supply to obtain the actual fault information of the power supply, and writing a target register corresponding to the target address into a preset fault value for a monitoring device to read.
As an optional embodiment, the operation parameter information includes input/output parameter information of the power supply and operation parameter information of key components inside the power supply;
and the power supply monitoring method further comprises the following steps:
when the actual fault information of the power supply is analyzed, the fault analysis condition of the power supply is recorded;
the current operation parameter information of the power supply is periodically acquired, and the future fault condition of the power supply is predicted by combining the fault analysis condition of the historical record.
Further, the method analyzes the operation parameter information of the power supply to obtain the actual fault information of the power supply, and specifically analyzes the input and output parameter information of the power supply and the operation parameter information of key components inside the power supply, wherein the input and output parameter information of the power supply is analyzed to obtain the externally dominant fault of the power supply, and the operation parameter information of the key components inside the power supply is analyzed to obtain the faults of the internal structure of the power supply, such as the comprehensive faults and the standard exceeding information of the voltage, the current and the temperature of the key components inside the power supply.
Therefore, when the actual fault information of the power supply is analyzed, the fault analysis condition of the power supply can be recorded and used as the basis for subsequently prejudging the power supply fault. In addition, the method and the device periodically acquire the current operation parameter information of the power supply, and predict the future fault condition of the power supply by combining the fault analysis condition of the historical record.
More specifically, the fault processing module analyzes the operation parameter information of the power supply to obtain actual fault information of the power supply, records a fault analysis log of the power supply, and sends the fault analysis log of the power supply to the monitoring device for saving. The monitoring device periodically polls the current operation parameter information of the power supply from the fault processing module and predicts the future fault condition of the power supply by combining with a fault analysis log stored in history.
As an optional embodiment, the power supply monitoring method further includes:
pre-establishing an index relation corresponding table for searching the power failure type and the failure processing mode according to the power failure information;
after the operation parameter information of the power supply is analyzed to obtain the actual fault information of the power supply, the power supply fault type and the fault processing mode corresponding to the actual fault information are found according to the index relation corresponding table.
Further, the method and the device can also establish an index relation corresponding table used for searching the power failure type and the failure processing mode according to the power failure information in advance, namely the index relation corresponding table represents the power failure type and the failure processing mode corresponding to any power failure information. Based on the method and the device, after the operation parameter information of the power supply is analyzed to obtain the actual fault information of the power supply, the power supply fault type and the fault processing mode corresponding to the obtained actual fault information can be found according to the index relation corresponding table.
More specifically, the fault processing module analyzes the operation parameter information of the power supply to obtain actual fault information of the power supply, and sends the actual fault information to the monitoring device (such as the BMC). The BMC stores the index relation corresponding table in advance, and after receiving the actual fault information, searches the power failure type and the fault processing mode corresponding to the obtained actual fault information according to the index relation corresponding table. For operation and maintenance personnel, the fault type and how to process the current fault can be known by remotely accessing the BMC WEB interface, so that the maintenance cost is saved.
As an optional embodiment, the power supply monitoring method further includes:
and when the searched fault processing mode is the firmware upgrading mode, triggering a chip for upgrading the firmware in the power supply to carry out online firmware upgrading.
Further, if the fault processing mode corresponding to the current fault information of the power supply is a firmware upgrading mode, the current fault of the power supply is eliminated by upgrading the firmware of the power supply. The existing system upgrading mode is as follows: the offline power supply upgrade is to take the power supply out of the system, and use a tool composed of a jig board, a computer, a burner, a USB (Universal Serial Bus) cable, a USB conversion head, and a PMBus (power management Bus) cable to upgrade the firmware of the power supply one by one. The method adopts online upgrading, and particularly, the monitoring device sends an upgrading instruction to the power supply to trigger a chip for upgrading firmware in the power supply to upgrade the firmware online, so that the method is simple and convenient.
As an alternative embodiment, the chip comprises a first chip core and a second chip core;
correspondingly, the process of triggering a chip for firmware upgrade in a power supply to perform online firmware upgrade comprises the following steps:
detecting whether a first chip core pre-designated to execute firmware upgrading operation fails;
if not, triggering the first chip core to execute firmware upgrading operation;
if yes, triggering the second chip core to execute the firmware upgrading operation.
Specifically, the chip for firmware upgrade adopts a dual-core chip, namely two chip cores are guaranteed to be mirror images of each other, if one chip core fails, the other chip core can continue to execute firmware upgrade operation, so that the successful online effective upgrade of the power firmware can be ensured, and meanwhile, the situation that the system fails to upgrade due to abnormity (such as interruption, interference, code error, sudden power failure in the upgrade process and the like) in the firmware upgrade process and further the system is crashed can be prevented.
The present application further provides a power monitoring system, including:
the first communication fault-tolerant-resisting module is arranged in the power supply and used for resetting a communication port of the power supply when a communication link between the power supply and a monitoring device which is directly connected with the power supply and used for monitoring the working condition of the power supply is interrupted;
the second communication fault-tolerant-resisting module is arranged in the monitoring device and used for detecting whether a communication link between the power supply and the monitoring device is interrupted or not, and if not, determining that the power supply fault alarm of the monitoring device is effective; if so, determining that the power failure alarm of the monitoring device is invalid, and resetting a communication port of the monitoring device and a communication bus between the monitoring device and a power supply so as to repair a communication link.
As an alternative embodiment, the power supply monitoring system further comprises:
the register is arranged in the power supply and used for storing power supply fault information;
the fault processing module is arranged in the power supply and used for pre-establishing an address information corresponding relation between the address of the register and the stored power supply fault information; after the operation parameter information of the power supply is analyzed to obtain the actual fault information of the power supply, a target address corresponding to the actual fault information is determined according to the address information corresponding relation, and a target register corresponding to the target address is written into a preset fault value for the monitoring device to read.
For introduction of the power monitoring system provided in the present application, reference is made to the embodiments of the power monitoring method described above, and details of the power monitoring system are not repeated herein.
The application also provides a server, which comprises a power supply and a monitoring device which is directly connected with the power supply and is used for monitoring the working condition of the power supply; wherein, the power supply is monitored by adopting any one of the power supply monitoring methods.
As an alternative embodiment, the monitoring device is embodied as a BMC within the server.
For the introduction of the server provided in the present application, please refer to the above embodiments of the power monitoring method, which are not described herein again.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1.一种电源监控方法,其特征在于,包括:1. a power monitoring method, is characterized in that, comprises: 检测电源和与所述电源直连且用于监控所述电源的工作状况的监控装置之间的通信链路是否中断;Detecting whether the communication link between the power source and the monitoring device directly connected to the power source and used to monitor the working condition of the power source is interrupted; 若否,则确定所述监控装置的电源故障报警有效;If not, it is determined that the power failure alarm of the monitoring device is valid; 若是,则确定所述监控装置的电源故障报警无效,并复位所述电源和所述监控装置的通信端口及二者之间的通信总线,以修复所述通信链路;If so, determine that the power failure alarm of the monitoring device is invalid, and reset the power supply and the communication port of the monitoring device and the communication bus between the two to restore the communication link; 预先建立用于根据电源故障信息查找电源故障类型及故障处理方式的索引关系对应表;Pre-establishing an index relationship correspondence table for finding the type of power failure and the fault handling method according to the power failure information; 在分析所述电源的运行参数信息得到所述电源的实际故障信息之后,根据所述索引关系对应表查找到所述实际故障信息对应的电源故障类型及故障处理方式。After analyzing the operating parameter information of the power supply to obtain the actual fault information of the power supply, the power supply fault type and the fault processing method corresponding to the actual fault information are searched according to the index relationship correspondence table. 2.如权利要求1所述的电源监控方法,其特征在于,所述电源监控方法还包括:2. The power monitoring method according to claim 1, wherein the power monitoring method further comprises: 预先建立用于存储电源故障信息的寄存器的地址与其所存储的电源故障信息之间的地址信息对应关系;Pre-establishing the address information correspondence between the address of the register for storing the power failure information and the stored power failure information; 在分析所述电源的运行参数信息得到所述电源的实际故障信息之后,根据所述地址信息对应关系确定与所述实际故障信息对应的目标地址,并将所述目标地址对应的目标寄存器写入预设故障值,供所述监控装置读取。After analyzing the operating parameter information of the power supply to obtain the actual fault information of the power supply, the target address corresponding to the actual fault information is determined according to the corresponding relationship of the address information, and the target register corresponding to the target address is written into the target address A preset fault value is read by the monitoring device. 3.如权利要求2所述的电源监控方法,其特征在于,所述运行参数信息包括所述电源的输入输出参数信息和所述电源内部关键元器件的运行参数信息;3. The power supply monitoring method according to claim 2, wherein the operating parameter information comprises input and output parameter information of the power supply and operating parameter information of key components inside the power supply; 且所述电源监控方法还包括:And the power monitoring method further includes: 在分析出所述电源的实际故障信息时,记录所述电源的故障分析情况;When analyzing the actual fault information of the power supply, record the fault analysis situation of the power supply; 周期性获取所述电源的当前运行参数信息,并结合历史记录的故障分析情况预测所述电源的未来故障情况。The current operating parameter information of the power supply is periodically acquired, and future failures of the power supply are predicted in combination with historically recorded failure analysis conditions. 4.如权利要求1所述的电源监控方法,其特征在于,所述电源监控方法还包括:4. The power monitoring method according to claim 1, wherein the power monitoring method further comprises: 当查找到的故障处理方式为升级固件方式时,触发所述电源中用于固件升级的芯片进行在线固件升级。When the found fault processing mode is the firmware upgrade mode, trigger the chip used for firmware upgrade in the power supply to perform online firmware upgrade. 5.如权利要求4所述的电源监控方法,其特征在于,所述芯片包括第一芯片核和第二芯片核;5. The power supply monitoring method according to claim 4, wherein the chip comprises a first chip core and a second chip core; 相应的,所述触发所述电源中用于固件升级的芯片进行在线固件升级的过程,包括:Correspondingly, the process of triggering the online firmware upgrade of the chip used for firmware upgrade in the power supply includes: 检测预指定执行固件升级操作的第一芯片核是否故障;Detecting whether the first chip core pre-designated to perform the firmware upgrade operation is faulty; 若否,则触发所述第一芯片核执行固件升级操作;If not, triggering the first chip core to perform a firmware upgrade operation; 若是,则触发所述第二芯片核执行固件升级操作。If so, trigger the second chip core to perform a firmware upgrade operation. 6.一种电源监控系统,其特征在于,包括:6. A power monitoring system, comprising: 设于电源内的第一通信抗容错模块,用于在检测电源和与所述电源直连且用于监控所述电源的工作状况的监控装置之间的通信链路中断时,复位所述电源的通信端口;A first communication fault-tolerant module provided in the power supply for resetting the power supply when the communication link between the detection power supply and the monitoring device directly connected to the power supply and used to monitor the working condition of the power supply is interrupted communication port; 设于所述监控装置内的第二通信抗容错模块,用于检测所述电源和所述监控装置之间的通信链路是否中断,若否,则确定所述监控装置的电源故障报警有效;若是,则确定所述监控装置的电源故障报警无效,并复位所述监控装置的通信端口及与所述电源之间的通信总线,以修复所述通信链路;预先建立用于根据电源故障信息查找电源故障类型及故障处理方式的索引关系对应表;在分析所述电源的运行参数信息得到所述电源的实际故障信息之后,根据所述索引关系对应表查找到所述实际故障信息对应的电源故障类型及故障处理方式。a second communication fault-tolerant module arranged in the monitoring device, for detecting whether the communication link between the power supply and the monitoring device is interrupted, and if not, determining that the power failure alarm of the monitoring device is valid; If so, determine that the power failure alarm of the monitoring device is invalid, and reset the communication port of the monitoring device and the communication bus with the power supply to restore the communication link; Find the index relationship corresponding table of power failure types and fault handling methods; after analyzing the operating parameter information of the power supply to obtain the actual failure information of the power supply, look up the power supply corresponding to the actual failure information according to the index relationship corresponding table. Types of failures and how to deal with them. 7.如权利要求6所述的电源监控系统,其特征在于,所述电源监控系统还包括:7. The power monitoring system of claim 6, wherein the power monitoring system further comprises: 设于所述电源内、用于存储电源故障信息的寄存器;a register arranged in the power supply for storing power failure information; 设于所述电源内的故障处理模块,用于预先建立所述寄存器的地址与其所存储的电源故障信息之间的地址信息对应关系;在分析所述电源的运行参数信息得到所述电源的实际故障信息之后,根据所述地址信息对应关系确定与所述实际故障信息对应的目标地址,并将所述目标地址对应的目标寄存器写入预设故障值,供所述监控装置读取。The fault processing module arranged in the power supply is used to pre-establish the address information correspondence between the address of the register and the power supply fault information stored therein; After the fault information is obtained, a target address corresponding to the actual fault information is determined according to the corresponding relationship of the address information, and a target register corresponding to the target address is written into a preset fault value for the monitoring device to read. 8.一种服务器,其特征在于,包括电源和与所述电源直连且用于监控所述电源的工作状况的监控装置;其中,所述电源采用如权利要求1-5任一项所述的电源监控方法进行监控。8. A server, characterized by comprising a power supply and a monitoring device directly connected to the power supply and used for monitoring the working condition of the power supply; wherein, the power supply adopts the method described in any one of claims 1-5. The power supply monitoring method is used for monitoring. 9.如权利要求8所述的服务器,其特征在于,所述监控装置具体为所述服务器内的BMC。9 . The server according to claim 8 , wherein the monitoring device is specifically a BMC in the server. 10 .
CN202010300845.8A 2020-04-16 2020-04-16 Power supply monitoring method, system and server Active CN111488050B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010300845.8A CN111488050B (en) 2020-04-16 2020-04-16 Power supply monitoring method, system and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010300845.8A CN111488050B (en) 2020-04-16 2020-04-16 Power supply monitoring method, system and server

Publications (2)

Publication Number Publication Date
CN111488050A CN111488050A (en) 2020-08-04
CN111488050B true CN111488050B (en) 2022-04-22

Family

ID=71791756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010300845.8A Active CN111488050B (en) 2020-04-16 2020-04-16 Power supply monitoring method, system and server

Country Status (1)

Country Link
CN (1) CN111488050B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113625696B (en) * 2021-08-31 2023-03-24 东风商用车有限公司 Safety processing method and system for overcurrent protection of vehicle-mounted controller
CN117527478A (en) * 2024-01-05 2024-02-06 西安图为电气技术有限公司 Monitoring system for power module and power module management system
CN118112454A (en) * 2024-03-01 2024-05-31 东莞市嘉田电子科技有限公司 A device for diagnosing potential power failure of a server and a method for diagnosing the same

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101102377A (en) * 2007-07-24 2008-01-09 北京意科通信技术有限责任公司 A communication power operation management and alert system and its method
CN102624584A (en) * 2012-03-01 2012-08-01 中兴通讯股份有限公司 Link detection method and link detection device
CN104656531A (en) * 2015-01-16 2015-05-27 张泽 Monitoring method and device for intelligent equipment
CN106292986A (en) * 2016-08-08 2017-01-04 浪潮电子信息产业股份有限公司 A kind of server power supply PSU fault determination method and device
CN106712287A (en) * 2016-11-21 2017-05-24 国家电网公司 Intelligent alarm analysis system of intelligent transformer substation
CN106788712A (en) * 2017-01-11 2017-05-31 山西恒海创盈科技有限公司 Electric power optical cable on-line intelligence monitoring system
CN108399116A (en) * 2018-03-02 2018-08-14 郑州云海信息技术有限公司 A kind of server power-up state monitoring system and method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6949916B2 (en) * 2002-11-12 2005-09-27 Power-One Limited System and method for controlling a point-of-load regulator
CN103792923A (en) * 2014-02-14 2014-05-14 浪潮电子信息产业股份有限公司 Method for detecting and controlling sets of power supplies of main board through digital chips
US10386425B2 (en) * 2014-03-24 2019-08-20 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Method and system for managing power faults
CN105897491A (en) * 2016-06-24 2016-08-24 努比亚技术有限公司 Method and device for filtering invalid monitoring alarm information
CN109885151A (en) * 2019-01-31 2019-06-14 郑州云海信息技术有限公司 A kind of server power monitoring method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101102377A (en) * 2007-07-24 2008-01-09 北京意科通信技术有限责任公司 A communication power operation management and alert system and its method
CN102624584A (en) * 2012-03-01 2012-08-01 中兴通讯股份有限公司 Link detection method and link detection device
CN104656531A (en) * 2015-01-16 2015-05-27 张泽 Monitoring method and device for intelligent equipment
CN106292986A (en) * 2016-08-08 2017-01-04 浪潮电子信息产业股份有限公司 A kind of server power supply PSU fault determination method and device
CN106712287A (en) * 2016-11-21 2017-05-24 国家电网公司 Intelligent alarm analysis system of intelligent transformer substation
CN106788712A (en) * 2017-01-11 2017-05-31 山西恒海创盈科技有限公司 Electric power optical cable on-line intelligence monitoring system
CN108399116A (en) * 2018-03-02 2018-08-14 郑州云海信息技术有限公司 A kind of server power-up state monitoring system and method

Also Published As

Publication number Publication date
CN111488050A (en) 2020-08-04

Similar Documents

Publication Publication Date Title
JP6333410B2 (en) Fault processing method, related apparatus, and computer
CN116225812B (en) Baseboard management controller system operation method, device, equipment and storage medium
CN111488050B (en) Power supply monitoring method, system and server
CN111324192A (en) System board power supply detection method, device, equipment and storage medium
CN106789306B (en) Method and system for detecting, collecting and recovering software fault of communication equipment
JP2001325124A (en) Computer, system management support device, and management method
CN106980562A (en) A kind of hard disk monitoring method and device
TW201119173A (en) Method of using power supply to execute remote monitoring of an electronic system
CN114816022A (en) Server power supply abnormity monitoring method, system and storage medium
CN109032863A (en) Determination method, the system of a kind of NVMe solid state hard disk and its failure cause
CN116126772A (en) UART serial port management system and method applied to ARM server
CN113672306B (en) Method, device, system and medium for recovery from abnormal self-checking of server components
CN112069032A (en) Availability detection method, system and related device for virtual machine
CN115129516B (en) A method for handling I2C deadlock problems of PCIe devices and related components
CN116775141A (en) Abnormality detection method, abnormality detection device, computer device, and storage medium
CN114138574B (en) Controller testing methods, apparatus, servers, and storage media
CN114003426B (en) Fault handling method, system and electronic equipment
CN101140540B (en) A method and system for automatically monitoring magnetic array faults
CN100369009C (en) Monitoring system and method using system management interrupt signal
CN114168396B (en) A fault location method and related components
CN116301276A (en) Device and method for detecting status of server power supply module
CN111352789B (en) Alternating current circulation test method and device for server and storage medium
CN114884021A (en) Power supply control method of power supply circuit and related components
CN111539044A (en) Server power firmware write protection control method, device, equipment and storage medium
CN113722185B (en) Domestic computer remote management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Jinan City, Shandong Province

Patentee after: Suzhou Yuannao Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Jinan City, Shandong Province

Patentee before: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address