CN102693166A

CN102693166A - Method, device and system for processing information

Info

Publication number: CN102693166A
Application number: CN2012101431532A
Authority: CN
Inventors: 牛功彪; 赵俊峰; 贾群
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2012-05-10
Filing date: 2012-05-10
Publication date: 2012-09-26
Anticipated expiration: 2032-05-10
Also published as: CN102693166B

Abstract

The invention discloses an information processing method, device and system, belonging to the technical field of communication. The method includes: when a plurality of nodes managed by the equipment management system DMS are running, the DMS acquires the CPU operation information of each node; when a faulty node appears in the plurality of nodes, the DMS According to the CPU operation information of each node, the faulty node is obtained; the DMS debugs the faulty node. The present invention obtains the CPU operation information of each node online. When a node in the system fails, it can find the faulty node in time according to the CPU operation information of each node, and debug the faulty node online without taking down the faulty node separately. Debugging is carried out, which solves the problem in the prior art that it takes a long time to debug a faulty node.

Description

The methods, devices and systems of information processing

Technical field

The present invention relates to communication technical field, particularly a kind of methods, devices and systems of information processing.

Background technology

Many large-scale communication systems all are made up of a plurality of servers at present; Each server all is a node in system; Link together through preset circuit between node and the node; Form a complete system, each node processing all can have corresponding Processing tasks, and work cooperatively interacts between node.For the system that a plurality of nodes are formed, if fault has appearred in a node, other node all can be affected, so to the fault detect of node and to carry out fault handling be to need the problem that solves.

In the prior art, in system's operational process, if certain node breaks down; Then can know the temperature of each node, when finding that wherein the temperature of certain node surpasses preset threshold value, show that then fault has appearred in this node through sensor; The operation of Break-Up System, and this node taken out from system, this node is debugged separately; Know the failure cause of this node, and according to failure cause this node is recovered further, put into system again after this node debugging is accomplished; Again debug and operational system, thereby realized fault detect and recovery node in the system.

In realizing process of the present invention, the inventor finds that there is following problem at least in prior art: when the discovery system breaks down, can only know malfunctioning node through sensor in the prior art; Can not be detailed obtain failure message, need the operation of Break-Up System, malfunctioning node is taken out from system is debugged; By the time after malfunctioning node debugging finishes and is re-fitted in the system; Just debug again and operational system, consuming time so longer, reduce system works efficient.

Summary of the invention

In order to solve in the prior art the debugging of malfunctioning node long problem consuming time, the embodiment of the invention provides a kind of methods, devices and systems of information processing.Said technical scheme is following:

On the one hand, a kind of method of information processing is provided, said method comprises:

When a plurality of nodes of equipment management system DMS management moved, said DMS obtained the central processor CPU operation information of said each node;

When breaking down node in said a plurality of nodes, said DMS is known malfunctioning node according to the CPU operation information of said each node;

Said DMS debugs said malfunctioning node.

On the other hand, a kind of equipment management system is provided, said equipment management system comprises:

Acquisition module is used for when a plurality of nodes of DMS management move, obtaining the CPU operation information of said each node;

Fault diagnosis module is used for when said a plurality of nodes break down node, knowing malfunctioning node according to the CPU operation information of said each node;

The fault debugging module is used for said malfunctioning node is debugged.

On the other hand, a kind of node is provided also, said node comprises: on-site programmable gate array FPGA, debugging interface XDP and central processor CPU, and wherein said FPGA comprises:

Receiver module is used to receive the CPU operation information that said CPU transmits through said XDP;

Message processing module; Be used for converting said interface message to information that equipment management system DMS can show; And sending the information after the said conversion CPU of Management Controller BMC on the plate to, the information of the CPU that makes said BMC after to said conversion encapsulates and sends the information after the said encapsulation to said DMS.

On the other hand, a kind of system of information processing is provided also, said system comprises: aforesaid equipment management system and aforesaid node.

The beneficial effect that the technical scheme that the embodiment of the invention provides is brought is: when a plurality of nodes of equipment management system DMS management moved, said DMS obtained the central processor CPU operation information of said each node; When breaking down node in said a plurality of nodes, said DMS is known malfunctioning node according to the CPU operation information of said each node; Said DMS debugs said malfunctioning node.Through online each node cpu operation information that obtains when the node in the system breaks down; Can be in time according to the CPU operation information of each node; Find malfunctioning node; And online malfunctioning node is debugged, need not lift down malfunctioning node separately and debug, solved in the prior art malfunctioning node debugging long problem consuming time.

Description of drawings

In order to be illustrated more clearly in the technical scheme in the embodiment of the invention; The accompanying drawing of required use is done to introduce simply in will describing embodiment below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills; Under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is the process flow diagram of the method for a kind of information processing of providing in the embodiment of the invention 1;

Fig. 2 is the process flow diagram of the method for a kind of information processing of providing in the embodiment of the invention 2;

Fig. 3 is the synoptic diagram of a kind of equipment management system of providing in the embodiment of the invention 3;

Fig. 4 is the synoptic diagram of a kind of node of providing in the embodiment of the invention 3;

Fig. 5 is the synoptic diagram of a kind of FPGA of providing in the embodiment of the invention 3;

Fig. 6 is the synoptic diagram of the system of a kind of information processing of providing in the embodiment of the invention 3.

Embodiment

For making the object of the invention, technical scheme and advantage clearer, will combine accompanying drawing that embodiment of the present invention is done to describe in detail further below.

The present invention has mainly solved in the system that multinode is formed, and when wherein one or more node cpus break down, needn't from system, single node be taken down again, carries out the independent debugging of single node, carries out group system again after debugging finishes; But directly through online method the CPU operation information real-time online of single node is collected and demonstration, thereby through the information of collecting cpu fault is carried out on-line debugging, and then practice thrift the system failure and get rid of time and raising system maintenance efficient.

Embodiment 1

Referring to Fig. 1, a kind of method of information processing is provided in the present embodiment, said method comprises:

101, when a plurality of nodes of DMS (DEVICE MANAGEMENT SYSTEM, equipment management system) management moved, said DMS obtained CPU (Central Processing Unit, the central processing unit) operation information of said each node;

102, when breaking down node in said a plurality of nodes, said DMS is known malfunctioning node according to the CPU operation information of said each node;

103, said DMS debugs said malfunctioning node.

Preferably; Each node of said DMS management includes FPGA (Field Programmable Gate Array; Field programmable gate array), XDP (Xtended Debug Port) and CPU, XDP is a new debugging interface of Intel definition, is used for CPU is debugged; Said DMS obtains before the CPU operation information of said each node, and each node of said DMS management is carried out following steps:

The FPGA of node receives the CPU operation information of the CPU of said node through the XDP transmission;

Said FPGA converts said CPU operation information to information that said DMS can show; And sending the information after the said conversion CPU of BMC to, the information of the CPU that makes said BMC after to said conversion encapsulates and sends the information after the said encapsulation to said DMS.

Preferably, said information after the said conversion is sent to the CPU of BMC, comprising:

Said FPGA sends interrupt request information to the CPU of BMC, makes the information the CPU of said BMC takes out said conversion from said FPGA after.

Preferably, said DMS debugs said malfunctioning node, comprising:

Said DMS obtains the fault recovery information of said malfunctioning node according to the CPU information of said malfunctioning node;

Said DMS sends the fault recovery information of the said malfunctioning node CPU through BMC to FPGA, and the instruction that the CPU that makes said FPGA be said malfunctioning node with said fault recovery information translation can carry out also sends said instruction to the CPU of said malfunctioning node through XDP.

The beneficial effect of present embodiment comprises: when a plurality of nodes of equipment management system DMS management moved, said DMS obtained the central processor CPU operation information of said each node; When breaking down node in said a plurality of nodes, said DMS is known malfunctioning node according to the CPU operation information of said each node; Said DMS debugs said malfunctioning node.Through online each node cpu operation information that obtains when the node in the system breaks down; Can be in time according to the CPU operation information of each node; Find malfunctioning node; And online malfunctioning node is debugged, need not lift down malfunctioning node separately and debug, solved in the prior art malfunctioning node debugging long problem consuming time.

Embodiment 2

A kind of method of information processing is provided in the present embodiment; Wherein, Node comprises FPGA, XDP and CPU, and the XDP signal on the mainboard is through the conversion of fpga logic realization XDP signal, and the signal with conversion passes the CPU to BMC through certain mode again; Through the CPU of BMC information is carried out encapsulation process by the protocol format of IPMI, realize cpu fault online treatment mechanism by the passage between BMC and DMS again.Wherein BMC can be the module that is integrated on the node, also can be an independently module, and this present embodiment is not done concrete qualification.Referring to Fig. 2, the information processing method in the present embodiment specifically comprises:

201, when a plurality of nodes of DMS management moved, DMS obtained the CPU operation information of each node.

In the present embodiment, each node in the DMS management system is suspended to the device of the FPGA in the node on the LOCAL BUS bus of BMC, and the XDP signal on the mainboard is realized the XDP signal resolution through fpga logic.The XDP debugging interface of node cpu and the intercommunication of FPGA; Wherein DMS obtains before the CPU operation information of each node; Said DMS obtains before the CPU operation information of said each node, and each node of said DMS management is carried out following steps: the FPGA of node receives the CPU information of the CPU of said node through the XDP transmission; Said FPGA converts said interface message to information that said DMS can show; And the information after the said conversion sent to the CPU of BMC; The information of the CPU that makes said BMC after to said conversion encapsulate and said envelope changeed after information send said DMS to through FE PHY (FAST ETHNET PHY, Fast Ethernet PHY).

In the present embodiment, in concrete real-time process, Data Receiving district OUT, data buffer 1, data buffer 2 and Data Receiving district IN are set on FPGA.Data Receiving district OUT is used to deposit the data that enter into FPGA from the XDP of node cpu; Data buffer 1 is used for depositing the data after FPGA resolves the data of Data Receiving district OUT; Data buffer 2 is used to deposit the order that DMS assigns to this node; Be fault recovery information, it is the data of CPU language with the data parsing of data buffer 2 that Data Receiving district IN is used for depositing FPGA.Based on the division of above-mentioned data area, the CPU operation information that obtains each node of system specifically comprises:

When 1) moving in system, the interface message of the XDP of the CPU of node debugging gets into the Data Receiving buffer zone OUT of FPGA;

Wherein, the interface message of the XDP of the CPU of node debugging is the CPU operation information.

2) FPGA carries out protocol analysis with the data of data buffer OUT, and the data after will resolving are put into data buffer 1;

Wherein, the data among the data buffer zone OUT being carried out protocol analysis is exactly the process that the CPU operation information is converted to the information that DMS can show.When system moved, the operation information of the CPU of node was binary code, if this code directly shows on DMS; Mess code may appear; The DMS system can not discern, and therefore needs FPGA that the operation information of CPU is changed, and converts the language that DMS can show into.

3) FPGA sends interrupt request information to the CPU of BMC; After the CPU of BMC receives this interrupt request; The data of data buffer 1 are taken out; And the protocol form of data with IPMI encapsulated, then through the channel transfer between CPU and FEPHY to DMS, packets of information handled and shown by DMS.

Wherein, FPGA sends interrupt request information to the CPU of BMC, after the CPU of BMC receives this interrupt request, the data of data buffer 1 is taken out; And the protocol form of data with IPMI encapsulated; Through the process of the channel transfer between CPU and FEPHY to DMS, the information after will resolving exactly sends the CPU of BMC to then, and the information of the CPU of BMC after to conversion the information after the commentaries on classics of encapsulating and will seal sends the process of the DMS of institute to through FEPHY.

202, when breaking down node in a plurality of nodes, DMS is known malfunctioning node according to the CPU operation information of each node.

In the present embodiment; When system moved, the real-time CPU operation information that obtains each node in the system was when certain or a plurality of node in the system break down; DMS analyzes the CPU operation information of each node, finds the one or more nodes that break down.Wherein concrete analytic process comprises: operation result in the operation information and preset operation result are compared, if exceed preset operation result, confirm that then fault has appearred in this node.Wherein the operation information of CPU is analyzed to know whether it breaks down and belonged to prior art, this present embodiment is repeated no more.

203, DMS debugs said malfunctioning node.

In the present embodiment; After finding malfunctioning node, DMS switches on the malfunctioning node, analyzes the failure cause of malfunctioning node; And assign instruction to malfunctioning node according to the failure cause of malfunctioning node; This instruction comprises the fault recovery information of malfunctioning node, with the CPU of this malfunctioning node of on-line debugging, makes malfunctioning node after operation troubles information, can recover operate as normal.Wherein malfunctioning node is debugged, specifically comprised: DMS obtains the fault recovery information of said malfunctioning node according to the CPU operation information of said malfunctioning node; Said DMS sends the fault recovery information of the said malfunctioning node CPU through BMC to FPGA, and the instruction that the CPU that makes said FPGA be said malfunctioning node with said fault recovery information translation can carry out also sends said instruction to the CPU of said malfunctioning node through XDP.In this step; The CPU of the instruction node that DMS assigns can not directly move, and needing this instruction transformation is the instruction that node cpu can be carried out, and uses FPGA to change in the concrete present embodiment; But also be not limited thereto kind of a conversion regime, this present embodiment is not done concrete qualification.

In the practical implementation process, based on the data field that FPGA divides, DMS find malfunctioning node and the trend of the trend of information flow that malfunctioning node is debugged and the information flow in the step 201 opposite, concrete information flow comprises:

1) DMS gives an order downwards and this order is encapsulated with the IPMI protocol form, the order after the encapsulation is passed to the CPU of BMC;

The fault recovery information that comprises node in the order that wherein DMS issues.

2) CPU of BMC resolves the information of the form of IPMI, and the CPU of BMC transmits the information after resolving through the passage between itself and FPGA to FPGA, and FPGA is put into the resolving information that receives and reaches data buffer 2;

3) FPGA becomes the data-switching in the data buffer 2 through the XDP agreement content of XDP protocol format; And the content after will changing is put into Data Receiving district IN; Data among the IN of data buffer get into the CPU of node again through XDP, make the CPU fill order, and the normal operation of recovery nodes.

204, after debugging is accomplished to malfunctioning node, system's operate as normal.

In this step, after debugging was accomplished to malfunctioning node, in the time of the node operate as normal, system is operate as normal also, and continues to obtain the CPU operation information of each node, and repeated execution of steps 201-204 repeats no more to this present embodiment.

The beneficial effect of present embodiment comprises: when a plurality of nodes of equipment management system DMS management moved, said DMS obtained the central processor CPU operation information of said each node; When breaking down node in said a plurality of nodes, said DMS is known malfunctioning node according to the CPU operation information of said each node; Said DMS debugs said malfunctioning node.Through online each node cpu operation information that obtains when the node in the system breaks down; Can be in time according to the CPU operation information of each node; Find malfunctioning node; And online malfunctioning node is debugged, need not lift down malfunctioning node separately and debug, solved in the prior art malfunctioning node debugging long problem consuming time.And communicating by letter between the XDP that realizes node through FPGA and BMC realizes simple, workable.

Embodiment 3

Referring to Fig. 3, a kind of equipment management system is provided in the present embodiment, it is characterized in that said equipment management system comprises: acquisition module 301, fault diagnosis module 302 and fault debugging module 303.

Acquisition module 301 is used for when a plurality of nodes of DMS management move, obtaining the CPU operation information of said each node;

Fault diagnosis module 302 is used for when said a plurality of nodes break down node, knowing malfunctioning node according to the CPU operation information of said each node;

Fault debugging module 303 is used for said malfunctioning node is debugged.

Preferably, said fault debugging module 303 comprises:

Acquiring unit is used for obtaining according to the CPU operation information of said malfunctioning node the fault recovery information of said malfunctioning node;

Order issues the unit; Be used for sending the fault recovery information of the said malfunctioning node CPU through BMC to FPGA, the instruction that the CPU that makes said FPGA be said malfunctioning node with said fault recovery information translation can carry out also sends said instruction to the CPU of said malfunctioning node through XDP.

Referring to Fig. 4; A kind of node also is provided in the present embodiment; Said node comprises: on-site programmable gate array FPGA 100, debugging interface XDP200 and central processor CPU 300, wherein referring to Fig. 5, said FPGA comprises: receiver module 401 and message processing module 402.

Receiver module 401 is used to receive the CPU operation information that CPU transmits through XDP;

Message processing module 402; Be used for converting said interface message to information that DMS can show; And sending the information after the said conversion CPU of BMC to, the information of the CPU that makes said BMC after to said conversion encapsulates and sends the information after the said encapsulation to said DMS.

Preferably, message processing module 402 comprises:

First transmitting element is used for sending interrupt request information to the CPU of BMC, makes the information the CPU of said BMC takes out said conversion from said FPGA after.

Preferably, said message processing module 402 also comprises:

Receiving element is used to receive the fault recovery information of said DMS through the said node of the CPU transmission of said BMC;

Converting unit is used for the instruction that the CPU that said fault recovery information translation is said node can be carried out;

Second transmitting element is used for sending said instruction to said CPU through said XDP.

Wherein FPGA comprises four data areas, Data Receiving district OUT, data buffer 1, data buffer 2 and Data Receiving district IN.Data Receiving district OUT is used to deposit the data that enter into FPGA from the XDP of CPU; Data buffer 1 is used for depositing the data after FPGA resolves the data of Data Receiving district OUT; Data buffer 2 is used to deposit the order that DMS assigns to this node; Be fault recovery information, it is the data of CPU language with the data parsing of data buffer 2 that Data Receiving district IN is used for depositing FPGA.

Referring to Fig. 6, a kind of system of information processing also is provided in the present embodiment, said system comprises: aforesaid equipment management system 400 and aforesaid node 500.

The equipment management system that present embodiment provides, node and system specifically can belong to same design with method embodiment, and its concrete implementation procedure sees method embodiment for details, repeats no more here.

The all or part of step that one of ordinary skill in the art will appreciate that realization the foregoing description can be accomplished through hardware; Also can instruct relevant hardware to accomplish through program; Described program can be stored in a kind of computer-readable recording medium; The above-mentioned storage medium of mentioning can be a ROM (read-only memory), disk or CD etc.

The above is merely preferred embodiment of the present invention, and is in order to restriction the present invention, not all within spirit of the present invention and principle, any modification of being done, is equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. A method for information processing, characterized in that the method comprises:

When multiple nodes managed by the device management system DMS are running, the DMS acquires the CPU running information of each node;

When a faulty node appears in the plurality of nodes, the DMS learns the faulty node according to the CPU operation information of each node;

The DMS debugs the faulty node.

2. The method according to claim 1, wherein each node managed by the DMS includes a Field Programmable Gate Array FPGA, a debugging interface XDP and a CPU, and the DMS obtains the CPU operation of each node. information, each node managed by the DMS performs the following steps:

The FPGA of the node receives the CPU operation information transmitted by the CPU of the node through the debugging interface XDP;

The FPGA converts the CPU operation information into information that the DMS can display, and transmits the converted information to the CPU of the on-board management controller BMC, so that the CPU of the BMC performs the conversion on the converted information. The information is encapsulated and the encapsulated information is transmitted to the DMS.

3. method according to claim 2, is characterized in that, the described information after described conversion is sent to the CPU of BMC, comprises:

The FPGA sends interrupt request information to the CPU of the BMC, so that the CPU of the BMC takes out the converted information from the FPGA.

4. The method according to any one of claims 1-3, wherein the DMS debugging the faulty node includes:

The DMS acquires fault recovery information of the faulty node according to the CPU information of the faulty node;

The DMS transmits the fault recovery information of the faulty node to the FPGA through the CPU of the BMC, so that the FPGA converts the fault recovery information into instructions that the CPU of the faulty node can execute and transmits the instructions through XDP to the CPU of the failed node.

5. A device management system DMS, characterized in that the device management system includes:

An acquisition module, configured to acquire the CPU operation information of each node when multiple nodes managed by the DMS are in operation;

A fault diagnosis module, configured to obtain the faulty node according to the CPU operation information of each node when a faulty node appears in the plurality of nodes;

The fault debugging module is used to debug the fault node.

6. The equipment management system according to claim 5, wherein the fault debugging module comprises:

an obtaining unit, configured to obtain fault recovery information of the faulty node according to the CPU operation information of the faulty node;

A command issuing unit, configured to transmit the fault recovery information of the faulty node to the Field Programmable Gate Array FPGA through the CPU of the on-board management controller BMC, so that the FPGA converts the fault recovery information into the fault recovery information The CPU of the node can execute the command and transmit the command to the CPU of the faulty node through XDP.

7. A kind of node, it is characterized in that, described node comprises: Field Programmable Gate Array FPGA, debugging interface XDP and central processing unit CPU, wherein said FPGA comprises:

a receiving module, configured to receive CPU operation information transmitted by the CPU through the XDP;

The information processing module is used to convert the interface information into information that can be displayed by the device management system DMS, and transmit the converted information to the CPU of the on-board management controller BMC, so that the CPU of the BMC can control the The transformed information is encapsulated and the encapsulated information is transmitted to the DMS.

8. The node according to claim 7, wherein the information processing module comprises:

The first sending unit is configured to send interrupt request information to the CPU of the BMC, so that the CPU of the BMC fetches the converted information from the FPGA.

9. The node according to claim 7 or 8, wherein the information processing module further comprises:

a receiving unit, configured to receive the fault recovery information of the node transmitted by the DMS through the CPU of the BMC;

a conversion unit, configured to convert the fault recovery information into instructions executable by the CPU;

The second sending unit is configured to send the instruction to the CPU of the node through the XDP.

10. An information processing system, characterized in that the system comprises: the device management system DMS according to any one of claims 5-6 and the node according to any one of claims 7-9.